Thursday, August 22, 2013

hSNP project, 0822 notes

Current goal: 
 Parse dbSNP XML data file into csv files, map SNP into genes, and then perform network permutation analysis. Are there network clustering difference of diseases between races?

 dbSNP XML -> parse coordindates --> genes --> map to gene network --> association patterns?

0) Figure out collaboration on project using GitHub.

1a) Download dbSNP data from

1b) Download the human reference genome.
(There are probably better sources for this. Hong needs to check the right version for dbSNP).

1c) Register and download the OMIM database OMIM cannot be shared with third party. This is another database for human disease information.

2a) Parse dbSNP XML files into csv format or tab-delimited format. This probably can be done using Python and bioPython.

2b) Map SNPs coordinates and disease association into genes using the human reference genomes. 

2c) Cross-valiatation of dbSNP disease associations with OMIM.

3) Network permutation test of disease clustering by races. Are there racial differences from network perspectives?

Hong needs to check
dbGaP,  Genetics association database

Which human reference genome should we use for dbSNP? Or, it should not matter?

Wednesday, August 21, 2013

Create hSNP and disease Github repository

cd hSNP_disease/ 

git init
git add *

git remote add origin

git config http.postBuffer 209715200

git commit -m 'first'

git push --force origin master 

rwwltdsmac110:hSNP_disease hongqin$ git push  --force origin master
Counting objects: 7, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (7/7), 225.04 MiB | 39.85 MiB/s, done.
Total 7 (delta 0), reused 0 (delta 0)

human disease genes

6548 "disease genes" are currently present in the GeneCards database  

human genome facts

From wikipedia:

The haploid human genome (23 chromosomes) is estimated to be about 3.2 billion base pairs long and to contain 20,000–25,000 distinct genes.[1][2][3]

Biopython and SNP

2010/4/13 Tiago Antão <tiagoantao at>:
> Hi,
> Just a simple question:
> Entrez SNP seems to return ASN.1 format only.
> Is there any way to parse this in biopython? I've looked at SeqIO and
> found nothing...
> I can think of tools to process this outside, but I am just curious if
> this is processed natively with Biopython (being an exposed NCBI
> format...)
> Many thanks,
> Tiago
> PS - You can easily try this with:
> hdl = Entrez.efetch(db="snp", id="3739022")
> print

Hi Tiago,

No, we don't support ASN.1, and I don't see any good reason to - I
think it would only be NCBI ASN.1 we'd we interested in, and I think
that all their resources are available in other easier to use formats
like XML these days.

See also

Instead ask Entrez to give you the SNP data as XML:

Entrez.efetch(db="snp", id="3739022", retmode="xml")

Hopefully the SNP XML file has everything in it.

You have a choice of Python XML parsers to use. However, the
Bio.Entrez parser doesn't like this XML. This appears to be related
(or caused by) a known NCBI bug. See


Thursday, August 15, 2013

cell size control

cell cycle regulation OR growth rate regulation?

Do smaller cells have longer G1 phases?
negative correlation bw G1 length and cell size

antibody screen for size sencors

identify drugs that weakens negative correlation bw size and cell cycle, size and growth rate.

mTOR affect size and growth rate.

some kind of integral feed back loops might be involved.

Albert Goldbeter, Vatech talk

Gerard and Goldbeter 2012 argued that cell cycle is a limit cycle. This seems to imply that cell cycle would go indefinitly.  An aging model would require a damned cycle.
Gerard Goldbeter Math Model Nat Phenom, 2012

Tuesday, August 13, 2013

Inquiry based learning

Some practices:

This seems to be rooted in the Socractic method.

Literature review on a research topic.  Writing a term paper.

Case studies.

Capstone experiences

Blended classrooms, worksheet for a group of 4-5 students, with information. The worksheet will guide students to develop concepts.

Case: Dr. Oz's show, are there arsenic in our juice?

SWOT analysis

Lee, Virginia S. What is inquiry-guided learning?

Micro UN symposium

Microbiology,  yeast genetics,

setting policies of the microbe world

signal transduction

war and peace theme

shopping list, food recipe

Sunday, August 11, 2013

Banner tutorial

How do I create a new requisition? Please use the link below which will give you step by step directions:

Banner Tutorial:

Who can I buy goods and services from? Please use the preferred vendor list for a list by commodities and vendors. Click on commodity and all information is there to assist you in completing your requisition and acquiring a Purchase Order.

Preferred Vendor List

How does a new vendor get approved to do business with Spelman College? How do I requisition a Guest Speaker? The first step is to complete the attached vendor profile. After the vendor or Speaker is approved a Banner requisition is required and a PO must be created. Special Note: Completion of a profile does not guarantee approval.

Vendor Profile

To look for vendor codes, go to banner database,
enter ARMARK%, F8 to execute the query
F7 to clear the form

Saturday, August 10, 2013

Tony Hui, UCSD

Schaechter, J gen Microbio 1958, faction of proteins (R-proteins) ~ growth rate

Bacterial growth laws

Three modes of growth limitation:

C-limitation -| catabolic section
A-lim  --| anabolic section
R-lim  ---| polymerization section

Mass spec -> changes of groups of proteins

Proteome partition model
 Scott, Science 2010,

  You, Nature 2013

6 Coarse grained proteome sections.

Spencer, qbio, prolieferation-quiscence decision

Spencer, Tobias Meyer, Cell, in press

Live-cell sensor for CDK2 activity,

CDK2-Venus green in nucleus

S, G2/M
CDK2-Venus in cytosol

p21 levels are different in CDK2-inc and CDD2-low cells.
  p21---| CDK2

CDK2-low is in transient G0 state.

E coli aging !!

Suckjoon Jun and Minsu Kim: 

CFU viability is exponential, but RLS is Gompertz. When SOS is knocked out, RLS become exponential.

RLS for E coli is measured in good growth condition, and CLS for E coli is measured in depleted nutrient condition. It is possible that CLS condition overwhelm network buffering and resulted in a network with too little redundancy.  On the other hand,  aging was able to manifest itself during growth, because the external 'insults' are below the 'critical' point for network robustness.

Craig Skinner and Su-ju Lin, 2010 AMB review on CR

CR on E coli aging

Friday, August 9, 2013

Unscripted classrooms

Zip-Zap, introduction

Promotion commercials for a bacteria, pathogen, virus.

Thursday, August 8, 2013

Benjamin Machta, Princeton, information fow in plasma membrane.

Membrane is nearly critical.

Phase transition in synthetic lipid membran
 2nd order critical point

Allosteric regulation

General anesthesia -> membrane involvement? However, proteins are unquestionably involved.

Lippincott-Schwartz, Imaging

Photoactivable fluorescent proteins

Patternson, Lippincott-Schwartz 2002, Science.

Fitting a 2D Gaussian centroid to photoactivated signal to find the source--> better resolution.

This method can be applied to photobleaching and improves resolution of conventional fluorescence imaging.

Assaf Rotem, Harvard, drop-based microfluidics of single-cell assay

Chewing on biology, one bit at a time -- high throughput assays at single-cell resolution using Drop-Based Microfluidics reveal novel variations in heterogeneous populations, Assaf Rotem, Harvard University

 CHIP-Seq for every cell

Rob De Boer, Utrecht University

Heterogeneous Differentiation Patterns of Individual CD8+ T Cells,
Rob De Boer, Utrecht University

Bar coding of cells using retroviral library. Deep sequencing to find distribution of cells.

50% of total immune responses derived from 5% of the cells

50% of the families contain less than 200 daughter cells.
Disparity in family sizes is established early.

Zaida Luthey-Schulten, stochastic simulations of cellular processes

UIUC, center of physics of living cells.

Noise contributions in an inducible genetic switch: A whole cell simulation study,

Choi, Cai, Xie, 2008, Science, stochastic single molecular event triggers phenotypic switching of a bacterial cell.

Tang, Earnest, Wang, Xie, zhung, Science 2011, nucleoid packing

Wolfgang Baumeister, Elizabeth Villa, yeast nuclear structure.

ZLS emphasizes that crowding makes a difference.

Taniguichi, ... Xie, 2010, Science
Labhsetwar, Cole, ZLS, PNAS in press

BRENDA Database,

Comparative analysis of metabolic robustness: E coli and Synechocytis Chintan Joshi, Ashok Prasad

Comparative analysis of metabolic robustness: E coli and Synechocytis
Chintan Joshi, Ashok Prasad

Power-laws leads to less robust systems.

Ali Tabei, 2013, PNAS

Ali Tabei, 2013, PNAS

random walks

glass properties,

S. M. A. Tabei, S. Burov, H. Y. Kim, A. Kuznetsov, T. Huynh, J.
Jureller, L. H. Philipson, A. R. Dinner, and N. F. Scherer, Proc.
Acad. Sci. USA 110, 4911 (20

Mary Teruel, Stanford, Conrolling size of tissue with stochastic noises

Mary Teruel, Stanford, Conrolling size of tissue with stochastic noises

Spalinding 2008 nature, fat cells renew at 10% per year.

PPARG -> adipogenic factors --> change of preadipocytes to adipcytes.

Park et al, Cell Rep 2012, conversion of preadipocyte to adipocyte via a bistable switch. (All or none process)
[PPARgamma] ~ [C/EBPbeta] plot to show bi-modal distribution of two cell populations.

[PPARG] ~ Rosiglitzaone(uM) plot show hysteresis, in support of bistable switch and positive feedback.

Increased noises would lead to more switches, this can be easily shown by simulation.

Q: how to increase noises experimentally?

More noises lead to stronger de-differentiation.

Too little noise, hard to switch. Too much noise, cannot hold the states.

One-feedback loop model:
 R ->    PPARG(x0)    --> Terminal
Two ODE equations.

Increased cooperativity helps, but does not solve the optimzation problem.

Multi-loop systems can solve the optimization problem in theory. How to prove this in experiments?

MT uses a triple-quadropole mass spectrometer, aiming to identify feedback loops.  This method requires a unique peptide sequence. Time course data were obtained. 7 feedback loop were concluded.

Fat cells -> nuclei -> proteins ---(typsin)---> peptides ---> mass spec


Undergraduate research conference at UTK, Nov16-17 (Sat, Sun), 2013

November 16-17, 2013. The conference will begin in the afternoon on November 16 and end by 4 pm on November 17.  Travel from Spelman to UTK take 4 hours. So, people can leave on Sat and return on Sun.

Funds can be requested, a good opportunity for SIAM chapter.


Alex Lang, Boston Univ, epigenetic landscapes

Alex Lang, Boston University (4th graduate student)

Epigenetic landscapes provide insight into cellular programming,

Takahashi, Yamanaka 2006, converted epithelia to stem cells.  Terminal cells can be converted to another terminal cell types without going through the 'stem' cell stages. 

Waddington landscape 1957, thinks cell types as dynamic attractors.  

Lang argued that Waddinton landscape is a emergent property of underling interaction networks. 

Lang: Cell type = high dimenstional Ising spin vector. 

Landscape construction: 
Each cell is a vector. Each cell is a basin of attraction. 
Project method neural network is a method to take vector and construct landscape. 
( Hopfiled PNAS 1982? )

Attractors due to correlation-based effective interaction matrix.

High demensions often give spurious attractors, an unavoidable for frustrated systems.
Lang claimed that partially reprogrammed cells are the spurious attractors.  
Lang developed a project method to match vectors to cell types.

Reprogramming facts: only final cell types matters, only one set of TFs in time.


Q-bio, Nancy Albritton, UNC, Chemistry

Nancy Albritton, UNC, Chemistry

Fluorescent substrates,

detection limits, 10^-21 moles

Kinase/Protease substrates

Fast single-cell analyses

Microfluidic device, lyse cells by laser.

Sphigosine kinase:  Single breast cancer cells, Red blood cells

Subpopulations of clinic samples can be detected. Heterogeneity in cancer cells were emphasized. 

Fluorescent PIP2 reporter libraries.
metabolism of PI3K
PIP2 --(PI3K) --> PIP3 ---(PTEN)-->PIP2 ... ...

Challenges: standardization between samples. Problems of cells lines.


Wednesday, August 7, 2013

Mean field approximation and network reliability

According to wikipedia entry, MFT simply the behavior of large and complex stochastic models by studying a simpler model. Such models often consists a large number of small interacting individuals.  The effect of all the other individual on any given individuals is approximated by "a single average effect", thus reducing the many-body problem to a one-body problem.

I should apply mean field approximation in network reliability studies. The challenge is that biological networks are heterogeneous, and simple 'average' might leave some interesting properties. In any case, this is an interesting direction that I should explore.

In Bialek, Nemenman, Tishby 2008, Predictability, complexity and learning, BNT08 discussed a Ising model as
BNT08 used Boltzmann distribution to describe spins {$\sigma_i$}.
The Boltzmann distribution is basically exponential decay function that is often used in reliability models. So, there seems to be a natural connection between statistics physics and reliability modeling.

There may be a problem or challenge for the mean field approach to study aging. Based on reliability model, system ages are determined by extreme values of components. So, mean field approximation may not capture this maximal-minimal nature of aging.


Fiberglass needles for yeast tetrad dissection

1. Singer Instruments
2. Cora Styles   (most responses)
3. Newport Corporation, (Item F-SBB  and others).  This is the cheapest alternative at $5.10 with free shipping.

Yeast deletion library from Invitrogen (Life Technology)

Invitrogen Yeast Deletion Product page:

User manual PDF:

Sanger resequenced strains

Ordering information for entire set 

Individual ordering information

Q-bio, Ilya Nemenman, tutorial on information processing in biological systems

Information processing in biological systems, Ilya Nemenman

IN use probabilistic model for information processing.

w --> P(r|w) --> r --

Information is not context free.

IN uses variability S[w] in his model. 

$S$ seems to stand for entropy. 

With probabilistic assumptions, IN can write down models explicitly.