Current goal:
Parse dbSNP XML data file into csv files, map SNP into genes, and then perform network permutation analysis. Are there network clustering difference of diseases between races?
dbSNP XML -> parse coordindates --> genes --> map to gene network --> association patterns?
Tasks
0) Figure out collaboration on project using GitHub.
1a) Download dbSNP data from ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/genotype/
1b) Download the human reference genome. http://www.ncbi.nlm.nih.gov/genome/guide/human/
(There are probably better sources for this. Hong needs to check the right version for dbSNP).
1c) Register and download the OMIM database http://omim.org/downloads. OMIM cannot be shared with third party. This is another database for human disease information.
2a) Parse dbSNP XML files into csv format or tab-delimited format. This probably can be done using Python and bioPython.
2b) Map SNPs coordinates and disease association into genes using the human reference genomes.
2c) Cross-valiatation of dbSNP disease associations with OMIM.
3) Network permutation test of disease clustering by races. Are there racial differences from network perspectives?
Hong needs to check
dbGaP, Genetics association database
Which human reference genome should we use for dbSNP? Or, it should not matter?
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Thursday, August 22, 2013
Wednesday, August 21, 2013
Create hSNP and disease Github repository
cd hSNP_disease/
git init
git add *
git remote add origin https://github.com/hongqin/hSNP_disease.git
git config http.postBuffer 209715200
git commit -m 'first'
git push --force origin master
rwwltdsmac110:hSNP_disease hongqin$ git push --force origin master
Counting objects: 7, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (7/7), 225.04 MiB | 39.85 MiB/s, done.
Total 7 (delta 0), reused 0 (delta 0)
human disease genes
http://www.omim.org/
http://www.genecards.org/cgi-bin/listdiseasecards.pl?type=full
6548 "disease genes" are currently present in the GeneCards database
human genome facts
From wikipedia:
The haploid human genome (23 chromosomes) is estimated to be about 3.2 billion base pairs long and to contain 20,000–25,000 distinct genes.[1][2][3]
Biopython and SNP
References:
http://comments.gmane.org/gmane.comp.python.bio.devel/8928
https://github.com/ngopal/23andMe
http://biopython.org/pipermail/biopython/2010-April/006416.html
2010/4/13 Tiago Antão <tiagoantao at gmail.com>: > Hi, > > Just a simple question: > Entrez SNP seems to return ASN.1 format only. > Is there any way to parse this in biopython? I've looked at SeqIO and > found nothing... > I can think of tools to process this outside, but I am just curious if > this is processed natively with Biopython (being an exposed NCBI > format...) > > Many thanks, > Tiago > PS - You can easily try this with: > hdl = Entrez.efetch(db="snp", id="3739022") > print hdl.read() Hi Tiago, No, we don't support ASN.1, and I don't see any good reason to - I think it would only be NCBI ASN.1 we'd we interested in, and I think that all their resources are available in other easier to use formats like XML these days. See also http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One Instead ask Entrez to give you the SNP data as XML: Entrez.efetch(db="snp", id="3739022", retmode="xml") Hopefully the SNP XML file has everything in it. You have a choice of Python XML parsers to use. However, the Bio.Entrez parser doesn't like this XML. This appears to be related (or caused by) a known NCBI bug. See http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Peter
Thursday, August 15, 2013
cell size control
cell cycle regulation OR growth rate regulation?
Do smaller cells have longer G1 phases?
negative correlation bw G1 length and cell size
antibody screen for size sencors
identify drugs that weakens negative correlation bw size and cell cycle, size and growth rate.
mTOR affect size and growth rate.
some kind of integral feed back loops might be involved.
Do smaller cells have longer G1 phases?
negative correlation bw G1 length and cell size
antibody screen for size sencors
identify drugs that weakens negative correlation bw size and cell cycle, size and growth rate.
mTOR affect size and growth rate.
some kind of integral feed back loops might be involved.
Albert Goldbeter, Vatech talk
Gerard and Goldbeter 2012 argued that cell cycle is a limit cycle. This seems to imply that cell cycle would go indefinitly. An aging model would require a damned cycle.
Gerard Goldbeter Math Model Nat Phenom, 2012
Tuesday, August 13, 2013
Inquiry based learning
Some practices:
This seems to be rooted in the Socractic method.
Literature review on a research topic. Writing a term paper.
Case studies.
Capstone experiences
Blended classrooms, worksheet for a group of 4-5 students, with information. The worksheet will guide students to develop concepts.
Case: Dr. Oz's show, are there arsenic in our juice?
SWOT analysis
References:
http://en.wikipedia.org/wiki/Socratic_method
http://en.wikipedia.org/wiki/Inquiry-based_learning
Lee, Virginia S. What is inquiry-guided learning?
Micro UN symposium
Microbiology, yeast genetics,
setting policies of the microbe world
signal transduction
war and peace theme
shopping list, food recipe
setting policies of the microbe world
signal transduction
war and peace theme
shopping list, food recipe
Sunday, August 11, 2013
Banner tutorial
How do I create a new requisition? Please use the link below which will give you step by step directions:
Banner Tutorial:
http://speldoc.spelman.edu/itc/AdminServices/index.htm
Who can I buy goods and services from? Please use the preferred vendor list for a list by commodities and vendors. Click on commodity and all information is there to assist you in completing your requisition and acquiring a Purchase Order.
Preferred Vendor List
http://princess.spelman.edu/vendorprofile.nsf/PreferredVendorList
How does a new vendor get approved to do business with Spelman College? How do I requisition a Guest Speaker? The first step is to complete the attached vendor profile. After the vendor or Speaker is approved a Banner requisition is required and a PO must be created. Special Note: Completion of a profile does not guarantee approval.
Vendor Profile
http://princess.spelman.edu/vendorprofile.nsf/frontpage
To look for vendor codes, go to banner database,
Form FTIIDEN
enter ARMARK%, F8 to execute the query
F7 to clear the form
Saturday, August 10, 2013
Tony Hui, UCSD
Schaechter, J gen Microbio 1958, faction of proteins (R-proteins) ~ growth rate
Bacterial growth laws
Three modes of growth limitation:
C-limitation -| catabolic section
A-lim --| anabolic section
R-lim ---| polymerization section
Mass spec -> changes of groups of proteins
Proteome partition model
Scott, Science 2010,
You, Nature 2013
6 Coarse grained proteome sections.
Spencer, qbio, prolieferation-quiscence decision
Spencer, Tobias Meyer, Cell, in press
Live-cell sensor for CDK2 activity,
G1/G0,
CDK2-Venus green in nucleus
S, G2/M
CDK2-Venus in cytosol
p21 levels are different in CDK2-inc and CDD2-low cells.
p21---| CDK2
CDK2-low is in transient G0 state.
Live-cell sensor for CDK2 activity,
G1/G0,
CDK2-Venus green in nucleus
S, G2/M
CDK2-Venus in cytosol
p21 levels are different in CDK2-inc and CDD2-low cells.
p21---| CDK2
CDK2-low is in transient G0 state.
E coli aging !!
Suckjoon Jun and Minsu Kim:
CFU viability is exponential, but RLS is Gompertz. When SOS is knocked out, RLS become exponential.
RLS for E coli is measured in good growth condition, and CLS for E coli is measured in depleted nutrient condition. It is possible that CLS condition overwhelm network buffering and resulted in a network with too little redundancy. On the other hand, aging was able to manifest itself during growth, because the external 'insults' are below the 'critical' point for network robustness.
Craig Skinner and Su-ju Lin, 2010 AMB review on CR
CR on E coli aging
http://precedings.nature.com/documents/2071/version/1
CFU viability is exponential, but RLS is Gompertz. When SOS is knocked out, RLS become exponential.
RLS for E coli is measured in good growth condition, and CLS for E coli is measured in depleted nutrient condition. It is possible that CLS condition overwhelm network buffering and resulted in a network with too little redundancy. On the other hand, aging was able to manifest itself during growth, because the external 'insults' are below the 'critical' point for network robustness.
Craig Skinner and Su-ju Lin, 2010 AMB review on CR
CR on E coli aging
http://precedings.nature.com/documents/2071/version/1
Friday, August 9, 2013
Unscripted classrooms
Zip-Zap, introduction
Promotion commercials for a bacteria, pathogen, virus.
Thursday, August 8, 2013
Benjamin Machta, Princeton, information fow in plasma membrane.
Membrane is nearly critical.
Phase transition in synthetic lipid membran
2nd order critical point
Allosteric regulation
General anesthesia -> membrane involvement? However, proteins are unquestionably involved.
Lippincott-Schwartz, Imaging
Photoactivable fluorescent proteins
Patternson, Lippincott-Schwartz 2002, Science.
Fitting a 2D Gaussian centroid to photoactivated signal to find the source--> better resolution.
This method can be applied to photobleaching and improves resolution of conventional fluorescence imaging.
Assaf Rotem, Harvard, drop-based microfluidics of single-cell assay
Chewing on biology, one bit at a time -- high throughput assays at single-cell resolution using Drop-Based Microfluidics reveal novel variations in heterogeneous populations, Assaf Rotem, Harvard University
CHIP-Seq for every cell
Rob De Boer, Utrecht University
Heterogeneous Differentiation Patterns of Individual CD8+ T Cells,
Rob De Boer, Utrecht University
http://theory.bio.uu.nl/rdb/
http://theory.bio.uu.nl/rdb/publications.html
Bar coding of cells using retroviral library. Deep sequencing to find distribution of cells.
50% of total immune responses derived from 5% of the cells
50% of the families contain less than 200 daughter cells.
Disparity in family sizes is established early.
Rob De Boer, Utrecht University
http://theory.bio.uu.nl/rdb/
http://theory.bio.uu.nl/rdb/publications.html
Bar coding of cells using retroviral library. Deep sequencing to find distribution of cells.
50% of total immune responses derived from 5% of the cells
50% of the families contain less than 200 daughter cells.
Disparity in family sizes is established early.
Zaida Luthey-Schulten, stochastic simulations of cellular processes
UIUC, center of physics of living cells.
Noise contributions in an inducible genetic switch: A whole cell simulation study,
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002010
http://q-bio.org/w/images/d/d2/Talk_ZanLutheySchulten.pdf
Choi, Cai, Xie, 2008, Science, stochastic single molecular event triggers phenotypic switching of a bacterial cell.
http://www.ncbi.nlm.nih.gov/pubmed/18927393
Tang, Earnest, Wang, Xie, zhung, Science 2011, nucleoid packing
Wolfgang Baumeister, Elizabeth Villa, yeast nuclear structure.
ZLS emphasizes that crowding makes a difference.
Taniguichi, ... Xie, 2010, Science
Labhsetwar, Cole, ZLS, PNAS in press
BRENDA Database, www.brenda-enzymes.org/
Noise contributions in an inducible genetic switch: A whole cell simulation study,
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002010
http://q-bio.org/w/images/d/d2/Talk_ZanLutheySchulten.pdf
Choi, Cai, Xie, 2008, Science, stochastic single molecular event triggers phenotypic switching of a bacterial cell.
http://www.ncbi.nlm.nih.gov/pubmed/18927393
Tang, Earnest, Wang, Xie, zhung, Science 2011, nucleoid packing
Wolfgang Baumeister, Elizabeth Villa, yeast nuclear structure.
ZLS emphasizes that crowding makes a difference.
Taniguichi, ... Xie, 2010, Science
Labhsetwar, Cole, ZLS, PNAS in press
BRENDA Database, www.brenda-enzymes.org/
Comparative analysis of metabolic robustness: E coli and Synechocytis Chintan Joshi, Ashok Prasad
Comparative analysis of metabolic robustness: E coli and Synechocytis
Chintan Joshi, Ashok Prasad
Power-laws leads to less robust systems.
http://q-bio.org/w/images/b/bc/Talk_Prasad_Ashok.pdf
Chintan Joshi, Ashok Prasad
Power-laws leads to less robust systems.
http://q-bio.org/w/images/b/bc/Talk_Prasad_Ashok.pdf
Ali Tabei, 2013, PNAS
Ali Tabei, 2013, PNAS
random walks
glass properties,
random walks
glass properties,
S. M. A. Tabei, S. Burov, H. Y. Kim, A. Kuznetsov, T. Huynh, J.
Jureller, L. H. Philipson, A. R. Dinner, and N. F. Scherer, Proc.
Natl.
Acad. Sci. USA 110, 4911 (20
Mary Teruel, Stanford, Conrolling size of tissue with stochastic noises
Mary Teruel, Stanford, Conrolling size of tissue with stochastic noises
Spalinding 2008 nature, fat cells renew at 10% per year.
PPARG -> adipogenic factors --> change of preadipocytes to adipcytes.
Park et al, Cell Rep 2012, conversion of preadipocyte to adipocyte via a bistable switch. (All or none process)
[PPARgamma] ~ [C/EBPbeta] plot to show bi-modal distribution of two cell populations.
[PPARG] ~ Rosiglitzaone(uM) plot show hysteresis, in support of bistable switch and positive feedback.
Increased noises would lead to more switches, this can be easily shown by simulation.
Q: how to increase noises experimentally?
More noises lead to stronger de-differentiation.
Too little noise, hard to switch. Too much noise, cannot hold the states.
One-feedback loop model:
R -> PPARG(x0) --> Terminal
|->CEBPA(x1)-^
Two ODE equations.
Increased cooperativity helps, but does not solve the optimzation problem.
Multi-loop systems can solve the optimization problem in theory. How to prove this in experiments?
MT uses a triple-quadropole mass spectrometer, aiming to identify feedback loops. This method requires a unique peptide sequence. Time course data were obtained. 7 feedback loop were concluded.
Fat cells -> nuclei -> proteins ---(typsin)---> peptides ---> mass spec
References:
http://q-bio.org/w/images/c/c3/Talk_TeruelMary.pdf
Spalinding 2008 nature, fat cells renew at 10% per year.
PPARG -> adipogenic factors --> change of preadipocytes to adipcytes.
Park et al, Cell Rep 2012, conversion of preadipocyte to adipocyte via a bistable switch. (All or none process)
[PPARgamma] ~ [C/EBPbeta] plot to show bi-modal distribution of two cell populations.
[PPARG] ~ Rosiglitzaone(uM) plot show hysteresis, in support of bistable switch and positive feedback.
Increased noises would lead to more switches, this can be easily shown by simulation.
Q: how to increase noises experimentally?
More noises lead to stronger de-differentiation.
Too little noise, hard to switch. Too much noise, cannot hold the states.
One-feedback loop model:
R -> PPARG(x0) --> Terminal
|->CEBPA(x1)-^
Two ODE equations.
Increased cooperativity helps, but does not solve the optimzation problem.
Multi-loop systems can solve the optimization problem in theory. How to prove this in experiments?
MT uses a triple-quadropole mass spectrometer, aiming to identify feedback loops. This method requires a unique peptide sequence. Time course data were obtained. 7 feedback loop were concluded.
Fat cells -> nuclei -> proteins ---(typsin)---> peptides ---> mass spec
References:
http://q-bio.org/w/images/c/c3/Talk_TeruelMary.pdf
Undergraduate research conference at UTK, Nov16-17 (Sat, Sun), 2013
November 16-17, 2013. The conference will begin in the afternoon on November 16 and end by 4 pm on November 17. Travel from Spelman to UTK take 4 hours. So, people can leave on Sat and return on Sun.
Funds can be requested, a good opportunity for SIAM chapter.
URL:
http://www.nimbios.org/education/undergrad_conf2013
Alex Lang, Boston Univ, epigenetic landscapes
Alex Lang, Boston University (4th graduate student)
Epigenetic landscapes provide insight into cellular programming,
Takahashi, Yamanaka 2006, converted epithelia to stem cells. Terminal cells can be converted to another terminal cell types without going through the 'stem' cell stages.
Waddington landscape 1957, thinks cell types as dynamic attractors.
Lang argued that Waddinton landscape is a emergent property of underling interaction networks.
Lang: Cell type = high dimenstional Ising spin vector.
Landscape construction:
Each cell is a vector. Each cell is a basin of attraction.
Project method neural network is a method to take vector and construct landscape.
( Hopfiled PNAS 1982? )
Attractors due to correlation-based effective interaction matrix.
High demensions often give spurious attractors, an unavoidable for frustrated systems.
Lang claimed that partially reprogrammed cells are the spurious attractors.
Lang developed a project method to match vectors to cell types.
Reprogramming facts: only final cell types matters, only one set of TFs in time.
Lang claimed that partially reprogrammed cells are the spurious attractors.
Lang developed a project method to match vectors to cell types.
Reprogramming facts: only final cell types matters, only one set of TFs in time.
URL:
http://q-bio.org/w/images/8/89/Talk_LangAlex.pdf
Q-bio, Nancy Albritton, UNC, Chemistry
Nancy Albritton, UNC, Chemistry
Subpopulations of clinic samples can be detected. Heterogeneity in cancer cells were emphasized.
Fluorescent PIP2 reporter libraries.
FAM-PIP2
metabolism of PI3K
PIP2 --(PI3K) --> PIP3 ---(PTEN)-->PIP2 ... ...
Challenges: standardization between samples. Problems of cells lines.
URLs:
Fluorescent substrates,
detection limits, 10^-21 moles
Sphigosine-FAM
Bodipy-PIP2,C4
Kinase/Protease substrates
Fast single-cell analyses
Microfluidic device, lyse cells by laser.
Sphigosine kinase: Single breast cancer cells, Red blood cells
Subpopulations of clinic samples can be detected. Heterogeneity in cancer cells were emphasized.
Fluorescent PIP2 reporter libraries.
FAM-PIP2
metabolism of PI3K
PIP2 --(PI3K) --> PIP3 ---(PTEN)-->PIP2 ... ...
Challenges: standardization between samples. Problems of cells lines.
URLs:
Wednesday, August 7, 2013
Mean field approximation and network reliability
According to wikipedia entry, MFT simply the behavior of large and complex stochastic models by studying a simpler model. Such models often consists a large number of small interacting individuals. The effect of all the other individual on any given individuals is approximated by "a single average effect", thus reducing the many-body problem to a one-body problem.
I should apply mean field approximation in network reliability studies. The challenge is that biological networks are heterogeneous, and simple 'average' might leave some interesting properties. In any case, this is an interesting direction that I should explore.
In Bialek, Nemenman, Tishby 2008, Predictability, complexity and learning, BNT08 discussed a Ising model as
BNT08 used Boltzmann distribution to describe spins {$\sigma_i$}.
The Boltzmann distribution is basically exponential decay function that is often used in reliability models. So, there seems to be a natural connection between statistics physics and reliability modeling.
There may be a problem or challenge for the mean field approach to study aging. Based on reliability model, system ages are determined by extreme values of components. So, mean field approximation may not capture this maximal-minimal nature of aging.
Reference:
http://en.wikipedia.org/wiki/Mean_field_theory
Fiberglass needles for yeast tetrad dissection
1. Singer Instruments
2. Cora Styles (most responses)
3. Newport Corporation, (Item F-SBB and others). This is the cheapest alternative at $5.10 with free shipping.
Yeast deletion library from Invitrogen (Life Technology)
Invitrogen Yeast Deletion Product page:
http://clones.invitrogen.com/cloneinfo.php?clone=yeast
User manual PDF:
http://tools.invitrogen.com/content/sfs/manuals/yeast_deletion_clones_man.pdf
Sanger resequenced strains
Ordering information for entire set
http://www.ncyc.co.uk/sgrp.html
Individual ordering information
http://www.ncyc.co.uk/search-sgrp.html
Q-bio, Ilya Nemenman, tutorial on information processing in biological systems
Information processing in biological systems, Ilya Nemenman
IN use probabilistic model for information processing.
w --> P(r|w) --> r --
^----------------------|
Information is not context free.
IN uses variability S[w] in his model.
$S$ seems to stand for entropy.
With probabilistic assumptions, IN can write down models explicitly.
References:
http://www.menem.com/~ilya/wiki/index.php/Stochastic_dynamics_on_biological_networks
IN use probabilistic model for information processing.
w --> P(r|w) --> r --
^----------------------|
Information is not context free.
IN uses variability S[w] in his model.
$S$ seems to stand for entropy.
With probabilistic assumptions, IN can write down models explicitly.
References:
http://www.menem.com/~ilya/wiki/index.php/Stochastic_dynamics_on_biological_networks
Subscribe to:
Posts (Atom)