This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Saturday, December 31, 2016
noiseqbio too many false positives
###################################################
### code chunk number 33: NOISeq.Rnw:875-877
mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "n", factor="Zconditions", lc = 1, r = 20, adj = 1.5, plot = FALSE, a0per = 0.9, filter = 1,
# random.seed = 12345,
conditions = c("control","high")
)#this runs 20161230 5:52pm
head(mynoiseqbio@results[[1]])
summary(mynoiseqbio@results[[1]]) #too many false positives
So, I decided that noiseqbio did not give reasonable outcomes. Although, different parameters may yield different results.
For comparison, I tried noiseq, which gave reasonable results.
Friday, December 30, 2016
NOIseq RNAseq differential analysis
http://bioconductor.org/packages/release/bioc/html/NOISeq.html
M: fold-change differences
D: absolute expression differences
(M,D) pair for each gene is evaluated based on a null distribution estimated from technical or biological replicates or simulations in 2011GR.
In NIOSEQBIO, theta=(M+D)/2 seems to be the statistic used for null distribution based on my understanding of its manual.
Probability = 0.8 was the cutoff for differentially expressed genes in 2011GR.
Probability = 0.95 (FDR) is recommended for biologically replicated samples.
In its Tarazona2011GR, noiseq-real and noiseq-sim were used. These two versions have now evolved to noiseq and noiseqbio.
NOISEQBIO is optimized for biological replicates.
When using noiseq and noiseqbio, normalization and filtering can be done through parameters, 'norm'.
Regarding the low-count filtering, it is not necessary to filter in NOISeq method. In contrast, it is recommended to do it in NOISeqBIO , which by default fliters out low-count features with CPM method (filter=1 ).
# noiseq(input, k = 0.5, norm = c("rpkm","uqua","tmm","n"), replicates = c("technical","biological","no"), factor=NULL, conditions=NULL, pnr = 0.2, nss = 5, v = 0.02, lc = 0)
nss = 5, v = 0.02, lc = 1, replicates = "technical")
head(mynoiseq@results[[1]])
> myfactors
Tissue TissueRun
R1L1Kidney Kidney Kidney_1
R1L2Liver Liver Liver_1
R1L3Kidney Kidney Kidney_1
R1L4Liver Liver Liver_1
R1L6Liver Liver Liver_1
R1L7Kidney Kidney Kidney_1
R1L8Liver Liver Liver_1
R2L2Kidney Kidney Kidney_2
R2L3Liver Liver Liver_2
R2L6Kidney Kidney Kidney_2
mynoiseqbio = noiseqbio(mydata, k = 0.5, norm = "rpkm", factor="Tissue", lc = 1, r = 20, adj = 1.5, plot = FALSE, a0per = 0.9, random.seed = 12345, filter = 2)
# "r=20" seems to indicate 20 bootstraps when biological replicate number <5.
Authors stated that noiseq output prob are not equivalent to p-values?
Q: what are "up" and "down" deg referenced to?
Output format
mynoiseq.deg1 = degenes(mynoiseq, q = 0.8, M = "up")
References:
[1] S. Tarazona, F. Garca-Alcalde, J. Dopazo, A. Ferrer, and A. Conesa. Dierential expression in RNA-seq: A matter of depth. Genome Research , 21: 2213 - 2223, 2011.
[2] S. Tarazona, P. Furio-Tar, D. Turra, A. Di Pietro, M.J. Nueda, A. Ferrer, and A. Conesa. Data quality aware analysis of dierential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Research ,
43(21):e140, 2015.
[8] B. Efron, R. Tibshirani, J.D. Storey, V. Tusher. Empirical Bayes Analysis of a Microarray Experiment. Journal of the American Statistical Association , 2001.
Wednesday, December 14, 2016
Tuesday, December 13, 2016
big challenges in evolution and ecology
https://dynamicecology.wordpress.com/2015/07/06/what-are-the-top-5-grand-challenges-in-biology/
1) linking genotype to phenotype, and understanding how the environment influences that link and 2) understanding biological diversity (its evolution, its maintenance, and the consequences of its loss).
So, understanding the origin of life would definitely be one of my grand challenges.
A clear fourth challenge relates to understanding the brain – clearly this is a huge, very active area of research, and it’s also something that students will find really engaging. (It’s also one of the challenges on the list I linked to earlier.)
- Linking phenotype to genotype
- Understanding biodiversity
- Origins of life
- Understanding the brain
- Sustainable agriculture
http://www.imperial.ac.uk/ecosystems-and-environment/
http://www.imperial.ac.uk/ecosystems-and-environment/grand-challenges/
Understanding biodiversity, linking past, present, and future of biodiversity
Environmental monitoring and evaluation, developing new tools and methods for environmental monitoring and evaluation
Engineering complex ecosystems.
Predicting and mitigating environmental change for managing the effect of local, regional and global change
Experiments: manupulation of the natural world to understand the mechanics of ecosystems
Scaling: summing up ecological and evolution processed locally on individuals to understand regional and global patterns
Ecoinformatics and genomics: integrating genomic and ecological data to understand the natural world
Exemplar Research Questions
The following list provides some examples of topics on which faculty in Ecology and Evolutionary Biology at University of Tennessee, Knoxville, would be interested in recruiting graduate students for entry in August 2017.
This list is not exhaustive – indeed, far from it. There are other faculty members who will be recruiting students in the Department. Also, the listed faculty members may recruit students who have different interests to those listed. But we prepared this list just to illustrate to prospective students some of the diversity of topics on which we envision recruiting, spanning conservation, macroevolution, global change ecology, molecular genetics, biology education and systematics, among many other topics.
Paul Armsworth (http://web.utk.edu/~parmswor)
- How can large-scale efforts to conserve biodiversity or ecosystem services, which are led by governments or international nonprofits, most effectively complement bottom-up conservation efforts lePDFd by local communities?
- Conservation organizations often have a hierarchical management structure – how effectively do hierarchies allocate resources to support conservation of biodiversity and ecosystem services?
Joe Bailey (http://joebaileylab.com/)
- How will species range dynamics drive genetic divergence? How do feedbacks reinforce patterns of genetic divergence on the landscape?
- Does contemporary evolution along the gradients of global change alter ecosystem function?
Jessica Budke (http://jmbudke.github.io)
- What morphological and transcriptomic changes do plant species undergo transitioning from terrestrial to aquatic habitats? How do we resolve relationships between morphologically austere taxa?
- What are the functional roles of maternal structures for offspring survival, development, and fitness?
Nina Fefferman (http://eeb.bio.utk.edu/people/nina-fefferman/)
- How do different types of selective pressures on individuals shape the evolution of animal social systems?
- What role (if any) does infectious disease play in conservation management planning for endangered populations?
Jim Fordyce (http://web.utk.edu/~jfordyce)
- How does among population variation in plant phenotype affect population structuring of herbivores?
- What role does host breadth play in range size and diversification rate of herbivorous insects?
Sergey Gavrilets (http://www.tiem.utk.edu/~gavrila)
- How can we understand better theoretically the origins of news species and the links between micro-evolutionary processes and macro-evolutionary patterns?
- How did human social complexity evolve and what are the implications of our evolutionary past for our social behavior?
Xingli Giam (http://eeb.bio.utk.edu/people/xingli-giam/)
- How do human activities impact species, communities, and ecosystem function across spatial and temporal scales?
- How will future demand for food and biofuels interact with likely agricultural yield improvements, climate change, and changes in land rental rates to affect future land-cover transformations and their subsequent impact on biodiversity?
Mike Gilchrist (http://eeb.bio.utk.edu/peopletwo/michael-gilchrist)
- How do assembly costs and translation errors shape selection on codon usage and how do they play themselves out in the face of biased mutation and genetic drift?
- Some pathogens replicate intracellularly within hosts and move between host cells through budding or bursting. How does the rate of intracellular replication affect the rates of immune response clearance by the host? How, in turn, does this lead to changes in the survival of the host and transmission of the pathogen between hosts?
Lou Gross (http://www.tiem.utk.edu/~gross/)
- How are biological processes integrated across scales and levels of biological resolution from within organism level to those operating at population/community/landscape levels?
- How do we effectively utilize mathematical and computational methods for spatial control – what to do, where to do it, when to do it, and how to assess the resulting solutions – for problems in epidemiology, invasive species management and conservation biology?
Susan Kalisz (http://kaliszlab.weebly.com/)
- How do invaders and antagonistic interactions alter soil fungal communities, the function of key plant mutualisms and shape the demography and life history evolution of native community members?
- What role does the ecological context, specifically selection driven by the absence of mates and pollinators, play in the evolution of selfing and genomic changes within and between species? Is selfing an evolutionary dead end or a reversible mating system?
Charlie Kwit (http://www.charleskwit.com)
- What are the effects (actual and predicted) and ramifications of land-use and climate change, management, and disturbance on biodiversity in natural, managed, and agricultural settings?
- What important roles do animals play in the seed dispersal process in animal-mediated seed dispersal systems?
Brandon Matheny (http://www.bio.utk.edu/matheny/Site/Home.html)
- How can we recognize species of mushroom-forming fungi? Why are there so many species of fungi? How are they related to each other, and what factors have promoted their diversification?
- What are general biogeographical patterns in fungi? What processes are responsible for patterns we observe?
Gary McCracken (email: gmccrack@utk.edu)
- How do highly mobile predators (bats) track ephemeral and patchy resources (insects) in three dimensional space?
- Why are some host species associated with a greater diversity of viral pathogens than are other host species?
Monica Papes (http://monapapes.wixsite.com/biodivmatters)
- Are spectral diversity metrics derived from hyperspectral imagery good indicators of forest species richness? What other remotely sensed indices can be used to investigate richness and seasonality of vegetation?
- Does inclusion of species’ physiological limits improve the precision of ecological niche models and potential distribution estimates?
Susan Riechert (email: sriecher@utk.edu)
- What is the importance of behavior in adapting animal populations to different and changing environments?
- What factors limit local adaptation to environmental context and why do weaker strategies persist?
Ed Schilling (http://www.bio.utk.edu/schilling)
- What is the parentage of the presumed allopolyploid lettuces (Lactuca) in North America, how many species are present, when did they arrive from Eurasia, what has been the consequence of polyploidy for their biology and evolution.
Beth Schussler (http://schusslerlab.utk.edu/)
- How can biology programs enhance graduate student instruction of introductory biology courses?
- How do instructor active learning practices relate to student perception of their effectiveness in large introductory biology classes?
Jen Schweitzer (http://jenschweitzer.com)
- Under what varied circumstances do soils and soil microbial communities determine plant traits and act as selective agents?
- What is the role of plant-pollinator interactions on soil processes?
Kimberly Sheldon (http://www.biogeographyresearch.org/)
- What are the processes generating spatial patterns of biodiversity? What are the roles of biotic and abiotic factors in determining species’ range limits?
- How do population-level variation in physiology and climatic variation affect predictions of the impacts of climate change?
Dan Simberloff (http://eeb.bio.utk.edu/peopletwo/daniel-simberloff)
- What are the direct and indirect effects of particular plant invasions? A direct effect might be shading, for example, or allelopathy, while an indirect effect might be changing the nutrient cycle (e.g., for instance, by being a nitrogen fixer) or the fire regime.
- What are the non-target impacts of particular insects introduced for biological control?
Randy Small (http://web.utk.edu/~rsmall)
- What is the role of polyploidy in governing the success (in terms of species richness) of plant lineages? Why are some polyploid lineages highly diverse, while others are not?
- What can contemporary patterns of genetic variation within and among populations tell us about species boundaries and the process of speciation?
Joe Williams (http://eeb.bio.utk.edu/peopletwo/joseph-williams-jr)
- What are the causes/consequences of diversification of reproductive traits in plants?
- How does a particular reproductive trait, or set of traits, in a clade of plants develop and how does it contribute to diversification of the clade?
Below is the list, in the chronological order that I plan to introduce them, of “foundational questions in ecology and evolution”:
- Why does life exist at all?
- What makes life different from non-life?
- Why do some individuals die and some live?
- How to do living things survive?
- How random is nature?
- Why don’t we live forever?
- Why do life forms look the way they do?
- Why are there diverse organisms?
- How do we partition diversity?
- What drives the patterns of diversity that we see across the earth?
- What determines the population size of different kinds of organisms?
- Why are some places more biodiverse than others?
- What are the various ways in which organisms interact with each other?
- Is there a difference between interactions between members of the same species versus different?
- Why is nature often a very nasty place?
- Why do organisms cooperate with each other?
- Why are there more plants than animals?
- What actually keeps ecosystems going? How do ecosystems work?
- How old is the earth?
- How do new species come into being?
- Are some species more closely related to each other?
- Why did some species go extinct?
- Why is there sexual reproduction?
- Why are there male and female organisms? Why aren’t there more types?
- Why are traits heritable?
- What are genes and how do they work in conjunction with the environment?
- Where do new traits come from?
- How are species often so well adapted to their environments?
- Why do organisms display behaviors? Different behaviors?
- Why do species change over time?
- Do different species affect each other’s evolution?
- What evolves?
- Are humans subject to evolutionary change in the same way as other organisms?
I recognize that if one were to write a list of contemporary “big questions” in ecology and evolution, there would be a lot of additional questions to add to this list. But my goal is not to capture the big questions of now: I want to create a comprehensive list of the questions that led to the formation of these scientific fields.
Feel free to comment on these “foundational questions”:
- Are these questions well-phrased and clear?
- Is this a complete list? Are there any critical questions missing?
- Do any of these questions seem superfluous?
- Is the order of the list logical?
Below is a list of some of the sources that I used to come up with these questions:
envrionment360 “On His Bicentennial, Mr. Darwin’s Questions Endure”
This page has some great commentary on Darwin’s tendency to ask questions about specific observations he made, questions that fall into some of the broad categories of my “foundational questions”. I also really like the “inherent tendency to vary” quote as it relates to the question of the diversity we observe in nature: being a keen observer of this diversity will be a key characteristic of WmD.
This page has some great commentary on Darwin’s tendency to ask questions about specific observations he made, questions that fall into some of the broad categories of my “foundational questions”. I also really like the “inherent tendency to vary” quote as it relates to the question of the diversity we observe in nature: being a keen observer of this diversity will be a key characteristic of WmD.
Ernst Mayr’s The Growth of Biological ThoughtMayr asserts that Darwin’s central questions were “Can species change, and can one species be transmuted into another?”.
Macroevolution.net “Alfred Russel Wallace”
This brief biography of Wallace discusses his “why do some die and some live?” question that was inspired in part by his malarial delirium.
This brief biography of Wallace discusses his “why do some die and some live?” question that was inspired in part by his malarial delirium.
UCLA Newsroom “Stepping out of Darwin’s shadow”
This page discusses some of Wallace’s important questions that relate to biogeography: why organisms exist in particular locations, and why species vary in abundance in different locations.
This page discusses some of Wallace’s important questions that relate to biogeography: why organisms exist in particular locations, and why species vary in abundance in different locations.
Natural History Museum London “Darwin’s questions on caterpillar colouring”
I like this page just because it highlights that Darwin was not above asking Wallace a question related to the evolution of organisms (in this case butterflies).
I like this page just because it highlights that Darwin was not above asking Wallace a question related to the evolution of organisms (in this case butterflies).
Thomas N. Sherratt and David M. Wilkinson Big Questions in Ecology and EvolutionThis book contains a bunch of nicely-phrased questions that inspired some of my questions above, including the “why the world is green” question of Hairston, Smith, and Slobodkin and questions such as why species exist and why the tropics are more diverse. In particular it looks at the question of chaos, which inspired my question on randomness.
Journal of Ecology “Identification of 100 fundamental ecological questions”
Although these questions are by-and-large a lot more specific — and wonky! — than mine, it was important to see to what degree my questions encompassed these. A lot of these are about human impacts, an area that I will not approach until later in the WmD Project.
Although these questions are by-and-large a lot more specific — and wonky! — than mine, it was important to see to what degree my questions encompassed these. A lot of these are about human impacts, an area that I will not approach until later in the WmD Project.
- Problem 1: No Viable Mechanism to Generate a Primordial Soup
- Problem 2: Unguided Chemical Processes Cannot Explain the Origin of the Genetic Code
- Problem 3: Random Mutations Cannot Generate the Genetic Information Required for Irreducibly Complex Structures
- Problem 4: Natural Selection Struggles to Fix Advantageous Traits into Populations
- Problem 5: Abrupt Appearance of Species in the Fossil Record Does Not Support Darwinian Evolution
- Problem 6: Molecular Biology has Failed to Yield a Grand "Tree of Life"
- Problem 7: Convergent Evolution Challenges Darwinism and Destroys the Logic Behind Common Ancestry
- Problem 8: Differences between Vertebrate Embryos Contradict the Predictions of Common Ancestry
- Problem 9: Neo-Darwinism Struggles to Explain the Biogeographical Distribution of many Species
- Problem 10: Neo-Darwinism has a Long History of Inaccurate Darwinian Predictions about Vestigial Organs and "Junk DNA"
- Bonus Problem: Humans Display Many Behavioral and Cognitive Abilities that Offer No Apparent Survival Advantage
GRAND CHALLENGE 1: BIOGEOCHEMICAL CYCLES
GRAND CHALLENGE 2: BIOLOGICAL DIVERSITY AND ECOSYSTEM FUNCTIONING
GRAND CHALLENGE 3: CLIMATE VARIABILITY
GRAND CHALLENGE 4: HYDROLOGIC FORECASTING
GRAND CHALLENGE 5: INFECTIOUS DISEASE AND THE ENVIRONMENT
Sunday, December 11, 2016
UTC, car rental
http://treasurer.tennessee.edu/travel/Web%20announcement.htm
The business rates may not be used for personal travel. For personal travel you must use corporate code XZ56TNP.
Friday, December 9, 2016
github website hosting
Github static website
http://blog.revolunet.com/blog/2015/07/15/beautiful-static-website-in-minutes-with-github/
Does javascript counts as dynamic website or static website?
Wednesday, December 7, 2016
Computational Geometry: Line Segment Properties ( Two lines Clockwise or Counterclockwise)
Computational Geometry: Line Segment Properties ( Two lines Clockwise or Counterclockwise)
https://www.youtube.com/watch?v=3YFUQDRL1s4
https://www.youtube.com/watch?v=3YFUQDRL1s4
simulation of photonic crystals and metamaterials
X Zhang, simcenter thesis defense
https://en.wikipedia.org/wiki/Photonic_crystal
photonic crystals, control propagation of lights,
photonic crystals: bandgap properties, in-plane wave propagation
related commercial software
MPB:MIT photonic bands
HFSS: high frequency structural simulator
CST MWS(CST microwave studio)
Petrov-Galerkin methods for electromagnetic simulations
maxwell's equation, 2D version: TE mode and TM mode
simulation at 500 THZ
Adjoint variables
Bezier curves -> optimal band and optimization
https://en.wikipedia.org/wiki/Photonic_crystal
photonic crystals, control propagation of lights,
photonic crystals: bandgap properties, in-plane wave propagation
related commercial software
MPB:MIT photonic bands
HFSS: high frequency structural simulator
CST MWS(CST microwave studio)
Petrov-Galerkin methods for electromagnetic simulations
maxwell's equation, 2D version: TE mode and TM mode
simulation at 500 THZ
Adjoint variables
Bezier curves -> optimal band and optimization
introduction to systems biology, student training
why systems biology
Uri Alon's sysems biology courses:
cellular aging studies in Qin's lab
MIT quantitative biology course
EdX systems biology
Coursera.org on "systems biology"
van Emde boas tree
keys are unique integers drawn from the set {0, 1, 2, 3, ..., u-1}, where u = 2^(2k).
Tuesday, December 6, 2016
time lapsed image analysis for RLS inference
image data: images produced by HYAA
Dang lab uses FIJI http://fiji.sc/http://fiji.sc/
In Dang lab, a person plays video of time lapsed images in IJ at a speed of 5 frames per second. Typically, each cell is counted twice.
Dang lab uses FIJI http://fiji.sc/http://fiji.sc/
In Dang lab, a person plays video of time lapsed images in IJ at a speed of 5 frames per second. Typically, each cell is counted twice.
Monday, December 5, 2016
structural controllability
two scenario of structural uncontrollable structure
inaccessibility
dilation
cactus: minimal structure that contains neither inaccessbile or dilations.
cacti:
Evolutionary theory and control theory.
inaccessibility
dilation
cactus: minimal structure that contains neither inaccessbile or dilations.
cacti:
Evolutionary theory and control theory.
Friday, December 2, 2016
Laplace transformation of graphic function
https://www.youtube.com/watch?v=f1mZArY0lLE
https://www.youtube.com/watch?v=ZGPtPkTft8g
UTC grade submission guideline
Final grading is open for the full-term and grades can be entered and changed until 9:00 a.m. on Monday, December 19, 2016.
To enter grades go to our main webpage https://www.utc.edu/ and click on the MyMocsNet link in the upper right hand corner, enter your UTCID and Password and hit enter, then click on Login to My MocsNet. Click on the SSB (Self Service Banner) link located in the bottom left area of the Home or Faculty tab, click on Faculty Services, choose XE Midterm & Final Grades, and choose the Final Grades tab.
There are grading guidelines to the right of the grading page along with a link to the training. If you need assistance you can email me or call me at 425-5780.
Link to Academic Calendars and Exam Schedules:
Thursday, December 1, 2016
Wednesday, November 30, 2016
linear and nonlinear ODEs
In essence, linear ODEs can be represented by dx_i/dt = matrix * X
nonlinear ODEs
http://eqworld.ipmnet.ru/en/solutions/ode/ode-toc3.htm
https://en.wikipedia.org/wiki/Linear_differential_equation
nonlinear ODEs
http://eqworld.ipmnet.ru/en/solutions/ode/ode-toc3.htm
https://en.wikipedia.org/wiki/Linear_differential_equation
Tuesday, November 29, 2016
*** Control systems engineering, control theory, Laplace transform, observability,
A control system has an input, a process, and an output. It can be open loop or closed loop. Open loop systems do not monitor or correct the output. Closed loop systems can monitor output and make adjustments.
linear time-invariant differential equation
Transfer function is another way of mathematically modeling a system. Transfer function can be derived from the linear, time-invariant differential equation using Laplace transform. Transfer function can only be used for linear systems. (Lapalace transformation was developed as a technique to solve differential equations).
State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.
Test signals with different waveforms can be used to study systems.
The basic analysis of a system is to evaluate the time response of a system.
A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.
In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.
Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables
Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?
Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable.
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t).
linear time-invariant differential equation
State-space representation is another model for systems and is suitable for non-linear systems.
Essentially, state-space model change nth-order differential equation into n simultaneous first-order equations. It seems to me that the state-space model is the mostly used ODE modeling methods in systems biology.
Test signals with different waveforms can be used to study systems.
The basic analysis of a system is to evaluate the time response of a system.
A sensitivity analysis can yield the percentage of change in a specification as a function of a change in a system parameter.
In biology, many ODEs has nonlinear terms with product of variables. So, transfer function cannot be applied, but state-space method can be used.
Controllability and Observability are well understood in continuous time-invariant linear state-space model, see https://en.wikipedia.org/wiki/State-space_representation#State_variables
Stability: a system is stable if every bounded input yields a bounded output. So, does aging changes a stable gene network into an unstable network?
Observability: If the initial state vector x(t0) can be found from input u(t) and output y(t) over a finite interval of time from t0, the system is observable; otherwise it is unobservable.
Observability is the ability to deduce state variables from knowledge of input u(t) and output y(t).
genome compression
https://en.wikipedia.org/wiki/Compression_of_Genomic_Re-Sequencing_Data
Number theory, data compression for NGS data
Can RSA or other methods be used for NGS sequence compression?
lab meeting
1a) DE gene lists for RNAseq project
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?
It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is
So, "Gene ID" is a standard NCBI number.
1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).
2) time-lapsed image analysis for yeast replicative lifespan
We can use ImageJ, MATlab or R.
TODO: there are various time points between control and treatment. Should we use the consensus DEG list?
It seems that "GeneID" in BGI report are from NCBI. Example of 57573 is
and
So, "Gene ID" is a standard NCBI number.
1b) Pathway analysis plan for DE gene lists
TODO: There are different sources of human gene/protein networks. We should try several for comparisons.
TODO: We should try different clustering method, such as hlcust, mcl, etc (refer to Qin's previous paper for clustering analysis).
2) time-lapsed image analysis for yeast replicative lifespan
We can use ImageJ, MATlab or R.
Monday, November 28, 2016
Sunday, November 27, 2016
Saturday, November 26, 2016
simcenter qinlab tools
"module load qinlab" can add these to $PATH
hqin@ridgeside[~/demo.lgf/
RNAseq.hisat2]->ls /usr/local/qinlab/
bin samtools-1.3.1.tar.bz2
hisat2 share
hisat2-2.0.5 stringtie
hisat2-2.0.5-Linux_x86_64.zip stringtie-1.3.1c.Linux_x86_64
samtools-1.3.1 stringtie-1.3.1c.Linux_x86_64. tar.gz
hqin@ridgeside[~/demo.lgf/
bin samtools-1.3.1.tar.bz2
hisat2 share
hisat2-2.0.5 stringtie
hisat2-2.0.5-Linux_x86_64.zip stringtie-1.3.1c.Linux_x86_64
samtools-1.3.1 stringtie-1.3.1c.Linux_x86_64.
Monday, November 21, 2016
SimCenter mailing address
University of Tennessee at Chattanooga
701 E. 701 ML King Blvd
Chattanooga, TN 37403
RNAseq software installation on qbert or Simcenter clusters
====================For hisat2 and supporting programs
Install hisat2
ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip
Install stringtie 1.3.1c
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.1c.Linux_x86_64.tar.gz
Install samtools
https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2
The above link is from http://www.htslib.org/download/
See also https://github.com/samtools/samtools/releases/
====================For R packages
Under shell, run R
Inside of R:
source("https://bioconductor.org/biocLite.R")
biocLite('ballgown')
install.packages('devtools') #A USA mirror site may be chosen
library(devtools)
devtools::install_github('alyssafrazee/RSkittleBrewer')
========== Testing the installation
Download the test files and codes from
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/
under shell
$ ./rnaseq_pipeline.config.sh
$./rnaseq_pipeline.sh out
=========Additional R packages
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
Install hisat2
ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip
Install stringtie 1.3.1c
http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.1c.Linux_x86_64.tar.gz
https://github.com/samtools/samtools/releases/download/1.3.1/samtools-1.3.1.tar.bz2
The above link is from http://www.htslib.org/download/
See also https://github.com/samtools/samtools/releases/
====================For R packages
Under shell, run R
Inside of R:
source("https://bioconductor.org/biocLite.R")
biocLite('ballgown')
install.packages('devtools') #A USA mirror site may be chosen
library(devtools)
devtools::install_github('alyssafrazee/RSkittleBrewer')
========== Testing the installation
Download the test files and codes from
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/
under shell
$ ./rnaseq_pipeline.config.sh
$./rnaseq_pipeline.sh out
=========Additional R packages
#Please also run the following code to install all packages in R. This may take 10-12 hours.
install.packages(new.packages())
#A prompt will ask for a mirror site. Any site from USA should work.
Friday, November 18, 2016
bibtex doi bug
in qin_network.bib, I added a reference with DOI field. This filed generates an error in *bbl file using $bibtex$. I removed the DOI fileds and the bug disappeared.
Wednesday, November 16, 2016
toread, Graph Metrics for Temporal Networks - Springer
http://www.springer.com/cda/content/document/cda_downloaddocument/9783642364600-c1.pdf?SGWID=0-0-45-1393604-p174915729
toread Path Problems in Temporal Graphs
http://www.vldb.org/pvldb/vol7/p721-wu.pdf
Path Problems in Temporal Graphs
Huanhuan Wu∗, James Cheng∗ , Silu Huang∗, Yiping Ke#, Yi Lu∗, Yanyan Xu∗ ∗Department of Computer Science and Engineering, The Chinese University of Hong Kong {hhwu,jcheng,slhuang,ylu,yyxu}@cse.cuhk.edu.hk #Institute of High Performance Computing, Singapore
safety training, UTC
hazardous materials
gasoline can be easily ignited, but diesel is not.
universal waste:
florescent lamp should be recycled.
computer batteries.
motor batteries
Dot hazard marking
Global harmonization container markings
NFPA rating explanation guide, NFPA 704, HMIS
423 425 HELP
gasoline can be easily ignited, but diesel is not.
universal waste:
florescent lamp should be recycled.
computer batteries.
motor batteries
Dot hazard marking
Global harmonization container markings
NFPA rating explanation guide, NFPA 704, HMIS
423 425 HELP
Tuesday, November 15, 2016
integrating gene expression and network, a reference collection
Convert p-value of differential expression into Z-scores based using inverse Gaussian CDF.
Maybe because Ideker02 is looking for 'active subnetwork', only positive Z-score were used. No, both positive and negative Z-score were calculated.
Ideker02 seems to combine K-means and simulated annealing for network clustering.
Ideker02 seems to combine K-means and simulated annealing for network clustering.
Tornow,S. and Mewes,H.W. (2003) Functional modules by relating protein interaction networks and gene expression. Nucleic Acids Res., 31, 6283–6289.
Segal,E. et al. (2003) Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, 264–272.
Morrison,J.L. et al. (2005) GeneRank: using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6, 233.
Ma, X., Lee, H., Wang, L., Sun, F.: ‘CGI: a new approach for prioritizing genes by combining gene expression and protein–protein interaction data’, Bioinformatics, 2007, 23, pp. 215–221
Integrating gene expression and protein-protein interaction network to prioritize cancer-associated
genes, Chao Wu, Jun Zhu and Xuegong Zhang
http://www.biomedcentral.com/1471-2105/13/182
http://scholar.google.com/scholar?cites=14200881095439672925&as_sdt=5,43&sciodt=0,43&hl=en
Li et al. BMC Medical Genomics 2014, 7(Suppl 2):S4 Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation
http://www.biomedcentral.com/1755-8794/7/S2/S4
http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html
http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html
http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html
From Ma, 2007 Bioinformatics CGI paper:
WGCNA: an R package for weighted correlation network analysis.
http://hongqinlab.blogspot.com/2014/12/li14-bmc-medical-genomics-predict.html
http://hongqinlab.blogspot.com/2014/12/integrating-gene-expression-data-into.html
http://hongqinlab.blogspot.com/2014/03/build-human-gene-network-reliability.html
From Ma, 2007 Bioinformatics CGI paper:
Gene expression data and protein interaction data have been
integrated for gene function prediction. For example, Ideker
et al. (2002) used protein interaction data and gene expression
data to screen for differentially expressed subnetworks between
different conditions. In Tornow and Mewes (2003) and Segal
et al. (2003), gene expression data and protein interactions are
used to group genes into functional modules. These methods provide
insights into the regulatory modules of the whole networks at
the systems biology level. However, it is not clear how to adapt their
methods to identify genes contributing to the phenotype of interest.
Morrison et al. (2005) adapted the Google search engine to prioritize
genes for a phenotype by integrating gene expression profiles
and protein interaction data. However, the algorithm ignores the
information from proteins linked to the target protein through other
intermediate proteins, referred to in the rest of this paper as indirect
neighbors.
Qin: Did the previous methods use human pathogenic genes? Seems not if they did not cite dbSNP or OMIM.
X. Zhou, M.-C. J. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A, 99(20):12783–12788, Oct 2002
WGCNA: an R package for weighted correlation network analysis.
Subscribe to:
Posts (Atom)