Reference: Lynch and Walsh, Genetics and quantitative traits
My understanding is that each pair of identical twins is treated as a clone or line in model organisms.
This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Thursday, December 31, 2015
git undo commit (git reset --soft HEAD~)
Reference:
http://stackoverflow.com/questions/927358/how-do-you-undo-the-last-commit
git reset --soft HEAD~
Wednesday, December 30, 2015
parallel R using foreach and doMC
Using foreach and doMC for parallel R computing. The foreach run has to be assigned to an variable.
See:
https://github.com/hongqin/demo/blob/master/R-parallel/foreach1.R
https://github.com/hongqin/mactower-network-failure-simulation/blob/master/1.grid.network.2015Oct/netwk_aging_sim.v0.1b.R
registerDoMC(cores=inputCores)
while ((j <= popSize) && ( count < popSize*5)) {
myParrallStep = inputCores * 5
bufferAges = foreach(i=1:myParrallStep) %dopar% {
currentNetworkAge = single_network_failure_v2(lambda1, lambda2, degreeThreshold, p, pairs, essenLookupTb)
}
goodAges = bufferAges [bufferAges>0] #goodAges() return NULL?
count = count + myParrallStep
if( length(goodAges)> 0 ){
currentEnd = j + length(goodAges)-1
popAges[j:currentEnd] = goodAges
j = currentEnd + 1
} else {
print(paste("newk_aging_sim:: goodAges", goodAges))
}
if(debug>2) {
print(paste("netwk_aging_sim::bufferAges=",bufferAges))
}
if(debug) {
print(paste("netwk_aging_sim::bufferAges=",bufferAges))
print(paste("netwk_aging_sim::count=",count, " j=", j))
}
}# end of j while-loop, population loop
See:
https://github.com/hongqin/demo/blob/master/R-parallel/foreach1.R
https://github.com/hongqin/mactower-network-failure-simulation/blob/master/1.grid.network.2015Oct/netwk_aging_sim.v0.1b.R
registerDoMC(cores=inputCores)
while ((j <= popSize) && ( count < popSize*5)) {
myParrallStep = inputCores * 5
bufferAges = foreach(i=1:myParrallStep) %dopar% {
currentNetworkAge = single_network_failure_v2(lambda1, lambda2, degreeThreshold, p, pairs, essenLookupTb)
}
goodAges = bufferAges [bufferAges>0] #goodAges() return NULL?
count = count + myParrallStep
if( length(goodAges)> 0 ){
currentEnd = j + length(goodAges)-1
popAges[j:currentEnd] = goodAges
j = currentEnd + 1
} else {
print(paste("newk_aging_sim:: goodAges", goodAges))
}
if(debug>2) {
print(paste("netwk_aging_sim::bufferAges=",bufferAges))
}
if(debug) {
print(paste("netwk_aging_sim::bufferAges=",bufferAges))
print(paste("netwk_aging_sim::count=",count, " j=", j))
}
}# end of j while-loop, population loop
git retrieve a single file
http://stackoverflow.com/questions/610208/how-to-retrieve-a-single-file-from-specific-revision-in-git
git checkout 08618129e66127921fbfcbc205a06153c92622fe -- [full/path]
To clarify with an example:
git checkout mybranchname ~/src/myapp/myfile.txt
In my case, I need to roll back netwk_aging_sim.v0.1.R.
$ git checkout 4bb272d6a304168fc6711479a8b0b34c55e47182 netwk_aging_sim.v0.1.R
It worked.
The longnumber (branchname?) was found here:
*** CRISPR/Cas9 in crevisiae
multigene knockout in yeast
[Mans15] Mans 2015 FEMS Yeast, CRISPR/Cas9: a molecular Swiss army knife for simultaneous introduction of multiple genetic modifications in Saccharomyces cerevisiae
Mans15 designed guide RNA (gRNA), a chimeric crRNA-tracrRNA, and provide a webtool, Yeastriction webtool, http://yeastriction.tnw.tudelft.nl/.
stack (MongoDB, Express, AngularJS and Node.js). The source code is available at https://github.com/hillstub/Yeastriction. Genome and ORF sequences were downloaded from SGD
(http://www.yeastgenome.org) in GFF and FASTA file format, respectively. ORFs, including their 1 kb up- and downstream sequences, were extracted and imported into Yeastriction, with the aid of an in-house script
CRISPR/cas 9 systems, Cong and Zheng 2015, Chapter 10
There are 3 known CRISPR systems, types I, II, and III. Type II CRISPR systems usually only require a single protein, Cas9, to perform the target cleavage.
Cas9 is a RNA-guided nuclease that is capable of binding to a target DNA and introduce a double strand break in a sequence-specific manner.
The guide sequence within sgRNA has a length of 20 bp and is the exact complementary sequence of the target site within the genome, sometimes referred to as “protospacer” following the
convention of microbial CRISPR field.
In choosing the target site, it is important to note the requirement of having the “NGG” trinucleotide motif, called protospacer adjacent motif (PAM), right next to the protospacer target on the 3′ end.
The PAM sequence depends on the Cas9 protein employed in the CRISPR-Cas9 system, and hence, the “NGG” PAM sequence applies specifi cally to the Streptococcus pyogenes Cas9 (SpCas9).
Monday, December 28, 2015
RNAseq end calling
change point analysis
yeast ngs data resource, Waern and Snyder 2013 G3.
Extensive Transcript Diversity and Novel Upstream
Open Reading Frame Regulation in Yeast
Karl Waern*,† and Michael Snyder*,†,1
*Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, and †Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520
http://downloads.yeastgenome.org/published_datasets/Waern_2013_PMID_23390610/fastq/
http://www.g3journal.org/content/3/2/343.full.pdf+html
Some discussion on problems during using these data
http://seqanswers.com/forums/archive/index.php/t-45228.html
"Data is single-end. I just checked the protocol accompanying the data- it confirms that the reads are indeed strand-specific."
Karl Waern*,† and Michael Snyder*,†,1
*Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, and †Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, Connecticut 06520
http://downloads.yeastgenome.org/published_datasets/Waern_2013_PMID_23390610/fastq/
http://www.g3journal.org/content/3/2/343.full.pdf+html
Some discussion on problems during using these data
http://seqanswers.com/forums/archive/index.php/t-45228.html
"Data is single-end. I just checked the protocol accompanying the data- it confirms that the reads are indeed strand-specific."
Github http.postBuffer out of range problem
http://stackoverflow.com/questions/6842687/the-remote-end-hung-up-unexpectedly-while-git-cloning
git config --global http.postBuffer 1048576000
Sunday, December 27, 2015
toread,
Mitochondrial and Cytoplasmic ROS Have Opposing Effects on Lifespan
-
Claire E. Schaar
¶,
-
Dylan J. Dues
¶,
-
Katie K. Spielbauer
¶,
-
Emily Machiela,
-
Jason F. Cooper,
-
Megan Senchuk,
-
Siegfried Hekimi,
- Jeremy M. Van Raamsdonk mail
http://www.plosgenetics.org/article/Authors/info:doi/10.1371/journal.pgen.1004972
Saturday, December 26, 2015
plot viability and mortality curves for the simulated grid networks 'net1'
Byte-4:1.grid.network.2015Oct hqin$ pwd
/Users/hqin/github/mactower-network-failure-simulation/1.grid.network.2015Oct
update lifespan.r to lifespan.20140711.R
modify analyze_netwk_aging.v0.0.R to do batch plot mortality and survival curves
change this to analyze_netwk_aging.v0.1.R
Plots show 100 cells are not enough. So, I modified 'net1.sh' with -n 1000 and re-run the shell script.
8:47pm ->
Plots show 100 cells are not enough. So, I modified 'net1.sh' with -n 1000 and re-run the shell script.
8:47pm ->
Wednesday, December 23, 2015
Tuesday, December 22, 2015
COSMIC Catalogue of somatic mutations in cancer
It seems that tumor is a positive sample. Tumor seem to have negative controls. Comparison of the positive and negative samples should lead to mutations in tumors.
Monday, December 21, 2015
Sunday, December 20, 2015
Cancer genomics resource
McFarland, Sunyaev, Mirny PNAS, 2013 impact of deleterious passernger mutations on cancer progression.
Catalogue of Somatic Mutations in Cancer (COSMIC) and The Cancer Genome Atlas (TCGA). We classified them as driver and passenger mutation groups and then characterized their effects
using PolyPhen, a tool widely used in population and medical genetics to predict the damaging effect of missense mutations (15).
Ref 15: Boyko AR, et al. (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4(5):e1000083.
Saturday, December 19, 2015
Friday, December 18, 2015
toread, fruit fly gene network on cell shape
https://medium.com/lifes-building-blocks/building-gene-networks-67069be4fcb0#.1gutmoyw0
Wednesday, December 16, 2015
R code to find the range of lambda.
After summarized network aging simulation runs, I realized I need an automated way to find the range of parameters, especially for lambda.
degree threshold=0 for grid networks or edges with uniform lambda
#############
# To use lambda1 for all edges, choose threshold = 0
single_network_failure_v2 = function(lambda1, lambda2=lambda1/10, threshold=5, p, pairs, essenLookupTb ) {
It took sometime before I realize this tricky choice.
# To use lambda1 for all edges, choose threshold = 0
single_network_failure_v2 = function(lambda1, lambda2=lambda1/10, threshold=5, p, pairs, essenLookupTb ) {
It took sometime before I realize this tricky choice.
bug, dataframe and vector, R 3.2.3
After update R to 3.2.3,
I have a bug due to implicit conversion from single column dataframe and vector, but strict requirement in R3.2.3.
essenLookupTb = read.csv(inLookupTbFile, row.names=1);
essenLookupTb = as.vector(essenLookupTb[,1]); #20151215 after update to R3.2.3
I have a bug due to implicit conversion from single column dataframe and vector, but strict requirement in R3.2.3.
essenLookupTb = read.csv(inLookupTbFile, row.names=1);
essenLookupTb = as.vector(essenLookupTb[,1]); #20151215 after update to R3.2.3
bug: R passing variable to function
When I used a variable name in the passing interface that is the same without inside variable, it seems to cause confusion by R, and claimed that the variable is undefined. I rename the variables, and the bug went away.
analyze_netwk_aging.v0.0.R.
File: analyze_netwk_aging.v0.0.R
$ Rscript analyze_netwk_aging.v0.0.R -id net1 -ip net1 -o __test.csv
When I moved the function into 'network.r', a bug was discovered in passing of the variables. I renamed the variables inside and outside of the function scope, and fixed the bug.
$ Rscript analyze_netwk_aging.v0.0.R -id net1 -ip net1 -o __test.csv
Passed.
When I moved the function into 'network.r', a bug was discovered in passing of the variables. I renamed the variables inside and outside of the function scope, and fixed the bug.
##########################################
# 2015Dec 16.
# Plan, loopover netwk-aging files in a directory and generate a report file
# Rscript analyze_netwk_aging.v0.0.R -id net1 -ip net1 -o __test.csv
rm(list=ls())
library(GetoptLong)
summarize_mean_from_files = function(myfiles){
outtb = data.frame(myfiles)
outtb$mean = NA;
for( i in 1:length(myfiles)) {
currentFile = paste( inputdir, '/', myfiles[i], sep='');
tb = read.csv(currentFile)
outtb$mean[i] = mean(tb[,1])
}
outtb;
}
inputdir = getwd();
inputprefix = '';
outputFile = "_tmp_networkaging_summary.csv";
debug = 0;
GetoptLong(c(
"inputdir|id=s", "input dir of network aging data in csv format. default: getwd()",
"inputprefix|ip=s", "prefix of input netwk aging csv data files. default empty",
"outputFile|of=s", "output csv file name",
"debug|d=i", "debug. default 0"
))
# inputdir = 'net1'; inputprefix = 'net1'; i = 1
myfiles = list.files(inputdir)
myfiles = myfiles[ grep(inputprefix, myfiles) ]
out = summarize_mean_from_files( myfiles )
#timestamp = format(Sys.time(), "%Y%b%d_%H%M%S")
write.csv( out, outputFile, row.names=F)
R GetoptLong, update to R 3.2.3
The R GetoptLong library was borrowed from PERL GetoptLong.
https://cran.r-project.org/web/packages/GetoptLong/vignettes/GetoptLong.pdf
It requires R 3.2.3. So, I updated R to 3.2.3.
#example from GetoptLong
# Rscript test-getoptlong.R --number 4 --verbose T
# Rscript test-getoptlong.R -n 4 --verbose T
library(GetoptLong)
cutoff = 0.05
GetoptLong(c(
"number|n=i", "Number of items, integer, mandatory option",
"cutoff|c=f", "cutoff to filter results, optional, default (0.05)",
"verbose|v", "print messages"
))
print(c("input:", 'number'=number, 'cutoff'=cutoff, 'verbose'=verbose))
https://cran.r-project.org/web/packages/GetoptLong/vignettes/GetoptLong.pdf
It requires R 3.2.3. So, I updated R to 3.2.3.
#example from GetoptLong
# Rscript test-getoptlong.R --number 4 --verbose T
# Rscript test-getoptlong.R -n 4 --verbose T
library(GetoptLong)
cutoff = 0.05
GetoptLong(c(
"number|n=i", "Number of items, integer, mandatory option",
"cutoff|c=f", "cutoff to filter results, optional, default (0.05)",
"verbose|v", "print messages"
))
print(c("input:", 'number'=number, 'cutoff'=cutoff, 'verbose'=verbose))
Tuesday, December 15, 2015
Generic network aging simulation code
First no-bug run:
R -f _todo.20151213-netsim-generic.R --args net1/Degree4N1000_network.csv net1/net1/Degree4N1000_EssenLookupTb.csv 0.002 0.0002 4 0.9 5 net1
change name to netwk_aging_sim.v0.1.R
Monday, December 14, 2015
Essential gene flag: zero is false, everything else is true
essenTb$essenflagNumeric = ifelse( essenTb$essenflag=='essential', 1, 0) #zero is false, everything else is true
See: 20151012-factorize_networks.R
todo student project, gillespie simulation of yeast replicative aging
mother cells
dividing
mitotic arrest
G0, G1, phase etc basec on dissection data
Q: how to estimate parameters using Gillspecie algorithm?
dividing
mitotic arrest
G0, G1, phase etc basec on dissection data
Q: how to estimate parameters using Gillspecie algorithm?
Sunday, December 13, 2015
Gillespie, predator-prey simulation
GillespieSSA test. Predator-Prey
H Qin
December 13, 2015
rm(list=ls())
library(GillespieSSA)
parms = c(b=2, d=1, K=1000, alpha=0.007, w=0.0035, c=2,g=2)
x0 = c(N=1000, P=100)
nu = matrix(c(+1,-1,-1, 0,0,
0, 0, 0,+1,-1), nrow=2, byrow=TRUE)
a = c("b*N",
"(d+(b-d)*N/K)*N",
"alpha/(1+w*N)*N*P",
"c*alpha/(1+w*N)*N*P",
"g*P")
tf = 100
method = "D"
simName = "Predator-prey model"
parms
## b d K alpha w c g
## 2.0e+00 1.0e+00 1.0e+03 7.0e-03 3.5e-03 2.0e+00 2.0e+00
nu
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 -1 -1 0 0
## [2,] 0 0 0 1 -1
a
## [1] "b*N" "(d+(b-d)*N/K)*N" "alpha/(1+w*N)*N*P"
## [4] "c*alpha/(1+w*N)*N*P" "g*P"
out = ssa(x0, a, nu, parms, tf, method, simName,
verbose = TRUE, consoleInterval = 10, maxWallTime = 30)
## Running D method with console output every 10 time step
## Start wall time: 2015-12-13 20:44:52...
## t=0 : 1000,100
## (4.082s) t=10 : 639,313
## (10.931s) t=20.00066 : 123,499
## (18.091s) t=30.00005 : 112,193
## (26.468s) t=40.00018 : 271,95
## t=43.15012 : 374,361
## tf: 43.15012
## TerminationStatus: maxWallTime
## Duration: 30.019 seconds
## Method: D
## Nr of steps: 96521
## Mean step size: 0.0004470542+/-0.0006694862
## End wall time: 2015-12-13 20:45:22
## --------------------
summary(out$data)
## V1 N P
## Min. : 0.000 Min. : 46.0 Min. : 9
## 1st Qu.: 9.575 1st Qu.: 294.0 1st Qu.: 92
## Median :19.063 Median : 496.0 Median :206
## Mean :20.357 Mean : 487.3 Mean :237
## 3rd Qu.:31.002 3rd Qu.: 677.0 3rd Qu.:361
## Max. :43.150 Max. :1014.0 Max. :622
plot( out$data[,2] ~ out$data[,1], ylab = "N")
plot( out$data[,3] ~ out$data[,1], ylab = "Predator")
Subscribe to:
Posts (Atom)