Thursday, July 31, 2014

ken rls

"pooled by" column?? file, genotype, mixed

set lifespan
ref lifespan

July 31, Thu, yeast14

 thyimidine kinase, lost in pre-cambrian age
TkMX absent
antifolate media


50 genes with > 3 mutations in long-term mutation lines

Frank Albert
McManus genomes research 2014, translation beffering between species, but not between strains

Meiotic rejuvenation hypothesis

Micro fluidic yeast aging paper


Wednesday, July 30, 2014

July 30, yeast 2014


=> Hippo pathway, -- Eric Weiss
 ancient pathway, morphogensis, cytokinesis

Ace2 is turned on once in a life time in yeast daughter cells.  It leads to separation of mother and daughter cells.

super computing molecular dynamics simulation


=> Swe1 morphology checkpoint, Sepp Kohlwein,
triglycerides is fat for yeast cells.

cerulenin inhibit de novo synthesis of fat, triglyceride

Swe1 kinase -> lipolysis mediated cell cycle progression.
Swe1 --| Cdc28 --(P)--> Swe1

=> Cdk28 (Cdk1), Andreas Doncic, Jan Skoheim. 
Single-cell fluorescence signal detection
Yeast mating pathway, 
Yeast has 'memory" of past experiences

Far1 feed-forward regulation combined current and past signals to provide informed decisions.

Inherited Far1 total protein level give yeast cells an intergenerational memory which can make an more informed cell cycle reentry decision. 

growth    --(CDK)-->cell cycle
                      ||       
MAPK(t) --Far1 --> mating arrest

It seems that growth before mating must be done in haploid strains. In wild, cells can switch mating types. The argument on 'memory' must a very short memory.


=> cdk regulated expression, CDK28/CDK1 targets
Cdk --> Fkh2, Ndd1  --> G2/M genes
Hcm1 --> Fkh2, Ndd1

HCM1 targets: TUB1, SPC34, NDD1, CIN8

=> RNP granules in quiescent cells, Paul Herman's lab
ribonucleoprotein granules: Processing bodies(PBS), stress granules (translation)

Decreased P-bodies shortens CLS.

=> Linden Breeden, Quiescence
stationary phased cells have two populations: quiescent and dividing ones.

cell wall fortification improved in quiescent cells
deletion library was screened, 131 genes -> cell wall defect, and several other categories

log phase: cln3/cdk --| Whi5 --| SBF/MBF  --> Cln1/Cln2
S phase: DNA damage

In Q-cells: how G1 arrest is maintained
Xbp1 --> Cnb3/Cdk

Q: do some genes change the population fractions of dividing and quietscent cells? 

=> Cdc14 phosphatase, a hub in NaCl responsive subnetwork.

=> metabolite regulation, Sean Hackett, Joshua Rabinowitz
LC-MS, 106 metabolite
Flux balance analysis (FBA)
Flux is carried in 270 reactions
Model comparison by fitting to measured flux. a p-value is calculated.
Limiting nutrient effects

=> Calcineurin (CN) substrates in yeast, Jagoree Roy
Calcineurin has a conserved docking site.
Screen method: motif search;  hyper-phosphorylation in calcineurin-null mutant

CN signaling network evolves rapidly in the yeast species.

=> George Church, future of yeast
1. 3D/4D structure of yeast cells
2. synthesis of genomes
3. Synthesis of cells.

3D RNA fluorescence in situ sequencing, FISSEQ, Lee, Daugharthy, Science 2014

Reasons for synthetic genomes: metabolic isolation, non-canonical AA, multi-virus resistance

Cause & effect of N-terminal codon bias in bacterial genes

Mirror DNA & RNA polymerase

Synthetic biology, a lab manual.


















Friday, July 25, 2014

Github sync problem

When a github repository is changed by two different computers, Github can have sync problem using the Gui version. In this case, command line version should be used.

$ Git pull origina mater

CNV papers to read

PubMed ID: 24244640; 
CNVannotator: a comprehensive annotation server for copy number variation in the human genome.

PMID:22729399 for PPI; 
Proteome-wide prediction of protein-protein interactions from high-throughput data.

PMID:24067414 for gene network, 
Gaussian graphical model for identifying significantly responsive regulatory networks from time course high-throughput data.

PMID:22360268 for network-based disease analysis; and 
Network-based analysis of complex diseases.

PMID:14735121 for network biology
Network biology: understanding the cell's functional organization.

SubmitR diagrid import file problem




IEEE BIBE2014

http://bibe2014.eng.fau.edu/

Thursday, July 24, 2014

EC 2.4.1.38



http://www.brenda-enzymes.org/php/result_flat.php4?ecno=2.4.1.38


Tuesday, July 15, 2014

Monday, July 14, 2014

Diagrid, SubmitR


https://diagrid.org/resources/submitr
https://diagrid.org/resources/submitr/supportingdocs

XSEDE Hub sign on and log in

https://portal.xsede.org/web/xup/single-sign-on-hub




parameter estimation

From Majoros WH 2008, advanced method for parameter estimation. Online supplement to method for computational gene prediction.



r_H is the Hasting ratio.

XSEDE tutorial

Mallet, languageE analysis

Gephi

Tableau, free for education
Tableau Teaching
Tableau Public

CyberGIS
cybergis.org
http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php

http://sandbox.cigi.illinois.edu/home/


1 service unit = 1 core hour.

ssh login.xsede.org  #login must be from the hub.

gsissh full-machine-name

such as gsish

An alias can be set up in ~/.ssh
Host name:
Host:?

$HOME permenant space on XSEDE
$SRATCH, temporary for running jobs

$ja, #time the job

qsub -i #for interactive

Mixture probability distributions

From http://en.wikipedia.org/wiki/Mixture_density

 the mixture distribution can be represented by writing either the density, f, or the distribution function, F, as a sum (which in both cases is a convex combination):
 F(x) = \sum_{i=1}^n \, w_i \, P_i(x),
 f(x) = \sum_{i=1}^n \, w_i \, p_i(x) .

Sunday, July 13, 2014

ms02 network aging simulation

Analysis using the first 15 simulated ms02 permuted networks. 




file : 20140713-compare-ms02,ginppi.R
#compare agine dynamcis between original and ms02 GINDIP.
# plot mortality rate ~time from simulated GINPPI aging.
# binomial aging model fitting

rm(list=ls())
setwd("~/github/mactower-network-failure-simulation/compareOrigMS02")
require(flexsurv)
getwd()
list.files()

source("lifespan.20140711.R")

ms02path = 'ms02.gindip.failures'
list.files(, path=ms02path)

ms02fullnames = NA;
for(i in 1:20){
  mypath = paste( ms02path, '/', i, '/popages', sep='' )
  myfiles = list.files(, path=mypath)
  myfullnames = paste( mypath, '/', myfiles, sep='')
  ms02fullnames = c(ms02fullnames, myfullnames)
}
ms02fullnames = ms02fullnames[-1]

origAgeFiles = list.files(path="ori_ginppit_ages")
origAgeFiles

mypattern = "p.0.8.lambda.0.01.popsiz"
ms02files= ms02fullnames[grep(mypattern, ms02fullnames)]
ms02output = data.frame(ms02files);
ms02output$mean = NA; ms02output$stddev = NA; ms02output$CV=NA;
ms02output$R=NA; ms02output$t0=NA; ms02output$n=NA;
ms02output$Rflex = NA; ms02output$Gflex = NA;

for( i in 1:length(ms02files)) {
  tb = read.csv(ms02files[i])
  summary(tb); names(tb) = c('rls'); tb$rls = round(tb$rls+0.5, digits=0)
  ms02output$mean[i]=mean(tb$rls)
  ms02output$stddev[i]= sd(tb$rls)
  ms02output$CV[i] = sd(tb$rls) / mean(tb$rls)
 
  fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')
  Rhat = fitGomp$res[2,1]; Ghat = fitGomp$res[1,1]
  ms02output$Rflex[i] = Rhat;
  ms02output$Gflex[i]=Ghat;
 
  #G = (n-1) / t0, so t0 = (n-1)/G
  nhat = 4; t0= (nhat-1)/Ghat; t0
  fitBinom = optim ( c(Rhat, t0, nhat),  llh.binomialMortality.single.run , lifespan=tb$rls,
                     method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );
 
  ms02output[i, c("R", "t0", "n")] = fitBinom$par
 
  tb2 = calculate.mortality.rate(tb$rls)
  plot( tb2$s ~ tb2$t, main=ms02files[i])
  plot( tb2$mortality.rate ~ tb2$t , main=ms02files[i])
  plot( tb2$mortality.rate ~ tb2$t, log='y', main= paste( 'linear for Gompertz', main=ms02files[i]) )
  plot( tb2$mortality.rate ~ tb2$t, log='yx', main=paste( 'linear for Weibull',main=ms02files[i]) )
}#i file loop

ms02output

oripath = "ori_ginppit_ages"
#origAgeFiles = list.files(path= oripath)
origAgeFiles
origFile = origAgeFiles[grep(mypattern, origAgeFiles)]
tb.ori = read.csv(paste( oripath, '/',origFile[2],sep='')  )
tb.ori$rls = round( tb.ori[,1]+0.5, digit=0)

oriGomp = flexsurvreg( formula=Surv(tb.ori$rls) ~ 1, dist='gompertz')
Rori = oriGomp$res[2,1]; Gori = oriGomp$res[1,1]
nori = 4; t0= (nori-1)/Gori; t0
oriBinom = optim ( c(Rori, t0, nori),  llh.binomialMortality.single.run , lifespan=tb.ori$rls,
                   method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );

cbind( mean(tb.ori$rls), median(ms02output$mean))
cbind( sd(tb.ori$rls), median(ms02output$stddev))
cbind( sd(tb.ori$rls)/mean(tb.ori$rls), median(ms02output$CV))
cbind( Rori, median(ms02output$Rflex))
cbind( oriBinom$par[1], median(ms02output$R))
cbind( oriBinom$par[2], median(ms02output$t0))
cbind( oriBinom$par[3], median(ms02output$n)) #ms02 does change n
cbind( Gori, median(ms02output$Gflex))

cv.ori =  sd(tb.ori$rls)/mean(tb.ori$rls)
hist(ms02output$CV, br=15)
arrows(cv.ori, 2.5, cv.ori, 1.1, lwd=2, col='red', xlab="t0" )

hist(ms02output$Rflex)
hist(ms02output$t0)
# arrows(oriBinom$par[2], 5, oriBinom$par[2], 1, lwd=2, col='red', xlab="t0" )

s.ori = calculate.mortality.rate(tb.ori$rls)
tb = read.csv(ms02files[6]); tb$rls = round(tb[,1]+0.5, digits=0)
s2 = calculate.mortality.rate(tb$rls)
ks.test( tb.ori$rls, tb$rls)
wilcox.test( tb.ori$rls, tb$rls)
var.test( tb.ori$rls, tb$rls)

plot( s.ori$s ~ s.ori$t, col='red',type='l')
lines(s2$s ~ s2$t)

plot( s.ori$mortality.rate ~ s.ori$t, type='l', col='red')
lines(s2$mortality.rate ~ s2$t, col='blue')

plot( s.ori$mortality.rate ~ s.ori$t, type='l', col='red', log='y')
lines(s2$mortality.rate ~ s2$t, col='blue')

plot( s.ori$mortality.rate ~ s.ori$t, type='l', col='red', log='xy')
lines(s2$mortality.rate ~ s2$t, col='blue')

quit('no')
##########END 20140713

Link files names to numbers, R, example

mixed_names = c(
  "BY4742_MATalpha_temp30_media_YPD  10 ngmL cycloheximide_n80.csv",  10,
  "BY4742_MATalpha_temp30_media_YPD + 10ngml cycloheximide_n120.csv", 10,
  "BY4742_MATalpha_temp30_media_YPD  10ngml cycloheximide_n120.csv",  10,
  "BY4742_MATalpha_temp30_media_YPD + 25 ngmL cycloheximide_n160.csv", 25,
  "BY4742_MATalpha_temp30_media_YPD  25 ngmL cycloheximide_n160.csv", 25,
  "BY4742_MATalpha_temp30_media_YPD  30ngml cycloheximide_n40.csv",   30,
  "BY4742_MATalpha_temp30_media_YPD  35ngml cycloheximide_n40.csv",   35,
  "BY4742_MATalpha_temp30_media_YPD  40ngml cycloheximide_n40.csv",   40,
  "BY4742_MATalpha_temp30_media_YPD  45ngml cycloheximide_n40.csv",   45,
  "BY4742_MATalpha_temp30_media_YPD  50 ngmL cycloheximide_n37.csv",  50,
  "BY4742_MATalpha_temp30_media_YPD  50ngml cycloheximide_n140.csv",  50,
  "BY4742_MATalpha_temp30_media_YPD + 30ngml cycloheximide_n40.csv",  30,
  "BY4742_MATalpha_temp30_media_YPD + 35ngml cycloheximide_n40.csv",  35,
  "BY4742_MATalpha_temp30_media_YPD + 40ngml cycloheximide_n40.csv",  40,
  "BY4742_MATalpha_temp30_media_YPD + 45ngml cycloheximide_n40.csv",  45,
  "BY4742_MATalpha_temp30_media_YPD + 50ngml cycloheximide_n140.csv", 50,
  "BY4742_MATalpha_temp30_media_YPD + 100ngml cycloheximide_n139.csv",100,
  "BY4742_MATalpha_temp30_media_YPD  100ngml cycloheximide_n139.csv",100
);
files = mixed_names[seq(1,(length(mixed_names)-1), 2)]
cycloheximide = as.numeric( mixed_names[seq(2,(length(mixed_names)), 2)] )

Saturday, July 12, 2014

Glucose dose effect on RLS in BY4742, code


File: _BY4742.glucose.dose.20140712.r

#test binomial aging model on glucose dose in BY4742

rm(list=ls());
setwd("~/projects/0.network.aging.prj/0a.network.fitting")
source("lifespan.20140711.r")
require(flexsurv)

pdf("BY4742GlucoseDose-binomial-20140713.pdf", width=10, height=8)
layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE))

files = list.files(, path='DR', pattern="BY4742")

files = c(
  "BY4742_MATalpha_temp30_media_0.005% D_n299.csv",                
  "BY4742_MATalpha_temp30_media_0.05% D_n4690.csv",                
  "BY4742_MATalpha_temp30_media_0.1% D_n338.csv",                  
  "BY4742_MATalpha_temp30_media_0.5% D_n579.csv",                  
  "BY4742_MATalpha_temp30_media_YPD_n19930.csv"
);

output = data.frame(files);
output$mean = NA; output$stddev = NA; output$CV=NA;
output$R=NA; output$t0=NA; output$n=NA;
output$Rflex = NA; output$Gflex = NA;
output$glucose = c(0.005, 0.05, 0.1, 0.5, 2.0)

#i=1;
for( i in 1:5) {
  filename = paste('DR/', files[i], sep='')
  tb = read.csv(filename)
  summary(tb)
  output$mean[i]=mean(tb$rls)
  output$stddev[i]= sd(tb$rls)
  output$CV[i] = sd(tb$rls) / mean(tb$rls)

  fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')
  Rhat = fitGomp$res[2,1]; Ghat = fitGomp$res[1,1]
  output$Rflex[i] = Rhat; output$Gflex[i]=Ghat;

  #G = (n-1) / t0, so t0 = (n-1)/G
  nhat = 3; t0= (nhat-1)/Ghat; t0
  fitBinom = optim ( c(Rhat, t0, nhat),  llh.binomialMortality.single.run , lifespan=tb$rls,
                method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );

  output[i, c("R", "t0", "n")] = fitBinom$par

    #  fitGomp2 = optim ( c(0.005, 0.08),  llh.G.single.run , lifespan=tb$rls,
   #                    method="L-BFGS-B", lower=c(1E-10,1E-2), upper=c(10,10) );

  tb2 = calculate.mortality.rate(tb$rls)
  plot( tb2$s ~ tb2$t, main=filename)
  plot( tb2$mortality.rate ~ tb2$t , main=filename)
  plot( tb2$mortality.rate ~ tb2$t, log='y', main='linear for Gompertz' )
  plot( tb2$mortality.rate ~ tb2$t, log='yx', main='linear for Weibull' )
}#i file loop


plot( output$R ~ output$glucose, type='l' )
plot( output$t0 ~ output$glucose,type='l' )
plot( output$n ~ output$glucose,type='l' )
plot( output$mean ~ output$glucose, type='l')
plot( output$CV ~ output$glucose, type='l' )
plot( output$stddev ~ output$glucose, type='l')
plot( output$Rflex ~ output$glucose, type='l')
plot( output$Gflex ~ output$glucose, type='l')

dev.off()


q('no')
####END of 20140712 #########





Gompertz versus Weibull, semilog versus log-log plot

Gompertz m = R exp(Gt)
    so, ln(m) = Gt  semi-log is linear.

Weibull     m = A t^(B)
   so, ln(m) = lnA + B ln(t)   log-log is linear.

Because Weibull takes an extra log-transformation, it 'suppresses' nosies in the data, therefore naturally fits bettern than Gomeprtz model.



flexsurv usage and results




fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')

> fitGomp$res
              est        L95%       U95%
shape 0.074024323 0.065724069 0.08232458

rate  0.009551328 0.007533154 0.01211018



fitGomp$coeff['rate'] has to be transformed by exp() function. 

Efficacy, effectiveness, biological activity


Efficacy, as in therapeutic efficacy?

Effectiveness,
http://getedited.wordpress.com/2009/10/26/efficacy-vs-effectiveness/

"In medical parlance, “efficacy” and “effectiveness” mean different things, and it’s a nuance that’s quite significant. Efficacy is a narrower definition that means how well something works in an ideal or controlled setting, such as a clinical trial. Effectiveness describes how well it works under “real-world” conditions. Effectiveness, for example, takes into consideration how easy a drug is to use, and potential side effects, whereas efficacy measures only how well it produces the desired result."

So, assume 'functional efficacy' of gene interaction to be non-aging is more appropriate. 

Or, I can just used the word "reliability" (although, reliability of gene interaction may be confused with reliability of network). Reliability is clear to me, but not to others. 



Biological activity,
Effective activity




DR CR in humans

J Gerontol A Biol Sci Med Sci. 2004 Aug;59(8):789-95.

How much should we eat? The association between energy intake and mortality in a 36-year follow-up study of Japanese-American men.


http://www.ncbi.nlm.nih.gov/pubmed/15345727

Friday, July 11, 2014

nested binomial-aging model test,

Summary: This is proof-of-concept work, using simulated data to see whether nested-model test can discern change of parameters. 





#test binomial aging model
rm(list=ls());

source("lifespan.20140711.r")

#################################
##### LRT to exam two datasets on binomial aging model, 
### model H0,  R, t0, n are all the same
### model H1i  R1<>R2
### model H1g, 
### model H2, 
#  rawIt0n = c( R1, t01, n1, R2, t02, n2)
H0  <- function( rawIt0n )  { Rt0n <- c(rawIt0n[1],rawIt0n[2],rawIt0n[3],rawIt0n[1],rawIt0n[2],rawIt0n[3]) }  #all the same
H1r  <- function( rawIt0n ) { Rt0n <- c(rawIt0n[1],rawIt0n[2],rawIt0n[3],rawIt0n[4],rawIt0n[2],rawIt0n[3]) }  # R1<>R2
H1t  <- function( rawIt0n ) { Rt0n <- c(rawIt0n[1],rawIt0n[2],rawIt0n[3],rawIt0n[1],rawIt0n[5],rawIt0n[3]) }  # t01 <> t02
H1n  <- function( rawIt0n ) { Rt0n <- c(rawIt0n[1],rawIt0n[2],rawIt0n[3],rawIt0n[1],rawIt0n[2],rawIt0n[6]) }  # n1 <> n2

Hx.llh.binomialMortality.single.run <- function( INrawRt0n, model, lifespan1, lifespan2 ) {
  Rt0n = model( INrawRt0n )
  I1 = Rt0n[1]; t01 = Rt0n[2]; n1=Rt0n[3];
  I2 = Rt0n[4]; t02 = Rt0n[5]; n2=Rt0n[6];  
  x1 = lifespan1[!is.na(lifespan1)];
  x2 = lifespan2[!is.na(lifespan2)];

  log_s1 = (I1 * t01 /n1 )*(1 - (1 + x1/t01)^n1);
  log_m1 = log(I1) +  (n1-1) * log(1 + x1/t01 ); 
  log_s2 = (I2 * t02 /n2 )*(1 - (1 + x2/t02)^n2);
  log_m2 = log(I2) +  (n2-1) * log(1 + x2/t02 ); 
  my.lh = sum(log_s1)  + sum(log_m1) + sum(log_s2) + sum(log_m2)
  print (Rt0n ); #trace the convergence
  ret = - my.lh # because optim seems to maximize 
}

set.seed(20140711)
x1 = round( rbinomial.aging( c(0.01, 15, 4), 100) + 0.5, digits=0);
x2 = round( rbinomial.aging( c(0.01, 10, 4), 100) + 0.5, digits=0);

llh.H0  = optim( c(0.01, 15, 4, 0.01, 15, 4), Hx.llh.binomialMortality.single.run, model=H0,lifespan1=x1, lifespan2=x2, 
                 method="L-BFGS-B", lower=c(1E-5,0.05,0.1, 1E-5,0.05,0.1), upper=c(10,100,50,10,100,50) );  

llh.H1t  = optim( c(0.01, 15, 4, 0.01, 10, 4), Hx.llh.binomialMortality.single.run, model=H1t,lifespan1=x1, lifespan2=x2, 
                 method="L-BFGS-B", lower=c(1E-5,0.05,0.1, 1E-5,0.05,0.1), upper=c(10,100,50,10,100,50) );  

cbind(llh.H0$par, llh.H1t$par)
LH = c(llh.H0$value, llh.H1t$value)
deltaLH =  - LH + llh.H0$value; 
1 - pchisq( 2*deltaLH, df =1 );

calculate.mortality.rate() R function

calculate.mortality.rate = function( lifespan ) {
  tb = calculate.s(lifespan)
  tb$ds=NA; tb$dt=NA
 
  tb$dt[1] = tb$s[1]
  tb$ds[1] = 1 - tb$s[1]
  tb$mortality.rate[1] = tb$ds[1] / tb$dt[1]
 
  for( j in 2:length(tb[,1])) {
    tb$ds[j] =  tb$s[j-1] - tb$s[j]
    tb$dt[j] = -tb$t[j-1] + tb$t[j]
    tb$mortality.rate[j] = tb$ds[j] / ( tb$s[j] * tb$dt[j])
  }
  return( tb )
}

Compare binomial, gompertz, and weibull model, fitting with simulated binomial-aging-model lifespan


# ~/0.network.aging.prj/0a.network.fitting/_binomial_aging20140711.r


#test binomial aging model
rm(list=ls());

source("lifespan.20131221.r")
require(flexsurv)


##### log likelihood function, 3-parameter binomial mortality rate model
# http://hongqinlab.blogspot.com/2013/12/binomial-mortailty-model.html
# m = R ( 1 + t/t0)^(n-1)
# s = exp( (R t0/n)*(1 - (1+t/t0)^n ) )  

llh.binomialMortality.single.run <- function( Rt0n, lifespan ) {
  I = Rt0n[1]; t0 = Rt0n[2]; n=Rt0n[3];
  my.data = lifespan[!is.na(lifespan)];
  log_s = (I * t0 /n )*(1 - (1 + my.data/t0)^n);
  log_m = log(I) +  (n-1) * log(1 + my.data/t0 ); 
  my.lh = sum(log_s)  + sum(log_m);
  print (Rt0n ); #trace the convergence
  ret = - my.lh # because optim seems to maximize 
}

#random number from binomial-aging model
rbinomial.aging <- function ( Rt0n, n){
    x.uniform = runif(n)
    inverse.binomialaging.CDF = function(R,t0,n,y) {t0*((1- n*log(y)/(R*t0))^(1/n) -1) }
    x.binomialaging = inverse.binomialaging.CDF(Rt0n[1],Rt0n[2], Rt0n[3], x.uniform)
    return(x.binomialaging)
}

set.seed(20140711)
x = rbinomial.aging( c(0.001,10, 5), 1000)
fitBinom = optim ( c(0.001,10, 3),  llh.binomialMortality.single.run , lifespan=x, 
              method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );  

fitGomp = optim ( c(0.005, 0.08),  llh.G.single.run , lifespan=x,
                  method="L-BFGS-B", lower=c(1E-10,1E-2), upper=c(10,10) );  

fitWeib2 = flexsurvreg(formula = Surv(x) ~ 1, dist="weibull")
fitGomp2 = flexsurvreg(formula = Surv(x) ~ 1, dist="gompertz")
fitGomp2$coeff

c(-fitGomp$value, fitGomp2$loglik, fitWeib2$loglik, -fitBinom$value)
Flexsurv can over-fit the Gompertz with negative values, and Weibull gave the best value!!! 
Even though the random number are generated by binomial aging model, its fitting gave the lowest likelihood?!

TODO: I should to provide a gradient function for optim(method='L-BFGS-B')
Or try optimx()

For binomial-aging, $n$ is under-estimated by optim(), but $m0$ is over-estimated. When the entire data is fitted with Gompertz model, $G$ is too low. How about fitting within initial stage? 


set.seed(20140711)
x = round( rbinomial.aging( c(0.001,10, 4), 5000), digits=1 )#t0=10, n=4
tb = calculate.mortality.rate(x)
plot( tb$s ~ tb$t)
plot( tb$mortality.rate ~ tb$t )
plot( tb$mortality.rate ~ tb$t, log='y', main='linear for Gompertz' )
plot( tb$mortality.rate ~ tb$t, log='yx', main='linear for Weibull' )





set.seed(20140711)
x = rbinomial.aging( c(0.01,25, 4), 100)  #large m0, t0=25, n=4
x = round( x + 0.5, digits=0)
fitBinom = optim ( c(0.01,25, 3.5),  llh.binomialMortality.single.run , lifespan=x,
              method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );

fitGomp = optim ( c(0.005, 0.08),  llh.G.single.run , lifespan=x,
                  method="L-BFGS-B", lower=c(1E-10,1E-2), upper=c(10,10) );

fitWeib2 = flexsurvreg(formula = Surv(x) ~ 1, dist="weibull")
fitGomp2 = flexsurvreg(formula = Surv(x) ~ 1, dist="gompertz")
fitGomp2$coeff

c(-fitGomp$value, fitGomp2$loglik, fitWeib2$loglik, -fitBinom$value)
Good, binomial aging model gives the best likelihood.
Plot confirmed that Gompertz looks better than Weibull. 





Lessons learned, grant writing

Prediction and supporting evidence should be presented side by side.


Thursday, July 10, 2014

Schleit13 screen of DR in yeast


Figure 4, DR reduced cytoplasmic translation, --| mitochondrial proteotoxic stress ---| RLS

My alternative hypothesis on translational fidelity can be added here. The two model may be compared using a simulation approach?

Do I expect lambda ~ glucose level, cycloheximide? 

AFG3

SGD RLS increased

http://www.yeastgenome.org/phenotype/increased_replicative_lifespan/overview

Yeast aging data sets


http://www.ellisonfoundation.org/research/replicative-senescence

Wednesday, July 9, 2014

extract rls in csv format from rls.db



548 rls*.csv files are extracted with sample size > 100.


file "_export_rls20140709.r"

#export rls to individual files. I need them for nrmca fittings.

#####################################################################################################
#################install and declare needed packaqes for survival analysis
rm(list=ls())


#require('survival')
#library('KMsurv')
library('data.table')
library('RSQLite')
#library('flexsurv')
#library('muhaz')

#########################################################################################################
#########################################################################################################

#####################Declare names of files and folders. Move working directory to desired folder. Establish connection to database
#folderLinearHazard = '//udrive.uw.edu/udrive/Kaeberlein Lab/RLS Survival Analysis 2/Linear Hazard'
Folder = "~/projects/0.network.aging.prj/9.ken"
File = 'rls.db'
setwd(Folder)
drv <- SQLite()
con <- dbConnect(drv, dbname = File)

##########################################################################################################
##########################################################################################################

######################Create a complete table of conditions that have data in the RLS database.
######################(Each row in this table will be a unique combination of genotype, mating type, media, and temperature

###get unique conditions from "set" columns (experimental treatments/mutations)
conditions1 = dbGetQuery(con, "
  SELECT DISTINCT set_genotype as genotype, set_mating_type as mat, set_media as media, set_temperature as temp
  FROM result
  WHERE pooled_by = 'file'
  ")
### rows in the result table of the database are not mutually exclusive. In some rows, data has sometimes been pooled by genotype, background strain, etc.

###get unique conditions from "reference" columns (these columns are the control conditions/lifespan results for each row of experimental lifespan results. Rows are not mutually exclusive (1 control to many experimental conditions)
conditions2 = dbGetQuery(con, "
  SELECT DISTINCT ref_genotype as genotype, ref_mating_type as mat, ref_media as media, ref_temperature as temp
  FROM result
  WHERE pooled_by = 'file'
  ")

####combine and take unique conditions from these two
conditions = rbind(conditions1, conditions2)
conditions = unique(conditions)
conditions = conditions[complete.cases(conditions),]
row.names(conditions) = NULL ###renumber the rows. important because future processes will refer to a unique condition by its row number in this table
controlConditions = conditions[conditions$genotype %in% c('BY4741', 'BY4742', 'BY4743'),] ###create a table of conditions that have WT genotypes

###Add columns to the conditions data frame. These columns will be filled in by their respective variable: ie. gompertz shape/rate of the lifespan data associated with a given conditions (genotype, mating type, media, temp)
conditions$n = apply(conditions, 1, function(row) 0)
conditions$avgLS = apply(conditions, 1, function(row) 0)
conditions$StddevLS = apply(conditions, 1, function(row) 0)
conditions$medianLS = apply(conditions, 1, function(row) 0)
conditions$gompShape = apply(conditions, 1, function(row) 0)
conditions$gompRate = apply(conditions, 1, function(row) 0)
conditions$gompLogLik = apply(conditions, 1, function(row) 0)
conditions$gompAIC = apply(conditions, 1, function(row) 0)
conditions$weibShape = apply(conditions, 1, function(row) 0)
conditions$weibScale = apply(conditions, 1, function(row) 0)
conditions$weibLogLik = apply(conditions, 1, function(row) 0)
conditions$weibAIC = apply(conditions, 1, function(row) 0)

#############################################################################################################
#############################################################################################################

#########################################Loop through the conditions to get lifespan data
# r =1

for (r in 1:length(conditions$genotype)) {
  genotypeTemp = conditions$genotype[r]
  mediaTemp = conditions$media[r]
  temperatureTemp = conditions$temp[r]
  matTemp = conditions$mat[r]

  conditionName = apply(conditions[r,1:4], 1, paste, collapse=" ") #### create a string to name a possible output file
  conditionName = gsub("[[:punct:]]", "", conditionName) #### remove special characters from the name (ie. quotations marks, slashes, etc.)

  genotypeTemp = gsub("'", "''", genotypeTemp)
  genotypeTemp = gsub('"', '""', genotypeTemp)

  ##### Query the database to take data (including lifespan data) from every mutually exclusive row (pooled by file, not genotype, not background, etc).
  ##### There will often be multiple rows (representing different experiments) for each unique condition. This analysis pools lifespan data from multiple experiments if the conditions are all the same
  queryStatementSet = paste(
    "SELECT * ",
    "FROM result ",
    "WHERE pooled_by = 'file' AND set_genotype = '", genotypeTemp, "' AND set_mating_type = '", matTemp, "' AND set_media = '", mediaTemp, "' AND set_temperature = '", temperatureTemp, "'",
    sep = ""
  )

  queryStatementRef = paste( ### both the reference and set columns will be searched for matching conditions
    "SELECT * ",
    "FROM result ",
    "WHERE pooled_by = 'file' AND ref_genotype = '", genotypeTemp, "' AND ref_mating_type = '", matTemp, "' AND ref_media = '", mediaTemp, "' AND ref_temperature = '", temperatureTemp, "'",
    sep = ""
  )

  dataListSet = dbGetQuery(con, queryStatementSet)
  dataListRef = dbGetQuery(con, queryStatementRef)
  lifespansChar = unique(c(dataListSet$set_lifespans, dataListRef$ref_lifespans)) ### combine lifespan values for a given condition into a single data structure. (problems of having non-mutually exclusive rows in the ref_lifespans column are overcome by only taking unique groups of lifespans. The assumption is that no two experiments produced identical lifespan curvs)

  ##### Database codes the lifespan data for each experiment as a string. So, lifespanChar is a vector of strings
  ##### Convert lifespanChar into a single vector of integers lifespansTemp
  lifespansTemp = c()
  if (length(lifespansChar) > 0) {
    for (s in 1:length(lifespansChar)) {
      lifespansNum = as.numeric(unlist(strsplit(lifespansChar[s], ",")))
      lifespansNum = lifespansNum[!is.na(lifespansNum)]
      if (length(lifespansNum) > 0) {
        lifespansTemp = c(lifespansTemp, lifespansNum)
      }
    }
  }

  if (length(lifespansTemp) > 100) {
    require(stringr)
    conditions$media[r] = str_replace( conditions$media[r], "\\/", "")
    conditions$genotype[r] = str_replace( conditions$genotype[r], "\\/", "-")
    conditions$n[r] = length(lifespansTemp)   #### record number of individuals
    filename = paste( 'rls_csv/',conditions$genotype[r], '_', conditions$mat[r],'_temp', conditions$temp[r], '_n',conditions$n[r],
                      '_media_', conditions$media[r], '.csv',sep='')
    out = data.frame(lifespansTemp)
    names(out) = c("rls")
      write.csv(out, filename, quote=F, row.names=F)
  }

}#outer loop
 
 

remove backslash from string in R using stringr


    require(stringr)
    conditions$media[r] = str_replace( conditions$media[r], "\\/", "")

R excel

require(xlsx)

kino = read.xlsx("data/kinetochore_list.xls", 1)
names(kino)= c("id", "ORF", "name", "desc", "alias", "genelen")


robustness 20140709

gompShape batch regression




##############################
# batch regression ~ CV
pTb = 1: length(fit3[1,])
names(pTb) = names(fit3); length(pTb)
pTb[1:15]
#for( j in c(7:50) ) {
for( j in c(7:19, 22:32, 34:538) ) {
  m = lm( fit3[, j] ~ fit3$StddevLS/fit3$avgLS )
  sm = summary(m); sm
  pTb[j] = 1 - pf(sm$fsta[1], sm$fsta[2], sm$fsta[3])
}
pTb[pTb<0.05]
p= pTb[c(22:32, 34:538)]
p[p<0.01]
hist(p, br=20)
source("robust-fdr.R")
ret= robust.fdr(p)
summary(ret$q)
hist(ret$q, br=20)

#There are dozens of factors ~ rls CV, including the PPI degree. 




RLS CV ~ morphology CV (why a negative correlation?)



gompShape ~ morphology CV(positive correlation)?
Todo: I can exclude the small sample size measure to see if p-value can be improved.


RLS CV ~ (-)~ evolutionary measures. High selection pressure -> stronger robustness. 
    Kdata = read.csv( "data/Sce.Spa.KaKs.csv");

20140710note: The gene with small CV and high omega is highly interesting. 







Todo: use log, sqrt() transformation on variables. Remove small sample sized measures, say n<30?

Tuesday, July 8, 2014

Select ORF rls

# 9.ken/_explore.20140705

rm(list=ls())
setwd("~/projects/0.network.aging.prj/9.ken")

list.files()

tb = read.csv('conditionsWeibRedo_qin.csv')
tb$genotype = toupper( as.character(tb$genotype))
tb$media = as.character(tb$media)
str(tb)

tb[grep("ctf", tb$genotype, ignore.case=T), ]


tb2 = read.csv("SceORF_name.csv", header=F, colClass=c("character", "character"))
names(tb2) = c('ORF','name')
length(unique(tb$genotype))

table( tb$genotype %in% tb2$name )
table( tb2$name %in% tb$genotype )

tb$flag = tb$genotype %in% tb2$name
sub = tb[tb$flag, ]
sub$ORF = tb2$ORF[match(sub$genotype, tb2$name)]

length(unique(sub$genotype))
x = table(sub$media)
x[grep("YPD",names(x))]
tb$media[grep("% D", tb$media, ignore.case=T)]

write.csv(sub, "ken-RLS-byORF.csv", row.names=F, quote=F)

x = sub[sub$n>30, ]
hist(log10(x$n)/log10(3))
summary(sub)
tb[tb$n>1000,]


combine p-value for multiple tests



To combine 6 phenotypic measures for each SNP, one way is the Fisher’s method. When null hypotheses are true for all association tests, they are independent by definition, and their p-value can be combined as 


 for K number of traits. To correct for multiple tests, phenotypic data will be permuted 1000 times, and lowest p-value from each test will be used to generate a distribution to determine proper genome-wide significance. 

It seems reasonable to extrapolate this to combine q-values for multiple tests.