Showing posts with label fitting. Show all posts
Showing posts with label fitting. Show all posts

Saturday, July 12, 2014

Glucose dose effect on RLS in BY4742, code


File: _BY4742.glucose.dose.20140712.r

#test binomial aging model on glucose dose in BY4742

rm(list=ls());
setwd("~/projects/0.network.aging.prj/0a.network.fitting")
source("lifespan.20140711.r")
require(flexsurv)

pdf("BY4742GlucoseDose-binomial-20140713.pdf", width=10, height=8)
layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE))

files = list.files(, path='DR', pattern="BY4742")

files = c(
  "BY4742_MATalpha_temp30_media_0.005% D_n299.csv",                
  "BY4742_MATalpha_temp30_media_0.05% D_n4690.csv",                
  "BY4742_MATalpha_temp30_media_0.1% D_n338.csv",                  
  "BY4742_MATalpha_temp30_media_0.5% D_n579.csv",                  
  "BY4742_MATalpha_temp30_media_YPD_n19930.csv"
);

output = data.frame(files);
output$mean = NA; output$stddev = NA; output$CV=NA;
output$R=NA; output$t0=NA; output$n=NA;
output$Rflex = NA; output$Gflex = NA;
output$glucose = c(0.005, 0.05, 0.1, 0.5, 2.0)

#i=1;
for( i in 1:5) {
  filename = paste('DR/', files[i], sep='')
  tb = read.csv(filename)
  summary(tb)
  output$mean[i]=mean(tb$rls)
  output$stddev[i]= sd(tb$rls)
  output$CV[i] = sd(tb$rls) / mean(tb$rls)

  fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')
  Rhat = fitGomp$res[2,1]; Ghat = fitGomp$res[1,1]
  output$Rflex[i] = Rhat; output$Gflex[i]=Ghat;

  #G = (n-1) / t0, so t0 = (n-1)/G
  nhat = 3; t0= (nhat-1)/Ghat; t0
  fitBinom = optim ( c(Rhat, t0, nhat),  llh.binomialMortality.single.run , lifespan=tb$rls,
                method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );

  output[i, c("R", "t0", "n")] = fitBinom$par

    #  fitGomp2 = optim ( c(0.005, 0.08),  llh.G.single.run , lifespan=tb$rls,
   #                    method="L-BFGS-B", lower=c(1E-10,1E-2), upper=c(10,10) );

  tb2 = calculate.mortality.rate(tb$rls)
  plot( tb2$s ~ tb2$t, main=filename)
  plot( tb2$mortality.rate ~ tb2$t , main=filename)
  plot( tb2$mortality.rate ~ tb2$t, log='y', main='linear for Gompertz' )
  plot( tb2$mortality.rate ~ tb2$t, log='yx', main='linear for Weibull' )
}#i file loop


plot( output$R ~ output$glucose, type='l' )
plot( output$t0 ~ output$glucose,type='l' )
plot( output$n ~ output$glucose,type='l' )
plot( output$mean ~ output$glucose, type='l')
plot( output$CV ~ output$glucose, type='l' )
plot( output$stddev ~ output$glucose, type='l')
plot( output$Rflex ~ output$glucose, type='l')
plot( output$Gflex ~ output$glucose, type='l')

dev.off()


q('no')
####END of 20140712 #########





Gompertz versus Weibull, semilog versus log-log plot

Gompertz m = R exp(Gt)
    so, ln(m) = Gt  semi-log is linear.

Weibull     m = A t^(B)
   so, ln(m) = lnA + B ln(t)   log-log is linear.

Because Weibull takes an extra log-transformation, it 'suppresses' nosies in the data, therefore naturally fits bettern than Gomeprtz model.



flexsurv usage and results




fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')

> fitGomp$res
              est        L95%       U95%
shape 0.074024323 0.065724069 0.08232458

rate  0.009551328 0.007533154 0.01211018



fitGomp$coeff['rate'] has to be transformed by exp() function. 

Wednesday, December 4, 2013

Fitting with weights, gnls, weights and form, in R



gnls with weights (form)
http://stackoverflow.com/questions/10508474/step-halving-issue-in-gnlsnlme

dat:
Site_Code   SF
5   3
5   0
5   2
5   0
5   0
5   0

library(nlme)
g0 <- gnls(SF ~ a * Site_Code^b, data = dat,
           weights = varPower(form = ~Site_Code),
           start=list(a=30,b=-0.5))
 
 
This example indicates that averaged data should NOT be used for regression if weights
are used. Instead, original data should be used, because the noises in the un-averaged
data can be used as weights. 
 
 
The following example shows varPower() can bring the fitting model closer to 'truth'. 
 
> require(nlme)
> x1 = rnorm(20)
> y1 = x1 + rnorm(20)/10
> 
> #weird ones
> x2 = rnorm(5)+5
> y2 = rnorm(5)
> 
> y=c(y1,y2)
> x=c(x1,x2)
> mydata = data.frame(cbind(y,x))
> 
> foo = function(a,b) { y = x*a + b }
> model1 = gnls( y ~foo(a,b), start=list(a=1,b=0))
> summary(model1)
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: NULL 
       AIC      BIC    logLik
  72.21925 75.87587 -33.10962

Coefficients:
       Value  Std.Error    t-value p-value
a -0.0159375 0.08311006 -0.1917637  0.8496
b -0.3550085 0.20216324 -1.7560486  0.0924

 Correlation: 
  a     
b -0.346

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.8712584 -0.7939610  0.2684611  0.7910997  1.2959748 

Residual standard error: 0.9485101 
Degrees of freedom: 25 total; 23 residual
> 
> model2 = gnls( y ~foo(a,b), data=mydata, start=list(a=1,b=0), weights=varPower(form = ~x))
> summary(model2)
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: mydata 
       AIC      BIC    logLik
  20.85874 25.73424 -6.429369

Variance function:
 Structure: Power of variance covariate
 Formula: ~x 
 Parameter estimates:
   power 
1.468287 

Coefficients:
       Value  Std.Error   t-value p-value
a  0.9281320 0.06104714 15.203530  0.0000
b -0.0255823 0.00891448 -2.869751  0.0087

 Correlation: 
  a   
b 0.17

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.0273323 -0.8480791 -0.1211550  0.3073090  2.2141947 

Residual standard error: 0.4204751 
Degrees of freedom: 25 total; 23 residual
> model1
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: NULL 
  Log-likelihood: -33.10962

Coefficients:
         a          b 
-0.0159375 -0.3550085 

Degrees of freedom: 25 total; 23 residual
Residual standard error: 0.9485101 
> model2
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: mydata 
  Log-likelihood: -6.429369

Coefficients:
          a           b 
 0.92813202 -0.02558234 

Variance function:
 Structure: Power of variance covariate
 Formula: ~x 
 Parameter estimates:
   power 
1.468287 
Degrees of freedom: 25 total; 23 residual
Residual standard error: 0.4204751 
 

Wednesday, October 16, 2013

flexsurvreg() in package flexsurv

Notes on 20140818: I did not see the specification of gradient function in the call of optim()?!


# package(flexsurv)

> flexsurvreg
function (formula, data, dist, inits, fixedpars = NULL, cl = 0.95, 
    ...) 
{
    call <- match.call() #a useful base function
    indx <- match(c("formula", "data"), names(call), nomatch = 0)
    if (indx[1] == 0) 
        stop("A \"formula\" argument is required")
    temp <- call[c(1, indx)]
    temp[[1]] <- as.name("model.frame")
    m <- eval(temp, parent.frame())
    Y <- model.extract(m, "response")
    if (!inherits(Y, "Surv")) 
        stop("Response must be a survival object")
    Terms <- attr(m, "terms")
    X <- model.matrix(Terms, m)
    dat <- list(Y = Y, X = X[, -1, drop = FALSE], Xraw = m[, 
        -1, drop = FALSE])
    X <- dat$X
    if (missing(dist)) 
        stop("Distribution \"dist\" not specified")
    if (is.character(dist)) {
        match.arg(dist, names(flexsurv.dists))
        dlist <- flexsurv.dists[[dist]]
        dlist$name <- dist
    }
    else if (is.list(dist)) {
        dlist <- check.dlist(dist)
    }
    else stop("\"dist\" should be a string for a built-in distribution, or a list for a custom distribution")
    parnames <- dlist$pars
    ncovs <- ncol(X)
    nbpars <- length(parnames)
    npars <- nbpars + ncovs
    if (!missing(inits) && (!is.numeric(inits) || (length(inits) != 
        npars))) 
        stop("inits must be a numeric vector of length ", npars)
    if (missing(inits) || any(is.na(inits))) 
        default.inits <- c(dlist$inits(Y[, "time"]), rep(0, ncovs))
    if (missing(inits)) 
        inits <- default.inits
    else if (any(is.na(inits))) 
        inits[is.na(inits)] <- default.inits[is.na(inits)]
    for (i in 1:nbpars) inits[i] <- dlist$transforms[[i]](inits[i])
    outofrange <- which(is.nan(inits) | is.infinite(inits))
    if (any(outofrange)) {
        plural <- if (length(outofrange) > 1) 
            "s"
        else ""
        stop("Initial value", plural, " for parameter", plural, 
            " ", paste(outofrange, collapse = ","), " out of range")
    }
    cnames <- if (ncovs == 0) 
        NULL
    else colnames(X)
    names(inits) <- c(parnames, cnames)
    if (!is.null(fixedpars) && !is.logical(fixedpars) && (!is.numeric(fixedpars) || 
        any(!(fixedpars %in% 1:npars)))) {
        dots <- if (npars > 2) 
            "...,"
        else ""
        stop("fixedpars must be TRUE/FALSE or a vector of indices in 1,", 
            dots, npars)
    }
    if ((is.logical(fixedpars) && fixedpars == TRUE) || (is.numeric(fixedpars) && 
        all(fixedpars == 1:npars))) {
        minusloglik <- minusloglik.flexsurv(inits, t = Y[, "time"], 
            dead = Y[, "status"], X = X, dlist = dlist, inits = inits)
        for (i in 1:nbpars) inits[i] <- dlist$inv.transforms[[i]](inits[i])
        res <- matrix(inits, ncol = 1)
        dimnames(res) <- list(names(inits), "est")
        ret <- list(call = call, dlist = dlist, res = res, npars = 0, 
            loglik = -minusloglik, AIC = 2 * minusloglik, data = dat, 
            datameans = colMeans(dat$X), N = nrow(dat$Y), events = sum(dat$Y[, 
                "status"]), trisk = sum(dat$Y[, "time"]))
    }
    else {
        optpars <- inits[setdiff(1:npars, fixedpars)]  #nice trick to optimize a subset of pars
        opt <- optim(optpars, minusloglik.flexsurv, t = Y[, "time"], 
            dead = Y[, "status"], X = X, dlist = dlist, inits = inits, 
            fixedpars = fixedpars, hessian = TRUE, ...) #fixedpars is a useful optim feature
        est <- opt$par
        if (all(!is.na(opt$hessian)) && all(!is.nan(opt$hessian)) && 
            all(is.finite(opt$hessian)) && all(eigen(opt$hessian)$values > 
            0)) {
            cov <- solve(opt$hessian)
            se <- sqrt(diag(cov)) #qin, standard errors, see wiki entry 
            if (!is.numeric(cl) || length(cl) > 1 || !(cl > 0) || 
                !(cl < 1)) 
                stop("cl must be a number in (0,1)")
            lcl <- est - qnorm(1 - (1 - cl)/2) * se
            ucl <- est + qnorm(1 - (1 - cl)/2) * se
        }
        else {
            warning("Could not calculate asymptotic standard errors - Hessian is not positive definite. Optimisation has probably not converged to the maximum likelihood")
            lcl <- ucl <- NA
        }
        res <- cbind(est = inits, lcl = NA, ucl = NA)
        res[setdiff(1:npars, fixedpars), ] <- cbind(est, lcl, 
            ucl)
        colnames(res) <- c("est", paste(c("L", "U"), round(cl * 
            100), "%", sep = ""))
        for (i in 1:nbpars) res[i, ] <- dlist$inv.transforms[[i]](res[i, 
            ])
        ret <- list(call = match.call(), dlist = dlist, res = res, 
            npars = length(est), loglik = -opt$value, AIC = 2 * 
                opt$value + 2 * length(est), cl = cl, opt = opt, 
            data = dat, datameans = colMeans(dat$X), N = nrow(dat$Y), 
            events = sum(dat$Y[, "status"]), trisk = sum(dat$Y[, 
                "time"]))
    }
    class(ret) <- "flexsurvreg"
    ret
}

flexsurv, R package,

R package 'flexsurv' provides optim() based parametric fitting. Two parameter Gomertz and Weibull model are included.  The 'flexsurv' use Hessian to build confidence intervals around the maximum likelihood estimate.

I did not see an obvious way to do nested model test using 'flexsurv'.


Wednesday, October 2, 2013

Overlay of natural isolates with reliability simulations, draft

I want to see what range of $n$, $p$, $\lambda$ can explain the observed yeast natural isolates.

It seems n=5,6,7 with changes of p can cover the lnR-G scatter plots in Qin06EXG.