Open Notebook: fitting

Showing posts with label fitting. Show all posts

Saturday, July 12, 2014

Glucose dose effect on RLS in BY4742, code

File: _BY4742.glucose.dose.20140712.r

#test binomial aging model on glucose dose in BY4742

rm(list=ls());
setwd("~/projects/0.network.aging.prj/0a.network.fitting")
source("lifespan.20140711.r")
require(flexsurv)

pdf("BY4742GlucoseDose-binomial-20140713.pdf", width=10, height=8)
layout(matrix(c(1,2,3,4), 2, 2, byrow = TRUE))

files = list.files(, path='DR', pattern="BY4742")

files = c(
"BY4742_MATalpha_temp30_media_0.005% D_n299.csv",
"BY4742_MATalpha_temp30_media_0.05% D_n4690.csv",
"BY4742_MATalpha_temp30_media_0.1% D_n338.csv",
"BY4742_MATalpha_temp30_media_0.5% D_n579.csv",
"BY4742_MATalpha_temp30_media_YPD_n19930.csv"
);

output = data.frame(files);
output$mean = NA; output$stddev = NA; output$CV=NA;
output$R=NA; output$t0=NA; output$n=NA;
output$Rflex = NA; output$Gflex = NA;
output$glucose = c(0.005, 0.05, 0.1, 0.5, 2.0)

#i=1;
for( i in 1:5) {
filename = paste('DR/', files[i], sep='')
tb = read.csv(filename)
summary(tb)
output$mean[i]=mean(tb$rls)
output$stddev[i]= sd(tb$rls)
output$CV[i] = sd(tb$rls) / mean(tb$rls)

fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')
Rhat = fitGomp$res[2,1]; Ghat = fitGomp$res[1,1]
output$Rflex[i] = Rhat; output$Gflex[i]=Ghat;

#G = (n-1) / t0, so t0 = (n-1)/G
nhat = 3; t0= (nhat-1)/Ghat; t0
fitBinom = optim ( c(Rhat, t0, nhat), llh.binomialMortality.single.run , lifespan=tb$rls,
method="L-BFGS-B", lower=c(1E-10,0.05,1E-1), upper=c(10,100,50) );

output[i, c("R", "t0", "n")] = fitBinom$par

# fitGomp2 = optim ( c(0.005, 0.08), llh.G.single.run , lifespan=tb$rls,
# method="L-BFGS-B", lower=c(1E-10,1E-2), upper=c(10,10) );

tb2 = calculate.mortality.rate(tb$rls)
plot( tb2$s ~ tb2$t, main=filename)
plot( tb2$mortality.rate ~ tb2$t , main=filename)
plot( tb2$mortality.rate ~ tb2$t, log='y', main='linear for Gompertz' )
plot( tb2$mortality.rate ~ tb2$t, log='yx', main='linear for Weibull' )
}#i file loop

plot( output$R ~ output$glucose, type='l' )
plot( output$t0 ~ output$glucose,type='l' )
plot( output$n ~ output$glucose,type='l' )
plot( output$mean ~ output$glucose, type='l')
plot( output$CV ~ output$glucose, type='l' )
plot( output$stddev ~ output$glucose, type='l')
plot( output$Rflex ~ output$glucose, type='l')
plot( output$Gflex ~ output$glucose, type='l')

dev.off()

q('no')
####END of 20140712 #########

Gompertz versus Weibull, semilog versus log-log plot

Gompertz m = R exp(Gt)
so, ln(m) = Gt semi-log is linear.

Weibull m = A t^(B)
so, ln(m) = lnA + B ln(t) log-log is linear.

Because Weibull takes an extra log-transformation, it 'suppresses' nosies in the data, therefore naturally fits bettern than Gomeprtz model.

flexsurv usage and results

fitGomp = flexsurvreg( formula=Surv(tb$rls) ~ 1, dist='gompertz')

> fitGomp$res
est L95% U95%
shape 0.074024323 0.065724069 0.08232458

rate 0.009551328 0.007533154 0.01211018

fitGomp$coeff['rate'] has to be transformed by exp() function.

Wednesday, December 4, 2013

Fitting with weights, gnls, weights and form, in R

gnls with weights (form)
http://stackoverflow.com/questions/10508474/step-halving-issue-in-gnlsnlme

dat:

Site_Code   SF
5   3
5   0
5   2
5   0
5   0
5   0

library(nlme)
g0 <- gnls(SF ~ a * Site_Code^b, data = dat,
           weights = varPower(form = ~Site_Code),
           start=list(a=30,b=-0.5))

This example indicates that averaged data should NOT be used for regression if weights

are used. Instead, original data should be used, because the noises in the un-averaged

data can be used as weights.

The following example shows varPower() can bring the fitting model closer to 'truth'.

> require(nlme)
> x1 = rnorm(20)
> y1 = x1 + rnorm(20)/10
> 
> #weird ones
> x2 = rnorm(5)+5
> y2 = rnorm(5)
> 
> y=c(y1,y2)
> x=c(x1,x2)
> mydata = data.frame(cbind(y,x))
> 
> foo = function(a,b) { y = x*a + b }

> model1 = gnls( y ~foo(a,b), start=list(a=1,b=0))
> summary(model1)
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: NULL 
       AIC      BIC    logLik
  72.21925 75.87587 -33.10962

Coefficients:
       Value  Std.Error    t-value p-value
a -0.0159375 0.08311006 -0.1917637  0.8496
b -0.3550085 0.20216324 -1.7560486  0.0924

 Correlation: 
  a     
b -0.346

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-1.8712584 -0.7939610  0.2684611  0.7910997  1.2959748 

Residual standard error: 0.9485101 
Degrees of freedom: 25 total; 23 residual
> 
> model2 = gnls( y ~foo(a,b), data=mydata, start=list(a=1,b=0), weights=varPower(form = ~x))
> summary(model2)
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: mydata 
       AIC      BIC    logLik
  20.85874 25.73424 -6.429369

Variance function:
 Structure: Power of variance covariate
 Formula: ~x 
 Parameter estimates:
   power 
1.468287 

Coefficients:
       Value  Std.Error   t-value p-value
a  0.9281320 0.06104714 15.203530  0.0000
b -0.0255823 0.00891448 -2.869751  0.0087

 Correlation: 
  a   
b 0.17

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.0273323 -0.8480791 -0.1211550  0.3073090  2.2141947 

Residual standard error: 0.4204751 
Degrees of freedom: 25 total; 23 residual
> model1
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: NULL 
  Log-likelihood: -33.10962

Coefficients:
         a          b 
-0.0159375 -0.3550085 

Degrees of freedom: 25 total; 23 residual
Residual standard error: 0.9485101 
> model2
Generalized nonlinear least squares fit
  Model: y ~ foo(a, b) 
  Data: mydata 
  Log-likelihood: -6.429369

Coefficients:
          a           b 
 0.92813202 -0.02558234 

Variance function:
 Structure: Power of variance covariate
 Formula: ~x 
 Parameter estimates:
   power 
1.468287 
Degrees of freedom: 25 total; 23 residual
Residual standard error: 0.4204751

Wednesday, October 16, 2013

flexsurvreg() in package flexsurv

Notes on 20140818: I did not see the specification of gradient function in the call of optim()?!

# package(flexsurv)

> flexsurvreg
function (formula, data, dist, inits, fixedpars = NULL, cl = 0.95,
...)
{
call <- match.call() #a useful base function
indx <- match(c("formula", "data"), names(call), nomatch = 0)
if (indx[1] == 0)
stop("A \"formula\" argument is required")
temp <- call[c(1, indx)]
temp[[1]] <- as.name("model.frame")
m <- eval(temp, parent.frame())
Y <- model.extract(m, "response")
if (!inherits(Y, "Surv"))
stop("Response must be a survival object")
Terms <- attr(m, "terms")
X <- model.matrix(Terms, m)
dat <- list(Y = Y, X = X[, -1, drop = FALSE], Xraw = m[,
-1, drop = FALSE])
X <- dat$X
if (missing(dist))
stop("Distribution \"dist\" not specified")
if (is.character(dist)) {
match.arg(dist, names(flexsurv.dists))
dlist <- flexsurv.dists[[dist]]
dlist$name <- dist
}
else if (is.list(dist)) {
dlist <- check.dlist(dist)
}
else stop("\"dist\" should be a string for a built-in distribution, or a list for a custom distribution")
parnames <- dlist$pars
ncovs <- ncol(X)
nbpars <- length(parnames)
npars <- nbpars + ncovs
if (!missing(inits) && (!is.numeric(inits) || (length(inits) !=
npars)))
stop("inits must be a numeric vector of length ", npars)
if (missing(inits) || any(is.na(inits)))
default.inits <- c(dlist$inits(Y[, "time"]), rep(0, ncovs))
if (missing(inits))
inits <- default.inits
else if (any(is.na(inits)))
inits[is.na(inits)] <- default.inits[is.na(inits)]
for (i in 1:nbpars) inits[i] <- dlist$transforms[[i]](inits[i])
outofrange <- which(is.nan(inits) | is.infinite(inits))
if (any(outofrange)) {
plural <- if (length(outofrange) > 1)
"s"
else ""
stop("Initial value", plural, " for parameter", plural,
" ", paste(outofrange, collapse = ","), " out of range")
}
cnames <- if (ncovs == 0)
NULL
else colnames(X)
names(inits) <- c(parnames, cnames)
if (!is.null(fixedpars) && !is.logical(fixedpars) && (!is.numeric(fixedpars) ||
any(!(fixedpars %in% 1:npars)))) {
dots <- if (npars > 2)
"...,"
else ""
stop("fixedpars must be TRUE/FALSE or a vector of indices in 1,",
dots, npars)
}
if ((is.logical(fixedpars) && fixedpars == TRUE) || (is.numeric(fixedpars) &&
all(fixedpars == 1:npars))) {
minusloglik <- minusloglik.flexsurv(inits, t = Y[, "time"],
dead = Y[, "status"], X = X, dlist = dlist, inits = inits)
for (i in 1:nbpars) inits[i] <- dlist$inv.transforms[[i]](inits[i])
res <- matrix(inits, ncol = 1)
dimnames(res) <- list(names(inits), "est")
ret <- list(call = call, dlist = dlist, res = res, npars = 0,
loglik = -minusloglik, AIC = 2 * minusloglik, data = dat,
datameans = colMeans(dat$X), N = nrow(dat$Y), events = sum(dat$Y[,
"status"]), trisk = sum(dat$Y[, "time"]))
}
else {
optpars <- inits[setdiff(1:npars, fixedpars)] #nice trick to optimize a subset of pars
opt <- optim(optpars, minusloglik.flexsurv, t = Y[, "time"],
dead = Y[, "status"], X = X, dlist = dlist, inits = inits,
fixedpars = fixedpars, hessian = TRUE, ...) #fixedpars is a useful optim feature
est <- opt$par
if (all(!is.na(opt$hessian)) && all(!is.nan(opt$hessian)) &&
all(is.finite(opt$hessian)) && all(eigen(opt$hessian)$values >
0)) {
cov <- solve(opt$hessian)
se <- sqrt(diag(cov)) #qin, standard errors, see wiki entry
if (!is.numeric(cl) || length(cl) > 1 || !(cl > 0) ||
!(cl < 1))
stop("cl must be a number in (0,1)")
lcl <- est - qnorm(1 - (1 - cl)/2) * se
ucl <- est + qnorm(1 - (1 - cl)/2) * se
}
else {
warning("Could not calculate asymptotic standard errors - Hessian is not positive definite. Optimisation has probably not converged to the maximum likelihood")
lcl <- ucl <- NA
}
res <- cbind(est = inits, lcl = NA, ucl = NA)
res[setdiff(1:npars, fixedpars), ] <- cbind(est, lcl,
ucl)
colnames(res) <- c("est", paste(c("L", "U"), round(cl *
100), "%", sep = ""))
for (i in 1:nbpars) res[i, ] <- dlist$inv.transforms[[i]](res[i,
])
ret <- list(call = match.call(), dlist = dlist, res = res,
npars = length(est), loglik = -opt$value, AIC = 2 *
opt$value + 2 * length(est), cl = cl, opt = opt,
data = dat, datameans = colMeans(dat$X), N = nrow(dat$Y),
events = sum(dat$Y[, "status"]), trisk = sum(dat$Y[,
"time"]))
}
class(ret) <- "flexsurvreg"
ret
}

flexsurv, R package,

R package 'flexsurv' provides optim() based parametric fitting. Two parameter Gomertz and Weibull model are included. The 'flexsurv' use Hessian to build confidence intervals around the maximum likelihood estimate.

I did not see an obvious way to do nested model test using 'flexsurv'.

Wednesday, October 2, 2013

Overlay of natural isolates with reliability simulations, draft

I want to see what range of $n$, $p$, $\lambda$ can explain the observed yeast natural isolates.

It seems n=5,6,7 with changes of p can cover the lnR-G scatter plots in Qin06EXG.