Sunday, June 30, 2013

Using yeast network to calculate likelihood of failure?

Can we use the yeast network, say PIN, to calculate likelihood of its failure? If we can compute this, then component decay rate and activation chance can be estimated by MLE.

This is similar to network reliability analysis, which is often computing intensive for path-related calculations. 

July 22, 2013. This also means that network parameter changes will lead to new calculation of the population distribution. I learned that a similar idea from ODE to likelihood calculation was discussed in a NIMBioS workshop on infectious disease. 

Age structure, mutation rate, and population genetic varaince

If mutation rate increases with age, age structure will leads to increasing genetic variance. So genetic variance can be a game strategy for changing environment.

A recent paper from Annette Baudisch's group argues that mutation accumulation plays a minor role in evolutionary genetics of aging.

Brian Charlesworth studied this topic in this book.

Friday, June 28, 2013

yeast mitochondrial protein network

Mitochondria has its own heat shock proteins. Some mito proteins are coded in its own genome, but many are coded in the nuclear genome.

Thursday, June 27, 2013

State space and network aging

The state space in NRMCA can be described by the adjacency matrix A. For the i-th essential gene gi, its active interactions are described by 1s in the i-th row and column, and dysfunctional and non-existing interactions are described by 0s. 

Tuesday, June 25, 2013

*** Two-parameter Gompertz model

Median lifespan of the 2-parameter


The Gompertz model in 'flexsurv' is

Notice that pdf = mortality rate * (1-s)  
pdf = mortality rate * s (based on mortality definition)
 m = - 1/s ds/dt  , so - ds/dt = pdf = m * s

So, flexsurv-shape a ==G, flexsurv rate b = R. 

Inverse of Gompertz CDF:

Component lifespan versus network lifespan, implication on life-history evolution.

For thousands of independent components, network survivor function is the product of all the component survivor function. Which means that system lifespan is the minimal lifespan of the component lifespan. This indicates that a significant amount of cost need to be invested to keep system functional. In comparison, if components can be replaced (renewals and repairs), the cost is probably much economic from the cell's perspective.

One solution is the redundancy building into the networks.

Monday, June 24, 2013

Synthetic lethal interaction and survival functions

I am really excited by this formula, which means that survivorships of single deletions can be used to infer gene interaction of the two genes involved. 

June 25, 2013 Correction: 
I made a mistake here. The final simplication of $S_dX / S_dY$ was mistakenly used of mortality rates. Instead, I should use the survival functions directly. 
$t_0$ should be the denominator of $t$ inside of the parenthesis. 

Reliability framework for gene interaction and limiting effects

The general reliability framework to study gene interaction is incredibly straightforward. It is a general mixture model of parallel and serial configuration. 
  S' = S'_parallel * phi + S'_serial * (1-phi)
 S = S_parallel * phi + S_serial * (1-phi)
where phi is the mixing weight.  

Thia framework can be used to test whether a gene can have broad network influence or a local influence. For example, TOR1 has been shown to activate ribosomal protein S6 kinase (S6K), and have a broad impact on protein networks. So, modeling the net' and X for tor1D, wildtype and compare them with AIC can be useful.

The mixing parameter $\phi$ is informative a gene's function.


Saturday, June 22, 2013

From gene interactions to QTL

Gene interaction network can be generalized as genetic interaction. Network reliability can be extrapolate to model fitness, and may offer an opportunity to study how product form of gene interaction can be translated to additive form of gene interaction in QTL.

Friday, June 21, 2013

Survivorship and QTL for a limiting aging locus in serial configuration with the rest of the network

 On 2013 June 25, I realized a mistake in E(t)_net, and it is not a simple sum of the two serial components.

Note that S'(t) = S(t) * mu(t)
Comparison of the two form can explain the so-called missing heritability.

How about the variance? 

Limiting gene interaction and network aging

To evaluate a mutant effect, the null hypothesis is that the effect is the same with the control. In this case, I can partition the wildtype survival function into the bulk and limiting module with a mixing parameter.  Both wildtype and mutant lifespan data can be fitting with the null model. 

Alternatively, mutant survival functions can be fitting with different parameters. 

Likelihood ratio test, AIC or BIC can be used. Alternatively, permutation of experimental data can be used to generate the null-distribution. 

Adjacency matrix and network reliability model of aging, overview

Thursday, June 20, 2013

Missing heritability (in progress)

What is the definition? Does it only apply to additive variance? It should, because of the linear model used.

Wikipedia gives a good explanation for broad and narrow sense heritability. The narrow sense heritability is
   h^2 = Var(A) / Var(P)
where Var(A) refers to the additive genetic variance (?).

In human, common SNPs are defined as >5% for major allele frequency (MAF)?

Eric Lander explain phantom heritability at 1hour timepoint in this video (May 20, 2011). Lander basically thinks linear additive model is inadequate to describe the biological mechanism. Lander and Francis Colin had an interesting exchange on independent loci, additive effect. (About 1:12 in this video).

Nature reviews genetics have some interesting 'Viewpoints' on the missing heritability problem. Gibson seems to think it due to mistreatment of environmental factors.

A blog "Genetic Inference" defended the additivity model, and cited some evolutionary arguments on additivity.

A Discovery blog explains, "the missing heritability can be solved by reconceptualizing the “genetic architecture” of the trait. This means that currently a major assumption of many models for putatively polygenic traits is that the variation is due to many genes of small effect which modify the trait value in an additive and independent manner. In other words, the genetic architecture in this sense is a linear system. A clear alternative, or complementary, possibility is that there are genetic interactions which are generating deviations from linearity. This would be epistasis, which has different implications depending on the sort of biology you’re talking about (e.g., molecular vs. evolutionary)." Some comments on this blog are worth reading.

Hemani et al 2012, argued that non-additive genetic variance contributed to the missing heritability problem.

Zuk et al 2012, PNAS

Related artilce on mathematic modeling of epistatiss
Gjuvsland 2007 Genetics, statistical epistatsis is a generic feature of gene regulatory networks.

Using adjacency matrix to evaluate network reliability

Ax = b.

Vector $x$ is my evaluation vector, which decides which nodes will cause network failure.

Bloom et al, 2013 Nature, missing heritability in a yeast cross, BYxRM (in progress)

Finding the sources of missing heritability in a yeast cross
Joshua S. Bloom1,2, Ian M. Ehrenreich1,3, Wesley T. Loo1,2, Thu´y-Lan Vo˜ Lite1,2 & Leonid Kruglyak1,4,5
Nature, 2013

Bloom13 measured 1008 haploid segregants from a BYxRM cross. All strains are genotyped by deep sequencing. 30.6K SNPs were identified, which is about 0.5%. 

Bloom13 stated that difference between broad sense heritability H^2 and narrow sense heritability h^2 is due to gene-gene interaction. Bloom13 showed that most additive heritability are explained by detected QTL. (This is somewhat expected, isn't it? This is what linear regression is designed for.). 

Bloom13 provided their raw data and code. This is a great effort. 

There are several reviews and comments on this publication. One comment is on the 50% allele frequency in this controlled study. Both Brookfield13 and Bloom13 seem to define missing heritability 

Brookfield, 2013, Current biology, Quantitative Genetics: Heritability Is Not Always Missing

Ben Bolker, estimation of parameters for stochastic dynamic models

Ben Bolker, estimation of parameters for stochastic dynamic models

Stochastic simulation: discrete time and continuous time.
Simple approaches:  trajectory matching, gradient matching, comparison
Fancier method: SIMEX, Kalman filter
State space models: Markov Chain Monte Carlo.

Typical statistical model:     Typical math model:
 stochastic                                 deterministic
 static                                         dynamic
 phenomenological                    mechanistic

Standard time-series models (ARIMA, spectral/wavelet analyses) are (mostly) phenomenological. Most are assessing patterns.

For stochastic models, need to define both a process model and an observation model (measurement model).
Process model Y(t+1) = F(Y(t))
Measurement model Y_obs(t) ~ Y(t)
Might decompose process model into a deterministic model for the expectation and (additive) noises around the expectation: e.g, Y(t) = mean + noise, Y(t) ~ Poisson(exp(eta)).

My understanding is that Walker defines  process errors that induce dynamic changes in variance (cause chaotic behavior in the future?!). Observation error is the ascertainment uncertainty.
Walker joked that 'process error' are given by God. Walker considered model bias as systematic errors.

Stochastic ODEs
ODE + Wiener process (derivative of a Brownian motion)
Delicate analysis (for biologists, Turelli, 1977, Roughgarden 1995, For mathamaticians, Oksendal 2003.
Specialized integration methods
Suited for cellular, physiological, but not population model?
Gillespie algorithm, exponentially distributed time between transitions.

SIMulation-EXtrapolation method.

Kalman filter is widely used in engineering. Often works for linear (typically normal) models, but can be extended to nonlinear models. Multivaraite extensions are natural (Schnute, 1994)
Kalman filter is kind of a state-status method.


Tuesday, June 18, 2013

Update (push) H2O2_LOH2012 repository to Github

$ pwd


$ git add *

$ git commit -m "updated analysis and plots"
[master 276db2b] updated analysis and plots

 160 files changed, 3726 insertions(+), 516 deletions(-)

$ git push -u origin master
Counting objects: 161, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (147/147), done.
Writing objects: 100% (148/148), 535.82 KiB, done.
Total 148 (delta 110), reused 0 (delta 0)
   47f5205..276db2b  master -> master
Branch master set up to track remote branch master from origin.

Checked my Github website and saw the changes. 

Positive LOH proxy for robustness

I found that Cv/Cb, Tc/Tg are positively correlated with G, and suggesting that they are positive proxies for robustness.  Because G~ARLS is negative, this also means that positive robustness proxies will correlative negatively with ARLS. In other words, trade-off will occur.

See _2013June18,
> summary( lm( 1/tb$Tg.vs.Tc ~ tb$G*log(tb$R0)) ) #positive correlation
lm(formula = 1/tb$Tg.vs.Tc ~ tb$G * log(tb$R0))

     Min       1Q   Median       3Q      Max 
-0.16425 -0.04734 -0.01812  0.04014  0.22382 

                Estimate Std. Error t value Pr(>|t|)
(Intercept)       1.8971     1.8101   1.048    0.329
tb$G              2.4978    12.6409   0.198    0.849
log(tb$R0)        0.2725     0.3067   0.888    0.404
tb$G:log(tb$R0)  -0.3646     2.0899  -0.174    0.866

Residual standard error: 0.1174 on 7 degrees of freedom
Multiple R-squared: 0.6008, Adjusted R-squared: 0.4297 
F-statistic: 3.512 on 3 and 7 DF,  p-value: 0.0776 

See _2013May31-H2O2LOH.R
>  ### Cv/Cb or Cb/Cv ~ robustness? I need a positive proxy
>  summary(lm( tb$Cv.vs.Cb ~ tb$G ) )  #positive, p=0.37,

lm(formula = tb$Cv.vs.Cb ~ tb$G)

    Min      1Q  Median      3Q     Max 
-1.2357 -0.7075 -0.3445  0.5111  1.9584 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.5274     1.4296   0.369    0.720
tb$G         10.6344    11.3144   0.940    0.369

Residual standard error: 1.13 on 10 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared: 0.08117, Adjusted R-squared: -0.01071 
F-statistic: 0.8834 on 1 and 10 DF,  p-value: 0.3694 

Kotz, Nadarajah, 2000, Extreme value distributions, theory and applications

The extreme value distribution is (Gumbel-type distribution): 
  Pr[X<=x] = exp[-exp[-(x-mu)/sigma] ]
This is often called the double-exponential distribution. 

The generalized extreme value distribution is:

There should some forms of conversion between the extreme value distribution and Gompertz distribution. 

Kotz, Nadarajah, 2000, Extreme value distributions, theory and applications

Interaction terms in lm() in R

lm0 <- lm(y ~ r*s, data=d)
lm1 <- lm(y ~ r + s + r:s, data=d)

Molecular number and noises using coefficient of variation -> diploid cells are more robust than haploid cells

Diploid cells are more robust than haploid cells.

Diploid cells are generally considered more robust than haploid cells. If we use binomial distribution to model the stochastic variation in the n number of molecules as a source gene expression noises, the coefficient of variation (CV) can be calculated from standard deviation divided by mean: 
                sqrt(np(1-p)) / np  = 1/ sqrt(np/(1-p) ~ 1/sqrt(n)
Hence, doubling the numbers of molecules in diploid cells will reduce CV by sqrt(2). Because small CV indicate less noisy and more robustness, diploid cells are sqrt(2) more robust than haploid cells. Similar argument was put by Schroedinger in 1944.

For binomial distribution, mean = np. Variance = np(1-p). Standard deviation = sqrt(np(1-p)). So, 
  CV = stddev / mean 

CV is basically the inverse of the signal-to-noise ration. 


Monday, June 17, 2013

gsub and strplit on vector of names

#I want to remove '_alpha' in this list
myInput = c( "afg3_alpha", "alg12_alpha", "dbp3_alpha",  "elp4_alpha",  "fob1_alpha",  "gpr1_alpha")

#my current solution

tmpsplit = strsplit(myInput, '_')
tmp = NA
for (item in tmpsplit){
  tmp = c(tmp, item[1])
results1  = tmp[-1];   
#this is what I need, but I want a better solution. 

#Here is what I come up with, but it is pretty awkward

tmp3 = lapply(myInput, FUN=function(x){strsplit(x, '_')[[1]][1]})
results2 = unlist(tmp3)

#gsub is the best solution
gsub("_.*", "", myInput)

R plot in bold font

plot( log(tb$a) ~ tb$b, pch=19, col='blue', xlim=c(0.05, 0.22),ylim=c(-8,-3.5),xlab="G",ylab="lnR") 
text( tb$b*1.02, log(tb$a*1.1), as.character(tb$strain ) )

Github, Qin06EXG data, using a Snow Leopard laptop

First I created a repository on Gibhut

Then I cloned the repository to laptop: /Users/hongqin/github/Qin06EXGData-master

$mv ../qinexg06data.local/* .
$ ll
total 8
drwxr-xr-x  20 hongqin  staff   680B Dec 16  2011 101S
drwxr-xr-x  20 hongqin  staff   680B Dec 16  2011 BY4743
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 M1-2
drwxr-xr-x  19 hongqin  staff   646B Dec 16  2011 M13
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 M14
drwxr-xr-x  19 hongqin  staff   646B Dec 16  2011 M2-8
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 M22
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 M32
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 M34
drwxr-xr-x  27 hongqin  staff   918B Dec 16  2011 M5
drwxr-xr-x  20 hongqin  staff   680B Dec 16  2011 M8
drwxr-xr-x  16 hongqin  staff   544B Dec 16  2011 PSY316
-rw-r--r--@  1 hongqin  staff    78B Jun 17 12:09
drwxr-xr-x  22 hongqin  staff   748B Dec 16  2011 RM112N
drwxr-xr-x  22 hongqin  staff   748B Dec 16  2011 SGU57
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 SK1
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 W303
drwxr-xr-x  18 hongqin  staff   612B Dec 16  2011 YPS128
drwxr-xr-x  23 hongqin  staff   782B Dec 16  2011 YPS163

$git init

$ git remote add origin

$ git add *
$ git commit -m "qin06exg"
# A lot of things showed up on the screen, including subdirectory files. 
$$ git commit -m "qin06exg"
[master (root-commit) f11f5a4] qin06exg
 345 files changed, 84504 insertions(+)
 create mode 100755 101S/.RData
 create mode 100755 101S/.Rhistory
 create mode 100755 101S/
 create mode 100755 101S/
 create mode 100755 101S/101S.boot.dat

 create mode 100755 101S/101S.merged
 ... .... ...

$ git push --force origin master 
Counting objects: 251, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (247/247), done.
Writing objects: 100% (251/251), 638.97 KiB, done.
Total 251 (delta 15), reused 0 (delta 0)

 + d37a505...f11f5a4 master -> master (forced update)

Thursday, June 13, 2013

Survivor function for binomial mortality function

June 20, 2013, This is actually wrong. I cannot believe that I made a mistake in the basic calculus.  

Notes, intro to modeling, Nimbios REU, day 2

S<-->I is introduced using matrix.

S(t+1) = 0.9 S + 0.2 I
I(t+1) = 0.1S + 0.8 I

[S(t+1), I(t+1)]^T = [(0.9, 0.1)^T, (0.2, 0.8)^T] [S, I]

X = A X for equilibrium, and eigen value will be found.

Leslie matrix model is introduced using locust with  3 stages: egg, nymph, adult. This simple case actually oscillates.

Differential equations

dx/dt = r x ( 1 - x/K)    logistic growth

dx/dt = r x (K-x) ( x-b)     Allee effect

Two interacting populations (predator prey model)
 dx/dt = a x - bxy
 dy/dt = cxy - dy

More on disease models:
  S-I-R model
I can be infected or immune
R can be recovered (immue) or removed.

Discussion focused on SIR without demographics (no birth or death).

R0 basic reproduction ratio is introduced.

SIR with waning immunity
  dS/dT = - beta SI + gamma R
  dI/dt = beta SI- v I
  dR/dt = vI - gamma R

SEIR (E for exposed, latent, not able to transmit disease)

Student group exercise is to model 3 population:
 pop1: logistic growth
 pop2: gowth with Allee effect
 pop3: exponential growht
 pop1 and pop2 compete with each other. Pop 3 cooperate with the other two populations.

Wednesday, June 12, 2013

Modeling positive and negative gene interactions in network reliability model

In Costanzo10's paper, there are positive and negative genetic interactions that indicate antagonizing and cooperative functions of the two genes involved, respectively. The extreme case negative gene interaction is synthetic lethality.  Synthetic lethality can be modeled by parallel configuration. Positive/Antagonizing interactions may be modeled by state vectors of the nodes involved. In fact, state vector is a general way for network modeling.

Bibtex, reliability and network (in progress)

Title: Network Reliability
Authors: Ball, Michael O.
Colbourn, Charles J.
Provan, J.S.
Department/Program: ISR
Type: Technical Report
Keywords: algorithms, combinatorics, computational complexity, graph theory, reliability, network reliability, networks, Systems Integration
Issue Date: 1992
Series/Report no.: ISR; TR 1992-74
Abstract: This paper provides a detailed review of the state of the art in the field of network reliability analysis. The primary model treated is a stochastic network in which arcs fail randomly and independently with known failure probabilities. The inputs to the basic network reliability analysis problem consist of the network and a failure probability for each are in the network. The output is some measure of the reliability of the network. The reliability measures treated most extensively in this paper are: the two terminal measure, the probability that there exists a path between two specified nodes; the all-terminal measure the probability that the network is connected and the k-terminal measure, the probability that a specified node subset, K, is connected. In all cases the results concerning each problem's computational complexity, exact algorithms, analytic bounds and Monte Carlo methods are covered. The paper also treats more complex reliability measures including performability measures and stochastic shortest path, max flow and PERT problems. A discussion is provided on applications and using the techniques covered in practice.
Appears in Collections:Institute for Systems Research Technical Reports

author={Ball, M.O.},
journal={Reliability, IEEE Transactions on},
title={Computational Complexity of Network Reliability Analysis: An Overview},
abstract={This paper presents an overview of results related to the computational complexity of network reliability analysis problems. Network reliability analysis problems deal with the determination of reliability measures for stochastic networks. We show how these problems are related to the more familiar computational network problems of recognizing certain subnetworks, finding optimal subnetworks, and counting certain subnetworks. We use these relationships to show that the k-terminal, the 2-terminal, and the all-terminal network reliability analysis problems are at least as hard as the renowned set of computationally difficult problems, NP-Complete. Finally, we discuss the impact of these results on how one should approach problem solving in this area.},
keywords={Algorithm design and analysis;Computational complexity;Computer network reliability;Computer networks;Educational institutions;Graph theory;Problem-solving;Reliability theory;Seismic measurements;Stochastic processes},

author={Ray, G. A. and Dunsmore, J. J.},
booktitle={INFOCOM '88. Networks: Evolution or Revolution, Proceedings. Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies, IEEE},
title={Reliability of network topologies},
abstract={The authors present some analytical results which can be used to compute network reliability. Some simple techniques to make asymptotic approximations based on parameters of the network topology are derived. Explicit reliability formulae are computed for four network topologies, including the star and counterrotating ring. Actual manufacturer's data and failure models for laser diodes and LEDs are used to compare the effects of transmitter reliability on the whole network.<>},
keywords={computer networks;graph theory;network topology;reliability;LAN;LEDs;asymptotic approximations;counterrotating ring;failure models;laser diodes;network reliability;network topologies;network topology;reliability formulae;star topology;transmitter reliability;Communication networks;Computer aided manufacturing;Computer network reliability;Computer networks;Light emitting diodes;Network topology;Optical transmitters;Reliability theory;Telecommunication network reliability;Virtual manufacturing},

author={Abraham, J.A.},
journal={Reliability, IEEE Transactions on},
title={An Improved Algorithm for Network Reliability},
abstract={Boolean algebra has been used to find the probability of communication between a pair of nodes in a network by starting with a Boolean product corresponding to simple paths between the pair of nodes and making them disjoint (mutually exclusive). A theorem is given, the use of which enables the disjoint products to be found much faster than by existing methods. An algorithm and results of its implementation on a computer are given. Comparisons with existing methods show the usefulness of the algorithm for large networks.},
keywords={Algorithm design and analysis;Boolean algebra;Computer network reliability;Computer networks;Ducts;Fault trees;Telecommunication network reliability;Boolean expressions;Disjoint products;Reliability algorithm;Terminalpair reliability},

Reliability network equivalent approach to distribution systems reliability evaluation, R. Billongton, P. Wong

Reliability network equivalent approach to distribution systems reliability evaluation,
R. Billongton, P. Wong
IEE, Proc. Gener. Transm. Distrib. Vol 145, No2, March 1998

This paper grouped subnetwork into large blocks, which is similar to the way that I am modeling the interacting loci for QTL study.

Reliability model with repairs

Akinpelo's slides

Interacting loci and QTL in network model of aging

Complex system configuration can be addressed by state vectors.

See PPT slides of J. Akipelu.

Synthetical lethal pairs is equivalent to parallel components

The network survivor function S(t) is based on network structure function (see Leemis09, example 3.9 figure 3.9 on page72). Mortality rate (hazard rate) can be found by definition:  h(t) = -S' / dS .

$S_a$ and $S_b$ follow the Weibull model in earlier aging based on GG01.

System mortality rate for serial components

This explains that product of survivor function leads to summation of mortality rate (hazard function).

Basic rules in differential calculus

Notes, discrete time modeling, NimbioS REU course, day 1

Dr. Suzanne Lenhart focused on discrete time model. 

She started with the population growth of Florida sandhill cranes. She stared with  a exponential growth in discrete steps. 

She emphasized that order of events is crucial in discrete models. 

She modifed the model to x(n+1) = 0.97 (x(n) + 5)
She explained stable models and equilibrium. 

She added logistical growth:
  x(n+1) = x(n) + R x(n) (1-x(n)/K)
where R is intrinsic growth rate.  Here a student explained the carrying capacity K. 
She then found the equilibrium population X_bar. 
She then asked students what are the unrealistic assumptions for logistical growth and discuss way to improve them. She suggested R and K can be time dependent. 

She gave another growth model (Allee model?)

x(n+1) = X(n) + R x(n) (1-x(n)/K)(x(n)-a),   0<a<K.
where $a$ is the critical mass. 

Lenhart gave a example of 'augmentation'. 
x_k+1 = x_k + r x_k (1- x_k / K1)    threatened population
y_k+1 = y_k + s y_k (1 - y_k /K2)    captive breed population

She asked students to work in groups to modify the model to "move a percentage of pop $y$ to $x$ 'instantly' at each time step. 
After about 5 minutes of discussion, Lenhart worked out solutions on the board. She used 'diagrams' to illustrate two ways to approach the problem: growth + then augment, augment + then growth. 

After 50 minutes, Lenhart started another example, vehicular stopping distance. 
This case asks students to estimate the distance in feet that car can travel in 2 second with speed at miles per hour.  One mile = 5280 feet and one hour = 3600 seconds. 
The final form for stop_distance consist of a reactive_distance (c1 * v) and a brake distance (c2 * v^2).  The brake distance is reasoned out by using force, moment, and physic principles.

Lenhart concluded the lecture with introducing gender to a population model. 

Tuesday, June 11, 2013

Brian Caffo's lecture on law of large numbers

Geometric interpretation of additive variance

My intuitive understanding of additive variance:
   Total variance = Var(X) + Var(Y)
  where Var(X) = sum of (x_i - mean)^2 .

So, Var(X) and Var(Y) are made of the square areas around their mean.  Total variance is the total area. Additive indicate X and Y are independent.

Co-variance will be the rectangles from this view.
Var(x+y) = Var(x) + Var(y) + 2Cor(X,Y)

I need to add a diagram to illustate this.

Sample R code for bootstrapping residues for regression

y = rnorm(100) + rep(c(0, 1), each = 50)
x = rep(c(1,2), each = 50)

my = mean(y)
res = lm(y~x)$resid

# Log LR test
lr.obs = sum((y-my)^2) - sum(res^2) # Can replace with "F" like statistic = c()
for(boot in 1:1000){ = my + sample(res, replace = T) = lm($resid[boot] = sum((^2) - sum(^2)

pval = mean(>= lr.obs)

This sample code was provided by Dr. Dipen Sangurdekar.  Boostrapping residue of the empirical data is considered to be more reasonable estimation of p-values.