Showing posts with label regression. Show all posts
Showing posts with label regression. Show all posts

Friday, November 15, 2013

Multicollinearity, correlated predictors in regression


Multicollinearity does not influence the prediction of response (y), but it affect the interpretation of individual predictors.


http://en.wikipedia.org/wiki/Multicollinearity

https://onlinecourses.science.psu.edu/stat501/node/82

Correlated regression, Langley example, Faraway

Correlated errors g <- gls( Employed ~ GNP + Population, correlation=corAR1(form=~Year), data=longley)

Thursday, November 14, 2013

nlme lme usage




Regression with correlated variance, from R blogger


Regression with correlated variance/error
http://www.r-bloggers.com/linear-regression-with-correlated-data/
http://www.quantumforest.com/wp-content/uploads/2011/10/unemployment.txt

# Loading packages
library(lattice)       # Fancy graphics
library(nlme)          # Generalized linear mixed models 

# Reading data
setwd('~/Dropbox/quantumforest')  # Sets default working directory

un = read.csv('nzunemployment.csv', header = TRUE)

# Plotting first youth data and then adding adults
# as time series
with(un,
  {
  plot(youth ~ q, type = 'l', ylim = c(0,30), col='red',
       xlab = 'Quarter', ylab = 'Percentage unemployment')
  lines(q, adult, lty=2, col='blue')
  legend('topleft', c('Youth', 'Adult'), lty=c(1, 2), col=c('red', 'blue'))
  abline(v = 90)
  })


# Creating minimum wage policy factor
un$minwage = factor(ifelse(un$q < 90, 'Different', 'Equal'))

# And a scatterplot
xyplot(youth ~ adult, group=minwage, data = un, 
       type=c('p', 'r'), auto.key = TRUE)


# Linear regression accounting for change of policy
mod1 = lm(youth ~ adult*minwage, data = un)
summary(mod1)

# Centering continuous predictor
un$cadult = with(un, adult - mean(adult))
mod2 = lm(youth ~ cadult*minwage, data = un)
summary(mod2)

plot(mod2)    # Plots residuals for the model fit
acf(mod2$res) # Plots autocorrelation of the residuals

# Now we move to use nlme

# gls() is an nlme function when there are no random effects
mod3 = gls(youth ~ cadult*minwage, data = un)
summary(mod3)

# Adding autocorrelation
mod4 = gls(youth ~ cadult*minwage, correlation = corAR1(form=~1), data = un)
summary(mod4)

Tuesday, June 18, 2013

Interaction terms in lm() in R

lm0 <- lm(y ~ r*s, data=d)
lm1 <- lm(y ~ r + s + r:s, data=d)
Reference:
http://stats.stackexchange.com/questions/19271/different-ways-to-write-interaction-terms-in-lm

Tuesday, June 11, 2013

Sample R code for bootstrapping residues for regression

y = rnorm(100) + rep(c(0, 1), each = 50)
x = rep(c(1,2), each = 50)

my = mean(y)
res = lm(y~x)$resid

# Log LR test
lr.obs = sum((y-my)^2) - sum(res^2) # Can replace with "F" like statistic

lr.star = c()
for(boot in 1:1000){
y.star = my + sample(res, replace = T)
res.star = lm(y.star~x)$resid
lr.star[boot] = sum((y.star-my)^2) - sum(res.star^2)
}

pval = mean(lr.star>= lr.obs)



This sample code was provided by Dr. Dipen Sangurdekar.  Boostrapping residue of the empirical data is considered to be more reasonable estimation of p-values.

Friday, May 31, 2013

Loop over all columns for exploratory regression analysis in R



### all possible regression analysis for a given column
pTb = 1: length(tb[1,])
names(pTb) = names(tb)
 for( j in c(2:15,17:34) ) {
   m = lm( tb[, j] ~ tb$Cv.vs.Cb)
   sm = summary(m)
   pTb[j] = 1 - pf(sm$fsta[1], sm$fsta[2], sm$fsta[3])
 }
 pTb[pTb<0.05]


pTb[pTb<0.05]
         ARLS          Lmax        L0.all     CLS.vs.Tc      Cv.vs.Cb      Cb.vs.Cv 
 3.546097e-02  3.877701e-02  3.137392e-02  4.748623e-02  0.000000e+00  7.306331e-05 
Cb.vs.Cv.mean 
 5.729385e-04