Open Notebook: regression

Showing posts with label regression. Show all posts

Monday, July 12, 2021

linear models in Python, Faraway

https://github.com/julianfaraway/LMP

Wednesday, December 30, 2015

linear mixed effect model, lme4

Reference:
https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf

Tuesday, December 30, 2014

least square method, matrix view

Khan academy
https://youtu.be/MC7l96tW8V8

Friday, November 15, 2013

Multicollinearity, correlated predictors in regression

Multicollinearity does not influence the prediction of response (y), but it affect the interpretation of individual predictors.

http://en.wikipedia.org/wiki/Multicollinearity

https://onlinecourses.science.psu.edu/stat501/node/82

Correlated regression, Langley example, Faraway

Correlated errors g <- gls( Employed ~ GNP + Population, correlation=corAR1(form=~Year), data=longley)

Thursday, November 14, 2013

nlme lme usage

From "introduction to R code for linear, nonlinear and linear mixed effect models", compiled by Courtney Meier

Regression with correlated variance, from R blogger

Regression with correlated variance/error
http://www.r-bloggers.com/linear-regression-with-correlated-data/
http://www.quantumforest.com/wp-content/uploads/2011/10/unemployment.txt

# Loading packages
library(lattice)       # Fancy graphics
library(nlme)          # Generalized linear mixed models 

# Reading data
setwd('~/Dropbox/quantumforest')  # Sets default working directory

un = read.csv('nzunemployment.csv', header = TRUE)

# Plotting first youth data and then adding adults
# as time series
with(un,
  {
  plot(youth ~ q, type = 'l', ylim = c(0,30), col='red',
       xlab = 'Quarter', ylab = 'Percentage unemployment')
  lines(q, adult, lty=2, col='blue')
  legend('topleft', c('Youth', 'Adult'), lty=c(1, 2), col=c('red', 'blue'))
  abline(v = 90)
  })


# Creating minimum wage policy factor
un$minwage = factor(ifelse(un$q < 90, 'Different', 'Equal'))

# And a scatterplot
xyplot(youth ~ adult, group=minwage, data = un, 
       type=c('p', 'r'), auto.key = TRUE)


# Linear regression accounting for change of policy
mod1 = lm(youth ~ adult*minwage, data = un)
summary(mod1)

# Centering continuous predictor
un$cadult = with(un, adult - mean(adult))
mod2 = lm(youth ~ cadult*minwage, data = un)
summary(mod2)

plot(mod2)    # Plots residuals for the model fit
acf(mod2$res) # Plots autocorrelation of the residuals

# Now we move to use nlme

# gls() is an nlme function when there are no random effects
mod3 = gls(youth ~ cadult*minwage, data = un)
summary(mod3)

# Adding autocorrelation
mod4 = gls(youth ~ cadult*minwage, correlation = corAR1(form=~1), data = un)
summary(mod4)

Tuesday, June 18, 2013

Interaction terms in lm() in R

lm0 <- lm(y ~ r*s, data=d)
lm1 <- lm(y ~ r + s + r:s, data=d)

Reference:
http://stats.stackexchange.com/questions/19271/different-ways-to-write-interaction-terms-in-lm

Tuesday, June 11, 2013

Sample R code for bootstrapping residues for regression

y = rnorm(100) + rep(c(0, 1), each = 50)

x = rep(c(1,2), each = 50)

my = mean(y)

res = lm(y~x)$resid

# Log LR test

lr.obs = sum((y-my)^2) - sum(res^2) # Can replace with "F" like statistic

lr.star = c()

for(boot in 1:1000){

y.star = my + sample(res, replace = T)

res.star = lm(y.star~x)$resid

lr.star[boot] = sum((y.star-my)^2) - sum(res.star^2)

}

pval = mean(lr.star>= lr.obs)

This sample code was provided by Dr. Dipen Sangurdekar. Boostrapping residue of the empirical data is considered to be more reasonable estimation of p-values.

Friday, May 31, 2013

Loop over all columns for exploratory regression analysis in R

### all possible regression analysis for a given column
pTb = 1: length(tb[1,])
names(pTb) = names(tb)
for( j in c(2:15,17:34) ) {
m = lm( tb[, j] ~ tb$Cv.vs.Cb)
sm = summary(m)
pTb[j] = 1 - pf(sm$fsta[1], sm$fsta[2], sm$fsta[3])
}
pTb[pTb<0.05]

pTb[pTb<0.05]
ARLS Lmax L0.all CLS.vs.Tc Cv.vs.Cb Cb.vs.Cv
3.546097e-02 3.877701e-02 3.137392e-02 4.748623e-02 0.000000e+00 7.306331e-05
Cb.vs.Cv.mean
5.729385e-04