Monday, December 22, 2014

study notes on PCA, principal components, with R testing codes, princomp()

ResearchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.

37:50 in Ng's video
https://www.youtube.com/watch?v=ey2PE5xi9-A
Ng showes to PCA (linear combinations of raw data) to reduce dimension of data. 

PCA is basically orthogonal transformation
http://en.wikipedia.org/wiki/Principal_component_analysis
"Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components."
"PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)."


R:  princomp( )
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
                  data = USArrests, na.action = na.exclude, cor = TRUE)
pc.cr$scores[1:5, ]  #scores probably are PCA results, based on wikipedia entry

I can use examples of linear combination to verify my guess. 


#######start of the R testing code and results #########
x1 = rnorm(100)
x2 = rnorm(100)
x3 = x1 + x2 + rnorm(100)/20
x4 = 2*x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2,x3,x4))
pc <- princomp(X)
plot(pc)#only two major components, consistent



head(pc)

#######start of the R testing code and results #########


#######start of 2nd R testing code and results #########
set.seed(2014)
x1 = rnorm(100)
x2 = x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2))
pc <- princomp(X, cor = TRUE)
head(pc)
pc$score[,1] - (0.707*x1 + 0.707*x2) #does this approach zero? 
summary( pc$score[,1] - (0.707*x1 + 0.707*x2) + mean( 0.707*x1 + 0.707*x2 ) )

#good, it approaches zero

summary(lm( pc$score[,1] ~ x1 ))

#######End of 2nd R testing code and results #########



No comments:

Post a Comment