Open Notebook: study notes on PCA, principal components, with R testing codes, princomp()

Monday, December 22, 2014

study notes on PCA, principal components, with R testing codes, princomp()

ResearchGate: Principal components are linear combinations of original variables x1, x2, etc. So when you do SVM on PCA decomposition you work with these combinations instead of original variables.

37:50 in Ng's video
https://www.youtube.com/watch?v=ey2PE5xi9-A
Ng showes to PCA (linear combinations of raw data) to reduce dimension of data.

PCA is basically orthogonal transformation
http://en.wikipedia.org/wiki/Principal_component_analysis
"Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components."
"PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute. The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)."

R: princomp( )
pc.cr <- princomp(~ Murder + Assault + UrbanPop,
data = USArrests, na.action = na.exclude, cor = TRUE)
pc.cr$scores[1:5, ] #scores probably are PCA results, based on wikipedia entry

I can use examples of linear combination to verify my guess.

#######start of the R testing code and results #########
x1 = rnorm(100)
x2 = rnorm(100)
x3 = x1 + x2 + rnorm(100)/20
x4 = 2*x1 + rnorm(100)/20
X = data.frame(cbind(x1,x2,x3,x4))
pc <- princomp(X)
plot(pc)#only two major components, consistent