This site is to serve as my note-book and to effectively communicate with my students and collaborators. Every now and then, a blog may be of interest to other researchers or teachers. Views in this blog are my own. All rights of research results and findings on this blog are reserved. See also http://youtube.com/c/hongqin @hongqin
Friday, November 15, 2013
*** How to generated correlated random numbers with specifed R-squared
R^2 = 1 - SSres / SStot
SSres = sum of (y_obs - fitted)^2
SStot = sum of (y_obs - y_mean)^2
For y = rho * x1 + sqrt(1-rho^2)*x2
rho^2 = 1- (y - x1*rho)^2 / (x2^2)
For standardized random numbers
x1 = rnorm(100)
x2 = rnorm(100)
rho = 0.5
y = rho * x1 + sqrt(1-rho^2)*x2
http://www.sitmo.com/article/generating-correlated-random-numbers/
> set.seed(2014)
> N=500
> x = rnorm(N)
> error = rnorm(N)
> rho = sqrt(0.5)
> y= rho*x + sqrt(1-rho^2)*error #rho is the slope
> summary(lm(y ~ x ))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-1.85255 -0.47818 0.04374 0.46947 2.17727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.00915 0.03101 -0.295 0.768
x 0.69389 0.03165 21.921 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6915 on 498 degrees of freedom
Multiple R-squared: 0.4911, Adjusted R-squared: 0.49
F-statistic: 480.5 on 1 and 498 DF, p-value: < 2.2e-16
Generate non-standardized random numbers
> set.seed(2014)
> x=rnorm(1000)*4+2
> error = rnorm(1000)
> rho=sqrt(0.5)
> y = rho*(x-2)/4 + sqrt(1-rho^2)*error
> summary(lm(y~x))
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.1879 -0.4414 -0.0138 0.4289 2.7802
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.37723 0.02428 -15.53 <2e-16 ***
x 0.18475 0.00546 33.83 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6751 on 998 degrees of freedom
Multiple R-squared: 0.5342, Adjusted R-squared: 0.5338
F-statistic: 1145 on 1 and 998 DF, p-value: < 2.2e-16
>
> y2 = y*4+2*rho
> summary(lm(y2~x))
Call:
lm(formula = y2 ~ x)
Residuals:
Min 1Q Median 3Q Max
-8.7515 -1.7656 -0.0552 1.7155 11.1207
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.09471 0.09713 -0.975 0.33
x 0.73899 0.02184 33.834 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.701 on 998 degrees of freedom
Multiple R-squared: 0.5342, Adjusted R-squared: 0.5338
F-statistic: 1145 on 1 and 998 DF, p-value: < 2.2e-16
Labels:
***,
R,
random numbers,
Rsquared,
star,
statistics
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment