Friday, November 15, 2013

*** How to generated correlated random numbers with specifed R-squared


R^2 = 1 - SSres / SStot
SSres = sum of (y_obs - fitted)^2
SStot = sum of (y_obs - y_mean)^2
 
For y = rho * x1 + sqrt(1-rho^2)*x2
rho^2 =  1-  (y - x1*rho)^2 / (x2^2)


For standardized random numbers
x1 = rnorm(100)
x2 = rnorm(100)
rho = 0.5
y = rho * x1 + sqrt(1-rho^2)*x2

http://www.sitmo.com/article/generating-correlated-random-numbers/

> set.seed(2014)
> N=500
> x = rnorm(N)
> error = rnorm(N)
> rho = sqrt(0.5)
> y= rho*x + sqrt(1-rho^2)*error  #rho is the slope
> summary(lm(y ~ x ))

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max
-1.85255 -0.47818  0.04374  0.46947  2.17727

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.00915    0.03101  -0.295    0.768   
x            0.69389    0.03165  21.921   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6915 on 498 degrees of freedom
Multiple R-squared: 0.4911,    Adjusted R-squared:  0.49
F-statistic: 480.5 on 1 and 498 DF,  p-value: < 2.2e-16 



Generate non-standardized random numbers
> set.seed(2014)
> x=rnorm(1000)*4+2
> error = rnorm(1000)
> rho=sqrt(0.5)
> y = rho*(x-2)/4 + sqrt(1-rho^2)*error
> summary(lm(y~x))


Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-2.1879 -0.4414 -0.0138  0.4289  2.7802

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.37723    0.02428  -15.53   <2e-16 ***
x            0.18475    0.00546   33.83   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6751 on 998 degrees of freedom
Multiple R-squared: 0.5342,    Adjusted R-squared: 0.5338
F-statistic:  1145 on 1 and 998 DF,  p-value: < 2.2e-16

>
> y2 = y*4+2*rho
> summary(lm(y2~x))

Call:
lm(formula = y2 ~ x)

Residuals:
    Min      1Q  Median      3Q     Max
-8.7515 -1.7656 -0.0552  1.7155 11.1207

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.09471    0.09713  -0.975     0.33   
x            0.73899    0.02184  33.834   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.701 on 998 degrees of freedom
Multiple R-squared: 0.5342,    Adjusted R-squared: 0.5338
F-statistic:  1145 on 1 and 998 DF,  p-value: < 2.2e-16 

No comments:

Post a Comment