Skip to contents

Implementation of equation 6 (arbitrary residual data) and equation 7 (residual correlations) in the paper. The alpha limit is calculated (equation 9). The limit is used when alpha =1 cannot be chosen (warning produced). In output, alpha is attribute.

Usage

RegSDCadd(y, resCorr = NULL, x = NULL, yStart = NULL, ensureIntercept = TRUE)

Arguments

y

Matrix of confidential variables

resCorr

Required residual correlations (possibly recycled)

x

Matrix of non-confidential variables

yStart

Arbitrary data whose residuals will be used. Will be calculated from resCorr when NULL.

ensureIntercept

Whether to ensure/include a constant term. Non-NULL x is subjected to EnsureIntercept

Value

Generated version of y with alpha as attribute

Details

Use epsAlpha=NULL to avoid calculation of alpha. Use of alpha (<1) will produce a warning. Input matrices are subjected to EnsureMatrix.

Author

Øyvind Langsrud

Examples

x <- matrix(1:10, 10, 1)
y <- matrix(rnorm(30) + 1:30, 10, 3)
yOut <- RegSDCadd(y, c(0.1, 0.2, 0.3), x)

# Correlations between residuals as required
diag(cor(residuals(lm(y ~ x)), residuals(lm(yOut ~ x))))
#> [1] 0.1 0.2 0.3

# Identical covariance matrices
cov(y) - cov(yOut)
#>              [,1]         [,2]         [,3]
#> [1,] 1.776357e-15 7.105427e-15 5.329071e-15
#> [2,] 7.105427e-15 1.421085e-14 1.243450e-14
#> [3,] 5.329071e-15 1.243450e-14 7.105427e-15
cov(residuals(lm(y ~ x))) - cov(residuals(lm(yOut ~ x)))
#>               [,1]          [,2]          [,3]
#> [1,] -6.661338e-16 -1.110223e-15  9.992007e-16
#> [2,] -1.110223e-15  0.000000e+00 -1.471046e-15
#> [3,]  9.992007e-16 -1.471046e-15  1.110223e-15

# Identical regression results
summary(lm(y[, 1] ~ x))
#> 
#> Call:
#> lm(formula = y[, 1] ~ x)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -1.6541 -0.8349 -0.3027  0.8195  2.3333 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   0.3089     0.8918   0.346 0.737999    
#> x             0.9642     0.1437   6.709 0.000151 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.306 on 8 degrees of freedom
#> Multiple R-squared:  0.8491,	Adjusted R-squared:  0.8302 
#> F-statistic: 45.01 on 1 and 8 DF,  p-value: 0.0001514
#> 
summary(lm(yOut[, 1] ~ x))
#> 
#> Call:
#> lm(formula = yOut[, 1] ~ x)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.94212 -0.86987 -0.01549  0.30785  2.05468 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   0.3089     0.8918   0.346 0.737999    
#> x             0.9642     0.1437   6.709 0.000151 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.306 on 8 degrees of freedom
#> Multiple R-squared:  0.8491,	Adjusted R-squared:  0.8302 
#> F-statistic: 45.01 on 1 and 8 DF,  p-value: 0.0001514
#> 

# alpha as attribute
attr(yOut, "alpha")
#> [1] 1

# With yStart as input and alpha limit in use (warning produced)
yOut <- RegSDCadd(y, NULL, x, 2 * y + matrix(rnorm(30), 10, 3))
#> Warning: alpha =  0.359609563673981
attr(yOut, "alpha")
#> [1] 0.3596096

# Same correlation for all variables
RegSDCadd(y, 0.2, x)
#>            [,1]     [,2]     [,3]
#>  [1,] 0.1494604 10.52564 20.53769
#>  [2,] 1.5386025 11.05852 21.64399
#>  [3,] 3.6344120 11.44332 24.19438
#>  [4,] 5.8993091 13.54563 24.33327
#>  [5,] 6.1946294 14.88324 27.17731
#>  [6,] 5.6022459 15.83345 24.14188
#>  [7,] 5.4877111 18.83235 27.40053
#>  [8,] 9.9434197 16.74912 26.96492
#>  [9,] 8.7776129 17.83100 29.35738
#> [10,] 8.8948941 20.20814 31.12093
#> attr(,"alpha")
#> [1] 1
# But in this case RegSDCcomp is equivalent and faster
RegSDCcomp(y, 0.2, x)
#>            [,1]     [,2]     [,3]
#>  [1,]  2.353808 10.18794 22.51758
#>  [2,]  1.740246 11.78502 20.46485
#>  [3,]  3.035642 11.37871 22.73457
#>  [4,]  3.612985 12.99010 24.39668
#>  [5,]  5.849152 14.20320 24.54176
#>  [6,]  5.826319 16.41682 28.20489
#>  [7,]  7.287550 18.21807 26.55755
#>  [8,]  6.627010 17.79353 26.99387
#>  [9,]  7.299075 19.66443 30.20654
#> [10,] 12.490511 18.27258 30.25402


# Make nearly collinear data
y[, 3] <- y[, 1] + y[, 2] + 0.001 * y[, 3]
# Not possible to achieve correlations. Small alpha with warning.
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
#> Warning: alpha =  0.00540603699649462
#>            [,1]      [,2]     [,3]
#>  [1,]  1.085687  9.701785 10.80909
#>  [2,]  4.048861 11.359769 15.43114
#>  [3,]  4.122992 11.887615 16.03209
#>  [4,]  2.903179 13.800613 16.72684
#>  [5,]  4.633005 15.090374 19.75092
#>  [6,]  5.144869 15.641939 20.81319
#>  [7,]  5.088251 18.428749 23.54311
#>  [8,]  7.872356 17.877153 25.77843
#>  [9,]  9.804360 16.880807 26.71378
#> [10,] 11.418736 20.241600 31.69099
#> attr(,"alpha")
#> [1] 0.005406037
# Exact collinear data
y[, 3] <- y[, 1] + y[, 2]
# Zero alpha with warning
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
#> Warning: Could not calculate alpha. 0 returned
#> Warning: alpha =  0
#>            [,1]      [,2]     [,3]
#>  [1,]  2.287637  9.135401 11.42304
#>  [2,]  1.947004 11.091959 13.03896
#>  [3,]  2.419759 12.981869 15.40163
#>  [4,]  4.186765 13.098180 17.28494
#>  [5,]  3.387029 16.122526 19.50955
#>  [6,]  7.728568 15.967254 23.69582
#>  [7,]  6.121133 17.886376 24.00751
#>  [8,]  9.040144 16.392123 25.43227
#>  [9,] 10.483649 18.810724 29.29437
#> [10,]  8.520609 19.423993 27.94460
#> attr(,"alpha")
#> [1] 0