
Regression-based SDC Tools - Synthetic addition with residual correlation control
Source:R/RegSDCadd.R
RegSDCadd.Rd
Implementation of equation 6 (arbitrary residual data) and equation 7 (residual correlations) in the paper. The alpha limit is calculated (equation 9). The limit is used when alpha =1 cannot be chosen (warning produced). In output, alpha is attribute.
Arguments
- y
Matrix of confidential variables
- resCorr
Required residual correlations (possibly recycled)
- x
Matrix of non-confidential variables
- yStart
Arbitrary data whose residuals will be used. Will be calculated from resCorr when NULL.
- ensureIntercept
Whether to ensure/include a constant term. Non-NULL x is subjected to
EnsureIntercept
Details
Use epsAlpha=NULL to avoid calculation of alpha. Use of alpha (<1) will produce a warning. Input matrices are subjected to EnsureMatrix
.
Examples
x <- matrix(1:10, 10, 1)
y <- matrix(rnorm(30) + 1:30, 10, 3)
yOut <- RegSDCadd(y, c(0.1, 0.2, 0.3), x)
# Correlations between residuals as required
diag(cor(residuals(lm(y ~ x)), residuals(lm(yOut ~ x))))
#> [1] 0.1 0.2 0.3
# Identical covariance matrices
cov(y) - cov(yOut)
#> [,1] [,2] [,3]
#> [1,] 1.776357e-15 7.105427e-15 5.329071e-15
#> [2,] 7.105427e-15 1.421085e-14 1.243450e-14
#> [3,] 5.329071e-15 1.243450e-14 7.105427e-15
cov(residuals(lm(y ~ x))) - cov(residuals(lm(yOut ~ x)))
#> [,1] [,2] [,3]
#> [1,] -6.661338e-16 -1.110223e-15 9.992007e-16
#> [2,] -1.110223e-15 0.000000e+00 -1.471046e-15
#> [3,] 9.992007e-16 -1.471046e-15 1.110223e-15
# Identical regression results
summary(lm(y[, 1] ~ x))
#>
#> Call:
#> lm(formula = y[, 1] ~ x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.6541 -0.8349 -0.3027 0.8195 2.3333
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3089 0.8918 0.346 0.737999
#> x 0.9642 0.1437 6.709 0.000151 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 1.306 on 8 degrees of freedom
#> Multiple R-squared: 0.8491, Adjusted R-squared: 0.8302
#> F-statistic: 45.01 on 1 and 8 DF, p-value: 0.0001514
#>
summary(lm(yOut[, 1] ~ x))
#>
#> Call:
#> lm(formula = yOut[, 1] ~ x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.94212 -0.86987 -0.01549 0.30785 2.05468
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.3089 0.8918 0.346 0.737999
#> x 0.9642 0.1437 6.709 0.000151 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 1.306 on 8 degrees of freedom
#> Multiple R-squared: 0.8491, Adjusted R-squared: 0.8302
#> F-statistic: 45.01 on 1 and 8 DF, p-value: 0.0001514
#>
# alpha as attribute
attr(yOut, "alpha")
#> [1] 1
# With yStart as input and alpha limit in use (warning produced)
yOut <- RegSDCadd(y, NULL, x, 2 * y + matrix(rnorm(30), 10, 3))
#> Warning: alpha = 0.359609563673981
attr(yOut, "alpha")
#> [1] 0.3596096
# Same correlation for all variables
RegSDCadd(y, 0.2, x)
#> [,1] [,2] [,3]
#> [1,] 0.1494604 10.52564 20.53769
#> [2,] 1.5386025 11.05852 21.64399
#> [3,] 3.6344120 11.44332 24.19438
#> [4,] 5.8993091 13.54563 24.33327
#> [5,] 6.1946294 14.88324 27.17731
#> [6,] 5.6022459 15.83345 24.14188
#> [7,] 5.4877111 18.83235 27.40053
#> [8,] 9.9434197 16.74912 26.96492
#> [9,] 8.7776129 17.83100 29.35738
#> [10,] 8.8948941 20.20814 31.12093
#> attr(,"alpha")
#> [1] 1
# But in this case RegSDCcomp is equivalent and faster
RegSDCcomp(y, 0.2, x)
#> [,1] [,2] [,3]
#> [1,] 2.353808 10.18794 22.51758
#> [2,] 1.740246 11.78502 20.46485
#> [3,] 3.035642 11.37871 22.73457
#> [4,] 3.612985 12.99010 24.39668
#> [5,] 5.849152 14.20320 24.54176
#> [6,] 5.826319 16.41682 28.20489
#> [7,] 7.287550 18.21807 26.55755
#> [8,] 6.627010 17.79353 26.99387
#> [9,] 7.299075 19.66443 30.20654
#> [10,] 12.490511 18.27258 30.25402
# Make nearly collinear data
y[, 3] <- y[, 1] + y[, 2] + 0.001 * y[, 3]
# Not possible to achieve correlations. Small alpha with warning.
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
#> Warning: alpha = 0.00540603699649462
#> [,1] [,2] [,3]
#> [1,] 1.085687 9.701785 10.80909
#> [2,] 4.048861 11.359769 15.43114
#> [3,] 4.122992 11.887615 16.03209
#> [4,] 2.903179 13.800613 16.72684
#> [5,] 4.633005 15.090374 19.75092
#> [6,] 5.144869 15.641939 20.81319
#> [7,] 5.088251 18.428749 23.54311
#> [8,] 7.872356 17.877153 25.77843
#> [9,] 9.804360 16.880807 26.71378
#> [10,] 11.418736 20.241600 31.69099
#> attr(,"alpha")
#> [1] 0.005406037
# Exact collinear data
y[, 3] <- y[, 1] + y[, 2]
# Zero alpha with warning
RegSDCadd(y, c(0.1, 0.2, 0.3), x)
#> Warning: Could not calculate alpha. 0 returned
#> Warning: alpha = 0
#> [,1] [,2] [,3]
#> [1,] 2.287637 9.135401 11.42304
#> [2,] 1.947004 11.091959 13.03896
#> [3,] 2.419759 12.981869 15.40163
#> [4,] 4.186765 13.098180 17.28494
#> [5,] 3.387029 16.122526 19.50955
#> [6,] 7.728568 15.967254 23.69582
#> [7,] 6.121133 17.886376 24.00751
#> [8,] 9.040144 16.392123 25.43227
#> [9,] 10.483649 18.810724 29.29437
#> [10,] 8.520609 19.423993 27.94460
#> attr(,"alpha")
#> [1] 0