Skip to contents

In section 7 in the paper Z = t(X) %*% Y where X is a dummy matrix. Some elements of Y can be found directly as elements in Z. Corresponding rows of X will be removed. After removing rows, some columns will only have zeros and these will also be removed.

Usage

ReduceX(x, z = NULL, y = NULL, digits = 9)

Arguments

x

X as a matrix

z

Z as a matrix

y

Y as a matrix

digits

When non-NULL and when NULL y input, output y estimates close to whole numbers will be rounded using digits as input to RoundWhole.

Value

A list of four elements:

x

Reduced x

z

Corresponding reduced z or NULL when no z in input

yKnown

Logical vector specifying elements of y that can be found directly as elements in z

y

As y in input (not reduced) or estimated y when NULL y in input

Details

To estimate Y, this function finds some values directly from Z and other values by running Z2Yhat on reduced versions of X and Z.

Author

Øyvind Langsrud

Examples

# Same data as in the paper
z <- RegSDCdata("sec7z")
x <- RegSDCdata("sec7x")
y <- RegSDCdata("sec7y")  # Now z is t(x) %*% y 

a <- ReduceX(x, z, y)
b <- ReduceX(x, z)
d <- ReduceX(x, z = NULL, y)  # No z in output

# Identical output for x and z
identical(a$x, b$x)
#> [1] TRUE
identical(a$x, d$x)
#> [1] TRUE
identical(a$z, b$z)
#> [1] TRUE

# Same y in output as input
identical(a$y, y)
#> [1] TRUE
identical(d$y, y)
#> [1] TRUE

# Estimate of y (yHat) when NULL y input
b$y
#>                freq
#> row1_col1  4.173913
#> row2_col1  4.521739
#> row3_col1 12.000000
#> row4_col1 13.304348
#> row1_col2  9.826087
#> row2_col2 10.173913
#> row3_col2 22.000000
#> row4_col2 19.000000
#> row1_col3 32.000000
#> row2_col3  8.304348
#> row3_col3  6.695652
#> row4_col3 16.000000
#> row1_col4 30.000000
#> row2_col4  8.000000
#> row3_col4 -2.695652
#> row4_col4  7.695652

# These elements of y can be found directly in in z
y[a$yKnown, , drop = FALSE]
#>           freq
#> row3_col1   12
#> row3_col2   22
#> row4_col2   19
#> row1_col3   32
#> row4_col3   16
#> row1_col4   30
#> row2_col4    8
# They can be found by searching for unit colSums
colSums(x)[colSums(x) == 1]
#> row1:col3 row1:col4 row2:col4 row3:col1 row3:col2 row4:col2 row4:col3 
#>         1         1         1         1         1         1         1 

# These trivial data rows can be omitted when processing data
x[!a$yKnown, ]
#>           Total:Total Total:col1 Total:col2 Total:col3 Total:col4 row1:Total
#> row1_col1           1          1          0          0          0          1
#> row2_col1           1          1          0          0          0          0
#> row4_col1           1          1          0          0          0          0
#> row1_col2           1          0          1          0          0          1
#> row2_col2           1          0          1          0          0          0
#> row2_col3           1          0          0          1          0          0
#> row3_col3           1          0          0          1          0          0
#> row3_col4           1          0          0          0          1          0
#> row4_col4           1          0          0          0          1          0
#>           row1:col3 row1:col4 row2:Total row2:col4 row3:Total row3:col1
#> row1_col1         0         0          0         0          0         0
#> row2_col1         0         0          1         0          0         0
#> row4_col1         0         0          0         0          0         0
#> row1_col2         0         0          0         0          0         0
#> row2_col2         0         0          1         0          0         0
#> row2_col3         0         0          1         0          0         0
#> row3_col3         0         0          0         0          1         0
#> row3_col4         0         0          0         0          1         0
#> row4_col4         0         0          0         0          0         0
#>           row3:col2 row4:Total row4:col2 row4:col3
#> row1_col1         0          0         0         0
#> row2_col1         0          0         0         0
#> row4_col1         0          1         0         0
#> row1_col2         0          0         0         0
#> row2_col2         0          0         0         0
#> row2_col3         0          0         0         0
#> row3_col3         0          0         0         0
#> row3_col4         0          0         0         0
#> row4_col4         0          1         0         0
# Now several columns can be omitted since zero colSums
colSums0 <- colSums(x[!a$yKnown, ]) == 0
# The resulting matrix is output from the function
identical(x[!a$yKnown, !colSums0], a$x)
#> [1] TRUE

# Output z can be computed from this output x
identical(t(a$x) %*% y[!a$yKnown, , drop = FALSE], a$z)
#> [1] TRUE