Presentation of Census Results by Interactive Statistical Models

How accurate is it?

The primary purpose of the estimated model is to reproduce the statistical properties of the original census data. Therefore the statistical model should reproduce the empirical frequencies of different properties as precisely as possible. In order to verify the model accuracy, we have compared the empirical frequencies of different statistically relevant combinations of responses with the estimates derived from the statistical model. We have found that the accuracy of the 10% microdata subset is only marginally better than the statistical model at reproducing the empirical frequencies.

Mean relative error according to subpopulation size

Interval Lower bound Upper bound Number of combinations Model (relative error in %) Microdata (relative error in %)
1 1612 3000 7688027 6.10 5.16
2 3000 5000 5011625 4.88 3.86
3 5000 7500 3220931 4.04 3.07
4 7500 10000 1906156 3.50 2.58
5 10000 15000 2213787 3.04 2.17
6 15000 30000 2695817 2.38 1.67
7 30000 50000 1296118 1.80 1.23
8 50000 100000 1075615 1.37 0.94
9 100000 150000 372570 1.03 0.70
10 150000 300000 358112 0.78 0.55
11 300000 500000 125103 0.55 0 39
12 500000 1000000 71104 0.39 0.28
13 1000000 1500000 15324 0.29 0.20
14 1500000 3000000 8511 0.22 0.14
15 3000000 5000000 1349 0.12 0.08
16 5000000 10300000 200 0.02 0.04

Distribution of relative errors of estimates according to the empirical frequency N(xC) (sub-population size). Comparison of the statistical model and 10%-subset of microdata. In the First two columns we specify the lower and upper bounds of the frequency intervals, respectively. The third column contains the number of properties falling into the given interval of empirical frequencies. The last two columns contain the corresponding mean relative errors for the statistical model and subset of microdata, respectively.

Details are described in the Paper