National Academies Press: OpenBook
« Previous: 4 Redistricting and Related Legal Uses
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

– 5 –
image
Delivery of Government Services

Michael McDonald (University of Florida) moderated a session on the uses of decennial census data for planning and delivery of government services at the state and local levels. The session included four speakers: Clifford Cook (City of Cambridge, Massachusetts), who presented several use cases of interest to city planners; Keith Wiley (Housing Assistance Council), who looked at the effects of differential privacy on characteristics of rural and occupied housing; Beth Jarosz (Population Reference Bureau), who discussed the implications of differential privacy for regional planning in California; and Abraham Flaxman (University of Washington), who spoke on anticipating the effect of differential privacy on evidence-based public policy at the state and local level. Floor discussion followed their presentations.

5.1 PRIVATIZED DATA IN CITY PLANNING

Clifford Cook (City of Cambridge, Massachusetts) began with some context about his experience as an urban planner in a city of 105,000 people covering an area of 6.4 square miles (the 10th densest incorporated municipality in the U.S.) and containing 32 census tracts, 88 block groups, and 1,100 blocks. He noted that planners used decennial census data for a variety of purposes: understanding the composition of their community; understanding the dynamics of community change; evaluating the potential effects of private sector development and the provision of public goods, particularly sewers, parks, highways, and transit systems; and modeling the effects of changes in dynamic systems such as transportation and population. What planners

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

brought to the discussion of census data uses and privacy, Cook said, was the geographic spatial dimension.

5.1.1 School-Age Children

For his first use case, Cook examined the data for children ages 5 to 17, an important group because of their school attendance and consumption of parks and recreation services, among other things. He examined the percentage change for this age group at the census tract level from the originally released 2010 Summary File 1 (SF1) compared with the 2010 Demonstration Data Products (DDP); see Figure 5.1(a). The preponderance of kids in Cambridge were in the western part of the city in SF1, which accorded with school enrollment data. The DDP added kids to the central and eastern parts of the city, which some people would like because they want parks there. However, the 2010 DDP was not representative of the actual composition of the city.

Cook noted that the 2010 DDP added 800 5–17-year-olds for the city compared with SF1, which is a number big enough to justify another elementary school. On the other side, the DDP removed 350 kids from the cohort of children ages 0 to 4, which did not square with vital statistics and school enrollment information. In addition, the number of households with children increased from 7,000 to 10,000, yet the number of kids ages 0–17 only increased by 450. The net result was to lower the ratio of kids per households with kids from 1.73 to 1.22, which is a large change. Finally, married couples with kids increased by over 50 percent, from 4,700 to 7,100. While definitions of the married couple group have now changed to include same-sex married couples, that cannot explain the difference as there simply were not that many of them.

5.1.2 Seniors Living Alone

For his second use case, Cook examined the data for people ages 65 and older living alone, a group that often requires special services that are brought to them rather than provided in a more central location. At the block group level, it proved hard to see a pattern in the percentage differences between the numbers for this group in SF1 compared to the DDP (see Figure 5.1(b)). Decreases tended to be somewhat more pronounced in the western half of the city. Cook also found that of the 18 or so block groups that had clusters of residents aged 65 and older in public housing, privately provided housing, or assisted living housing, over half lost elderly households while only one showed a gain. The net result was that these block groups lost more than half of their elder households, who were redistributed around the city. This was clearly not representative of actual living patterns.

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

5.1.3 Vacancy Rates

Cook said that census data on vacancy rates matter because it is hard to get information from other sources, unlike the case for many kinds of housing data. Cambridge has purchased commercial statistics on vacancies, but the data have never been satisfactory, so the city must rely on SF1 and, in between censuses, the American Community Survey (ACS). The data are important, given that vacancies are a contentious issue in a market like Cambridge, which has a very low vacancy rate.

Looking at blocks, Cook found that the percentage of blocks with a zero percent vacancy rate doubled in the DDP compared with SF1. Excluding blocks without housing, the percentage quadrupled. Other blocks showed an increase in vacancy rates. Turning to block groups, it appeared that three block groups absorbed the bulk of the added vacant housing (see Figure 5.1(c)). Two of the three block groups include large affordable housing developments, and the third is home to a large student housing development at Harvard University, which functions very much like affordable housing. The biggest increase in vacant housing was in one of these block groups (in the far northwestern part of the city, which consists of one small private development, one large public housing development, and two even larger privately owned affordable housing developments. The vacancy rate there went from roughly 2 to 3 percent in SF1, which made sense knowing the organizations that run those developments, to roughly 29 percent. That was not a figure Cook would like to have to defend: it is neither credible nor reflective of what was occurring on the ground.

Cook concluded that while the differential privacy procedure was intended to be random in its application across the city, the results showed it was not applied equitably. Across the Boston area and across the country, differential privacy, as implemented in the 2010 DDP would produce situations where areas of large affordable housing would have artificially high vacancy rates because of their urban design. They are, in other words, superblock structures that are not common in suburban areas or older urban areas. Cook said that these artificially high vacancy rates could be used to feed into a narrative about the “failure” of public housing. Other areas would have artificially low vacancy rates.

5.1.4 Household Size

For his fourth use case, Cook looked at average household size for block groups (see Figure 5.1(d)). In SF1, the average was much the same across block groups as one would expect. In the DDP, most changes were minor, but some changes were not credible. For example, in one block group, average household size increased from one person per household in SF1 to 27 people per household in the DDP. In other block groups, the change went the other way, that is, a substantial decrease in average household side. These block groups tended to be

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

dominated by group quarters. The DDP added households but not people to them, so the average household size went down. The scale of changes in some block groups were what might occur across a decade or more, but not in one run of the data to the next.

As a general point, Cook said that areas dominated by group quarters are just not demographically suited to absorb additional household population or households. There ought to be a rule that if a block group or other geography exceeded a certain percentage of group quarters, then the algorithm should not add much, if any, household population there.

Moreover, geography matters, Cook said. Not employing some sort of geographic boundary to redistribute cases in reasonably close proximity would severely undermine the ability to use the data. Cook said he was not a mathematician or computer scientist, but he thought that the spatial dimension needed to be included in the differential privacy algorithm in some way.

Cook also suggested that there needed to be more invariants in the algorithm, perhaps at the census tract level, even though other users might want invariants for even finer-grained geography. Having the household count and the count of vacant units invariant at the tract level was important as well, otherwise, the data would be moved around in ways that would not be tenable.

Cook said that he had already stressed the importance of controlling the spatial distribution. Protecting the relationship between persons and household data was also important, as was treating geographies dominated by group quarters appropriately and placing limits to avoid absurd results like going from 1 person per household to 27.

Cook cautioned that if the data were not defensible, the user community would have a hard time working with them and a really hard time supporting the Census Bureau. Cook wanted to be able to stand up for the data. If the data were deemed unreliable, people would start to turn to other sources, including privately owned datasets, which would make it hard to understand the nature of those data. Cook reported a conversation with a transportation modeler in Massachusetts, who said that if his community could not rely on the decennial census data, they would grab another state database like the one for drivers licenses. They were not thinking about differential privacy in the way they were doing their work.

5.2 DECENNIAL CENSUS, RURAL HOUSING DATA, AND DIFFERENTIAL PRIVACY

Keith Wiley (Housing Advocacy Council) said that he worked in the Washington, DC, area for a small nonprofit organization focused on rural housing. For his presentation, he looked at occupied rural housing in five states with significant rural populations: Kentucky, Mississippi, New Mexico, South

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

Dakota, and West Virginia. He focused on race and ethnicity because of the high-need populations or high-need regions in these states, for example, Native American populations in South Dakota and New Mexico, the border colonias in New Mexico, the Mississippi Delta in Mississippi, and central Appalachia in Kentucky and West Virginia.

Wiley said that the highlight of his day came when people called with problems or issues on rural housing, so he endeavored to think about how these people worked with census data. His organization also was planning a report on rural housing using census data, which figured into his thinking for his presentation.

Wiley raised one more important background element, namely, the many different definitions of rural areas. Wiley used a census tract-based definition that his colleague Lance George developed. It categorizes areas on the basis of housing density and also accounts for commuting patterns using the Rural-Urban Commuting Area (RUCA) codes developed by the Economic Research Service, U.S. Department of Agriculture. It classifies census tracts into three categories: rural and small town, suburban and exurban, and urban. If differential privacy were to alter housing unit counts, this definition of rurality would be affected, as would some other definitions such as the one used in the Duty to Serve regulation promulgated by the Federal Housing Finance Agency, which applies to low-income rural housing.

5.2.1 Analysis of Differences

Wiley proceeded to describe the results of his analysis comparing the original 2010 SF1 data and the 2010 DDP. He first aggregated census tracts to states. For total population and total housing units, there was little difference between the originally published 2010 data and the data in the DDP. Turning to rural areas, the results were not so comforting. For each state, the aggregate estimates of census tracts for rural areas were larger in the DDP by 2 to 5 percent (the latter in New Mexico). In contrast, the DDP resulted in smaller populations in suburban, exurban, and urban areas. This would be problematic in that a community that was actually losing population could appear to be gaining population.

Aggregating occupied units in census tracts to counties showed that the differences between the originally published data and the DDP were less than 10 percent for about three-quarters of counties. On the other hand, the differences were between 11 and 20 percent for another 20 percent of counties, and greater than 20 percent for the rest. The results for census tracts were even worse.

As an example, Wiley pointed to Pendleton County, West Virginia, for which the originally published total for occupied units was 3,200, while the total from the DDP was over 3,800 units. If the error was as large as this—500-odd units—for total occupied units, Wiley expressed concern that the error

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

could cause problems in the case of occupied housing by tenure. For example, many rural areas have a small number of rental units, and the application of differential privacy could double or triple the number, similar to what Cook showed for Cambridge.

Wiley argued that using differential privacy for the 2020 Census could also hurt the continuity of housing estimates over time. For example, if it increased (or decreased) the numbers, one could get a different view of what had been going on in Tucker County, West Virginia, the Mississippi Delta, or the border colonias over time. Assessments of chronic conditions such as chronic, persistent poverty would be hard to conduct. It would be like changing a question in a longitudinal survey, resulting in different data over time.

5.2.2 Two Use Cases: Affordable Rural Housing and Lending to Underserved Markets

The USDA Section 515 Rural Rental Housing loan program supports the construction of affordable housing in predominantly rural areas. After these loans reach maturity and are paid off, the owners are no longer obliged to maintain the units as affordable rental housing. Given that many of the Section 515 loans will reach maturity over the next 20 years—and the concern that many of those units will no longer represent affordable housing—the share of Section 515 units in local areas is emerging as a policy concern. Wiley said that the Housing Assistance Council has done studies to try to assess this risk to rural communities, comparing Section 515 housing units to the balance of occupied renter housing stock. If differential privacy added 400 to 500 units in a place like Pendleton County, West Virginia, this type of analysis would be useless. The community would have to spend resources to try to confirm whether its census number was accurate or not. This kind of analysis, while simple, was important for communities to ensure that they were using their resources effectively to target properties that had loans maturing and warranted investment to protect their affordable housing stock.

Another issue the Housing Assistance Council has been working on concerns mortgage lending in the border colonias, related to the Duty to Serve regulation issued by the Federal Housing Finance Agency. This regulation requires Fannie Mae and Freddie Mac to increase liquidity in high-need markets, some of which are in the colonias along the U.S.–Mexico border. For a crude estimate, the Housing Assistance Council took ratios of loans to Hispanic households as a share of owner-occupied housing units in census tracts in colonias compared with nearby tracts not in colonias. This analysis required an assumption that the owner-occupied housing unit count was reliable. This might not be the case with differential privacy.

Wiley emphasized that he appreciated privacy and data protection concerns, yet the needs of data users are also important. The DDP was useful, but it would

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

have been even more useful for his purposes if housing tenure information had been included. Moreover, given that the ACS contains almost all of the housing data that used to be in the census, the application of differential privacy to those data would have even more ramifications for data users. Wiley understood that usefulness would have to be impaired in order to increase privacy protection. For rural areas and areas with small populations, a seemingly small change to the data could be a big deal and make a difference between getting something right or wrong.

5.3 IMPORTANCE OF DECENNIAL CENSUS FOR REGIONAL PLANNING IN CALIFORNIA

Beth Jarosz (Population Reference Bureau) said she was going to echo and expand on some of the points made by previous presenters. Her topic was the importance of decennial census data for regional planning, specifically in California.

As background, she noted that regional planning was a federally mandated activity originally established by the Federal Highway Act of 1973 and reauthorized through successive acts since then. Today, there are more than 400 regional planning organizations nationwide including 18 in California that represent about 35 million people.

In her role at PRB, Jarosz assisted the Association of Monterey Bay Area Governments (AMBAG) in their planning work. For more than a decade, she worked for a regional planning organization and organized a conference for regional planning and modeling specialists from around the country. Jarosz also described how she invited input on how regional planning need good census data from three colleagues: Gina Schmidt at AMBAG, Tina Glover at the Sacramento Area Council of Governments (SACOG), and Rachel Cortes at the San Diego Association of Governments (SANDAG).

Jarosz noted another key attribute of regional planning organizations: although they have a transportation planning mandate, they also do other planning work in many states, such as habitat conservation planning, public safety, and in many cases housing planning. Jarosz said she would feature housing in most of her presentation and then turn to transportation.

5.3.1 Housing Affordability

Jarosz noted that news headlines have made clear the housing affordability problem in California. One way in which California has tried to deal with the issue is through state legislation and the state process called Regional Housing Needs Allocation (RHNA). Generally speaking, the process involves the state determining need within each of its metropolitan regions, after which each region assigns need to each of its jurisdictions (in census terminology,

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Image
Figure 5.2 Rate of vacant housing units in California counties, 2010 Census published data and 2010 Demonstration Data Products.
NOTE: Presenter’s original graphics, using “DP” to denote the Demonstration Data Products, not “DDP.”

SOURCE: Beth Jarosz workshop presentation.

incorporated places). The jurisdictions then must find specific sites within their boundaries where they can zone for or build affordable housing. Jarosz observed that this housing planning in California is a fraught process, almost always ending up in court. Thus, having good data at every level of this process is critical.

Jarosz said, simplifying for brevity, that the RHNA process uses population data, particularly demographic characteristics, and the characteristics of householders to look at households by age, sex, race, and ethnicity. The process also uses data on housing and vacancies.

Jarosz agreed with Cook that vacancy rates are important. Cook showed examples of problematic rates in the DDP by census tract, and Jarosz found

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

that the problem persisted at the county level; see Figure 5.2. The original 2010 data showed no 0 percent vacancy rates in any California county, but that was not the case with the DDP data. In fact, Alpine County had a 72 percent vacancy rate in the original 2010 data and a 0 percent rate in the DDP. The problem was compounded for smaller geographies like places or jurisdictions within counties.

Jarosz walked through an example of the RHNA process for the City of Monterey. In the original 2010 data, the city had 13,584 housing units with just over a 10 percent vacancy rate and about 2 people per occupied household. One thing the RHNA process tries to achieve is a higher vacancy rate, which tends to bring down home prices and promote affordability. If the City of Monterey was trying to get to an 11.4 percent vacancy rate, they would need about 168 more housing units, which would be about 2 or 3 large apartment complexes. The city would have to go through the courts several times to get that done, but that would probably be manageable. If, in making calculations using the DDP, the vacancy rate were found to be only 1.7 percent and not 10.3 percent, and the target rate was 11.4 percent, the city would be looking to build nearly 1,500 housing units. That many more units would greatly complicate the community and legal challenges.

5.3.2 Housing and Population Consistency

Jarosz said that accuracy and internal consistency were key for planning. Planners look at several mathematical identities in assessing local demographics, for instance: (1) population must equal household population plus group quarters population; (2) occupied units must be the sum of all housing units minus vacant units; and (3) average household size must be at least one, and household size multiplied by number of occupied units gives the population in a size category. None of these identities held for every jurisdiction in the DDP.

Alpine County, admittedly small, had 0.6 people per household in the DDP. The problem compounded at the place level. California has just over 1,500 places, and 63 of those had more households than population available to fill them in the DDP. Jarosz asserted that this was a critical problem to fix for 2020.

In the analysis depicted in Table 5.1, Jarosz identified another important mathematical identity, namely that total household population must equal the sum of the population in each household size category. In prior censuses, one of the ways of protecting privacy was to “top-code” variables, and the largest reported household size in the data was typically set at “seven or more” persons. While the planner would never know how many people were actually in households of seven or more people, the planner would know that if there were 10 households in each size category, from one to seven or more, then there must be at least 280 people living in households to fill those households appropriately. Again, with the DDP, this mathematical formula or identity

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

Table 5.1 Mathematical Impossibilities: Households by Size, Select California Counties

County Household Size Household Population
1 2 3 4 5 6 7+ Calculated (min) 2010 DDP Reported
Monterey County, CA 125,936 27,353 35,122 18,916 18,056 11,630 6,554 382,178 396,315
San Benito County, CA 17,870 2,612 4,637 2,740 3,195 2,051 1,291 60,295 54,837
Santa Cruz County, CA 95,317 25,064 30,819 14,677 12,108 6,069 3,319 252,251 251,339

NOTES: DDP, Demonstration Data Products. For counts HHi in household size categories of i = 1 person, 2 persons, through 7+ persons, the calculated (minimum) household population must be

HHP ≥ (HH1 × 1) + (HH2 × 2) + · · · + (HH7+ × 7).

SOURCE: Beth Jarosz workshop presentation.

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

did not always hold. In one of the three counties where Jarosz works, the formula with the DDP data produced at least 382,000 people in households but the DDP showed 396,000, which meant the top household size category (7+) must include a sizable number of households with more than seven people. For San Benito County and Santa Cruz County, that was not the case. In San Benito, there should have been at least 60,000 people, but the DDP only had 55,000. In Santa Cruz, there should have been at least 252,000 people, but there were only 251,000.

5.3.3 Transportation Planning

Jarosz noted that transportation planning models are activity-based models that look at individual trips. They are so complicated for large regions that they can take 48 hours to run. No transportation modeler of whom Jarosz is aware has run their model with the DDP, so Jarosz shared some higher-level concerns.

Transportation planning requires good estimates and projections of population by age, sex, race, ethnicity, group quarters by type (also by age, sex, and race and ethnicity), household size and structure, housing tenure, and headship. For funding allocations, population totals are also needed. At the block, block group, or tract level, total population must at a minimum be right, because transportation modelers are looking at very small levels of geography to figure out where people live and where they are going to work during the day. If the population numbers are wrong at the block or block group level, then the transportation model calibration is off.

Group quarters by type is important because children who are hospitalized and people who are in prisons are not going to be taking trips, while college students who are living in dorms are going to be on the bus, walking, or driving, and military personnel living in barracks are going to be walking and driving. Having accurate group quarters counts by type for small geographic levels is important to transportation planning. Housing counts, occupancy and vacancy characteristics, and household size and structure all need to be accurate for very small levels of geography for transportation planning.

5.3.4 Concluding Comments

Jarosz suggested that housing occupancy and vacancy either be invariant or have a larger portion of the privacy budget. Ideally, occupancy and vacancy would be invariant at the block level. Jarosz pointed out that occupancy and vacancy characteristics are not necessarily private. One can look at Google Street View to infer vacancy status; there are data vendors that sell vacancy data; and the U.S. Postal Service in collaboration with the U.S. Department of Housing and Urban Development makes vacancy data available for census tracts. Information on a housing unit’s occupancy status is basically in the

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

public domain. In addition, preserving household-person joins is critical. For example, there needs to be at least one full person for every occupied household.

For transportation planning purposes, Jarosz suggested keeping not just the number of group quarters facilities invariant but also making the group quarters population invariant at the block level, particularly by type if that is possible. If the block level proves infeasible, then having these data invariant at the block group or tract level should be done.

Jarosz concluded that population ideally would be invariant at the block level and absolutely must be accurate for places. Considering population size, zeroing out existing populations would be problematic for transportation and housing planning.

5.4 DIFFERENTIAL PRIVACY IN 2020: ANTICIPATING IMPACTS ON EVIDENCE-BASED PUBLIC POLICY AT THE STATE AND LOCAL LEVEL

Abraham Flaxman (University of Washington) said his topic is about anticipating the impact of differential privacy for 2020 Census data on evidence-based public policy at the state and local level. His training is in mathematics, and he has been at the medical school for the last decade, so he has a public health perspective but has discussed census data use with decision makers in other areas. He thanked Jan Vink (Cornell University), Sam Petti (Georgia Tech), David Van Riper (University of Minnesota), and Mike Mohrman (State of Washington), who made it possible for him to access and understand the data.

Flaxman said that when he thinks about what is needed for evidence-based public policy, it is of great importance that the population count be invariant at the block level, not only for voting rights analysis, but also for other census data uses. He emphasized that he is talking about people, and not housing, which has been the subject of previous presentations in the session. (The block housing count is invariant in the DDP but not the population count.) Adopting a kind of Hegelian dialectic, the two concerns that Flaxman has heard from the Census Bureau is whether there will still be sufficient privacy protection if the total count is invariant at the block level and whether this would compromise the accuracy of other statistics.

Flaxman believes that the most relevant concern of public health decision makers with regard to differential privacy for 2020 Census data was quality. He cited a recently completed qualitative study by colleagues at the University of Washington that asked public health decision makers about data capacity building and training needs. While their results were focused on rural health, inequities, and issues relating to the northwest United States, Flaxman thought the findings were generally applicable. Public health decision makers had three concerns: (1) the limited availability of data or access to data; (2) issues with

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Image
Figure 5.3 Median absolute error as a function of empirical privacy loss for county-level stratied counts, Census TopDown Algorithm versus simple random sampling.
SOURCE: Abraham Flaxman workshop presentation.

data quality; and (3) limited staff with the expertise and resources necessary to analyze the data. Concerning quality, the way the respondents put it, data perceived as unreliable or inaccurate are often deemed to be unusable. Outdated data were also viewed to be a problem.

Flaxman discussed variability and census data. He graphed (see Figure 5.3) empirical privacy loss for county-level population counts by median absolute numerical error as a function of ϵ. The graph shows that error decreases and privacy loss increases as a function of ϵ, but the slope flattens when ϵ exceeds 1.0. By comparison, an ϵ of 1.0 has about the same amount of error as a 50 percent sample of the population, which is substantial in comparison with census data to which noise has not been added by a differential privacy algorithm. He developed these kinds of graphs to explain the variability introduced by differential privacy in ways that local policy makers could understand by making a comparison with a survey conducted with simple random sampling. Similarly, he developed histograms that with the use of shading illustrated the spread in error and whether error was biased in a positive or negative direction for different aggregations that users need, such as census tracts, cities, counties, states, and blocks with group quarters populations in total and by type such as nursing homes or prisons.

From his discussions with census data users at the state and local levels, Flaxman found that a common question was concerned the number of people

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

in a particular city or county, and therefore what share of the state revenue it would correspond to in the following year. This budgeting need is what led Flaxman to conclude that the total population count should be invariant at the block level. He found that the more homogeneous a census tract, measured by the number of race and ethnicity groups in the tract with zero population, the larger the difference between the DDP and SF1.

Flaxman said that the kinds of problems he identified could be addressed by making population counts invariant at small geographic levels. He experimented with making enumeration district population counts invariant in the 1940 data released by the Census Bureau with a differential privacy algorithm applied to introduce noise and obtained a good result. He repeated his suggestion that the population count be made invariant at the block level or that bias be addressed in some other way and also suggested that the Census Bureau should publish the microdata files (after privatization) for counties for research use.

5.5 FLOOR DISCUSSION

Elizabeth Hardy (Arlington County, Virginia, government) thanked the presenters and stressed the fact that block-level census data were very important for county governments. The county needed accurate data on households, average household size, and population—at the block level—because the data were used for forecasting in regional transportation and just about everything the county does. When the county reviewed the DDP, the differences from the originally published data at the block level were huge. She wondered if the Census Bureau had given any thought to something like the Count Question Resolution (CQR) Program for jurisdictions to contest the results from the implementation of the privacy algorithm.1 The DDP put people into areas such as Arlington Cemetery, which would be a huge issue if it were to occur in 2020. Jarosz replied that every census has had some error in it. She heard in a meeting in Portland, Oregon, that the 2010 Census published data showed 47 people “residing” in a road roundabout that was only about 15 feet wide. Nonetheless, planners need the best data available at the smallest geographic level available.

Alexis Santos-Lozada (Pennsylvania State University) asked if organizations would need to invest more money in getting accurate data to enable them to do their jobs. Wiley answered that his organization would not likely have the extra money. He also wondered how many small rural communities could muster the

___________________

1 A final quality-check operation in recent U.S. censuses that is scheduled to be repeated in the 2020 Census round, Count Question Resolution is a program that allows officials from state, local, and tribal areas to challenge certain census counts on the grounds of processing or tabulation errors such as the incorrect placing of a geographic boundaries or an error in calculation.

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×

resources to contest the 2020 data, if it came to it. Cook answered that, if the decennial census data were not considered accurate enough for modelers, they would look for other data sources, which could include private data sources. Yet private data sources that can be purchased are often black boxes, in contrast to the greater familiarity of census data gathering and tabulation processes. Cook hoped that it would be possible to determine roughly how much noise will be infused into the 2020 census data products. Jarosz echoed Cook’s point. There would be equity considerations for census data use from the perspective of a city’s size and finances. Some jurisdictions would be able to go after additional data sources, and others would not. Those that could not would end up with transportation plans that might not be as high in quality or be able to stand up to federal or local court scrutiny in the same way as plans developed by bigger, more well-funded jurisdictions.

McDonald asked what jurisdictions would do if they found that other data they had were at odds with the census data produced using a differential privacy algorithm. Cook said that, speaking as a local government person, he had not given the matter a lot of thought. Perhaps a government could adjust 2020 Census data that were clearly off base, say for children, by looking at birth statistics and school enrollment data, but still, the jurisdiction would need the resources to do that. Cook said he was fortunate to work for a city of about 100,000 that has about 50 planners, in stark contrast to the city next door that has about 30,000 people and only three or four planners. Cook thought that this was a much more typical situation, and many jurisdictions had even less capability. Most likely, planners would just accept the data and use them without understanding the shortcomings and be hard pressed to explain why things had shifted so much from 2010 to 2020.

Eddie Hunsinger (California Department of Finance) wanted to raise one very discouraging possibility, which was that local jurisdictions would keep their models based on 2010 data and go from there, viewing the 2020 data as unreliable. Hunsinger thought that was a scary scenario. Perhaps private sector vendors could step up, but he was not excited about the prospect of dealing with private sector vendors in place of the U.S. decennial census. Wiley emphasized that, in many instances, administrative data for rural areas (e.g., loan data) had a lot of limitations such as a lack of universal reporting requirements and data production delays. There would be problems, then, in trying to use other data sources.

Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 59
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 60
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 61
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 62
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 63
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 64
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 65
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 66
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 67
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 68
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 69
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 70
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 71
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 72
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 73
Suggested Citation:"5 Delivery of Government Services." National Academies of Sciences, Engineering, and Medicine. 2020. 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25978.
×
Page 74
Next: 6 Business and Private Sector Applications »
2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop Get This Book
×
 2020 Census Data Products: Data Needs and Privacy Considerations: Proceedings of a Workshop
Buy Paperback | $60.00 Buy Ebook | $48.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The Committee on National Statistics of the National Academies of Sciences, Engineering, and Medicine convened a 2-day public workshop from December 11-12, 2019, to discuss the suite of data products the Census Bureau will generate from the 2020 Census. The workshop featured presentations by users of decennial census data products to help the Census Bureau better understand the uses of the data products and the importance of these uses and help inform the Census Bureau's decisions on the final specification of 2020 data products. This publication summarizes the presentation and discussion of the workshop.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!