Page 261 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

– 10 –

Measurement of Race and Ethnicity

The quality of the race and ethnicity data collected in the 2020 Census is of central concern—these data are among the most important a U.S. census collects, and the census historically has not counted all racial and ethnic groups equally well. In this chapter, we provide a brief history of race and ethnicity measurement in the decennial census, assess the quality of the race and ethnicity data for 2020, and make recommendations for 2030. To provide a comprehensive picture of race and ethnicity measurement in the 2020 Census, the chapter excerpts findings from other chapters that address specific quality measures. Quality attributes covered in this chapter include:

Effects of changes in the race question format, data capture, and coding conventions for 2020 on race reporting—compared with 2010, these changes appeared to contribute to a substantial increase in the multiracial population and a decrease in the White Alone group;
Coverage errors in the census for major race and ethnic population groups (see Chapter 4 for more detail);
Rates of imputations for missing and inconsistent responses to the race and ethnicity questions; and
Effects of the use of differential privacy-based algorithms to protect the confidentiality of 2020 Census responses on the timeliness and accuracy of race and ethnicity data (see Chapter 11 for more detail).

It is important to note that “race” and “ethnicity” are social-political constructs and often associated with national origins. The concepts are also

Page 262 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

fluid, depending in part on the format of the survey or census questions¹ and who is the respondent (i.e., self, other household member, proxy).

10.1 OVERVIEW OF RACE AND ETHNICITY MEASUREMENT

Measuring race has been a core function of the constitutionally mandated U.S. decennial census since 1790 (that census identified White and all other). According to the U.S. Census Bureau’s web site:²

Information on race is required for many Federal programs and is critical in making policy decisions, particularly for civil rights. States use these data to meet legislative redistricting principles. Race data also are used to promote equal employment opportunities and to assess racial disparities in health and environmental risks.

The 1970 Census was the first census to include a specific question on ethnicity, or what was often termed Spanish origin, in its 5-percent long-form sample.³ The 1980 Census and subsequent censuses asked ethnicity of everyone.

10.1.1 U.S. Office of Management and Budget Standards and Census Race and Ethnicity Questions

The 1977 and 1997 OMB Standards

In 1977, the Bureau of the Budget (now the U.S. Office of Management and Budget, OMB) issued Statistical Policy Directive No. 15, which specified standard race and ethnicity categories for federal use government wide. The legal authority for the directive stems from the 1942 Federal Reports Act, which mandated a “forms clearance process” for all forms used by the federal government to standardize classifications, avoid duplication, and lessen respondent burden. The Paperwork Reduction Act of 1980 and its reauthorizations in 1986 and 1995 updated and expanded the authority of OMB—specifically through the office of a chief statistician—to coordinate the federal statistical system and develop and oversee the implementation of

___________________

¹ A prime example of format likely influencing response on race and ethnicity is the 2000 Census question on Hispanic origin which, unlike the question used in 1990, 2010, and 2020, did not list specific examples in the “Other Spanish/Hispanic/Latino” category to prompt people in providing write-in answers. (The 2020 version provided six examples, such as Salvadoran and Dominican.) Responding to user concerns that the lack of examples led respondents to provide a generic response, such as “Spanish,” and thereby undercount specific groups, Census Bureau researchers conducted analysis that supported this hypothesis (Cresce and Ramirez, 2003).

² See “About the Topic of Race,” https://www.census.gov/topics/population/race/about.html.

³ Prior censuses used different ways of estimating ethnicity, such as estimates of people of Puerto Rican origin in states and metropolitan areas with sufficiently large numbers of such people, and people with Spanish surnames in the Southwest.

Page 263 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.1 U.S. Office of Management and Budget Standard Race and Ethnicity Categories in Statistical Policy Directive No. 15

1977—Separate Questions Preferred	1997—Separate Questions Preferred
Race Categories:	Race Categories:
White	White
Black	Black
American Indian or Alaska Native (AIAN)	AIAN
Asian or Pacific Islander	Asian
	Native Hawaiian or Other Pacific Islander (NHOPI)
Ethnicity Categories:	Ethnicity Categories:
Hispanic	Hispanic/Latino
Not Hispanic	Not Hispanic/Latino

SOURCE: 1997, https://www.bls.gov/bls/statistical-policy-directive-15.pdf. 1977, https://obamawhitehouse.archives.gov/omb/fedreg_race-ethnicity, Appendix.

standards and guidance (see National Academies of Sciences, Engineering, and Medicine, 2021:App. A).⁴

OMB issued an update of Directive No. 15 in 1997 which, for the first time, stipulated that respondents should have the option to check more than one race category. The directive permits agencies to include additional race categories as long as they can be aggregated to the principal categories. The Census Bureau has traditionally collected expanded detail for race and ethnicity beyond that described in the OMB standard. “Other” is not a listed category in Directive No. 15, but Congress has required the Census Bureau to include “Some Other” in the decennial census and the American Community Survey.⁵ Table 10.1 provides the OMB categories as of 1997 along with the 1977 categories for comparison.

Census Race and Ethnicity Questions, 1970–2020

In addition to the major change in the census race question in 2000 to a “check more than one” format, each census has seen minor changes in format, wording, and other features. Table 10.2 provides the categories for race (10.2(a)) and ethnicity (10.2(b)), respectively, in the 1970–2020 Censuses.

___________________

⁴ The chief statistician heads the Statistical and Science Policy Office in the Office of Information and Regulatory Affairs, established by the 1980 Paperwork Reduction Act, in OMB.

⁵ In 2009, Congress mandated in appropriations language that “none of the funds provided in this or any other Act for any fiscal year may be used for the collection of census data on race identification that does not include ‘some other race’ as a category” (123 Stat. 3115), in response to a proposal to delete the category from the 2010 Census questionnaire. The statutory notes

Page 264 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.2 Response Categories to Race and Ethnicity Questions, 1970–2020 Censuses

(a) Race Question
1970	1980	1990	2000 (mark one or more races)	2010 (mark one or more races)	2020 (mark one or more races and print origins)
White	White	White	White	White	White (print origins)
Negro or Black	Black or Negro	Black or Negro	Black, African American or Negro	Black, African American or Negro	Black or African American (print origins)
Indian (American) (print tribe)	Indian (American) (print tribe) Eskimo Aleut	Indian (American) (print enrolled or principal tribe) Eskimo Aleut	American Indian or Alaska Native (print enrolled or principal tribe)	American Indian or Alaska Native (print enrolled or principal tribe)	American Indian or Alaska Native (print enrolled or principal tribes)
Chinese Japanese Filipino Korean	Chinese Japanese Filipino Korean Vietnamese Asian Indian	Chinese Japanese Filipino Korean Vietnamese Asian Indian	Chinese Japanese Filipino Korean Vietnamese Asian Indian Other Asian (print race)	Chinese Japanese Filipino Korean Vietnamese Asian Indian Other Asian (print race)	Chinese Japanese Filipino Korean Vietnamese Asian Indian Other Asian (print origins)
Hawaiian	Hawaiian Guamanian Samoan	Hawaiian Guamanian Samoan Other Asian or Pacific Islander (print race)	Native Hawaiian Guamanian or Chamorro Samoan Other Pacific Islander (print race)	Native Hawaiian Guamanian or Chamorro Samoan Other Pacific Islander (print race)	Native Hawaiian Chamorro Samoan Other Pacific Islander (print origins)
Other (print race)	Other (specify)	Other race (print race)	Some other race (print race)	Some other race (print race)	Some other race (print race/origin)

SOURCE: Inspection of questionnaires, various online sites.

Page 265 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

(b) Ethnicity Question
1970 (5% sample long form only)	1980 (Hispanic question after race)	1990 (Hispanic question after race)	2000 (Hispanic question first)	2010 (Hispanic question first)	2020 (Hispanic question first)
No, none of these (last category shown)	No, not Spanish/Hispanic	No, not Spanish/Hispanic	No, not Spanish/Hispanic/Latino	Not Hispanic, Latino or Spanish origin	Not Hispanic, Latino or Spanish origin
Mexican	Mexican, Mexican Am., or Chicano	Mexican, Mexican Am., or Chicano	Mexican, Mexican Am., or Chicano	Mexican, Mexican Am., or Chicano	Mexican, Mexican Am., or Chicano
Puerto Rican	Puerto Rican	Puerto Rican	Puerto Rican	Puerto Rican	Puerto Rican
Cuban	Cuban	Cuban	Cuban	Cuban	Cuban
Other Spanish	Other Spanish/Hispanic	Other Spanish/Hispanic (print one group)	Other Spanish/Hispanic/Latino (print group)	Other Hispanic, Latino or Spanish origin (print origin)	Other Hispanic, Latino or Spanish origin (print origin)
Central or South American	—	—	—	—	—

SOURCE: Inspection of questionnaires, various online sites.

Page 266 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Research conducted as early as 1997 and throughout the next two decades, culminating in the 2015 National Content Test, demonstrated that a single combined race/ethnicity, check-more-than-one question would be preferable to the two-question format in several respects. A single question would be more intelligible to respondents, reduce the extent of missing data, and practically eliminate the Some Other Race (SOR) category, which about 40% of Hispanic people have typically checked as their race (Mathews et al., 2017). The 2015 National Content Test also considered the effect of adding a separate Middle Eastern or North African (MENA) race category, finding that people reporting relevant national or ethnic backgrounds chose the MENA category in a reinterview when it was available, and that reporting of MENA responses in the Some Other Race category decreased from 12% without a MENA category to 3% with a MENA category (Mathews et al., 2017).

In September 2016, OMB issued a Federal Register notice (81 FR 67398) requesting comments on four aspects of Statistical Policy Directive No. 15: (1) moving to a single combined check-more-than-one race and ethnicity question with Hispanic as a category; (2) the addition of a separate MENA race category, which had been considered for the 1997 revision but not approved pending further research; (3) clarifying that agencies are not limited to the minimum set of race and ethnicity categories; and (4) clarifying race and ethnicity terminology to keep up with changing societal usage. The notice identified a range of uses of federal data on race and ethnicity and the need to consider effects on data quality and utility of proposed changes to the categories and question format (see Box 10.1).

The Census Bureau expected that OMB would approve the move to a single question and the addition of a separate MENA race category. OMB never gave its approval, however, and the Census Bureau announced in December 2017 that it would continue with the two-question format without a MENA category.⁶

Format and Processing Changes to the 2020 Race and Ethnicity Questions

The Census Bureau decided to make significant format and processing changes to the 2020 race and ethnicity questions. These changes included:

Adding space to write in origins under the White and Black checkboxes in addition to the other checkbox categories in the race question;⁷
Expanding data capture of write-in responses from only 30 characters in 2010 to 200 characters in 2020;

___________________

accompanying 13 U.S.C. § 5, on the nature of Census Bureau questionnaires, indicate that similar language was inserted into appropriations acts earlier in 2009, as well as in 2007, 2005, and 2004.

⁶ MENA responses are currently treated as White in accordance with the current OMB race category definitions.

⁷ See Marks and Rios-Vargas (2021) and U.S. Census Bureau (2021d:Appendix B, Definitions of Subject Characteristics).

Page 267 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Box 10.1 Request for Comment on Proposed Race and Ethnicity Measurement Standards

Federal Uses of Race and Ethnicity Data: When providing comment regarding proposed areas for possible revision, it may be helpful to keep in mind how the standard is used. The standard not only guides information collected and presented from the decennial census and numerous other statistical collections, but also is used by Federal agencies for civil rights enforcement and for program administrative reporting. These include, among others:

Enforcing the requirements of the Voting Rights Act;
reviewing State congressional redistricting plans;
collecting and presenting population and population characteristics data, labor force data, education data, and vital and health statistics;
establishing and evaluating Federal affirmative action plans and evaluating affirmative action and discrimination in employment in the private sector;
monitoring the access of minorities to home mortgage loans under the Home Mortgage Disclosure Act;
enforcing the Equal Credit Opportunity Act;
monitoring and enforcing desegregation plans in the public schools;
assisting minority businesses under the minority business development programs; and
monitoring and enforcing the Fair Housing Act.

To most effectively promote information quality, the intended uses of data on race and ethnicity should be considered when changes to the standards are contemplated. Additionally, the possible effects of any proposed changes on the quality and utility of the resulting data must be considered.

SOURCE: Excerpted from “Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity,” 81 FR 67398, September 30, 2016.

Expanding the coding of write-ins from only two detailed categories in 2010 to up to six detailed categories in 2020;
Expanding the list of codes for 2020 based on input from experts on specific race and ethnicity categories;
Combining the race and ethnicity code lists; and
Coding from left to right, instead of prioritizing Hispanic responses for the ethnicity question and race responses for the race question as was done in 2010.

Looking specifically at the Hispanic-origin question, for 2020, the Census Bureau decided to code people as Hispanic if they responded “No, not Hispanic” but provided an Hispanic category (e.g., Mexican) in one of the race question write-in spaces. The Census Bureau would also code people as not Hispanic if they checked “Other Hispanic” in the Hispanic-origin question but

Page 268 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

provided only non-Hispanic write-ins (this edit was also done in 2000 and 2010). Both edits overrode the actual response to the Hispanic-origin question.⁸

Looking at the race question, the Census Bureau assigned people to the specific race(s) corresponding to each check box they marked, even if write-ins did not support the categorization. The Census Bureau might assign other specific races depending on write-ins, including assigning a race from a write-in when the person did not check the corresponding box. For example, people who checked White and wrote in an American Indian tribe but did not check the American Indian or Alaska Native (AIAN) box were coded White and AIAN, while people who checked White and wrote in an Hispanic origin (e.g., Mexican) but did not check the SOR box were coded White and SOR. Write-ins were coded the same regardless of where they were written—for example, a write-in of Jamaican was coded as Black no matter where it was written (e.g., under White or SOR). Write-ins under the SOR checkbox were used where possible to assign one or more specific races and delete the SOR categorization.

A New OMB Standard?

In January 2023, OMB issued a Federal Register notice calling for public comments on proposed updates to Statistical Policy Directive No. 15, with the intent of finalizing a new standard by summer 2024 (88 FR 5375, January 27, 2023).⁹ The core proposals, suggested by an interagency working group, are to implement a single combined question for race and ethnicity, adding “Hispanic or Latino” and MENA as new main categories; the proposal also invites comment on the feasibility of requiring detailed race and ethnicity categories by default (i.e., check-box examples and write-in responses) rather than the minimum, main category headings. As of August 2023, the notice had drawn more than 20,000 comments.

10.1.2 Race and Ethnicity Distributions from 1970–2020—A More Diverse Nation

Figure 10.1 graphs the distributions of race and ethnicity from 1970–2020. The bottom six groups (from White Alone to Two or More Races) add up to 100% from responses to the race question. The top group is the Hispanic population, from responses to the separate ethnicity question—Hispanics may be of any race. Table 10.3 tabulates the percentage distribution of race and ethnicity from 1970–2020 for the non-Hispanic population (top

___________________

⁸ This paragraph and the next rely on notes from Jeffrey Passel, Pew Hispanic Research Center, based on his conversations with Census Bureau staff.

⁹ See also the Census Bureau’s notice of the public comment period at https://www2.census.gov/about/ombraceethnicityitwg/omb-proposals-english.pdf. The Federal Register notice requested comments by April 27, 2023.

Page 269 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

**Figure 10.1**Percentage distribution of the population by race and ethnicity, 1970–2020 Censuses.
NOTES: AIAN, American Indian or Alaska Native; NHOPI, Native Hawaiian or Other Pacific Islander.

SOURCES: 1970: Gibson and Jung (2002:Table 1) and U.S. Census Bureau (1973:Table 2). 1980: Tabulation at https://censusscope.org/us/chart_race.html. 1990: U.S. Census Bureau (2001:Table 4). 2000: Grieco and Cassidy (2001:Table 10). 2010, 2020: U.S. Census Bureau (2021g:Table 4).

half of table) and Hispanic population (bottom half of table). Focusing on changes from 2010–2020, the White Alone category decreased somewhat among non-Hispanic people and markedly among Hispanic people (denoted in red). Conversely, the Two or More Races category increased somewhat among non-Hispanic people and substantially among Hispanic people (denoted in green).

By age, younger people were more diverse in race and ethnicity than people ages 18 and over in both 2010 and 2020, while racial and ethnic diversity increased between 2010 and 2020 for both age groups (see Figure 10.2). Specifically, Figure 10.2(a) shows the increase in the Hispanic population over time and for younger people compared with older people, while Figure 10.2(b) similarly shows the decrease in the White Alone population and the increase in the Two or More Races population. Additional age detail from the 2020 Census to examine racial and ethnic diversity among generations was not available until the release of the 2020 Demographic and Housing Characteristics file in May 2023 (see Chapter 11 and Section 10.5).

Page 270 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.3 Percentage Non-Hispanic/Hispanic of Total Population and Percentage Distribution of the Non-Hispanic/Hispanic Population by Race, 1970–2020 Censuses

Census/Category	1970	1980	1990	2000	2010	2020
Non-Hispanic Population (1,000s)	194,229	211,933	226,356	246,116	258,268	269,369
Percent Non-Hispanic of Total (%)	95.5	93.6	91.0	87.5	83.7	81.3
Percent of Non-Hispanic:	100.0	100.0	100.0	100.0	100.0	100.0
White Alone	—	85.1	83.1	79.1	76.2	71.2
Black Alone	—	12.5	12.9	13.8	14.6	14.8
AIAN Alone	—	0.7	0.8	0.8	0.9	0.8
Asian/NHOPI Alone	—	1.6	3.1	4.2	5.8	7.5
Some Other Race Alone	—	0.3	0.1	0.2	0.2	0.6
Two or More Races	N.A.	N.A.	N.A.	1.9	2.3	5.0
Hispanic Population (1,000s)	9,073	14,609	22,354	35,306	50,478	62,080
Percent Hispanic of Total (%)	4.5	6.4	9.0	12.5	16.3	18.7
Percent of Hispanic:	100.0	100.0	100.0	100.0	100.0	100.0
White Alone	93.3	—	51.7	47.9	53.0	20.3
Black Alone	5.0	—	3.4	2.0	2.5	1.9
AIAN Alone	0.3	—	0.7	1.2	1.4	2.4
Asian/NHOPI Alone	0.4	—	1.4	0.4	0.5	0.5
Some Other Race Alone	1.0	—	42.7	42.2	36.7	42.2
Two or More Races	N.A.	N.A.	N.A.	6.3	6.0	32.7

NOTES: AIAN, American Indian or Alaska Native; NHOPI, Native Hawaiian or Other Pacific Islander; N.A., not available; —, could not locate or not published. Major decreases from 2010 to 2020 are rendered in red, major increases from 2010 to 2020 are rendered in green.

SOURCES: 1970: Gibson and Jung (2002:Table 1) and U.S. Census Bureau (1973:Table 2). 1980: Tabulation at https://censusscope.org/us/chart_race.html.1990: U.S. Census Bureau (2001:Table 4). 2000: Grieco and Cassidy (2001:Table 10). 2010, 2020: U.S. Census Bureau (2021g:Table 4).

Page 271 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

**Figure 10.2**Percentage Hispanic and percentage distribution by race for people over and under age 18, 2010 and 2020 Censuses.
NOTES: AIAN, American Indian or Alaska Native; NHOPI, Native Hawaiian or Other Pacific Islander.

SOURCES: 2020 and 2010 Redistricting Files, Tables P3, P4, accessed from data.census.gov.

Page 272 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

10.2 NEW WRITE-INS, DATA CAPTURE, AND CODING FOR 2020

10.2.1 Effects on Race and Ethnicity Estimates from the Census and the American Community Survey

The provision for new write-ins and expanded data capture and coding undoubtedly contributed to the dramatic changes in the distribution of race in the 2020 Census compared with 2010—see Table 10.4, which also compares the 2020 American Community Survey (ACS) with the 2019 ACS.¹⁰ There was a substantial drop in the White Alone population (shown in red) and a substantial increase in the Two or More Races population (shown in green) in both the 2020 Census and 2020 ACS, mostly but not entirely concentrated among people who answered Hispanic to the ethnicity question (see Table 10.3). The coding change that appears to account for this dramatic swing among Hispanic people was a combination of allowing write-ins under the White checkbox and coding any Hispanic entry in the White write-in space (or any other race category write-in space) as SOR, thereby recategorizing Hispanic people who only checked White (and who would have been White Alone in 2010 and previous censuses) as Two or More Races (White and SOR).

In addition to the substantial increase in Two or More Races among Hispanic people in the 2020 Census (20.3 million up from 3.0 million in 2010), there was an increase in Two or More Races among non-Hispanic people (13.5 million up from 6.0 million in 2010). The largest groups contributing to this increase of 7.5 million were:

White and AIAN non-Hispanic people—increase of 2.3 million;
White and SOR non-Hispanic people—increase of 2.1 million;
White and Black non-Hispanic people—increase of 1.2 million; and
White and Asian non-Hispanic people—increase of 1 million.

The increase in non-Hispanic White and AIAN people could be due to more people checking both White and AIAN. It could also be due to people who checked only White (and no other race including AIAN) and provided an AIAN origin in the White write-in space. Whether these people intended to provide two races or to provide a more detailed origin has not been analyzed. The increase in non-Hispanic White and SOR people does not have a ready explanation, although there are non-Hispanic SOR codes in the coding scheme, including Brazilian and Caribbean.¹¹

___________________

¹⁰ The 2020 ACS used the 2020 Census race and ethnicity format and coding, while the 2019 ACS used the 2010 Census format and coding.

¹¹ Brazilians, as noted, are supposed to be coded as non-Hispanic under the OMB 1997 standard, regardless of their responses to the Hispanic-origin question. Through an error in the 2020 ACS, Brazilians who said they were Hispanic were not recoded as non-Hispanic. This error also occurred for people who checked Hispanic and wrote in Belize or the Philippines. Consequently, 2020 ACS data show an out-of-proportion increase in Hispanics among these groups—most noticeably, fully

Page 273 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.4 Percentage Distribution of the Population by Race and Ethnicity, 2010–2020 Censuses and 2019–2020 1-Year American Community Survey Estimates

Census/ACS Year/Percent in Category	2010 Census	2020 Census	2019 ACS	2020 ACS
Total Population (1,000s)	308,746	331,449	328,240	329,484
Percent of Total (%)
White Alone	72.4	61.6	72.0	62.7
Black Alone	12.6	12.4	12.8	12.1
AIAN Alone	0.9	1.1	0.9	1.0
Asian Alone	4.8	6.0	5.7	5.7
NHOPI Alone	0.2	0.2	0.2	0.2
Some Other Race Alone	6.2	8.4	5.0	6.8
Two or More Races	2.9	10.2	3.4	11.5
Percent of Total (%)
Hispanic	16.3	18.7	18.4	18.6
Non-Hispanic	83.7	81.3	81.6	81.4

NOTES: ACS, American Community Survey; AIAN, American Indian or Alaska Native; NHOPI, Native Hawaiian or Other Pacific Islander. Major decreases from 2010/2019 to 2020 are rendered in red, major increases from 2010/2019 to 2020 are rendered in green.

SOURCE: 2010, 2020 Censuses: U.S. Census Bureau (2021g:Table 1 (numbers), Table 2 (percentages)). 2019 ACS: Table B03002 (Hispanic or Latino Origin by Race), accessed via data.census.gov. 2020 ACS: 2020 ACS 1-Year Experimental Data Tables XK200201 (Race) and XK2003 (Hispanic origin) at https://www.census.gov/programs-surveys/acs/data/experimental-data/1-year.html.

10.2.2 Possible Causes

The Census Bureau has not produced tabulations for the same people (e.g., a sample of the 2020 records) using the 2010 and 2020 race question formats and data-capture and coding schemes, nor has it looked at whether the race identification of people included in both censuses remained the same or changed. Consequently, the degree to which the new schemes contributed to the increase in the Two or More Races category shown in Table 10.4, and the degree to which demographic change (e.g., more multiracial children) or changes in self-identification drove the increase is not knowable. The dramatic changes in the race distributions between the 2019 and 2020 ACS (similar in magnitude to those between the 2010 and 2020 Censuses) suggest that the new format (write-in spaces for the White and Black checkboxes) and the expanded data capture and coding played a major role.

___________________

70% of Brazilians were coded as Hispanic in the 2020 ACS compared with about 3–4% in the 2019 and 2021 ACS. This footnote was revised after release of the prepublication version of the report to correct the use of “White” versus “non-Hispanic” and more precisely reflect the findings in Passel and Krogstad (2023).

Page 274 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

**Figure 10.3**Percentage distribution of response modes for race and ethnic groups, 2020 Census.
NOTE: SOR, Some Other Race; NHOPI, Native Hawaiian or Other Pacific Islander; AIAN, American Indian or Alaska Native; NRFU, Nonresponse Followup. Includes people who reported their race and ethnicity themselves (or their households reported for them), excluding responses from proxies, imputations, and administrative records. Other Ops primarily includes smaller operations outside of Self-Response and NRFU responses. For the 2020 Census, this category included Update Enumerate, Remote Alaska, Enumeration of Transitory Locations, Coverage Improvement, and Self-Response Quality Assurance operations.

SOURCE: Census Bureau tabulations from 2020 Census Edited File. See Disclosure Review Statement; CBDRB-FY23-0179.

The Census Bureau provided the panel with limited tabulations that suggest areas for further research. Figure 10.3 shows the response modes in 2020 for people who provided their own race and ethnic identification (i.e., their responses were not obtained from a proxy interview in Nonresponse Followup (NRFU), imputed due to a missing response, or obtained from administrative records), by race and ethnic group.¹² The groups are ordered by the percentage who responded via the internet, from a high of 79% (non-Hispanic Asian Alone people) to a low of 49% (non-Hispanic AIAN Alone people). Among Hispanic people, 63% responded via the internet.

An examination of response modes by race and ethnicity illustrates that the propensity to write in responses varied by mode (see Figure 10.4).¹³ Specifically, for every internet respondent who checked one or more major race category

___________________

¹² For many “self” reporters of race and ethnicity, the actual respondent was a member of the household who filled out the census questionnaire for all household members.

¹³ Overall, there was an explosion of write-ins for the race question in 2020 compared with 2010—from almost 38 million in 2010 to almost 336 million in 2010 (write-ins under the Hispanic-

Page 275 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

**Figure 10.4**Ratio of write-in race responses (for people who provided at least one write-in) to checkbox-only responses by response mode, total and Hispanic population, 2020 Census.
NOTES: NRFU, Nonresponse Followup. Includes people who reported their race and ethnicity themselves (or their household reported for them), excluding responses from proxies, imputations, and administrative records. Checkbox-Only indicates that one or more major race categories were checked and no write-ins were provided. See also notes to Figure 10.3.

SOURCE: Census Bureau tabulations from 2020 Census Edited File. See Disclosure Review Statement; CBDRB-FY23-0179.

boxes but did not provide any write-ins, there were 8.7 write-ins by internet respondents who provided at least one write-in. For every Hispanic internet respondent who checked one or more major category boxes with no write-ins, there were 11.8 write-ins by internet respondents who provided at least one write-in. These ratios were lower for other response modes, although Hispanic people in every instance provided a larger number of write-ins than the total population.¹⁴ Had Hispanic people responded via the internet at the same rate as the total population, then it is likely that the White Alone population would have declined even more and the Two or More Races population increased even more than what occurred in 2020.

___________________

origin question declined from 17 million in 2010 to 15 million in 2020). See Jones et al. (2021:Slide 9).

¹⁴ For some people in large households, some paper questionnaires did not provide space to record race or ethnicity. Some write-in responses were removed through the editing process.

Page 276 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

10.3 COVERAGE ERROR

Every census misses people and at the same time duplicates people and makes other erroneous enumerations, the balance of which determines whether the census experienced a net undercount or overcount. More concerning, net undercounts and overcounts differ among population groups, which is why the paramount question about the race and ethnicity results in any census is the extent of differential net undercount. Results from the Census Bureau’s two main methods to estimate coverage errors—a Post-Enumeration Survey (PES) and Demographic Analysis (DA)—were released in March–May 2022. The DA coverage error results by race (for Black people and all others) could not be released, however, because of delays due to implementation of the new 2020 Disclosure Avoidance System (DAS)—see Section 10.5.2.

Chapter 4 presents and discusses PES estimates of coverage errors in 2010 and 2020 by race and ethnicity cross-classified with other characteristics, including age, sex, and housing tenure (owner, renter). Summarizing the findings:

Net overcounts were substantially higher in 2020 compared with 2010 for White people and Asian people, while net undercounts were substantially higher for people in the categories of Black, AIAN on reservations, Native Hawaiian or Other Pacific Islander (NHOPI), SOR, and Hispanic.
The differential coverage rate (the difference in net coverage rates between two groups), which is critical for equitable allocation of legislative seats and federal and state funds, widened between non-Hispanic White Alone people and other groups from 2010–2020. Differences (from non-Hispanic White Alone people) increased from 2.9–4.9 percentage points for Black Alone or in Combination people, from 5.7–7.3 percentage points for AIAN Alone or in Combination people on Reservations, and from 2.4–6.6 percentage points for Hispanic people.
Non-Hispanic White Alone people were almost always overcounted in both censuses, although the percentages are not large except for men and women ages 50 and over in rental housing, who were overcounted in 2020 by double the rate of 2010.
Asian people were predominantly overcounted in both censuses but especially so if they lived in rental housing in 2020.
Black people were generally undercounted in both censuses and particularly so if they lived in rental housing; net undercount rates were somewhat higher for this group in 2020 compared with 2010.
Hispanic people under age 50 were generally undercounted in both censuses, with much higher rates in 2020 compared with 2010 and even higher rates if they lived in rental housing.

Page 277 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.5 Percentage of People with at Least One Item Reported Who Had Ethnicity Imputed and Percentage Who Had Race Imputed, by Response Mode, 2010–2020 Censuses

Census/Response Mode	2010 Ethnicity	2020 Ethnicity	2010 Race	2020 Race
Total Population (%)	4.5	8.7	4.1	9.2
Self-Response	4.8	2.4	4.2	2.8
Enumerator Return	3.3	26.8	3.7	29.0
People in Group Quarters	25.7	46.2	19.0	32.9

NOTES: Total Population excludes whole-person imputations for which only the existence of a person is known and not any of their characteristics; Self-Response includes internet, paper, and telephone; Enumerator Returns include NRFU and other operations such as Update Enumerate, and (for 2020) exclude administrative records enumerations.

SOURCE: DeJesus and Konya (2023:Table N).

Looking at components of error (duplications, other types of erroneous enumerations such as counting a baby born after Census Day, whole-person imputations, and omissions), the 2020 Census clearly experienced significant problems with the enumeration of Black people, AIAN people on reservations, and Hispanic people. There were also problems with the enumeration of NHOPI people. A striking finding is that the already-high rates of omissions for Hispanic people and those checking SOR Alone or in Combination (mostly Hispanic people) in 2010 (8.6 and 7.7%, respectively) were higher still in 2020 (9.9 and 10.5%, respectively).

10.4 IMPUTATIONS FOR MISSING AND INCONSISTENT DATA

Not every household provides answers for all items in the census questionnaire. Some responses also turn out to be inconsistent with other responses. The Census Bureau imputes values for missing and inconsistent responses based on other information for the same person or household (which could include a previous census or ACS response), termed “assignment,” or by “allocating” a response from a person in a nearby household (or group quarters, GQ).¹⁵ Table 10.5 provides 2010 and 2020 imputation rates for ethnicity and race by type of enumeration—self, enumerator, and GQ.¹⁶

___________________

¹⁵ “Allocation” is typically referred to as “hot deck imputation” in the survey literature (see Citro, 2011).

¹⁶ Whole-person imputations, which require imputing all characteristics and are not treated as item imputations, increased from 2% of the population in 2000 and 2010 to 3.4% of the population in 2020, primarily due to an increase in the percentage of households that only provided the population count (Khubba et al., 2022:Table 3).

Page 278 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

The 2020 ethnicity and race item imputation rates for self-responses were lower in 2020 than in 2010, but they were higher overall and markedly so for enumerator returns and for people in GQs. Census tracts with low self-response, which are characterized by low income and low educational attainment among other characteristics, likely had high missing data rates for race and ethnicity and hence high imputation rates.¹⁷ Overall, non-Hispanic Asian, White, and Two or More Races people had high self-response rates while other groups had lower rates and therefore likely higher imputation rates (see Figure 10.3). Looking at enumerator returns, proxy NRFU enumerations had higher imputation rates than other types of enumerator returns in both 2010 and 2020, and markedly higher rates in 2020 compared with 2010.¹⁸

To fill in missing race and ethnicity, the Census Bureau used one of the following assignment methods when possible: accept a Hispanic origin provided in the race question (for people missing ethnicity) or a race category provided in the ethnicity question (for people missing race); accept a person’s 2010 Census or ACS response if such were available; or accept the race and Hispanic origin of another household member. If assignment was not feasible, as a last resort, the Census Bureau used allocation or hot deck imputation (essentially, accept the race and Hispanic origin of a neighbor). Assignments and allocations were used about equally for ethnicity and race imputations. For enumerator returns, which had high rates of imputation in 2020, 14% and 13% of ethnicity responses were assignments and allocations, respectively, as were 15% and 14% of race responses (DeJesus and Konya, 2023:Table E).

10.5 2020 DISCLOSURE AVOIDANCE SYSTEM (DAS)

The new DAS for 2020, using differential privacy-based algorithms (see Chapter 11), has had and continues to have significant adverse effects on the timeliness of release of 2020 data products. It also has forced the Census Bureau’s DA program to delay release of net coverage results by race, and forced the population estimates program to continue to use race and ethnicity distributions derived from the 2010 Census, in what is termed a “blended base.” Finally, the DAS introduced noise into small population groups and small governmental jurisdictions, which calls into question the usability of the data for important purposes.

___________________

¹⁷ O’Hare and Lee (2021:Figure 2), using census tract characteristics from the 2015–2019 ACS.

¹⁸ Based on comparisons of missing data rates (which are similar to but usually 1–3 percentage points lower than imputation rates). Proxy enumerations had missing data rates of 17% (ethnicity) and 15% (race) in 2010 compared with 38% (ethnicity) and 41% (race) in 2020 (U.S. Census Bureau, 2021c).

Page 279 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Box 10.2 Content of the 2020 Redistricting Data File

Table P1: Race for total population—six major race categories alone, plus two races (15 combinations, such as White-Black, Black-American Indian and Alaska Native, etc.), three races (20 combinations), four races (15 combinations), five races (6 combinations), and six races, for a total of 63 race categories (71 cells including subtotals)
Table P2: Hispanic origin and race for non-Hispanic people—same detail as total population (73 cells)
Table P3: Race for total population ages 18 and older—same detail as total population (71 cells)
Table P4: Hispanic origin and race for non-Hispanic people ages 18 and older—same detail as total population (73 cells)
Table 5: Group quarters population by major type of group quarters (10 cells)
Table 6: Occupancy status for housing units (3 cells)

SOURCE: Technical documentation accompanying 2020 Census State Redistricting Data (Public Law 94-171) Summary File.

10.5.1 Delays

A major consequence of the new DAS has been a series of unprecedented delays in the release of 2020 Census data products. Subject-matter and geographic detail have also been cut back in the 2020 data products compared with earlier censuses.

Regarding timing, between August 2021 and May 2023, the only dataset released to date (in addition to the 50 state population counts for reapportionment) was the 2020 P.L. 94-171 Redistricting File, released in August 2021. Box 10.2 lists its content, which essentially comprise race and ethnicity information for the total and voting-age population for all geographic units down to the block level.

In May 2023, the Census Bureau released the Demographic and Housing Characteristics (DHC) File, the successor to Summary File 1 (SF1) from previous censuses and a workhorse for data users. The DHC File contains cross-tabulations of race/ethnicity with sex, detailed age, housing tenure, and household relationship, with more detail available for higher levels of geography. The DHC File excludes some of the tables that were in SF1 or provides less geographic disaggregation for them—either due to confidentiality concerns or because the TopDown Algorithm used for the Redistricting File and the DHC File cannot handle complex household/family-person “joins,” such as the number of households by average size, tenure, and race/ethnicity of the household head (see Chapter 11).

For the 2010 Census, SF1 was delivered on a flow basis, state by state, from June–August 2011 (a national file was released in October 2011), and earlier censuses met the same schedule. The 2020 DHC File was not released until May

Page 280 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

25, 2023. This is unprecedentedly late, even allowing an additional 5–6 months because of the delay in census operations due to the COVID-19 pandemic. The 2020 DAS requires that a differential privacy-based algorithm be applied data product by data product, in contrast to previous censuses for which disclosure avoidance was implemented at the source.

Following release of the DHC File will be three files that use different types of differential privacy-based algorithms:¹⁹

Detailed DHC-A (DDHC-A): Population counts and sex and age statistics for 300 detailed race and ethnic groups and 1,187 American Indian/Alaska Native/Native Hawaiian tribes and villages; released September 21, 2023.
Detailed DHC-B (DDHC-B): Household type and tenure information for detailed race and ethnic groups and AIAN tribes and villages. As of this writing, DDHC-B is planned to be released in September 2024.
Supplemental DHC (S-DHC): Household/family-person joins, including average household size by age and tenure, average family size, household/family type for children under 18 years and total population in households by tenure. Some tables will be iterated by major race and ethnicity categories (White Alone, Black Alone, etc.) As of this writing, S-DHC is planned to be released in September 2024.

DDHC-A and DDHC-B were collectively intended to be the equivalent of Summary File 2 (SF2) in previous censuses. The 2010 SF2 was released on a flow basis, state by state, from December 2011 through April 2012, with a national file released in May 2012. DDHC-A was not released until September 2023 and DDHC-B is not scheduled for release before September 2024. Moreover, the subject matter detail in both the A and B Files has been reduced compared with the 2010 SF2, and S-DHC will contain a greatly reduced number of person-household join variables compared with SF1 in previous censuses (see Chapter 11).

10.5.2 Effects of Delays on Race and Ethnicity for Demographic Analysis and Population Estimates

Although it would have been normal practice to release DA results by race (Black and all other races) in the same timeframe as the PES results, the April 2022 DA release only covered age and sex.²⁰ To permit race comparisons, 2020 Census race data need to be modified—specifically, people in the SOR Alone category need to have a specific race imputed for them because the DA estimates

___________________

¹⁹ See the “About 2020 Census Data Products” listing at https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/release/about-2020-data-products.html.

²⁰ See U.S. Census Bureau (2022d).

Page 281 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

derive from vital records and other sources that historically have not always had a SOR category. The Census Bureau therefore distributes SOR Alone responses to specific race groups (people classified as SOR and a specific race are assigned the specific race category and removed from the Two or More Races category) in a Modified Race File. The method used in previous censuses essentially searched for a donor—someone in the household or neighboring area who reported a specific race(s) and the same Hispanic origin (yes/no) as the person who checked SOR—and assigned the race(s) of the donor to that person.²¹ Given the need to develop an appropriate differential privacy-based algorithm, the Census Bureau experienced delays in creating a 2020 Modified Race File on the same schedule as in 2010. The Census Bureau hoped to release the 2020 file in summer 2023 but has not done so as of this writing.

The lack of a Modified Race File for 2020 has also affected the annual population estimates, mandated in Title 13. These estimates provide vital inputs to federal formula allocations for states and localities, provide controls for the ACS and other household surveys, serve as denominators for vital rates and per-capita time series and as indicators of recent demographic changes, among many other uses.²² The estimates typically derive from the most recent census, with race modified to allocate SOR Alone to specified races, updated with administrative records and survey data (birth and death records, Internal Revenue Service and Social Security data, Medicare records, ACS data, and other sources). Among the regularly produced series are estimates of single years of age by sex, race, and Hispanic origin for the nation, states, and counties.

The delays in the 2020 Census, concerns about data quality, and the delays in developing an appropriate differential privacy-based algorithm for the Modified Race File led to an unprecedented decision at the Census Bureau to continue the estimates going forward from the 2010 Census base, with some modifications (U.S. Census Bureau, 2021f). What the Census Bureau calls the “blended base” uses:

2020 Census population counts for the nation, state, and counties;
Age and sex estimates from the 2020 DA (which does not exhibit the extreme age heaping evident in 2020 Census data and corrects for the large census undercount of young children—see Chapter 4) as a national control; and
Race/ethnicity by age and sex for the nation, states, and counties from the 2010-based population estimates updated through 2020 and forward.

Table 10.6 shows the 2020 population distribution from: (1) the 2020 population estimates based on updating the 2010 Census and released in 2020; (2) the 2020 population estimates using the blended base and released in 2021;

___________________

²¹ For the 2020 method, see U.S. Census Bureau, Population Division (2012).

²² See https://www.census.gov/programs-surveys/popest.html.

Page 282 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.6 Percentage Distribution of the Population by Race/Ethnicity, 2020 Population Estimates, 2020 Census

Source/Race-Ethnicity Group	(1) 2020 Population Estimates (2010 Base)	(2) 2020 Population Estimates (July) (Blended Base)	(3) 2020 Census, as Reported (with Some Other Race)	(4) 2020 Census, with Some Other Race Reallocated
Total (%)	100.0	100.0	100.0	100.0
White Alone (%)	76.0	75.9	61.6	74.4
Black Alone	13.5	13.5	12.4	13.3
AIAN Alone	1.3	1.3	1.1	1.7
Asian Alone	6.3	6.1	6.0	6.3
NHOPI Alone	<0.1	0.3	0.2	0.3
SOR Alone	—	—	8.4	—
Two or More Races	2.9	2.9	10.2	4.0
Hispanic	18.6	18.7	18.7	18.7

NOTES: AIAN, American Indian or Alaska Native; NHOPI, Native Hawaiian or Other Pacific Islander; SOR, Some Other Race. The Hispanic population can be of any race and is not included in the race percentages, which add up to 100%.

SOURCES: Column (1): Monthly Postcensal Resident Population table for January 1, 2020, to June 1, 2020, at https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates/2020-evaluation-estimates/2010s-national-detail.html. Column (2): Table NC-T2021-SR11H, Annual Estimates of the Resident Population by Sex, Race, and Hispanic Origin for the United States: April 1, 2020 to July 1, 2021, at https://www.census.gov/data/tables/time-series/demo/popest/2020s-national-detail.html. Column (3): See Table 10.3. Column (4): Estimates for race groups developed by Citro (2021).

(3) the 2020 Census as reported (with SOR); and (4) the 2020 Census with SOR Alone reallocated to specific race groups, by applying the percentage allocations of SOR Alone to specific race groups used in 2010 to 2020 SOR counts. The blended base population estimates for 2020 (column 2) closely resemble the original 2020 estimates carried forward from 2010 (column 1). They are dramatically different from the 2020 Census counts (column 3). This outcome is seen to some extent in every census given that the census has an SOR category and the population estimates do not; for 2020, the differences are exaggerated because of the new, expanded coding rules for the 2020 Census, which generated many fewer people in the White Alone category and many more in the Two or More Races category. Using a rough allocation method based on the 2010 Census, the 2020 Census estimates with SOR reallocated (column 4) are in between the 2020 blended base population estimates and the 2020 Census as reported counts (columns 2 and 3, respectively).

Page 283 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

10.5.3 Noise in the 2020 Redistricting File

The Census Bureau warned that block data in the Redistricting File would need to be aggregated because of the new method for injecting noise into the data to protect confidentiality (see Chapter 11). The Census Bureau estimated, however, that once a block group had between 450 and 499 or more people, the ratio of the population represented by the largest race/ethnicity group from the publicly released Redistricting File would be within ±5 percentage points of the actual census counts at least 95% of the time. The same statement would apply to minor civil divisions and places with between 200 and 249 or more people. This assessment was based on comparing the 2010 Redistricting File (that used swapping for confidentiality protection) with a “demonstration” 2010 Redistricting File that used the production parameters for the 2020 Census confidentiality-protection method (Wright and Irimata, 2021:1). Nonetheless, the 2020 redistricting data are quite noisy for some types of areas and population groups.

Figure 10.5 graphs the percentages of counties, incorporated places, and census tracts with differences of 10% or more for race and Hispanic population groups between the 2010 Redistricting File protected with the 2020 DAS and the original 2010 file protected with a technique called data swapping. With the exception of the White Alone population and, for counties, the Hispanic population, more than 10% to more than 60% of units have differences of at least 10% or more. Clearly, these data are quite noisy, particularly for small race groups, such as NHOPI, and generally for incorporated places.²³ That the nation’s 19,483 incorporated places do not always nest neatly within counties and are smaller in population on average than even census tracts (the median population in 2020 for incorporated places was 1,129, compared with 3,775 for census tracts) makes it difficult to inject noise and retain accuracy for incorporated places. Small population sizes also make it difficult to square accuracy and confidentiality protection for American Indian reservations and Alaska Native villages, with median populations of 400–500 people in 2020.²⁴ Yet incorporated places and AIAN jurisdictions are communities with governmental functions that have historically depended on accurate data from the decennial census.

___________________

²³ Final 2010 Demonstration File Detailed Summary Metrics file at https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-product-planning/2010-demonstration-data-products/01-Redistricting_File--PL_94-171/2021-06-08_ppmf_Production_Settings/2021-06-08-data-metrics-tables_production-settings.xlsx, using the June 8, 2021, production settings for the application of the TopDown Algorithm to the 2020 Census Redistricting File.

²⁴ See Census Geographies Project (2022) for population size data. See also Liebler (2022).

Page 284 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

**Figure 10.5**Percentage of counties, incorporated places, and census tracts with differences of 10% or more between the 2020 Disclosure Avoidance System privacy-protected and original 2010 redistricting files in estimates for race and Hispanic populations.
NOTES: AIAN, American Indian or Alaska Native; NHOPI, Native Hawaiian or Other Pacific Islander. Groups are ordered from left to right by their percent of total U.S. population.

SOURCE: Final 2010 Demonstration File Detailed Summary Metrics file at https://www2.census.gov/programs-surveys/decennial/2020/program-management/data-product-planning/2010-demonstration-data-products/01-Redistricting_File--PL_94-171/2021-0608_ppmf_Production_Settings/2021-06-08-data-metrics-tables_production-settings.xlsx, using the June 8, 2021, production settings for the application of the TopDown differentially private algorithm to the 2020 Census Redistricting File.

10.6 RACE AND ETHNICITY MEASUREMENT IN 2020—CONCLUSIONS

Conclusion 10.1: The 2020 Census depicted a more diverse nation than did the 2010 Census. The White Alone population declined significantly, particularly among Hispanic people; the Hispanic population increased; and the Two or More Races population increased substantially, particularly among Hispanic people. It is unknown the extent to which these changes were due to demographic changes (e.g., more multiracial children); changes in self-identification; the provision of write-in origins (perhaps not intended to indicate racial identification) made possible by the format and processing changes to the 2020 Census race question; and other effects of the 2020 Census format and processing changes.

Conclusion 10.2: The 2020 Census had poorer quality data on race and ethnicity compared with the 2010 Census in terms of coverage error (net

Page 285 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

overcounts and undercounts), rates of missing and imputed responses, and the noise infused by the 2020 Census Disclosure Avoidance System.

The 2020 Census exhibited larger coverage errors compared with 2010 for race and ethnic groups. There were increases in net overcounts for some groups, particularly non-Hispanic White Alone people and Asian people, and increases in net undercounts for other groups, particularly Black people, Hispanic people, and American Indians and Alaska Natives. Consequently, differences among groups widened significantly in 2020, with adverse implications for uses of census data to allocate fixed resources, such as representation, funding, and services.
The 2020 Census had higher imputation rates (to account for missing and inconsistent responses) for race and ethnicity compared with 2010, particularly for responses not provided by a household member via the internet, a mailed-back paper questionnaire, or telephone. Moreover, the 2020 Census had much higher imputation rates for race and ethnicity for people in group quarters compared with the already high rates in 2010.
The 2020 Disclosure Avoidance System injected noise of considerable magnitude into the Redistricting File for small race and ethnicity groups in small governmental jurisdictions, such as incorporated places and American Indian tribal lands and Alaska Native villages.

10.6.1 Improvements to Collection of Race and Ethnicity Information

Our suggestions for improving of the collection of race and ethnicity data in the 2030 Census and beyond follow two basic tracks: first, continuing and extending lines of research on the decennial census and the ACS, and second, addressing the implications of possible revisions in federal standards for collecting race and ethnicity data.

Understanding how the changes in format and processes for the race and ethnicity questions in the 2020 Census and the ACS beginning in 2020 affected distributions may well be moot should the Census Bureau’s preferred single combined race/origin question be approved for the 2030 Census. Yet, research on the matter is important for users to understand the implications for time series and for the Census Bureau to understand how respondents think of the responses they provide—for example, as a current self-identification of race or ethnicity or as an identification of long-past origins.

Types of research that could be valuable for users and the Census Bureau to understand the effects of the 2020 changes in format, data capture, and coding on race and ethnicity reporting include:

Page 286 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Qualitative research with respondents varying in age, sex, race, and ethnicity. Research could include focus groups and one-on-one cognitive interviewing to ascertain how people viewed and used major category check boxes and write-in spaces.
Tabular analysis of age, sex, race, and ethnicity reporting, comparing the 2010 Census, 2020 Census, the ACS before and after 2020, and the 2015 National Content Test to illuminate any trends, such as in multirace reporting by generations (e.g., baby boom, millennial).
Simulation of the use of 2010 race and ethnicity questions and response processing for people in the 2020 Census, using a sufficient sample of age, sex, race, and ethnicity groups for efficient and statistically significant estimation.
Simulation of the implications for the race distribution in states, counties, and other geographies of the higher frequency of write-in responses via the internet compared with other response modes.

Recommendation 10.1: The U.S. Census Bureau should conduct research to determine how the changes in format and processes that were made in the 2020 Census and in the American Community Survey beginning in 2020 affected the distributions of race and ethnicity. Such research should use qualitative, quantitative, and simulation methods to ascertain: how respondents viewed and used the 2010 and 2020 formats; trends in multirace reporting by age, sex, race, and ethnicity; how samples of 2020 respondents would have been categorized using the 2010 format, data capture, and coding; and the implications of differences in write-ins by response mode (e.g., more write-ins for internet responses) on race distributions among geographic areas. The Census Bureau should communicate the results of this research to data users to assist them to understand the implications of the changes for time series, and should also use the findings to inform 2030 Census planning.

Should OMB revise Statistical Policy Directive No. 15 to adopt a single, combined, check-all-that-apply question for race and ethnicity, with a MENA category added and Hispanic as a category, what might be the effects on race and ethnicity distributions in the 2030 Census? Table 10.7 gives a possible answer—a combined question, with write-in spaces under the White and Black checkboxes as in 2020 and the other 2020 data-capture and coding changes, would likely be a marked change in series from 2020, just as 2020 was a marked change in series from 2010. The SOR category would dwindle with a combined question, while the increase in the Two or More Races category and the decrease in the White Alone category would be somewhat less pronounced. Moreover, a combined question would likely show many fewer Hispanic people in the Two or More

Page 287 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

Table 10.7 Percentage Distribution of the Population by Race/Ethnicity, 2010–2020 Censuses and Combined Race-Ethnicity Question

Source/Race-Ethnicity Group	2010 Census	2020 Census	Combined Race-Ethnicity Question
Total (%)	100.0	100.0	98.8
White Alone	72.4	61.6	63.0
Black Alone	12.6	12.4	9.6
AIAN Alone	0.9	1.1	0.4
Asian Alone	4.8	6.0	5.2
MENA Alone	N.A.	N.A.	0.2
NHOPI Alone	0.2	0.2	0.1
SOR Alone	6.2	8.4	0.3
Two or More Races	2.9	10.2	9.3
Hispanic	16.3	18.7	10.8

NOTES: AIAN, American Indian or Alaska Native; MENA, Middle Eastern or North African; NHOPI, Native Hawaiian or Other Pacific Islander; SOR, Some Other Race. The Hispanic population in the 2010–2020 Census columns can be of any race and is not included in the race percentages, which add up to 100%; the Hispanic population in the combined question column is Hispanic Alone. The last column adds up to 98.8%—the remaining 1.2% were invalid or missing responses in the 2015 National Content Test (see Sources). The comparable estimate in the last column to the Hispanic estimates of 16.3–18.7% in the other two columns is 14% for Hispanic Alone and in Combination.

SOURCES: 2010–2020 Census: See Table 10.4. Combined Race-Origin Question: Mathews et al. (2017:Table H7 (all interview modes, row for a combined question with write-in areas)).

Races category (less than 5% in the 2015 National Content Test),²⁵ compared with one-third of Hispanic people who were so coded in 2020.

Other implications of the move to a combined question would need attention. First, the January 2023 Federal Register notice’s query for feedback on requiring detailed/write-in race and Hispanic-origin data collection by default, whenever feasible, raises a very important issue. Many federal, state, and local administrative agencies (e.g., school districts, police departments, health departments, vital records offices) may not want or simply may not be able to use the detailed write-ins and additional codes that the Census Bureau has adopted. Instead, they may need or want to use the minimum, main categories only (White, Black, AIAN, Asian, Native Hawaiian and Pacific Islander,²⁶ Hispanic, MENA) to simplify data collection and processing. Even at the minimum category level, it could take considerable time for agencies to switch over to a combined question with Hispanic and MENA categories (e.g., all

___________________

²⁵ See Mathews et al. (2017:Table H13).

²⁶ The suggested changes in the Federal Register notice also include dropping “Other” from the NHOPI category (88 FR 5375, January 27, 2023).

Page 288 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

states would have to adopt the new categories for vital records on births and deaths). Consequently, there will likely be periods when numerators (e.g., race and ethnicity of enrolled school children or newborns) do not correspond exactly to denominator populations derived from the census or ACS. For this reason, crosswalks would be essential for bridging series from the old to the new standards, as called for in the initial proposals for revision.

It would also be important to consult as soon as possible with state legislators, redistricting commissions, political parties, consultants, and others involved in the redistricting process. In this consultation, it would be useful to focus on ways to streamline the number of categories compared with 2000–2020. A combined question could increase the number of cells in Tables P1 (race or ethnicity) and P2 (race or ethnicity for 18 and older) because of the two added categories of Hispanic and MENA, but it would eliminate the need for two tables in the current list (P2, Hispanic and non-Hispanic by race, and P4, Hispanic and non-Hispanic by race for 18 and older—refer to Box 10.2). SOR could be ignored except for a single cell of SOR Alone, and redistricters might agree that stopping with, say, four or more races instead of six (or eight under the single-question format) would be acceptable. As part of this conversation, the issue could be raised about whether small blocks (the median block population size in 2020 was 28 people) could be combined. Streamlining and combining would decidedly help to square privacy protection with accuracy, whatever protection method were used.

Recommendation 10.2: To improve the quality of reporting on race and ethnicity in the census, American Community Survey (ACS), and other federal data collections, the U.S. Office of Management and Budget (OMB) should revise Statistical Policy Directive No. 15 to adopt a combined, check-more-than-one race/ethnicity classification with both Hispanic and Middle Eastern or North African as main categories, in addition to White, Black, American Indian and Alaska Native, Asian, and Native Hawaiian and Other Pacific Islander. OMB should formalize any changes to the race and ethnicity reporting standards as soon as practicable, to permit the U.S. Census Bureau to implement the new race/ethnicity question in the ACS and in the 2030 Census. OMB should allow agencies to collect data only for the major categories in the combined question when added detail (as in the 2020 Census) would impose undue administrative burdens. A decision by the Census Bureau on including added detail and write-in spaces, along with expanded data capture and coding procedures, for a combined race/ethnicity question for the ACS and the 2030 Census should await completion of research on such matters

Page 289 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×

as how respondents viewed the 2020 question format and the effects of response-mode differences in write-in responses on diversity among geographic areas.

Recommendation 10.3: The U.S. Census Bureau should produce a crosswalk or bridge between the 2010 and 2020 Census race and ethnicity questions and responses. Similarly, if a combined race/ethnicity question (with Hispanic and Middle Eastern or North African categories) is adopted as the standard by the U.S. Office of Management and Budget, the Census Bureau should produce a crosswalk or bridge between the 2020 version and revised race and ethnicity questions and responses, as soon as the revisions are implemented in the American Community Survey and then in the 2030 Census. The Census Bureau should involve data users in this important work.

Recommendation 10.4: The U.S. Census Bureau should consult as early as possible with the redistricting community, which includes state legislators, redistricting commissions, political parties, and political consultants, among others, to determine the optimum set of tabulations to include on the 2030 Redistricting File, whether the current race and ethnicity categories are retained or the proposed revisions take effect. The consultation should include consideration of streamlining the number of race/ethnicity categories in the file and combining blocks with small populations (using input from localities) to maximize the ability to protect confidentiality while maintaining accuracy.

Page 290 Cite

Suggested Citation:"10 Measurement of Race and Ethnicity." National Academies of Sciences, Engineering, and Medicine. 2023. Assessing the 2020 Census: Final Report. Washington, DC: The National Academies Press. doi: 10.17226/27150.

×