National Academies Press: OpenBook

Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement (2009)

Chapter: 3 Defining Categorization Needs for Race and Ethnicity Data

« Previous: 2 Evidence of Disparities Among Ethnicity Groups
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

3
Defining Categorization Needs for Race and Ethnicity Data

The collection of data in the Office of Management and Budget (OMB) race and Hispanic ethnicity categories is improving across a variety of health care entities, but all entities do not yet collect or report data using these categories. Moreover, disparities within the broad groups represented by these categories support the case for collection of granular ethnicity data beyond the OMB categories. Given variations in locally relevant populations, no single national set of additional ethnicity categories is best for all entities that collect these data. Collection of data in the OMB race and Hispanic ethnicity categories, supplemented by more granular ethnicity data, is recommended, with tailoring of the latter through locally relevant categories chosen from a standardized national set. In most cases, rolling up the data on granular ethnicities to the OMB categories will be possible, but it will be necessary to exercise care as there are certain ethnicities that do not correspond with any one race. However, when questions about race and granular ethnicity are both answered, rollup is not necessary.

Collecting and maintaining demographic data in medical records and enrollment files allows for analyses stratified by race and ethnicity to identify needed improvements in health care, and for identification of individuals or population groups that might be the focus of interventions designed to address health care needs. The resultant analyses can be used, for example, to plan specific features of interventions (e.g., the use of culturally relevant content in outreach communications about preventive services) and to compare the quality of care being provided by various entities serving similar populations. The primary reason for standardizing categories for the variables of race and ethnicity is to enable consistent comparison or aggregation of the data across multiple entities (e.g., state-level analyses of providers under Medicaid or a health plan’s analysis of disparities in multiple states where it is operating). At the same time, standardized categories must enable persons to self-identify with the categories and increase the utility of the data to the entity collecting them.

Both federal and state agencies (e.g., the Social Security Administration and state Medicaid programs) classify individuals by their race or ethnicity to obtain useful information for health and health care purposes (Mays et al., 2003). Other entities, such as health plans, health professionals, hospitals, community health centers, nursing homes, funeral directors, public health departments, and the public, play roles in categorizing, collecting, reporting, and using these data for quality improvement purposes. Coordinating efforts of these stakeholders to ensure accurate collection and reporting of uniformly categorized race and ethnicity data could lead to more powerful

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

TABLE 3-1 OMB Race and Hispanic Ethnicity Categories According to a One- and Two-Question Format

Responses for Hispanic ethnicity in two-question format

Hispanic or Latino

Not Hispanic or Latino

Responses for race in two-question format

American Indian or Alaska Native

Asian

Black or African American

Native Hawaiian or Other Pacific Islander (NHOPI)

White

Responses to a single question combining race and Hispanic ethnicity (one-question format)

American Indian or Alaska Native

Asian

Black or African American

Hispanic or Latino

Native Hawaiian or Other Pacific Islander (NHOPI)

White

SOURCE: OMB, 1997b.

analyses of aggregated data (Sequist and Schneider, 2006). While progress has been made in the past few years to incorporate the existing national standard set of categories promulgated by OMB (see Table 3-1) into the collection and presentation of data, many data collection efforts still do not fully employ these basic standard categories.

All health and health care entities are not required to collect data on race and ethnicity, but if they do, the OMB categories are the minimum that a federal agency or recipient of federal funds must include in its categorization and reporting. The OMB standards have acknowledged imperfections, though. The categories are often, as shown by the literature review in Chapter 2, too broad for effectively identifying and targeting disparities in health and health care. Additionally, a substantial portion of Hispanics do not relate to the race options, leading to many Hispanics being reported in Census data as “Some other race” because they do not choose any of the five OMB race categories (del Pinal et al., 2007; NRC, 2006; OMB, 1997a). While OMB allows two formats for the race and Hispanic ethnicity questions—one combining both race and Hispanic ethnicity in a single question and the other asking about them in two separate questions, with the Hispanic ethnicity question being asked first (Table 3-1)—OMB explicitly prefers the latter two-question format (OMB, 1997b). As discussed later in the chapter, the format used may have implications for Hispanic response rates (Baker et al., 2006; Laws and Heckscher, 2002; Taylor-Clark, 2009).

This chapter examines approaches to categorizing race and ethnicity by (1) reviewing the current state of standardized collection of race and ethnicity data, with a focus on the sufficiency of the OMB categories and their uptake in various areas of health care data collection; (2) examining the utility of the continued use of the current OMB categories; and (3) considering how the OMB race and Hispanic ethnicity categories can be combined with locally tailored, more detailed ethnicity categories selected from a national standard set, with standardized coding and rollup procedures, to capture important variations among ethnic groups. The chapter concludes by exploring approaches to eliciting responses on race, Hispanic ethnicity, and granular ethnicity, and reviewing models for data collection.

CURRENT STATE OF STANDARDIZED COLLECTION OF RACE AND ETHNICITY DATA

As previously noted, a variety of entities, many of which fall under the purview of the Department of Health and Human Services’ (HHS’) 1997 inclusion policy, collect race and ethnicity data for a variety of purposes. The HHS inclusion policy mandates the collection of at least OMB race and Hispanic ethnicity data in specific circumstances, such as in administrative records, surveys, research projects, and contract proposals associated with direct

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

federal service programs. While the policy does not state which specific categories should be collected in addition to the OMB categories, it encourages the collection and reporting of subgroup data (HHS Data Council, 1999).

Exploring the current state of data categorization provides insight into the challenges faced by health- and health care-related entities in categorizing and collecting the data. Table 3-2 shows the categories used by various federally funded health surveys, state birth records, and cancer registries. Many of these data sources are national-level collection systems designed—among other purposes—to make comparisons across time, providers, and geographic areas (Madans, 2009). These surveys collect race and Hispanic ethnicity data in the six categories specified by OMB and a usually common set of 9 to 12 additional ethnicity categories. For example, the National Health Interview Survey (NHIS), National Survey on Drug Use and Health (NSDUH), and Medical Expenditure Panel Survey (MEPS) all include the OMB categories plus Mexican, Cuban, Puerto Rican, Asian Indian, Chinese, Filipino, Japanese, Korean, and Vietnamese categories, among others. These categories generally correspond to the 15 response check-off boxes included in Census 2000, Census 2010, and intercensal American Community Survey (ACS) questions on race and ethnicity (see Table 3-2).

Despite HHS’ inclusion policy, some HHS agencies have not collected even the minimum OMB categories (e.g., Medicare enrollment files). In general, HHS-funded or -sponsored surveys collect the minimum OMB categories—and often additional categories—but all categories are not necessarily reported or analyzed because of small sample sizes. As specific stratifying variables are applied to survey data, for example, the pool of applicable respondents gets smaller (e.g., receipt of diabetes care services by age and race), which may make the number of cases of small racial or ethnic groups too small for analysis. In contrast to surveys, most national administrative datasets are case-rich, meaning they may contain enough data to allow for analyses of even small ethnic groups. For example, the Medicare databases contain a large number of cases and thereby could play an important role in stratifying data by race and ethnicity.

Race and Ethnicity Categorization in Medicare Data

Medicare, a large source of quality improvement data, has limited race and ethnicity data in the enrollment files for its 44.8 million beneficiaries. Because of the history of how race and ethnicity data have been captured (Reilly, 2009), the available race and ethnicity data are often of low accuracy and quality (Bilheimer and Sisk, 2008; Bonito et al., 2008; Eicheldinger and Bonito, 2008; Ford and Kelly, 2005; U.S. House Committee on Ways and Means Subcommittee on Health, 2008). Analyses of Medicare administrative enrollment data found that while the validity of individual data on race and ethnicity was high for Whites and Blacks (the sensitivity was 97 and 96 percent, respectively), only 52 percent of Asian, 33 percent of Hispanic or Latino, and 33 percent of American Indian or Alaska Native beneficiaries were correctly identified (McBean, 2006). Medicare has historically relied on the race and ethnicity data individuals provided when they applied for a Social Security number (SSN). Before 1980, the SSN application form limited respondents to choosing Black, White, and Other. Since most people age 65 and older today received a SSN prior to 1980, their racial and ethnic identifiers were limited to these responses unless the individual changed enrollment to a specific health plan. The current SSN application combines race and ethnicity into a single question and includes only five of the six OMB categories.1 Consequently, Medicare data have been of limited use in studying differences in patterns of care for populations identified by the OMB categories (Bilheimer and Sisk, 2008; Bonito et al., 2008; Eicheldinger and Bonito, 2008; Ford and Kelly, 2005; U.S. House Committee on Ways and Means Subcommittee on Health, 2008).

The limitations of the Medicare data for race and Hispanic ethnicity have been acknowledged by Centers for Medicare and Medicaid Services (CMS) officials, and CMS is actively working to improve its coding of race and ethnicity data by working with the Social Security Administration (SSA) to ensure the capture of data according to the OMB minimum standards (Reilly, 2009). CMS has also explored a variety of indirect estimation techniques

1

The OMB-approved SSA Application for a Social Security Card instructs applicants to “Check one only”: Asian, Asian-American or Pacific Islander; Hispanic; Black (Not Hispanic); American Indian or Alaska Native; or White (Not Hispanic). These five categories do not correspond to the 1997 OMB standards, which split Asians and Pacific Islanders into separate categories, nor do the instructions to “Check one only” allow multirace individuals to “Mark one or more.”

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

TABLE 3-2 Race and Ethnicity Categories Collected by Various Data Sources

Categories Collected

Census 2010 and ACS (2009)

NHIS (2008)

NIS (2007)

NSDUH (2008)

MEPS (2006)

NAMCS (2008)

NHAMCS (2009)

Application for a SSN (update unknown)

CMS Nursing Home Minimum Data Set (updated 2000)

Standard Certificate of Birth and Death (updated 2003)

SEER (updated 2008)

Responses to combined race and Hispanic ethnicity format

Asian, Asian-American or Pacific Islander

 

 

 

 

 

 

 

X

 

 

 

Asian/Pacific Islander

 

 

 

 

 

 

 

 

X

 

 

Hispanic

 

 

 

 

 

 

 

X

X

 

 

Black (Not Hispanic)

 

 

 

 

 

 

 

X

X

 

 

North American Indian or Alaskan Native

 

 

 

 

 

 

 

X

 

 

 

American Indian/Alaskan Native

 

 

 

 

 

 

 

 

X

 

 

White (Not Hispanic)

 

 

 

 

 

 

 

X

X

 

 

Responses for Hispanic ethnicity question in two-question format

Yes/Hispanic or Latino

 

X

X

X

X

X

X

 

 

 

 

No/Not Hispanic or Latino/Not of Spanish, Hispanic, Latino origin

X

X

X

X

X

X

X

 

 

X

X

Puerto Rican

X

X

X

X

X

 

 

 

 

X

X

Cuban/Cuban American

X

X

X

X

X

 

 

 

 

X

X

Dominican (Republic)

 

X

 

X

X

 

 

 

 

 

X

Mexican

 

X

X

 

X

 

 

 

 

 

X

Mexican American

 

X

X

 

X

 

 

 

 

 

 

Mexican/Mexican American/Mexicano/Chicano

X

 

 

X

 

 

 

 

 

X

 

Central or South American

 

X

 

X

X

 

 

 

 

 

X

Central American

 

 

X

 

 

 

 

 

 

 

 

South American

 

 

X

 

 

 

 

 

 

 

 

Spanish (from Spain)

 

 

 

X

 

 

 

 

 

 

 

Spanish

 

 

 

 

 

 

 

 

 

 

X

Spanish-Caribbean

 

 

X

 

 

 

 

 

 

 

 

Other Latin American

 

X

 

 

X

 

 

 

 

 

 

Other Hispanic/Latino/Spanish, Specify

X

X

 

 

 

 

 

 

 

X

 

Other Hispanic/Latino, Specify

 

 

 

X

X

 

 

 

 

 

 

Other Spanish/Hispanic, Specify

 

 

X

 

 

 

 

 

 

 

X

Refused

 

X

X

X

X

 

 

 

 

 

 

Don’t know/Unknown

 

X

X

X

X

 

 

 

 

 

X

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Categories Collected

Census 2010

NHIS

NIS

NSDUH

MEPS

NAMCS

NHAMCS

Application for a SSN

Minimum Data Set

Certificate of Birth and Death

SEER

Responses to race question in two-question format

White

X

X

X

X

X

X

X

 

 

X

X

Black/African American

 

X

X

X

X

X

X

 

 

X

 

Black

 

 

 

 

 

 

 

 

 

 

X

Black, African Am., or Negro

X

 

 

 

 

 

 

 

 

 

 

American Indian

 

X

X

 

 

 

 

 

 

 

 

Alaska Native

 

X

X

 

 

 

 

 

 

 

 

American Indian or Alaska Native

 

 

 

X

X

X

X

 

 

 

 

American Indian or Alaska Native – Print name of enrolled or principal tribe

X

 

 

 

 

 

 

 

 

X

 

American Indian, Aleutian, Alaskan Native, or Eskimo

 

 

 

 

 

 

 

 

 

 

X

Native Hawaiian

X

X

X

X

 

 

 

 

 

X

X

Native Hawaiian or Other Pacific Islander

 

 

 

 

X

X

X

 

 

 

 

Samoan

X

X

 

 

 

 

 

 

 

X

X

Guamanian or Chamorro

X

X

 

 

 

 

 

 

 

X

X

Micronesian

 

 

 

 

 

 

 

 

 

 

X

Polynesian

 

 

 

 

 

 

 

 

 

 

X

Tahitian

 

 

 

 

 

 

 

 

 

 

X

Tongan

 

 

 

 

 

 

 

 

 

 

X

Melanesian

 

 

 

 

 

 

 

 

 

 

X

Fiji Islander

 

 

 

 

 

 

 

 

 

 

X

New Guinean

 

 

 

 

 

 

 

 

 

 

X

Other Pacific Islander, Specify

X

X

X

X

 

 

 

 

 

X

X

Asian

 

 

X

X

X

X

X

 

 

 

 

Asian Indian

X

X

 

X

X

 

 

 

 

X

 

Asian Indian, Pakistani

 

 

 

 

 

 

 

 

 

X

 

Chinese

X

X

 

X

X

 

 

 

 

X

X

Filipino

X

X

 

X

X

 

 

 

 

X

X

Japanese

X

X

 

X

X

 

 

 

 

X

X

Korean

X

X

 

X

X

 

 

 

 

X

X

Vietnamese

X

X

 

X

X

 

 

 

 

X

X

Other Asian: Specify

X

X

 

X

X

 

 

 

 

X

X

Laotian

 

 

 

 

 

 

 

 

 

 

X

Hmong

 

 

 

 

 

 

 

 

 

 

X

Kampuchean (including Khmer and Cambodian)

 

 

 

 

 

 

 

 

 

 

X

Thai

 

 

 

 

 

 

 

 

 

 

X

Some other race, Specify

X

X

X

X

X

 

 

 

 

X

X

Refused

 

X

X

 

X

 

 

 

 

 

 

Don’t know/Unknown

 

X

X

 

X

 

 

 

 

 

X

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

to improve analyses of race and ethnicity differentials among individuals currently in the Medicare data system (Bonito et al., 2008; Wei et al., 2006).2 Under the Medicare Improvements for Patients and Providers Act of 2008,3 CMS is required to address quality reporting by race and ethnicity. A report by CMS detailing its proposed actions is due to Congress in January 2010.

Race and Ethnicity Categorization in State-Administered Programs

Much, but not all, of the collection of standardized data at the state level is done under federally funded programs, including Medicaid and the Children’s Health Insurance Program (CHIP). Other state data collection systems, such as hospital discharge data systems and cancer registries, aim to use race and ethnicity data categories that are consistent with nationally collected denominator data (Friedman et al., 2000; Laws and Heckscher, 2002). States face difficulties, though, in consistently collecting accurate and reliable data that are uniformly classified.

Medicaid and CHIP

The Children’s Health Insurance Program Reauthorization Act of 2009,4 signed into law in February 2009, stipulates the development, by January 2011, of quality measures designed to identify and eliminate racial and ethnic disparities in child health and health care. This legislation has the potential to improve measurement of disparities for children in federally funded programs as it specifies that “data required for such measures is [sic] collected and reported in a standard format that permits comparison of quality and data at a State, plan, and provider level.” A national standard set of race and ethnicity categories is necessary to stratify and compare these quality metrics across the nation.

Although states are mandated to submit Medicaid claims data electronically to CMS, there are anomalies in the submitted data (CMS, 2009). For example, in 2003, race and Hispanic ethnicity were listed as “unknown’’ for more than 20 percent of enrollees in New York, Rhode Island, and Vermont (McAlpine et al., 2007). A 2004 survey noted that while the majority of states were collecting self-reported race and Hispanic ethnicity from their Medicaid and CHIP beneficiaries, most commonly during the enrollment process (Llanos and Palmer, 2006), few states were collecting the six OMB minimum categories (Palmer, 2004). Many states were including Hispanic as an option in the race question instead of asking a separate question about ethnicity (McAlpine et al., 2007); as noted earlier, OMB permits this format but explicitly prefers the two-question format. The subcommittee’s research indicates that some progress has been made in the past six years on the collection of Medicaid data using the OMB standards. The subcommittee examined state Medicaid and CHIP application forms and found improved standardization, most notably in collecting the Asian and Native Hawaiian or Other Pacific Islander (NHOPI) categories (Table 3-3).

Vital Statistics Data

Failure to use standard categories and nonreporting or misreporting of data complicate efforts to calculate national and state birth, mortality, and morbidity rates by the OMB race and Hispanic ethnicity categories or for more detailed categories. The National Vital Statistics System (NVSS), hospital discharge data, and state registries provide data needed to calculate these rates, but the data may not be collected and reported according to the OMB categories or may be of poor quality. While the standard birth, death, and fetal death certificates now include

2

A 2009 white paper by the U.S. Senate Finance Committee presented proposals to improve patient care and health delivery. One proposal included a comprehensive database required of CMS to expand existing data sources, data sharing, and matching across federal and state claims and payment data, including HHS; SSA; the Departments of Veterans Affairs (VA), Defense (DOD), and Justice (DOJ); and the Federal Employees Health Benefit Program (FEHBP) (U.S. Senate Finance Committee, 2009). The results of this and other proposals to revise payment systems and policies in the Medicare program remain to be seen.

3

 Medicare Improvements for Patients and Providers Act of 2008, Public Law 110-275 § 118, 110th Cong., 2d sess. (July 15, 2008).

4

 Children’s Health Insurance Program Reauthorization Act of 2009, Public Law 111-3, 111th Cong., 1st sess. (February 4, 2009).

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

TABLE 3-3 Race and Hispanic Ethnicity Categories Used by State Medicaid and CHIP Programs

OMB Race and Hispanic Ethnicity Categories

2004: State Medicaid Programs Using (out of 21)a

2009: State Medicaid Programs Using (out of 33)b

2009: State CHIP Programs Using (out of 38)c

White

20

32

37

American Indian or Alaska Native

20

31

37

Black or African American

19

32

37

Hispanic or Latino

19d

32e

35f

Asian

16

32

37

Native Hawaiian or Other Pacific Islander

14

30

36

Other

9

5

8

a SOURCE: Palmer, 2004.

b 37 state applications were available online. Four states provided space to write-in a free-text response, so they are not included in the denominator. Of the remaining 33 states, all applications except one solicited race and ethnicity information with specific category choices.

c 45 state applications were available online. Seven states provided space to write-in a free-text response, so the categories collected by these states are not included. Of the remaining 38 states, all applications except one solicited race and ethnicity information with specific category choices.

d Seven of the 19 states also collected data on Not Hispanic or Latino, indicating differences in using the one-question versus two-question format.

e 18 of the 32 states also collected data on Not Hispanic or Latino, indicating about an equivalent number of states using the one-versus two-question format.

f 18 of the 35 states also collected data on Not Hispanic or Latino.

the OMB categories plus 13 other categories,5 not all jurisdictions have adopted these standard certificates. As of April 1, 2009, 32 jurisdictions (56percent) had adopted the 2003 standard birth and death certificates, and 22 jurisdictions (39 percent) had adopted the 2003 standard fetal death report. The percentage of these vital events covered by the states that have adopted the 2003 standard certificates is higher, however, because they are states with larger populations.6

Death certificates provide the numerator for calculating death rates, while Census data provide the denominator. A deceased individual’s race and ethnicity are often identified by the funeral director relying on his or her own observation, which is often inaccurate, particularly for racial and ethnic groups with a large number of multiracial individuals (Arias et al., 2008; Durch and Madans, 2001). For example, an individual who may self-identify as White and American Indian or Alaska Native may be categorized as only White by a funeral director, resulting in undercounting of deaths in the American Indian or Alaska Native population. Misclassification on death certificates produces a substantial net underestimate of mortality rates for Hispanic, Asian, American Indian or Alaska Native, and NHOPI populations (Arias et al., 2008; Durch and Madans, 2001). An assessment of the quality of death rates found them to be understated by 11 percent for both Asians and Pacific Islanders and about 21 percent for American Indians and Alaska Natives (Rosenberg et al., 1999).

Hospital Discharge Data

Hospital discharge records sometimes lack race and ethnicity information (Gold et al., 2008; Schoenman et al., 2005) because hospitals either are not required to collect and report this information or choose not to do so (Romano et al., 2003). As of May 2009, at least 39 states included some race and ethnicity data in their discharge

5

The categories collected on the standard death certificate are included in Table 3-2.

6

Personal communication, J. Madans, National Center for Health Statistics, April 17, 2009.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

BOX 3-1

Race and Ethnicity Categories in the HCUP Databases

The Healthcare Cost and Utilization Project (HCUP), a family of health care databases sponsored by the Agency for Healthcare Research and Quality (AHRQ), relies on the voluntary participation of 40 states to submit hospital discharge data. HCUP databases contain clinical and nonclinical information, including patient demographics, diagnoses, procedures, discharge status, and charges for all patients, regardless of payer (e.g., persons covered by Medicare, Medicaid, and private insurance, as well as no insurance). One HCUP data element contains source-specific information about the race and ethnicity of the patient: “race” retains information on the race of the patient as provided by the data source, and “Hispanic” retains information on Hispanic ethnicity as provided by the data source.

Only 31 of the 40 participating states provide race and ethnicity data to HCUP. Some states report on all the OMB standard categories (e.g., Arizona, Missouri), some states (e.g., Hawaii, Massachusetts, New Jersey) collect more detailed ethnicity data, and some states do not report on the minimum OMB categories (e.g., Arkansas, North Carolina, Utah). HCUP recodes the data into the race and Hispanic ethnicity categories by which it analyzes and stratifies data: White, Black, Hispanic, Asian or Pacific Islander, Native American, and Other. These categories are similar to but do not in totality mirror the OMB standards.


SOURCES: AHRQ, 2006; Fraser and Andrews, 2009.

data reporting requirements. These data fields, however, are often added without additional resources to support complete and consistent reporting. Consequently, collection and coding practices vary, and data quality may be poor.7

Forty states voluntarily participate in the HCUP databases, but only 31 of these provide HCUP with race and ethnicity data. Of these 31 states, several do not report data using the minimum OMB race and Hispanic ethnicity categories, and others report the data in different categories that HCUP must recode to allow multistate and national-state comparisons (see Box 3-1) (AHRQ, 2006).

Cancer Registries

State cancer registries collect, classify, consolidate, and link information on new cancer cases from hospital reports, medical records, pathology reports, hospital discharge data, and death certificates (CDC, 2009). Cancer registries operate in 45 states, the District of Columbia, Puerto Rico, and the U.S. Pacific Islands, providing surveillance capabilities for identifying patterns, trends, and variation in disease burden and care among racial and ethnic groups. Difficulties may arise, however, in coding race and ethnicity from such disparate sources including, for example, the hand-written observations of physicians (Izquierdo and Schoenbach, 2000).

The National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) Program coding manual includes two of the OMB categories directly (e.g., White and Black) and more granular ethnicity categories that correspond to the other OMB standard categories (e.g., instead of a broad Hispanic ethnicity category, SEER asks more specifically whether a person is Puerto Rican or Cuban) (see Table 3-2 for the categories coded by SEER); altogether there are 34 categories. Because SEER stratifies the data whenever possible by more discrete groups, registries are instructed to categorize a patient’s ancestry by one of the 34 categories (Johnson and Adamo, 2008). SEER supplements and improves the data it receives from states by, for example, linking with the Indian Health Service to improve American Indian or Alaska Native data (see Box 3-2). SEER also uses an indirect estimation

7

Personal communication, D. Love, National Association of Health Data Organizations, June 5, 2009.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

BOX 3-2

The Use of Data Linkages to Improve Data Coverage and Quality in Cancer Registries

The American Indian or Alaska Native population makes up just over one percent of the U.S. population and is dispersed throughout the country. This complicates the collection and aggregation of data on cancer incidence, an especially important task because unique circumstances of culture, locale, history, and health care produce unusual patterns of cancer occurrence among American Indian or Alaska Native populations (Cobb et al., 2008). Alaska Natives, for example, have rates of lung, colon, and breast cancer five times higher than those of Southwestern Indians.

Studies have demonstrated that many American Indian or Alaska Natives are misclassified as another race in cancer registry data, and dividing these numerators with population denominators from the Bureau of the Census has the effect of underestimating cancer rates for American Indian or Alaska Natives. To address this problem, SEER cancer registries (which cover 26 percent of the total U.S. population and 42 percent of the American Indian or Alaska Native population) have been linked with Indian Health Service (IHS) beneficiary records using LinkPlus, a probabilistic linkage software program developed by the Centers for Disease Control and Prevention (CDC), to identify records representing the same individual in the IHS and registry databases (Espey et al., 2008).

algorithm based on Spanish surnames and birthplace to improve Hispanic classification, and an algorithm based on surnames and birthplace to improve data on Asian and NHOPI ethnic groups (Edwards, 2009).

Review of the State of Standardization

This review of categories currently used in various data collection activities highlights that there are substantial efforts nationally, by a number of states, and by various health care organizations to collect race and Hispanic ethnicity data according to the OMB standards. However, not all of these efforts have yet achieved that level of categorization, and national surveys, nationally standardized birth and death certificates, and cancer registries have found it useful to use more fine-grained categorizations beyond the basic OMB categories. Efforts to standardize categorization and collection will eliminate some of the problems with comparability among data collected by disparate systems.

CONTINUED USE OF THE OMB CATEGORIES

The OMB race and Hispanic ethnicity categories were deemed to represent the country’s broad population groups most necessary or useful for a variety of reporting and analytic purposes not specific to health care. The 1997 Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity were developed over a 4-year period during which an interagency taskforce weighed public input, expert testimony, and other evidence to consider whether and how to modify OMB’s 1977 standards (OMB, 1977, 1997b). OMB has no plans to change its current standards (Wallman, 2009).

Chapter 2 documented important variations in health and health care that may be masked when data are analyzed using only the OMB race and Hispanic ethnicity categories. Notwithstanding this limitation, a large body of studies has revealed disparities in health and health care among the groups represented by those categories. Thus, use of the OMB categories yields important data for quality improvement analyses and reporting efforts (AHRQ, 2008; Cohen, 2008; Flores and Tomany-Korman, 2008; IOM, 2008; Kaiser Family Foundation, 2009). Additionally, because OMB-level reporting is required by various federal agencies and recipients of federal funds, the

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

OMB categories serve as a denominator for many comparisons related to health and health care. Thus, the OMB categories are useful for high-level analysis, reporting, and policy intervention (e.g., in the National Healthcare Disparities Report), as well as more local uses. If all entities were to collect race and ethnicity data using the OMB categories, the process of combining or comparing data across reporting entities (e.g., hospitals in states contributing to HCUP or health plans’ Healthcare Effectiveness Data and Information Set [HEDIS] data stratified by race and ethnicity) would be greatly facilitated. While the OMB categories do not define more specific subgroups and do not address how to include all difficult-to-categorize groups, they provide a useful common minimum platform for analyzing disparities in health care.

Past Activities to Improve the Collection of Data in the OMB Categories

One assumption underlying self-identified race and ethnicity data collection is that the categories and designations are recognized and accepted by the populations questioned (CDC, 1993; Lin and Kelsey, 2000). Improving the likelihood that respondents can identify with the races and ethnicities offered as response options is therefore essential to the quality of the data collected. Challenges in capturing accurate and reliable OMB-level data include the lack of detailed categories to which individuals can relate and the format of the questions used to elicit Hispanic ethnicity.

Categorizing Diverse Populations

A wide range of cultures, languages, and health-related behaviors are encompassed by each of the six OMB race and Hispanic ethnicity categories. For example, the Asian category blurs ancestry distinctions and vast cultural and geographic diversity (Holup et al., 2007). As a result, the Asian race identification may not resonate with all individuals of Pakistani, Vietnamese, or Filipino descent, for example, who might prefer to self-identify according to their ancestry (see Box 3-3) (Laws and Heckscher, 2002).

Similarly, the Black or African American, White, American Indian or Alaska Native, and NHOPI populations consist of heterogeneous groups and persons within these groups may not identify with the broader race categories (Bailey, 2001; Mays et al., 2003). The Census Bureau has recognized that check-off boxes that represent more detailed categories in addition to the broad OMB categories resonate better with respondents. The Census includes several ancestry options on the Hispanic origin question and several Asian and NHOPI ancestries on the race question (see Figure 3-1). Additionally, the inclusion of space to write in a free-text response permits individuals who do not identify with any of the provided check-off boxes to self-identify.

In Census 2000, about 15.4 million respondents were classified in the “Some other race” alone category, which was added to the OMB categories; this represents 5.5 percent of the total U.S. population.8 More than 97 percent of those who chose this category were Hispanic (Rothenberg, 2006), and the remaining write-in responses included a range of answers, such as German and Guyanese. As Table 3-4 illustrates, 42.2 percent of the 35.2 million Hispanic respondents identified with the response category “Some other race.” High rates of reporting “Some other race” on the Census may indicate that Dominicans, for example, are uncomfortable with saying “I am Black,” or “I am White,” and instead prefer to identify with a separate, distinct group (Bailey, 2001).9

Hispanics (discussed below) dwarf the other ethnicities in the “Some other race” category by virtue of their numbers, but individuals of other ethnicities, such as Cape Verdeans and Guyanese, also often do not self-identify with any of the OMB race and Hispanic ethnicity categories (Hernandez-Ramdwar, 1997; Laws and Heckscher, 2002; Model and Fisher, 2008). Consequently, these individuals, as well as many people of Filipino descent, among others, may not respond to the race question or may check “Some other race” if the option is available. The sub-

8

The 2005 Omnibus Appropriations Bill, at the urging of Congressman José E. Serrano (D-NY), directed that any collection of Census data on race identification must include “Some other race” as a response category. In previous censuses, the Census Bureau had sought and received OMB approval to include “Some other race” as a response category (U.S. Census Bureau, 2002b).

9

Dominicans (58 percent) were the group most likely to self-identify as “Some other race” in Census 2000 (NRC, 2006; Tafoya, 2004).

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

BOX 3-3

The Challenge of Categorizing Filipino Respondents

The Philippines consist of over 7,000 islands set in the western Pacific Ocean. The OMB standards define persons of Filipino descent as Asian. To evaluate Asian subgroup responses to race and ethnicity inquiries, Holup and colleagues (2007), asked a subset of adults participating in the Hemochromatosis and Iron Overload Screening Study to complete both the OMB-minimum and the expanded race and ethnicity measure used in the National Health Interview Survey (NHIS). The expanded measure used in the NHIS includes response categories for Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, and Other Asian. While 89 percent of single-heritage Filipinos marked Asian in the OMB-minimum categorization, the remaining 11 percent marked primarily NHOPI. Filipinos have also been known to categorize themselves as Spanish (Mays et al., 2003), Pacific Islander, Asian American, or, if multiracial, White (Yu and Liu, 1992). Holup and colleagues note that while OMB’s decision to separate the Asian and Pacific Islander category in the 1997 OMB revisions was a positive step, specification or provision of definitions when using the minimum OMB categories is “prudent.”

FIGURE 3-1 Reproduction of questions on race and Hispanic origin from Census 2000.

FIGURE 3-1 Reproduction of questions on race and Hispanic origin from Census 2000.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

TABLE 3-4 Hispanic and Non-Hispanic Population Distribution by Race for the United States: 2000

Race

Hispanic or Latino Ethnicity (%)

Not Hispanic or Latino Ethnicity (%)

One race

 

 

White

47.9

79.1

Black or African American

2.0

13.8

American Indian or Alaska Native

1.2

0.8

Asian

0.3

4.1

Native Hawaiian or Other Pacific Islander (NHOPI)

0.1

0.1

Some other race

42.2

0.2

Two or more races

6.3

1.9

SOURCE: Grieco and Cassidy, 2001.

committee concludes that making this option available in addition to the OMB categories would allow individuals who do not identify with one of the OMB race categories to respond (see Recommendation 3-1 below).

Format of the Race and Hispanic Ethnicity Questions

One of the principal challenges in capturing race and ethnicity data for purposes of improving health care is determining how best to capture the Hispanic or Latino population, a population comprising groups that vary widely in their characteristics (McKenney and Bennett, 1994; NRC, 2006). Many Hispanic individuals, including persons of Mexican, Puerto Rican, and Cuban heritage, prefer to self-identify using their specific ancestry as opposed to the general category Hispanic or Latino (Bowman, 1994; Gimenez, 1989; Hayes-Bautista and Chapa, 1987). The term “Hispanic” may not resonate with immigrants, in particular, because it is not used outside the United States (NRC, 2006). Many Hispanics choose “Some other race” instead of the OMB race options when given the opportunity to do so, or refuse to answer the race question when it is asked (Hasnain-Wynia et al., 2008). In a study of birth certificate data, for example, approximately two-thirds of the 15,074 mothers of Hispanic ethnicity reported their race as “Some other race” (Buescher et al., 2005). Research indicates that children of immigrants may be even more likely than their parents to self-identify as “Some other race” (NRC, 2006; Portes and Rumbaut, 2001).

As previously stated, the OMB standards encourage, “whenever feasible,” the separation of questions on race and Hispanic ethnicity, a distinction stemming from a 1976 law requiring documentation of the size and growth of the Hispanic population.10 Some research prior to the 1997 OMB revisions indicated that the separate, two-question format in which Hispanic ethnicity is elicited before race11 best identifies an OMB race category for as many Hispanic individuals as possible and allows analyses of combined race and Hispanic ethnicity categories (e.g., Hispanic Black and non-Hispanic Black). The two-question format may capture important health differences among groups. A 2006 study, for example, found that non-Hispanic Blacks have higher risks of developing coronary disease (5.8 percent) than Hispanic Blacks (4.7 percent, P = 0.017) (Lancaster et al., 2006). Additionally, a yet-to-be-released study of data from the NHIS indicates that Hispanic Blacks have a different health services and health status profile from that of either Hispanics or Blacks (Austin et al., 2009). However, the need for the dual categorization of Hispanic ethnicity and race for health care improvement purposes is not well studied.

At the same time, some research suggests that Hispanic respondents better identify with questions on race and Hispanic ethnicity when a one-question instead of a two-question format is used (Baker et al., 2006; Laws and Heckscher, 2002; Taylor-Clark, 2009). For example, the Census Bureau’s 1996 Racial and Ethnic Targeted Test (RAETT), which was administered to a sample of households in preparation for Census 2000, experimented with combining race and Hispanic ethnicity into a single question. Nonresponse to the one-question format was

10

 Joint resolution relating to the publication of economic and social statistics for Americans of Spanish-origin or descent, Public Law 94-311 (15 U.S.C. 1516a), 94th Cong. (June 16, 1976).

11

Non-response to the Hispanic origin question decreased to 5.2 percent from 8.6 percent when the Hispanic origin question was asked before rather than after the race question (U.S. Census Bureau, 1996b).

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

significantly lower than nonresponse to the two-question format. However, in the one-question format, many people who had identified as Hispanic and White or Black in the two-question format changed their response to only Hispanic, despite being permitted to “Select one or more” categories (Bennett et al., 1997).12 Yet while conventional wisdom indicates that the combined format maximizes response among Hispanics (Hirschman et al., 2000; OMB, 1997a; Tucker et al., 1996; U.S. Census Bureau, 1996a), survey research has been inconclusive regarding the best way to capture information on race and Hispanic ethnicity among this population. Continued testing of a combined-question format during the 2010 Census may reveal additional information on this issue (Humes, 2009; NRC, 2009).

Legislative efforts are under way to increase the options on the Census 2020 forms to include Caribbean, Dominican, and other populations. In the first session of the 111th Congress, Representative Charles Rangel (D-NY) and Senator Kirsten Gillibrand (D-NY) introduced bills HR 1504 and SB 1084, respectively, to require that in Census questionnaires, a check-off box be included so that respondents may indicate Dominican ethnicity. Also in the first session, Representative Yvette D. Clarke (D-NY) and Senator Charles Schumer (D-NY) introduced bills HR 2071 and SB 1083, respectively, to include a Caribbean check-off box on all future Census forms. These efforts indicate a continued call for more detailed ethnicity data. The need for more detailed data and concerns about Hispanic response may require OMB to review its standards. Most important, the subcommittee concludes there is a need for an assessment of the extent to which lack of identification with the OMB categories interferes with accurate data collection for use in quality improvement efforts (see Recommendation 3-3 below).

Identification of Multiracial Individuals

The 1997 OMB standards require that respondents be allowed to report more than one race and recommend “Mark one or more” and “Select one or more” as the included instruction. Approximately 2.4 percent of the country’s population (6.8 million persons) reported multiple races in Census 2000 (U.S. Census Bureau, 2000); this percentage can be expected to increase in the coming years (Edmonston et al., 2000). The largest percentage of multirace responses are from Hispanics; in Census 2000, Hispanics were more than three times as likely as non-Hispanics to self-identify with multiple race responses (NRC, 2006). As a result, like the “Some other race” category, multirace reporting is expected to increase with the growth of the Hispanic population. Additionally, in some areas of the country, the proportion of the population self-identifying as multiracial is substantial. In Census 2000, there were 14 states where the multiracial population was above the nationwide average of 2.4 percent. For example, the multiracial population in Hawaii totaled 21 percent, followed at a distance by Alaska at 5.4 percent (Jones and Smith, 2001).

In analysis and reporting, organizations often collapse reported multiracial combinations into an aggregate “more than one race” or a “multiracial” category because the sample sizes for the individual combinations are usually too small for analysis. The Census’ 1996 RAETT found that the option to “Select one or more” captures the same number of individuals as a single, multiracial/biracial category (Hirschman et al., 2000). The former instruction, though, allows for the identification of specific races, whereas the latter does not. Where possible, information on specific combinations of races and ethnicities should be preserved so the data can be aggregated over enough reporting units or periods to provide more informative analyses and the basis for targeted interventions. A single category labeled “multiracial” or “more than one race” may mask valuable information that could be used in analyses. More accurate analyses may require detail on each category selected by a respondent.

Some health information technology (HIT) systems are unable to support the collection and reporting of data in a “Select one or more” manner.13 OMB guidance stipulates that civil rights enforcement agencies must include the four “double-race” combinations most frequently reported. The U.S. Department of Housing and Urban

12

Sutter Health collects the five OMB race categories with a Hispanic/Non-Hispanic notation. For example, an individual may self-identify as Black/Hispanic or Black/Non-Hispanic (Personal communication, T. Van, Sutter Health, July 22, 2009). This is another way to capture these data in accordance with the OMB standards.

13

All possible combinations of the six OMB categories results in 64 combinations.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Development, for example, tabulates respondents by the five OMB race categories and four specific multiple-race combinations:

  • American Indian or Alaska Native and White

  • Asian and White

  • Black or African American and White

  • American Indian or Alaska Native and Black or African American

A sampling of the local service population or an examination of applicable Census data could reveal the most common combinations that an organization might want to capture if its information system does not allow all combinations under the “Select one or more” option.

Counting multiracial individuals as members of each individual race they select (e.g., counting individuals who self-identify as Black and Native Hawaiian in both the Black and NHOPI categories) may double-count respondents and inflate the number of respondents in denominator data. Therefore, this practice may come “at the expense of misstating disparities in the health of specific racial/ethnic groups” (Mays et al., 2003, p. 89), especially among populations in which the ratio of responses involving multiple races to a single race is high (e.g., American Indian or Alaska Native and NHOPI populations). On the other hand, this practice allows analyses to include all those who identify with a specific group.

To avoid double-counting, prioritization schemes, commonly referred to as trumping rules, recategorize multiracial individuals into a single race category and facilitate comparison of the data with data from systems that allow only single-race categories. For example, OMB guidelines stipulate that when addressing civil rights claims, “responses that combine one minority race and white are allocated to the minority race” (OMB, 2000).

Prioritization schemes reflect a lack of consideration of multiracial respondents’ preference, aversion, or indifference to identifying primarily with one race. The NHIS and the California Health Interview Survey (CHIS) ask respondents who report more than one race whether there is a category with which they most identify, providing an opportunity to categorize individuals in a way that most closely matches their preferred self-identification. Those responses then can be used to inform the assigning of multiracial individuals to single-race categories in a manner more informative than arbitrary prioritization schemes (Holup et al., 2007). However, while many multiracial individuals identify with one race (Mays et al., 2003), some multiracial individuals may hesitate to choose one racial identity over another. Asking such a question also requires the collection and coding of data on an additional variable, which may be burdensome for some data systems. The subcommittee concludes that retaining specific combinations or codes for more common combinations in data systems allows for more thorough analysis and reporting. Different ways of aggregating multiracial categories may be appropriate for different purposes; therefore, the subcommittee does not endorse any single analytic approach but concludes that, whenever possible, each race an individual selects on a collection form be available for analysis.

NEED FOR LOCALLY RELEVANT GRANULAR ETHNICITY CATEGORIES

As noted earlier, the OMB categories, when used alone, can mask important within-group variations in quality of care (Blendon et al., 2007; Jerant et al., 2008; Read et al., 2005; Shah and Carrasquillo, 2006). While the OMB standards include only two ethnicity categories (Hispanic and not Hispanic), many other ethnicities exist. Assessing and reducing disparities within the broad race and Hispanic ethnicity categories requires ethnicity data at a greater level of detail than is mandated by the OMB standards.

The subcommittee evaluated the necessary level of ethnicity detail beyond Hispanic ethnicity and considered whether it should include national origin, place of birth, and ancestry. The Supreme Court has interpreted national origin to refer to “the country where a person was born, or, more broadly, the country from which his or her ancestors came.”14 Thus, a person may identify with a national origin if he or she shares physical, cultural, or linguistic characteristics with the group. This terminology, however, may indicate only country of birth to

14

 Espinoza v. Farah Mfg. Co., 414 U.S. 86, 88 (1973).

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

some respondents. Therefore, the subcommittee determines that ancestry, which the Census Bureau defines as “a person’s ethnic origin or descent, ‘roots,’ or heritage, or the place of birth of the person or the person’s parents or ancestors before their arrival in the United States,” is the ethnicity concept most encompassing of the detail necessary in health care settings (U.S. Census Bureau, 2008). To distinguish the definition of ethnicity adopted by OMB (i.e., Hispanic ethnicity) from this more encompassing definition, the subcommittee refers to the latter concept as granular ethnicity.

Importance of Flexibility in Choosing Locally Relevant Categories

The subcommittee considered whether to recommend the OMB race and Hispanic ethnicity categories plus a uniform set of 10 to 15 additional ethnicity categories (i.e., an “OMB Plus” set similar to the categories used in national surveys outlined in Table 3-2). Demographic distributions confirm, however, that a uniform set beyond the OMB categories would include groups not relevant to all communities. The subcommittee concludes that, to allow for better understanding and serving of local populations, the categories collected and analyzed need to accurately reflect the population served. Thus, a fixed “OMB Plus” set of categories would be less desirable than local selection of ethnicity categories in addition to the OMB categories.

Ethnicity data must be specific and appropriate to the communities in which health care providers operate (Bilheimer and Sisk, 2008). Clustering of racial and ethnic groups in specific communities, such as a relatively large population of White persons of French descent in Maine or a large population of White persons of Armenian descent in Southern California, requires the use of locally relevant granular ethnicity categories. Figure 3-2 shows the county-level distribution of the country’s Asian population, revealing that there are higher concentrations of Asians in broad geographic regions (e.g., the West Coast and Northeast Corridor), as well as clustered within specific counties or metropolitan areas (e.g., Collin County, Texas; Atlanta, Georgia). In areas with larger and more diverse Asian populations, discrete categorizations are more useful than a single broad category for data collection. Even in the state of Minnesota, which has a reasonably average concentration of Asians (3.5 percent), the broad OMB Asian category masks the fact that a large portion of Asians in the state are Hmong, an important consideration for locally tailored health care interventions. Similarly, a health care provider may care for a large number of persons who belong to an ethnic group whose significant presence is masked even by county-level data in the aggregate OMB categories.

Ethnicity Categories on Data Collection Instruments

Health care entities must determine an approach to collecting granular ethnicity data that allows all individuals, if they desire, to self-identify and at the same time is feasible, given that the population of their service area may include hundreds of granular ethnicities. Individual self-identification enables entities to learn about the composition of their service population so they can decide which ethnicity categories will yield the most responses on data collection instruments, and can be used in analyses to generate information on where to target interventions. Additionally, such individualized data collection has the potential benefit of preserving small subgroup identities that might be of interest for analytic studies (assuming preservation of the specific identifiers during data transfer) at the state, health plan, or national level but that might prove too small to reveal any group-specific quality issues at the local level (e.g., higher cancer mortality among persons of Samoan descent). Of course, such aggregation presumes standardization of categories across entities.

Presenting respondents with a list of hundreds of categories (see Appendix E) poses logistical challenges. Models exist for the collection of data on highly diverse populations; Kaiser Permanente, for example, collects data using approximately 260 categories of granular ethnicity through a separate question in addition to collecting the OMB minimum categories (see Appendix G). Similarly, Contra Costa Health Plan uses 133 ethnicity categories (see Appendix H). Both of these entities have resolved having lengthy lists through software applications that recognize keystrokes to present the most pertinent categories on screen. The Contra Costa software first identifies the 15 most frequently encountered ethnicities. Both of these organizations ask about granular ethnicity after asking a single question to solicit the OMB race and Hispanic ethnicity categories.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
FIGURE 3-2 Geographic distribution of the Asian population.

FIGURE 3-2 Geographic distribution of the Asian population.

SOURCE: Barnes and Bennett, 2002.

Respondents may find the task of self-identification from a lengthy list daunting or unreasonable when faced with a paper-based form. Likewise, it would not be feasible for staff to read through such lengthy lists when collecting the data by phone, for example, during preregistration for hospitalization. Instead, some health care entities ask patients to provide a response to an open-ended question and present no preselected response options, while others provide patients and staff with a short list of categories, often accompanied by an “Other, please specify:__” option. This latter response option is also open-ended, meaning individuals or staff can write in a self-identification if it is not included on the local list of response categories. Similarly, state or national surveys could have a limited list of categories, but also present the open-ended response option.

There are advantages and disadvantages to both open-ended and closed-ended question formats. For example, questions that list examples or check-off boxes may bias respondents to the given response options (Chesnut et al., 2007). Census research has indicated higher response rates for the ethnicities listed as examples, indicating that this question format may skew responses (Cresce et al., 2004; del Pinal et al., 2007). Traditionally, closed-ended questions have been used to elicit race and Hispanic ethnicity data. But open-ended questions may have advantages for some entities collecting granular ethnicity data, including that this format reduces the amount of space needed on paper data collection forms or electronic screens. However, collecting open-format data for hundreds of thousands of enrollees or respondents on a survey can make it difficult to use the data unless resources are devoted to coding those responses according to standardized categories. One of the difficulties with open-ended questions is that respondents may leave the item blank. Census studies have indicated that this may be the result

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

of perceived redundancy when the open-ended ancestry question follows questions on race and Hispanic ethnicity (del Pinal, 2004; Martin et al., 1990). Open-ended questions often provide examples so respondents know what type of response is desired; for example, the Medi-Cal instruction sheet includes a list of nine examples of ethnicity (e.g., Hispanic, Cambodian, Asian Indian).

The subcommittee finds no positive evidence from a health care quality improvement standpoint to support conclusions about requiring multiple responses to a question about granular ethnicity (i.e., “Select one or more”) for each individual. Additionally, the subcommittee acknowledges the potential HIT challenges of having multiple granular ethnicity responses. It is feasible and indeed required by OMB that entities collecting race and Hispanic ethnicity data according to the OMB standards allow individuals to “Select one or more,” and these few categories can yield 64 combinations. However, the number of possible combinations from a list of several hundred granular ethnicities may increase the analytic burden, and multiple ethnicity combinations will result in small cell sizes and thus may not be useful for identifying patterns of care in all circumstances. Furthermore, response variation, which occurs when individuals intentionally or inadvertently make inconsistent choices over time (Snipp, 1989), increases when individuals have a greater number of choices with which to self-identify (Snipp, 2003). Kaiser Permanente’s initiative to capture race, Hispanic ethnicity, and granular ethnicity does not currently allow multiple granular ethnicity responses because of collection and analytic considerations. However, there may be some communities where combinations of ethnicities may regularly occur, and health entities would find these combinations useful to collect.

Definition of a Standard National Set with Local Choices

To ensure standardized collection of race and ethnicity data, locally relevant choices of response categories should be selected from a national standard set, with appropriate coding to facilitate sharing of the data. The national standard set of categories needs to be comprehensive enough to capture changing demographic trends, geographically isolated subgroups, and groups relevant to the provision of culturally and linguistically appropriate care. While several organizations provide lists of granular ethnicities (see Table 3-5), none of these include all of the granular ethnicity categories required for a national set. Merging these sets, as is done in Appendix E, provides a starting point from which a national standard set could be developed. These sets are further explored in this section to identify the strengths and weaknesses of each.

The Centers for Disease Control and Prevention (CDC)/Health Level 7 (HL7) Race and Ethnicity Code Set 1.0 was developed to clarify the relationship of granular ethnicities to the broad OMB categories and to facilitate data exchange and analysis. In formulating this set, CDC worked with HL7 and X12, the leading standards-setting organizations for data interactions and for administrative transactions, respectively. The CDC/HL7 Code Set, which was introduced in 2000, incorporates ethnicity categories derived from write-in responses to the Census questions on race and Hispanic ethnicity, not responses to the Census ancestry question. Each ethnicity is assigned a permanent five-digit unique numerical code as well as a hierarchical code to associate with race or Hispanic ethnicity.

The CDC/HL7 Code Set, which has been under the jurisdiction of the National Center for Public Health Informatics, will be updated based on Census 2010 write-ins on the race and Hispanic ethnicity questions.15 The addition of categories beyond those currently specified on the Census form (see Figure 3-1), however, requires respondents to give free-text responses on lines provided under Hispanic or Latino, Asian, American Indian or Alaska Native, and “Some other race.” Thus, for example, the granular ethnicities of African immigrants who simply check “Black or African American” may not be represented in the CDC/HL7 Code Set. The current ethnicity list, for instance, notably does not include groups such as Somalis, Russians, Cape Verdeans, or Brazilians.

The U.S. Census Bureau, in addition to cataloging write-in responses to questions on race and Hispanic ethnicity, asks a separate ancestry question for which respondents are asked to write in their ancestry or ethnic origin; thus, a person might identify with an individual country (e.g., French), a region within a country (e.g., Corsican

15

Personal communication, S. Ganesan, Centers for Disease Control and Prevention, June 3, 2009.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

TABLE 3-5 Comparison of Granular Ethnicity Categorization and Coding Systems

Category and Code Set

Total Number of Categories

Estimated Breakdown of Categories by OMB Race and Hispanic Ethnicity Category

CDC/HL7 Race and Ethnicity Code Set 1.0 (2000)

Over 925 categories

Over 800 American Indian or Alaska Native categories

21 White categories

19 Black or African American categories

24 Asian categories/codes categories

23 NHOPI categories

38 Hispanic or Latino categories

Census Ancestry Codes

993 categories

212 broad “ancestry descriptions”

Approximately 780 more detailed response categories

Massachusetts Superset

173 categories

31 major ethnicities categories

140 sub-ethnicities categories

Kaiser Permanente Granular Ethnicity (2009)

268 categories

59 American Indian or Alaska Native categories

206 additional ethnicities

Wisconsin Cancer Reporting System Code Manual (2008)

648 categories

371 American Indian or Alaska Native categories

129 White categories

37 Black or African American categories

41 NHOPI categories

14 Other Race categories

Contra Costa Health Plan Race and Ethnicity

143 categories

130 categories from the CDC/HL7 Code Set

9 additional ethnicity categories: American, Bosnian, Brazilian, Kurdish, Mixtec, Portuguese, Punjabi, Russian, and Yao, Mien

NOTE: The estimated categories in the third column may not equal the total number of categories in the middle column due to additional response and coding options such as Unknown, Declined, and Unavailable.

SOURCES: CDC, 2000; Kaiser Permanente, 2009; Taylor-Clark, 2009; Tiutin, 2009; U.S. Census Bureau, 2005; Wisconsin Cancer Reporting System, 2008.

or Breton), or a broader category (e.g., European).16 The Census maintains lists of write-in responses with corresponding three-digit numerical codes for its questions on race, Hispanic origin, and ancestry. The codes for each of these lists differ, although the lists overlap with many of the same categories. For example, 101 is the code for White on the Census Race Code List, the code for “Not Spanish/Hispanic” in the Hispanic or Latino Origin Code List, and the code for Azerbaijani in the Census Ancestry Code List (U.S. Census Bureau, 2002a). Korean is coded as 620 on the Census Race Code List and 750 on the Census Ancestry Code List.

The Massachusetts Division of Health Care Finance and Policy and the Massachusetts Quality and Cost Council mandated that the state’s acute care hospitals and health plans, respectively, report uniform race and ethnicity data (Weinick et al., 2007). These requirements spurred development of an ethnicity categorization and coding list by the Brookings Institution. Entities responsible for the list’s development considered recommending the CDC/HL7 Code Set but found it did not accurately capture all relevant population groups.17 The category and coding list developed by the Brookings Institution includes 31 ethnicity categories and additional “sub-ethnicities” that are not required for reporting but that an organization can collect, if useful. Acute care hospitals and health

16

The separate ancestry question was included only on the Census “long form.” This form was sent to one in six households. The American Community Survey (ACS), an annual survey sent to a sample of households, has replaced the Census “long form” and includes a question about ancestry.

17

Personal communication, K. Taylor-Clark, The Brookings Institution, January 15, 2009.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

plans are required to report (i.e., have the fields and categories available in their HIT systems) the basic OMB race categories along with the 31 ethnicity categories (Massachusetts Executive Office of Health and Human Services, 2009a, 2009b). When an organization collects any of the “sub-ethnicity” categories, it is required to roll that category up to one of the 31 broader ethnicity categories for reporting. The Massachusetts Superset, which is intended to serve as a guide for health plans and hospitals when they collect granular ethnicity beyond the 31 required categories, includes most of the CDC/HL7 categories and 87 additional categories representing African nations (e.g., Sudanese, Somali), synonyms for existing CDC categories (e.g., La Raza, Chicano), Middle Eastern nations (e.g., Saudi Arabian, Jordanian), and other ethnicities (e.g., Cape Verdean, Brazilian, Guyanese) (Taylor-Clark et al., 2009).

Similarly, Contra Costa Health Plan and the Wisconsin Cancer Reporting System (WCRS) developed their own categorization and coding schemes (Tiutin, 2009; Wisconsin Cancer Reporting System, 2008). Contra Costa’s code set is based on the CDC/HL7 Code Set, but includes nine additional granular ethnicities, including American and Russian, which are two of Contra Costa’s top 15 response categories, but are not included in the CDC/HL7 Code Set (see Appendix H).

In 2004, Kaiser Permanente began collecting member race and ethnicity data using the OMB categories and a limited number of detailed ethnicity groups. After implementation, Kaiser determined a need for more granular ethnicity categories to allow for better self-identification and analyses of health care data. As a result, Kaiser developed a list of granular ethnicities that could be used for self-reporting separately from the OMB race and Hispanic ethnicity categories. The code set includes 268 categories, and continual review is planned to ensure alignment with immigration trends and relevance to health care (Kaiser Permanente, 2009). Appendix G provides more detail on Kaiser Permanente’s collection of data on race, ethnicity, and language need.

“Unavailable,” “declined,” and “unknown” codes, variations of which are included in the HRET Toolkit’s suggested format, the Massachusetts Superset, the Contra Costa Health Plan code list, and the Kaiser Permanente code list, are frequently used in survey analysis. These codes are not presented as response options, but are recorded by registration/eligibility clerks or surveyors, for example, so that data systems can track the number of persons for whom the organization has attempted to collect race and ethnicity data. The subcommittee suggests that such categories be provided for individuals who have not responded (unavailable), refuse to answer (declined), or do not know (unknown). The “unavailable” category allows data collectors to see that the respondent has not yet provided the information, so the information should be solicited at a future point of contact with that individual. In contrast, the “declined” category indicates the individual should not be asked again. In some instances, the “unknown” category provides a response option if the respondent is adopted, for example, and does not know his/her race and ethnicity (Taylor-Clark, 2009).

SELECTION OF LOCAL GRANULAR ETHNICITY CATEGORIES

The list of granular ethnicities in Appendix E provides a baseline template for a national standard set of granular ethnicity categories. An entity can decide, based on local circumstances, whether to use 10 or 100 categories from the national standard list for collection and/or analysis. If the entity sees an increase in the use of the “Other, please specify:__” option, it should consider adding categories to its local list. If an organization chooses not to have a preset list of categories, it will need to compile responses according to the national standard list to ensure comparability with data collected by other entities.

Determining which locally relevant categories to include may initially require subjective judgments about subgroups believed to be present in large numbers. However, some organizations may not realize the diversity of their service population and thus may not understand the need to collect the OMB categories and granular ethnicity data (see Box 3-4). Therefore, specific, locally relevant categories can be determined using population estimates from geographic-based Census data, school enrollment data that identify newer and growing populations in service areas, indirect estimation techniques, or surveying. However, even constructing a survey may require some knowledge of persons in the service area; Anthem Blue Cross, for example, solicited through a mailed survey the race and ethnicity of its California members, but focused on the six OMB race and Hispanic ethnicity categories

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

BOX 3-4

Realizing the Necessity of Collecting Data: The University of Mississippi Medical Center

When informed they were to begin collecting race, ethnicity, and language data from patients, employees at University of Mississippi Medical Center (UMMC) almost uniformly indicated that patients would believe this information would be used to segregate services and would create racial tensions. In fact, the director in charge of implementing the data collection was convinced that UMMC and the organizations funding and administering the data collection initiative (The Robert Wood Johnson Foundation and The George Washington University through an Expecting Success project) were “taking gasoline and pouring it on a blazing fire.”

The registration department initially thought registration staff were already asking for the patient’s race. The director discussed this with staff and found out they were not asking the patients but were looking at patients to determine their race. Staff informed management that patients might be offended or become indignant when asked for the information. Observer report was indicating approximately 180 Hispanic patients per year registered at UMMC. So what was the point of collecting additional race and ethnicity data for a reasonably homogenous patient population?

With funding and support from Expecting Success, UMMC implemented a staff training program to ensure patients would be asked directly their race, ethnicity, and language need. Within months of implementation, UMMC learned it was registering approximately 600 Hispanic individuals per month (approximately 1.5 percent of the 40,000 individuals registered per month) and the patient population was found to be less homogenous than initially believed. Approximately 500 patients per month were from subgroups the medical center did not even realize existed in their service area (e.g., Japanese and Russian). UMMC found that between 3 and 4 percent of the population preferred to talk to a physician in a language other than English. UMMC now has three full-time Spanish interpreters (where they previously had none) and switched vendors to ensure their interpreter phone system could handle the types and numbers of interpreter services required. In-house physicians and researchers have begun to utilize the race, ethnicity, and language data to stratify quality measures.


SOURCE: Personal communication with Richard Pride, UMMC, June 3, 2009.

and 61 additional ethnicity categories considered most pertinent to its enrollees.18 As all granular ethnicity lists should also include an “Other, please specify:__” option, the write-in responses may help organizations evaluate and expand as necessary the granular ethnicity response options provided. If an organization is receiving numerous write-in responses of “Russian,” for example, it may consider adding a Russian response option.

A variety of entities participate in the health care system, and while each has roles to play in capturing race and ethnicity data, not all currently collect these data and those that do so may not use uniform methods or categories. There are other entities that collect and report detailed data in ways that comply with the OMB standards and produce data useful to local and national quality improvement efforts. The subcommittee’s task is to provide standardized categories “for entities wishing to assess and report on quality of care.” The subcommittee aims to accomplish this by imposing the least possible data collection burden and without hindering the progress and processes of entities already collecting detailed data.

The subcommittee focuses its recommendations on care delivery sites and public and private insurers, as these health care entities are involved in measuring and improving quality, as well as on data collection activities that provide information about equity in care, care outcomes, quality of care, or utilization of care (e.g., health surveys

18

Personal communication, G. H. Ting, Wellpoint, Inc., February 19, 2009.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

asking about health care). Some public health activities involve delivery of care, but others do not. Because vital statistics and other public health surveillance systems are organized and supported for purposes beyond health care quality improvement, these collection activities may require different considerations. All entities related to health and health care, though, are encouraged to collect race, Hispanic ethnicity, and granular ethnicity data in accordance with the subcommittee’s recommendations.

The subcommittee considered a stepwise approach to collecting race and ethnicity data, where entities would first emphasize collecting the data according to the OMB standards and then gradually implement granular ethnicity data collection over time. However, as discussed in Chapter 2, granular ethnicity data are useful for improving health care quality in many settings, and thus the collection of these data should not be considered a secondary aim in those settings. While the subcommittee recognizes that full implementation of its recommendations may require HIT and process changes for some entities (see Chapter 5), race, Hispanic ethnicity, and granular ethnicity data are all necessary to effectively and efficiently target health care quality improvement to groups that are at risk of suboptimal care.

Recommendation 3-1: An entity collecting data from individuals for purposes related to health and health care should:

  • Collect data on granular ethnicity using categories that are applicable to the populations it serves or studies. Categories should be selected from a national standard list (see Recommendation 6-1a) on the basis of health and health care quality issues, evidence or likelihood of disparities, or size of subgroups within the population. The selection of categories should also be informed by analysis of relevant data (e.g., Census data) on the service or study population. In addition, an open-ended option of “Other, please specify:__” should be provided for persons whose granular ethnicity is not listed as a response option.

  • Elicit categorical responses consistent with the current OMB standard race and Hispanic ethnicity categories, with the addition of a response option of “Some other race” for persons who do not identify with the OMB race categories.

Consistent Rollup of Granular Ethnicity to OMB Categories

While systems for rolling granular ethnicity categories up to broader categories have been developed by CDC/HL7 and the Commonwealth of Massachusetts, among others, an agreed-upon rollup strategy for granular ethnicities has not been determined or reviewed for its applicability nationwide and across the health care system. For example, the Massachusetts Superset aggregates its set of granular ethnicities to 31 mid-level aggregations whereas the CDC/HL7 Code Set aggregates its ethnicity categories to only the OMB race and Hispanic ethnicity categories. A process for rolling granular ethnicity categories up to the OMB categories is key to achieving two potentially contradictory objectives: on the one hand, consistency and standardization in analysis and reporting, and on the other hand, data collection tailored to local circumstances. Rollup procedures will need to be employed only when a person does not check off an OMB race or Hispanic ethnicity and only provides a granular ethnicity response or when only granular ethnicities are collected; however, the subcommittee prefers separate collection of granular ethnicity from OMB race and Hispanic ethnicity. The subcommittee chose not to define mid-level aggregations between granular ethnicity and the OMB categories.

Rollup Issues

The CDC/HL7 Code Set was designed in a hierarchical fashion such that each ethnicity category corresponds to one of the OMB race or Hispanic ethnicity categories (see Figure 3-3). This rollup scheme can be used when reporting is required to conform to the OMB categories or when an analyst needs a consistent set of minimum categories to make comparisons across systems reporting race and ethnicity at different levels of detail. For the vast majority of individuals, mapping from ethnicity to race categories is not problematic. As discussed in Chapter 1, however, ethnicity and race are two different concepts. Individuals who self-identify as Brazilian may also identify

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
FIGURE 3-3 CDC ethnicities rolled up to the OMB minimum categories for race and Hispanic ethnicity with subcommittee annotations.

FIGURE 3-3 CDC ethnicities rolled up to the OMB minimum categories for race and Hispanic ethnicity with subcommittee annotations.

as White, Black, or some combination of races, or may see themselves as falling into no category beyond Brazilian. As a result, a rollup scheme that assumes all respondents who self-identify as Brazilian are White could wrongly assign a race to a number of individuals.

Figure 3-3 highlights some problems with current CDC rollup procedures. For example, Brazilians may not be considered Hispanic because they speak Portuguese rather than Spanish. Additionally, several national origins correspond to two or more major racial populations. For instance, the population of Madagascar is of mixed African, Malayo-Indonesian, and Arab ancestry. This means that rolling up Madagascan to Asian, as recommended by

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

the CDC rollup scheme, would misclassify Africans of Madagascan descent as Asian. Rollup schemes are further complicated by misclassifications introduced by the use of geographic boundaries. While the CDC rollup scheme considers Afghanistan to be Middle Eastern and consequently categorizes Afghanis as White, the Census ancestry list classifies Afghanistan as an Asian country. Additionally, the WCRS coding manual notes that descriptions of religious affiliation should be “used with caution” when determining corresponding races.19

The above discussion highlights some of the difficulties inherent in rolling up some ethnicities because (1) ethnicities can include two or more major racial populations, (2) the geographic boundaries used to distinguish major groups in different classification schemes are arbitrary, and (3) many individuals may not associate with a specific race for cultural or other reasons. Thus, an individual’s race cannot always be presumed based on his or her ethnicity. For this reason, the rollup assignment of a self-reported ethnicity to an OMB category should not be placed in an individual’s health record or supersede a person’s direct self-report. Analysts should understand that making an assignment using a 90 percent (or any other percent) threshold or an assignment based solely on geography incurs a higher probability that the rollup assignment misclassifies individuals based upon how they would self-identify their race. The rates of misclassification, even for granular ethnicities meeting a 90 percent threshold, underscores the fact that rollup schemes only provide probabilistic assignments useful for analysis at the group or population level.

Granular Ethnicities with an Indeterminate Race or Hispanic Ethnicity Classification

Various methods are used to distinguish ethnic groups that cannot be rolled up to a specific race category. For example, in Census 2010, the Census Bureau will use OMB’s geographic definitions when it reclassifies ethnic responses in the race question to an OMB race category (e.g., all entries reflecting a sub-Saharan African nation will be counted as “Black”). In Census 2000, the Census Bureau applied a 90 percent rule to reclassify write-in responses on the race question according to the OMB race categories (del Pinal et al., 2007).20 Single-ancestry responses were cross-tabulated by race responses, and if 90 percent or more of respondents in a specific ancestry group selected a particular race, that race was assigned to respondents who gave that ethnic response in the race question.

To determine whether groups included on the CDC, Census, Massachusetts, and WCRS category lists can be rolled up to a specific OMB race category with some degree of certainty, the subcommittee evaluated 2000 Public Use Microdata Samples (PUMS) data and used the methodology of the Census Bureau’s 90 percent rule. The subcommittee cross-tabulated write-in responses on ancestry with the “alone or in combination with one or more other races” variable for each OMB race group. If fewer than 90 percent of respondents of a specific ancestry group selected an OMB race either alone or in combination with another race, the ancestry group was identified as being problematic for rolling up. The subcommittee did not have sufficient data on some granular ethnicity groups to apply the 90 percent rule to each ancestry subgroup (see Appendix F). The subcommittee finds some granular ethnicities could not be rolled up to an OMB race category with greater than 90 percent certainty. The difficult-to-categorize granular ethnicity groups are included in Appendix F.

The subcommittee suggests that those ethnicities that do not meet the 90 percent threshold be classified as “no determinate OMB race classification.” This classification differs from the “Some other race” category because “Some other race” is a response option used by individuals who do not identify with a specific OMB race category. The “no determinate OMB race classification” would be used to identify entire ethnic groups that cannot be assumed to comprise one specific racial group. None of the granular ethnicities associated with the

19

The Census list of categories does not include religiously affiliated ancestries (e.g., Ashkenazi Jewish) because of the Bureau’s constitutionally rooted decision not to identify or count religious populations. For health care purposes, religion may be coded as a separate variable from race and ethnicity. For example, the HL7 EHR System Functional Model states that systems shall provide the ability to capture, present, maintain, and make available for clinical decisions patient preferences such as language, religion, spiritual practices, and culture (Fischetti et al., 2007).

20

Write-in responses to the questions on race and Hispanic ethnicity were allocated to an OMB race or Hispanic ethnicity category using the 90 Percent Rule only in the Census’ Modified Race-Age-Sex (MARS) file. The MARS file is used by other agencies seeking denominators consistent with numerators collected in systems in which “Some other race” is not an option. Otherwise, write-in responses to “Some other race” are reported as they were received in all data released and published by the Bureau.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Hispanic ethnicity category can be assigned to an OMB race category with greater than 90 percent certainty. Granular ethnicities that cannot easily be rolled up to the OMB Hispanic ethnicity category include individuals identifying a granular ethnicity associated with the non-Spanish-speaking territories in South America (Guyana, Suriname, Brazil, and Belize); additionally, these granular ethnicities should be considered “no determinate OMB race classification” because they do not meet the 90 percent rule. Appendix F highlights some additional difficultto-categorize granular ethnicity groups, including persons of Moroccan, Brazilian, Cape Verdean, Dominican, Guyanese, and South African descent.

Rollup Schemes

For interventions aimed at quality improvement and reduction of disparities at the local level, mapping granular ethnicities to the OMB race categories may be unnecessary. Locally tailored quality improvement activities may target subgroups without needing to relate those subgroups to a single OMB race category. Collecting race, Hispanic ethnicity, and granular ethnicity data separately allows reporting of the OMB categories when necessary without requiring rollup of the granular ethnicities, provided that individuals respond to all the questions asked.

Nonetheless, the subcommittee recognizes that some circumstances will require the use of a rollup scheme to link granular ethnicities to broader categories to allow comparison or data aggregation. The Massachusetts Superset was developed to guide health plans toward a uniform set of ethnicities; this set avoids rolling up granular ethnicities to races and instead aggregates granular ethnicities into broader groups of ethnicities. Such an ethnicity rollup scheme is useful when the sample of a granular ethnicity group is too small for analysis and needs to be aggregated with others.

The subcommittee merged several ethnicity lists into a template of granular ethnicity categories. These categories are mapped to the OMB race and Hispanic ethnicity categories (see Appendix E). National agreement needs to be reached on a rollup scheme, recognizing that all ethnicities do not necessarily map to an OMB race category, so that some respondents will have “no determinate OMB classification.” The locus of responsibility for the development of a national standard set of ethnicity categories and a national rollup scheme is addressed in Chapter 6.

Recommendation 3-2: Any entity collecting data from individuals for purposes related to health and health care should collect granular ethnicity data in addition to data in the OMB race and Hispanic ethnicity categories and should select the granular ethnicity categories to be used from a national standard set. When respondents do not self-identify as one of the OMB race categories or do not respond to the Hispanic ethnicity question, a national scheme should be used to roll up the granular ethnicity categories to the applicable broad OMB race and Hispanic ethnicity categories to the extent feasible.

ELICITING RESPONSES ON RACE, HISPANIC ETHNICITY, AND GRANULAR ETHNICITY

The ways in which entities inquire about an individual’s race and ethnicity vary based on the setting in which the questions are asked. For example, paper survey forms use minimal words in questions and category descriptions to solicit race and ethnicity information from respondents. In contrast, surveys administered via an in-person interview can solicit more detailed information and explain the types of responses desired. Table 3-6 highlights ways in which race and ethnicity data are captured and illustrates how the questions may be tailored to specific contexts in health care.

Eliciting accurate and reliable race, Hispanic ethnicity, and granular ethnicity data depends on the ways in which the questions are asked, the instructions provided to respondents (e.g., “Select one or more”), and the format of the questions (i.e., one-question versus two-question format). As previously noted, this latter concern is especially relevant to accurately classifying individuals who self-identify as Hispanic. Ensuring that as many respondents as possible answer questions regarding their race and ethnicity will improve data quality. Pilot projects and further study can help determine the best ways to elicit accurate data that are useful for health care quality improvement and will guide current and future data collection systems.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Recommendation 3-3: To determine the utility for health and health care purposes, HHS should pursue studies on different ways of framing the questions and related response categories for collecting race and ethnicity data at the level of the OMB categories, focusing on completeness and accuracy of response among all groups.

  • Issues addressed should include use of the one- or two-question format for race and Hispanic ethnicity, whether all individuals understand and identify with the OMB race and Hispanic ethnicity categories, and the increasing size of populations identifying with “Some other race.”

  • The results of such studies, together with parallel studies by the Census Bureau and other agencies, may reveal the need for an OMB review across all agencies to determine the best format for improving response among all groups.

MODELS FOR DATA COLLECTION

Figure 3-4 shows models for the collection of data on race, Hispanic ethnicity, and granular ethnicity, taking into account that the capacity of information systems may limit the number of questions that can be asked. This report emphasizes the importance of collecting granular ethnicity data in addition to the OMB race and Hispanic ethnicity questions. Using the approach preferred by OMB of asking two separate questions about Hispanic ethnicity and race and then asking additionally about granular ethnicity requires collecting three separate variables, regardless of whether through paper-based or electronic collection modes (Model A). For organizations constrained to two data fields, one collection field would be used to collect responses to the OMB combined race and Hispanic ethnicity question, followed by a second collection field for granular ethnicity data (Model B).

A distinction needs to be made between limits on collection and storage of coded response information in HIT systems; some organizations are limited in storage capacity by their legacy HIT systems, but could recode responses from multiple inputs to occupy fewer fields in HIT systems. For example, if an individual self-identified as non-Hispanic, White, and Russian on a paper form, the organization could store this information using one code in its HIT system. Doing so would, of course, introduce a very large number of possible combinations for which the organization would need to have codes.21 Ultimately, to achieve compatibility across data systems, it may be necessary for organizations to upgrade their data collection and HIT systems to ensure the ability to collect, report, and use data as recommended in this report.

SUMMARY

This chapter has explained the subcommittee’s rationale for recommending continued use of the OMB race and Hispanic ethnicity categories, supplemented by locally relevant granular ethnicity categories. The health and health care needs of all racial and ethnic groups can be best addressed through comprehensive strategies that recognize the importance of documenting and addressing variations among and within the locally relevant groups, and that further provide procedures for aggregating data to provide regional or national profiles.

To collect OMB race and ethnicity data, entities should use either the one-question or two-question format, depending on their system’s field capacity. In accordance with OMB guidance, when the two-question format is used, the Hispanic ethnicity question should be first, and a “Select one or more” instruction should be included; OMB has indicated a preference for the two-question format. The recording of specific multiracial combinations (e.g., American Indian or Alaska Native and Black) is preferred by the subcommittee over assigning a single “multiracial” category to all persons of mixed race. A “Some other race” response category should be included for questions on race for respondents who do not identify with any of the OMB race categories. The minimum OMB categories to be collected are, then:

21

All possible combinations of just the six OMB categories results in 64 combinations. Introducing granular ethnicities would drastically increase the possible combinations.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

TABLE 3-6 Examples of Instructions, Phrasing, and Terminology to Capture Race and Ethnicity Data

Source of Questions

Hispanic Ethnicity Question

Race Question

Granular Ethnicity Question

OMB’s preferred format

Separate questions shall be used wherever feasible. Ethnicity shall be collected first

Respondents shall be offered the option of selecting one or more racial designations. Recommended forms for the instruction are “Mark one or more” and “Select one or more”

 

Response options: Hispanic or Latino, Not Hispanic or Latino

 

 

Response options: American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander (NHOPI), White

 

Census 2000 long form (paper form)

Is Person 1 of Hispanic, Latino, or Spanish origin?

What is Person 1’s race? Mark one or more boxes

What is this person’s ancestry or ethnic origin?

Response options: Not of Hispanic, Latino, or Spanish origin; Mexican, Mexican Am., Chicano; Puerto Rican; Cuban; Another Hispanic, Latino, or Spanish origin

Response options: Five OMB race options plus six additional Asian origins, three additional NHOPI origins, and an option for “Some other race”

Response option: write-in response

HRET Toolkit (in-person interview)

Do you consider yourself Hispanic/Latino?

Which category best describes your race?

“I would like you to describe your race or ethnic background. You can use specific terms such as Korean, Mexican, Haitian, Somali.”

Response options: Yes; No; Declined; Unavailable/Unknown

Response options: American Indian or Alaska Native; Asian; Black or African American; NHOPI; White; Multiracial; Declined; Unavailable/Unknown

Response option: free-text response

National Health Interview Survey (NHIS) (in-person interview)

Do you consider yourself to be Hispanic or Latino?

What race or races do you consider yourself to be? Please select one or more of these categories

 

Response options: Yes; No; Refused; Don’t know

 

 

Response options: White; Black/African American; Indian (American); Alaska Native; Guamanian; Samoan; Other Pacific Islander; Asian Indian; Chinese; Filipino; Japanese; Korean; Vietnamese; Other Asian; Some other race; Refused; Don’t know

 

Please give me the number of the group that represents your Hispanic origin or ancestry. You may choose up to five, if applicable

 

Response options: Puerto Rico; Cuban/Cuban American; Dominican (Republic); Mexican; Mexican American; Central or South American; Other Latin American; Other Hispanic/Latino/Spanish; Refused; Don’t know

(If more than one race entered, which of these groups would you say best represents your race?)

 

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Source of Questions

Hispanic Ethnicity Question

Race Question

Granular Ethnicity Question

National Ambulatory Medical Care Survey (paper form)

Ethnicity

Race, mark one or more

 

Response options: Hispanic or Latino; Not Hispanic or Latino

Response options: White; Black/African American; Asian; NHOPI; American Indian or Alaska Native

 

Application for a Social Security Card (paper form)

 

Race/ethnic description (check one only)

 

 

Response options: Asian, Asian-American or Pacific Islander; Hispanic; Black (Not Hispanic); North American Indian or Alaskan Native; White (Not Hispanic)

 

U.S. Standard Certificate of Death (paper form)

Decedent of Hispanic origin?

Decedent’s race (check one or more boxes to indicate what the decedent considered himself or herself to be)

 

Response options: No, not Spanish/Hispanic/Latino; Mexican, Mexican American, Chicano; Puerto Rican; Cuban; Other Spanish/Hispanic/Latino (specify)

 

Response options: Five OMB race options plus six additional Asian origins, three additional NHOPI origins, and other (specify)

 

  • Hispanic or Latino (in the two-question format, this is a separate question, having the choice of Hispanic or Latino and Not Hispanic or Latino)

  • Black or African American

  • White

  • Asian

  • American Indian or Alaska Native

  • Native Hawaiian or Other Pacific Islander (NHOPI)

  • Some other race

The categories used for the collection of granular ethnicity should be locally relevant and selected from a national standard list. Each set of categories should include an “Other, please specify:__” option to allow individuals to self-identify if their category is not on the prespecified list. Similarly, state or national surveys might limit the number of listed categories, but should also present the “Other, please specify:__” response option. An open-ended approach with no pre-specific granular ethnicity response categories is acceptable in lieu of a specified list, but requires subsequent coding of responses according to the national standard set. The granular ethnicity question, whether presented as a closed- or open-ended question, should be separate from the question(s) involving the OMB categories.

Organizations may also want to use codes for tracking the current response status of individuals from whom they have attempted to collect race and ethnicity data, indicating unavailable (no response), declined (refused to answer), or unknown (respondent does not know) for those who fail to select a category.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
FIGURE 3-4 Models for data collection instruments to collect race, Hispanic ethnicity, and granular ethnicity data.

FIGURE 3-4 Models for data collection instruments to collect race, Hispanic ethnicity, and granular ethnicity data.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

REFERENCES

AHRQ (Agency for Healthcare Research and Quality). 2006. Databases and related tools from HCUP: Fact sheet. Rockville, MD: AHRQ.

———. 2008. National Healthcare Disparities Report. Rockville, MD: AHRQ.

Arias, E., W. Schauman, K. Eschbach, P. Sorlie, and E. Backlund. 2008. The validity of race and Hispanic origin reporting on death certificates in the United States. Hyattsville, MD: National Center for Health Statistics.

Austin, C. J., R. J. Thorpe, C. Bell, and T. A. LaVeist. 2009. Are Black Hispanics Black or Hispanic? Understanding disparities at the intersection of race and ethnicity. Paper to be presented at the American Public Health Association Annual Meeting and Expo, Philadelphia, PA, on November 10, 2009.

Bailey, B. 2001. Dominican-American ethnic/racial identities and United States social categories. International Migration Review 35(3):677-708.

Baker, D. W., K. A. Cameron, J. Feinglass, J. A. Thompson, P. Georgas, S. Foster, D. Pierce, and R. Hasnain-Wynia. 2006. A system for rapidly and accurately collecting patients’ race and ethnicity. American Journal of Public Health 96(3):532-537.

Barnes, J. S., and C. E. Bennett. 2002. The Asian population: 2000. Washington, DC: U.S. Census Bureau.

Bennett, C., M. de la Puente, D. Griffin, B. Harris-Kojetin, R. Harrison, J. Hill, J. Hilton, T. Leslie, and E. Paisano. 1997. Population Division Working Paper No. 18: Results of the 1996 Race and Ethnic Targeted Test. Washington, DC: U.S. Bureau of the Census.

Bilheimer, L. T., and J. E. Sisk. 2008. Collecting adequate data on racial and ethnic disparities in health: The challenges continue. Health Affairs 27:383-391.

Blendon, R. J., T. Buhr, E. F. Cassidy, D. J. Perez, K. A. Hunt, C. Fleischfresser, J. M. Benson, and M. J. Herrmann. 2007. Disparities in health: Perspectives of a multi-ethnic, multi-racial America. Health Affairs 26(5):1437-1447.

Bonito, A. J., C. Bann, C. Eicheldinger, and L. Carpenter. 2008. Creation of new race-ethnicity codes and socioeconomic status (SES) indicators for Medicare beneficiaries. Final report, sub-task 2. Rockville, MD: RTI International.

Bowman, K. H. 1994. What we call ourselves. The Public Perspective May/June:29-31.

Buescher, P. A., Z. Gizlice, and K. A. Jones-Vessey. 2005. Discrepancies between published data on racial classification and self-reported race: Evidence from the 2002 North Carolina live birth records. Public Health Reports 120(4):393-398.

CDC (Centers for Disease Control and Prevention). 1993. Use of race and ethnicity in public health surveillance (Report RR-10). Atlanta, GA: Centers for Disease Control and Prevention.

———. 2000. Race and ethnicity code set version 1.0. Atlanta, GA: Centers for Disease Control and Prevention.

———. 2009. National program of cancer registries facts, 2008/2009. Atlanta, GA: Centers for Disease Control and Prevention.

Chesnut, J., J. Woodward, and E. Wilson. 2007. A comparison of closed- and open-ended question formats for select housing characteristics in the 2006 American Community Survey Content Test. Washington, DC: U.S. Bureau of the Census.

CMS (Centers for Medicare and Medicaid Services). 2009. Physician quality reporting initiative. http://www.cms.hhs.gov/pqri (accessed May 22, 2009).

Cobb, N., P. A. Wingo, and B. K. Edwards. 2008. Introduction to the supplement on cancer in the American Indian and Alaska Native populations in the United States. Cancer 113(Suppl):1113-1116.

Cohen, L. L. 2008. Racial/ethnic disparities in hospice care: A systematic review. Journal of Palliative Medicine 11(5):763-768.

Cresce, A. R., A. D. Schmidley, and R. R. Ramirez. 2004. Identification of Hispanic ethnicity in Census 2000: Analysis of data quality for the question on Hispanic origin. Washington, DC: U.S. Census Bureau.

del Pinal, J. 2004. Census 2000 testing, experimentation, and evaluation, program topic report No. 9, TR-9: Race and ethnicity in Census 2000. Washington, DC: U.S. Census Bureau.

del Pinal, J., E. Martin, C. Bennett, and A. Cresce. 2007. Overview of results of new race and Hispanic origin questions in Census 2000. Washington, DC: U.S. Census Bureau.

Durch, J. S., and J. H. Madans. 2001. Methodological issues for vital rates and population estimates: The 1997 OMB standards for data on race and ethnicity. Vital Health Statistics 4(31).

Edmonston, B., S. M. Lee, and J. S. Passell. 2000 September 22-23. Recent trends in intermarriage and immigration and their effects on the future racial composition of the U.S. Population. Paper presented at Multiraciality: How Will the New Census Data be Used?, Bard College, Annandale-on-Hudson, New York.

Edwards, B. K. 2009. NCI surveillance research program: SEER, standards for collection of race/ethnicity, measuring health disparities in cancer surveillance. NCI Surveillance Research Program. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint Presentation.

Eicheldinger, C., and A. Bonito. 2008. More accurate racial and ethnic codes for Medicare administrative data. Health Care Financing Review 29(3):27-42.

Espey, D. K., C. L. Wiggins, M. A. Jim, B. A. Miller, C. J. Johnson, and T. M. Becker. 2008. Methods for improving cancer surveillance data in American Indian and Alaska Native populations. Cancer 113(5 Suppl):1120-1130.

Fischetti, L., D. Mon, J. Ritter, and D. Rowlands. 2007. HL7 EHR TC: Electronic health record system functional model, Release 1. Ann Arbor, MI: Health Level Seven.

Flores, G., and S. C. Tomany-Korman. 2008. Racial and ethnic disparities in medical and dental health, access to care, and use of services in US children. Pediatrics 121(2):e286-e298.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Ford, M. E., and P. A. Kelly. 2005. Conceptualizing and categorizing race and ethnicity in health services research. Health Services Research 40(5):1658-1675.

Fraser, I., and R. Andrews. 2009. HCUP data in the National Healthcare Quality & Disparities Reports: Current strengths and potential improvements. Agency for Healthcare Research and Quality. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 10, 2009. Washington, DC. PowerPoint Presentation.

Friedman, D. J., B. B. Cohen, A. R. Averbach, and J. M. Norton. 2000. Race/ethnicity and OMB Directive 15: Implications for state public health practice. American Journal of Public Health 90:1714-1719.

Gimenez, M. E. 1989. Latino/’Hispanic’ who needs a name: The case against a standardized terminology. International Journal of Health Services 19:557-571.

Gold, M., A. H. Dodd, and M. Neuman. 2008. Availability of data to measure disparities in leading health indicators at the state and local levels. Journal of Public Health Management Practice (Suppl):S36-S44.

Grieco, E. M., and R. C. Cassidy. 2001. Overview of race and Hispanic origin. Washington, DC: U.S. Census Bureau.

Hasnain-Wynia, R., R. Kang, M. B. Landrum, C. Vogeli, D. W. Baker, and J. S. Weissman. 2008 June 8. Disparities within and between hospitals for inpatient quality of care: Targeting resources to close the gap. Paper presented at the Academy Health 2008 Annual Research Meeting, Washington, DC.

Hayes-Bautista, D. E., and J. Chapa. 1987. Latino terminology: Conceptual bases for standardized terminology. American Journal of Public Health 77:61-68.

Hernandez-Ramdwar, C. 1997. Multiracial identities in Trinidad and Guyana: Exaltation and ambiguity. Latin American Issues 13(4). http://webpub.allegheny.edu/group/LAS/LatinAmIssues/Articles/LAI_vol_13_section_IV.html (accessed June 18, 2009).

HHS Data Council. 1999. Improving the collection and use of racial and ethnic data in health and human services. Washington, DC: HHS.

Hirschman, C., R. Alba, and R. Farley. 2000. The meaning and measurement of race in the U.S. Census: Glimpses into the future. Demography 37(3):381-393.

Holup, J. L., N. Press, W. M. Vollmer, E. L. Harris, T. M. Vogt, and C. Chen. 2007. Performance of the U.S. Office of Management and Budget’s revised race and ethnicity categories in Asian populations. International Journal of Intercultural Relations 31(5):561-573.

Humes, K. 2009. Remarks by Karen Humes, Assistant Division Chief for Special Population Statistics in the Population Division of the U.S. Census Bureau. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC.

IOM (Institute of Medicine). 2008. Challenges and successes in reducing health disparities: Workshop summary. Washington, DC: The National Academies Press.

Izquierdo, J. N., and V. J. Schoenbach. 2000. The potential and limitations of data from population-based state cancer registries. American Journal of Public Health 90(5):695-698.

Jerant, A., R. Arellanes, and P. Franks. 2008. Health status among US Hispanics: Ethnic variation, nativity, and language moderation. Medical Care 46(7):709-717.

Johnson, C. H., and M. Adamo. 2008. The SEER program coding and staging manual 2007. Bethesda, MD: National Cancer Institute.

Jones, N. A., and A. S. Smith. 2001. The two or more races population: 2000. Washington, DC: U.S. Census Bureau.

Kaiser Family Foundation. 2009. Putting women’s health care disparities on the map: Examining racial and ethnic disparities at the state level. Menlo Park, CA: The Henry J. Kaiser Family Foundation.

Kaiser Permanente. 2009. Evolution of data collection on race, ethnicity, and language. Oakland, CA: Kaiser Permanente.

Lancaster, K. J., S. O. Watts, and L. B. Dixon. 2006. Dietary intake and risk of coronary heart disease differ among ethnic subgroups of Black Americans. Journal of Nutrition 136:446-451.

Laws, M. B., and R. A. Heckscher. 2002. Racial and ethnic identification practices in public health data systems in New England. Public Health Reports 117(1):50-61.

Lin, S. S., and J. L. Kelsey. 2000. Use of race and ethnicity in epidemiologic research: Concepts, methodological issues, and suggestions for research. Epidemiologic Reviews 22(2):187-202.

Llanos, K., and L. Palmer. 2006. Using data on race and ethnicity to improve health care quality for Medicaid beneficiaries. Hamilton, NJ: Center for Health Care Strategies.

Madans, J. H. 2009. Race/ethnic data collection: Population surveys and administrative records. National Center for Health Statistics. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint Presentation.

Martin, E., T. J. Demaio, and P. C. Campanelli. 1990. Context effects for Census measures of race and Hispanic origin. Public Opinion Quarterly 54:551-566.

Massachusetts Executive Office of Health and Human Services. 2009a. 129 CMR 2.00: Uniform reporting system for health care claims data sets. Boston, MA: Massachusetts Health Care Quality and Cost Council.

———. 2009b. FY2007 inpatient hospital discharge database documentation manual. Boston, MA: Division of Health Care Finance and Policy.

Mays, V. M., N. A. Ponce, D. L. Washington, and S. D. Cochran. 2003. Classification of race and ethnicity: Implications for public health. Annual Review of Public Health 24:83-110.

McAlpine, D. D., T. J. Beebe, M. Davern, and K. T. Call. 2007. Agreement between self-reported and administrative race and ethnicity data among Medicaid enrollees in Minnesota. Health Services Research 42(6p2):2373-2388.

McBean, A. 2006. Improving Medicare’s data on race and ethnicity, Medicare brief. Washington, DC: National Academy of Social Insurance.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

McKenney, N. R., and C. E. Bennett. 1994. Issues regarding data on race and ethnicity: The Census Bureau experience. Public Health Reports 109(1):16-25.

Medi-Cal. 2009. Apply for Medi-Cal, individuals and families. http://www.dhcs.ca.gov/services/medi-cal/Pages/MCIndividual.aspx (accessed June 2009).

Model, S., and G. Fisher. 2008. Penalized for race, penalized for ethnicity: The earnings of Cape Verdean immigrants. Paper presented at American Sociological Association Annual Meeting, Sheraton Boston and the Boston Marriott Copley Place, Boston, MA.

NRC (National Research Council). 2006. Multiple origins, uncertain destinies: Hispanics and the American future. Edited by M. Tienda and F. Mitchell. Washington, DC: The National Academies Press.

———. 2009. Experimentation and evaluation plans for the 2010 Census: Letter report. Washington, DC: The National Academies Press.

OMB (Office of Management and Budget). 1977. Statistical policy directive No. 15, race and ethnic standards for federal statistics and administrative reporting. http://wonder.cdc.gov/wonder/help/populations/bridged-race/Directive15.html (accessed August 3, 2009).

———. 1997a. Recommendations from the Interagency Committee for the Review of the Racial and Ethnic Standards to the Office of Management and Budget concerning changes to the standards for the classification of federal data on race and ethnicity. Federal Register (3110-01):36873-36946.

———. 1997b. Revisions to the standards for the classification of federal data on race and ethnicity. Federal Register 62:58781-58790.

———. 2000. OMB bulletin No. 00-02. http://www.whitehouse.gov/omb/bulletins/b00-02.html (accessed January 14, 2009).

Palmer, L. 2004. Learning from each other: Assessing how Medicaid agencies collect racial/ethnic data. Hamilton, NJ: Center for Health Care Strategies. PowerPoint Presentation.

Portes, A., and R. G. Rumbaut. 2001. Legacies: The story of the immigrant second generation. Berkeley, CA: University of California Press and Russell Sage Foundation.

Read, J. G., M. O. Emerson, and A. Tarlov. 2005. Implications of Black immigrant health for U.S. racial disparities in health. Journal of Immigrant Health 7(3):205-212.

Reilly, T. 2009. Data improvement efforts: Centers for Medicare & Medicaid Services. Centers for Medicare & Medicaid Services. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint Presentation.

Romano, P. S., J. J. Geppert, S. Davies, M. R. Miller, A. Elixhauser, and K. M. McDonald. 2003. A national profile of patient safety in U.S. hospitals. Health Affairs 22(2):154-166.

Rosenberg, H. M., J. D. Maurer, P. D. Sorlie, N. J. Johnson, M. F. MacDorman, D. L. Hoyert, J. F. Spitler, and C. Scott. 1999. Quality of death rates by race and Hispanic origin: A summary of current research, 1999. Vital Health Statistics 2(128):1-13.

Rothenberg, P. S. 2006. Race, class, and gender in the United States: An integrated study, seventh edition. New York: Macmillan.

Schoenman, J. A., J. P. Sutton, S. Kintala, D. Love, and R. Maw. 2005. The value of hospital discharge databases. Rockville, MD: AHRQ.

Sequist, T. D., and E. C. Schneider. 2006. Addressing racial and ethnic disparities in health care: Using federal data to support local programs to eliminate disparities. Health Services Research 41(4p1):1451-1468.

Shah, N. S., and O. Carrasquillo. 2006. Twelve-year trends in health insurance coverage among Latinos, by subgroup and immigration status. Health Affairs 25(6):1612-1619.

Snipp, C. M. 1989. American Indians: The first of this land. New York: Russell Sage.

———. 2003. Racial measurement in the American Census: Past practices and implications for the future. Annual Review of Sociology 29:563-588.

Tafoya, S. 2004. Shades of belonging: Latinos and racial identity. Washington, DC: Pew Hispanic Center.

Taylor-Clark, K. 2009. Race/ethnicity/language data collection and reporting. The Brookings Institution. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint Presentation.

Taylor-Clark, K., A. B. Anise, Y. Joo, and M. Chin. 2009. Massachusetts Superset. Washington, DC: The Brookings Institution.

Tiutin, O. 2009. Language assistance data base (LADB), based on CDC race/ethnicity codes and ISO language codes. Martinez, CA: Contra Costa Health Plan.

Tucker, C., R. McKay, B. Kojetin, R. Harrison, M. de la Puente, L. Stinson, and E. Robinson. 1996. Testing methods of collecting racial and ethnic information: Results of the Current Population Survey Supplement on Race and Ethnicity. Washington, DC: Bureau of Labor Statistics.

U.S. Census Bureau. 1996. Population division working paper No. 16: Findings on questions on race and Hispanic origin tested in the 1996 National Content Survey. Washington, DC: U.S. Census Bureau.

———. 2000. Census 2000 summary file 1: 100-percent data. Washington, DC: U.S. Census Bureau.

———. 2002a. Census 2000 summary file 3: Technical documentation. Washington, DC: U.S. Census Bureau.

———. 2002b. Modified race data summary file: 2000 Census of population and housing, technical documentation. http://www.census.gov/popest/archives/files/MRSF-01-US1.html#fig1 (accessed February 25, 2009).

———. 2005. ACS 1-year PUMS ancestry code list. Washington, DC: U.S. Census Bureau.

———. 2008. Ancestry. http://www.census.gov/population/www/ancestry/ancoverview.html (accessed May 24, 2009).

U.S. House Committee on Ways and Means, Subcommittee on Health. 2008. Addressing disparities in health and healthcare: Issues for reform. Washington, DC: U.S. House Committee on Ways and Means.

U.S. Senate Finance Committee. 2009. Description of policy options, transforming the health care delivery system: Proposals to improve patient care and reduce health care costs. Washington, DC: U.S. Senate Committee on Finance.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×

Wallman, K. K. 2009. Current and future federal standards for race/ethnicity/language data. U. S. Office of Management and Budget. Presentation to the IOM Committee on Future Directions for the National Healthcare Quality and Disparities Reports, February 9, 2009. Washington, DC. PowerPoint Presentation.

Wei, I. I., B. A. Virnig, D. A. John, and R. O. Morgan. 2006. Using a Spanish surname match to improve identification of Hispanic women in Medicare administrative data. Health Services Research 41(4):1469-1481.

Weinick, R. M., J. M. Caglia, E. Friedman, and K. Flaherty. 2007. Measuring racial and ethnic health care disparities in Massachusetts. Health Affairs 26(5):1293-1302.

Wisconsin Cancer Reporting System. 2008. WCRS abstract code manual, 2ndedition. Madison, WI: Division of Public Health, Wisconsin Department of Health Services.

Yu, E., and W. Liu. 1992. U.S. national health data on Asian Americans and Pacific Islanders: A research agenda for the 1990s. American Journal of Public Health 82(12):1645-1652.

Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 61
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 62
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 63
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 64
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 65
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 66
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 67
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 68
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 69
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 70
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 71
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 72
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 73
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 74
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 75
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 76
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 77
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 78
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 79
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 80
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 81
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 82
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 83
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 84
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 85
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 86
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 87
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 88
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 89
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 90
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 91
Suggested Citation:"3 Defining Categorization Needs for Race and Ethnicity Data." Institute of Medicine. 2009. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: The National Academies Press. doi: 10.17226/12696.
×
Page 92
Next: 4 Defining Language Need and Categories for Collection »
Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement Get This Book
×
Buy Paperback | $65.00 Buy Ebook | $54.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The goal of eliminating disparities in health care in the United States remains elusive. Even as quality improves on specific measures, disparities often persist. Addressing these disparities must begin with the fundamental step of bringing the nature of the disparities and the groups at risk for those disparities to light by collecting health care quality information stratified by race, ethnicity and language data. Then attention can be focused on where interventions might be best applied, and on planning and evaluating those efforts to inform the development of policy and the application of resources. A lack of standardization of categories for race, ethnicity, and language data has been suggested as one obstacle to achieving more widespread collection and utilization of these data.

Race, Ethnicity, and Language Data identifies current models for collecting and coding race, ethnicity, and language data; reviews challenges involved in obtaining these data, and makes recommendations for a nationally standardized approach for use in health care quality improvement.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!