Health Statistics: Past, Present, and Future*
Dorothy P.Rice, Professor Emeritus Institute for Health and Aging University of California, San Francisco
The organization, delivery, and financing of health care services in the United States is complex, comprising an interdependence of the private and government sectors of the economy. This pluralistic health care economy, with its pragmatic mix of public and private organizations, has produced a wide range of databases that enable us to monitor the health of the nation.
Health care expenditures have been rising rapidly in the United States and claiming a larger share of national resources during the past three decades. In 1965, $41.1 billion was spent for health care, comprising 5.7 percent of the gross domestic product (National Center for Health Statistics, 1999). In 1998, health care expenditures in the United States totaled $1.1 trillion, an average of $4,094 per person, comprising 13.5 percent of the nations gross domestic product (Levit et al., 2000). Almost 11.5 million civilians were employed in the health services industry in 1998, com-
The keynote address given at the workshop is presented here in its entirety. Rice is former Director of the National Center for Health Statistics and a former member of the Committee on National Statistics. The author extends a special note of thanks to Mary Grace Kovar, Harry Rosenberg, and Samuel Marcus, who offered helpful comments on earlier drafts.
prising 8.8 percent of employed civilians (National Center for Health Statistics, 1999).
The growth of the health care industry in the United States has been accompanied by significant achievements in public health, including advances in prevention and significant declines since 1950 in death rates for diseases of the heart (56 percent), and stroke (70 percent) (Morbidity and Mortality Weekly Report, 1999). We have been successful in monitoring these and other morbidity and mortality trends through the growth and development of our health data systems.
Health care is a pressing social, political, and economic issue in the United States. The American pluralistic health care economy presents special problems for data collection, analysis, and dissemination. Health statistics systems have grown rapidly with the growth of the industry and the expansion of private health insurance coverage and public health care programs.
There is general agreement that data are needed to monitor the health of the nation; to plan and develop better health services; to deliver those services in an effective, efficient, and equitable manner; to measure their effectiveness; to make decisions on resource allocation; and to conduct research. Data also are needed to facilitate effective policy making, planning, management, and evaluation. Private organizations of health professionals, health service providers, health insurance, and many others have important interests in the collection and use of health data. The federal government needs a variety of data to support its major role in improving health and medical care delivery systems throughout the nation. State and local government agencies also have key roles in disease prevention, delivery of health services, and health planning and evaluation that require timely and reliable health statistics.
This paper presents a brief historical review of how the health statistics system has evolved to its present configuration and the lessons to be learned that might guide the future evolution of the system. This review will focus on the changes during the past 35 years in the types and uses of health statistics, the constituencies, and changes in technologies supporting the health statistics system. Gaps in health statistics, as well as several cross-cutting issues, will be discussed. Special focus will be on the federal health statistical system, especially as it relates to the production, use, and need for health data at the federal, state, and local levels. The paper concludes with challenges for the future in producing a health statistics system for the twenty-first century.
The statistical needs of the American pluralistic health care economy have grown enormously in the past 35 years since the enactment of the Medicare and Medicaid programs in 1965, the rapid growth of private health insurance, the expansion of the health care industry, and the concomitant public health, medical, and technological advances to meet the needs of a growing population. The rapid aging of the population, the emergence of chronic illnesses to replace infectious diseases as the leading causes of morbidity and mortality, and the growing heath care needs of subpopulation groups (i.e., minorities, uninsured, immigrants, and persons with disabilities and low incomes) are current phenomena that require close monitoring in the future.
Health statistics often are obtained via sample surveys conducted through telephone, mail, or in-person interviews of individuals and/or households. Health surveys go back to the Hagerstown morbidity studies conducted by the Public Health Service in the early 1920s. However, sample surveys did not become dominant until the rise of probability sampling in the 1930s (Frankel and Stock, 1969). The Public Health Service conducted the first National Health Survey in 1935–1936, funded by the Works Projects Administration (Duncan and Shelton, 1978). In 1953 the National Opinion Research Center began a series of surveys separated by five-year intervals on the consumers use of medical care, the degree of health insurance protection, and expenditures for care (Andersen and Anderson, 1967).
In October 1953, a subcommittee of the U.S. National Committee on Vital and Health Statistics (NCVHS) recommended that a national health survey be established on a permanent basis. The passage of the National Health Survey Act of 1956 called for a continuing survey and special studies on the nations health. It also provided for studying methods and survey techniques for obtaining this statistical information and for disseminating results of these surveys and studies. The National Health Survey, later renamed the National Health Interview Survey (NHIS), began in 1957. In 1960, the National Center for Health Statistics (NCHS) was created by combining the National Health Survey and the National Office of Vital Statistics. Responsibility for vital statistics had been transferred to the Public Health Service from the Bureau of the Census.
NCHS is the federal government’s principal health statistics agency (National Research Council, 1992; Office of Management and Budget,
1998). The NCHS congressional mandate addresses the full spectrum of concerns in the health field from birth to death, including overall health status, environmental, social and other health hazards, the onset and diagnosis of illness and disability, health resources, and the use, cost, and financing of health care. NCHS also has the mandated responsibility for assisting the states and local health agencies in meeting their costs of data collection.
Although NCHS is considered the main health statistics agency, many other federal agencies also have significant responsibilities for health data collection. For example, included in the NIH statistical budget are activities that support the design and implementation of epidemiologic studies, clinical trials, biomedical research, and laboratory investigations conducted by the various institutes. Other DHHS components, such as the Centers for Disease Control and Prevention (CDC), the Substance Abuse and Mental Health Services Administration (SAMHSA), the Office of the Assistant Secretary for Planning and Evaluation (OASPE), the Agency for Health Care Policy and Research (AHCPR), the Health Care Financing Administration (HCFA), and the Health Resources Services Administration (HRSA) also actively collect health data. The Office of Management and Budget (OMB), which reviews agencies’ budgets and tracks the amount allocated toward “statistical activities” reports that 13 agencies of DHHS had direct statistical budgets amounting to $804 million in fiscal year 1999; of this total, NCHS’s budget comprised only 10.7 percent (Office of Management and Budget, 1998). By comparison, the statistical budget for the National Institutes of Health (NIH) comprised more than two-fifths of the total DHHS budget—$347.7 million, or 43.2 percent.
Outside of DHHS, other federal agencies collect health-related data as part of their programs, such as Bureau of the Census, Bureau of Labor Statistics, Department of Veterans Affairs, Department of Agriculture, Department of Defense, Department of Commerce, Department of Transportation, and others.
The myriad of federal agencies, with their special needs related to their health programs, clearly use considerable resources each year on multiple, decentralized program-related health surveys and statistical activities. Most federal health data systems have traditionally been developed independent of each other. Despite the large amount of money and staff resources devoted to these statistical activities, we lack information necessary to adequately assess the health status of the population, and the determinants of risks to health, and the ability to analytically relate data across these areas.
Recognizing the inadequate coordination, and the inefficient and overlapping statistical activities within DHHS, the agency targeted the improvement of the analytic capacity of health and human services programs through Reinventing Government, Part II. A survey consolidation working group was formed in early 1995 to develop a consensus plan for consolidation of surveys (HHS Survey Consolidation Working Group, 1995). The recommendations of this group will be discussed following a review of the major types of health statistics collected and key constituents of these data.
Birth, death, and fetal death statistics constitute the National Vital Statistics System (NVSS) of NCHS. This program, together with the Bureau of the Census decennial census and immigration and emigration data, provides information on the dynamics of population, its growth, and changes in its composition. They furnish the information basic for making population projections, fertility analysis, planning needs for health services, for projecting school needs, and for other purposes. They are essential in the teaching and application of demography, epidemiology, sociology, medicine, and public health.
Vital statistics are provided through state-operated registration systems. Standard forms for the collection of data and model procedures for the uniform registration of events are developed and recommended for use through cooperative activities of the states and NCHS. NCHS shares the costs incurred by the states in providing vital statistics for national use. Additional programs related to the NVSS include the Linked Birth and Infant Death Data Set, the National Maternal and Infant Survey, the National Mortality Followback Survey, and the National Death Index.
The historical roots of the vital registration system go back to the earliest American settlements, when such colonies as Massachusetts and Virginia, following the English custom, required that records be kept of christenings, weddings, and burials. In time, these records shifted to the more meaningful categories of births, marriages, and deaths. Lemuel Shattuck, the leading proponent of registration, demonstrated that the health of the residents of the city of Boston was deteriorating, as measured by mortality levels. The Report of the Sanitary Commission of Massachusetts recommended the creation of a state board of health based on complete registration and vital statistics (Rice, 1981). By 1933, all states were registering births and
deaths. Currently, the marriage and divorce statistics program is limited to the publication by NCHS of monthly counts of marriages and divorces registered in each state. Detailed abortion statistics are not collected by NCHS due to budget constraints; aggregate statistics are available through CDC’s abortion surveillance system established in 1969.
In the early 1970s, NCHS pioneered the development of automated systems to process cause-of-death data through its creation of the Automated Classification of Medical Entities (ACME) system. The purpose was to apply computer systems to the complex logic for selecting the underlying cause of death from among conditions that physicians reported on death certificates, which was a costly and complicated manual coding process. Beginning with data for 1968, all U.S. death statistics were based on the application of that computer algorithm at either the state or national level. These systems produced more consistent data as well as much greater detail than ever before with the exact diagnoses reported by certifying physicians. The effectiveness of the U.S. automated systems was affirmed by the growing adoption of the systems on an international basis.
The evolution of the vital statistics program is regarded by some as an example of a most successful program, providing full counts of births and deaths at the local, state, and federal geographic levels. Except for the important issue of timeliness, the reports emanating from the vital statistics program did an excellent job of meeting the demands of users, within the limits of timely availability of the data (Committee to Evaluate the National Center for Health Statistics, 1973). The availability of data electronically in recent years has gone a long way to improving the timeliness of vital statistics data, thereby enhancing the usefulness of the data.
The introduction of the National Death Index (NDI), a computerized index of death record information beginning with 1979 deaths, has made enormous contributions to more efficient epidemiologic and other health studies in which researchers can go to one source, NCHS, to obtain mortality information on their study participants. Prior to the establishment of the NDI, each state had to be contacted separately for such information on file in the state vital statistics offices.
PUBLIC HEALTH SURVEILLANCE
Public health surveillance is the “ongoing systematic collection, analysis, and interpretation of data on specific health events affecting a population, closely integrated with the timely dissemination of these data to those
responsible for prevention and control” (Thacker et al., 1996:633). A feature of surveillance is the ability to identify individuals and groups of individuals for further action on prevention and treatment. This construct raises issues of privacy and confidentiality, rapidly evolving issues that reflect the complex interplay of personal rights, ethical concerns, legal responsibility, and societal interest in the general welfare of the population and public health. Privacy and confidentiality will be discussed further below.
The National Notifiable Disease Surveillance System illustrates traditional surveillance in which physicians, clinical laboratories, and other health care providers are required by state law to report all cases of health conditions, mainly infectious in origin, that are specified as being notifiable. The Council of State and Territorial Epidemiologists determines which notifiable conditions should be reported by state health departments to the CDC. The CDC and other federal agencies are involved in the collection of surveillance data, including, but not limited to the following:
the National Institute of Occupational Safety and Health has maintained a sentinel health event verification system for occupational risk;
the Food and Drug Administration conducts postmarketing surveillance of adverse reactions to drugs;
the National Cancer Institute conducts the Surveillance, Epidemiology, and End-Results (SEER) Programs;
the Behavioral Risk Factor Surveillance System, a telephone survey conducted in each of the 50 states, and supported in part by the CDC, provides data on health behaviors; questions can be added by the individual states;
the Pregnancy Assessment Monitoring System developed by CDC to collect information on maternal behaviors that occur before, during, and shortly after pregnancy among women who deliver a live-born infant; and
the Consumer Product Safety Commission conducts surveillance on product-related injuries.
Surveillance data vary in their quality and often are incomplete and unrepresentative, and they may vary in sensitivity and specificity (Stroup and Teutsch, 1998). Although the current programs provide essential data to monitor the incidence of communicable diseases and some chronic diseases, the system relies on voluntary physician reporting, which has been demonstrated to be variable and inconsistent. States differ in their authority to require physician reporting. Development of greater standardization
in reporting from state to state, and obtaining improved physician cooperation are areas that need further exploration.
Population-based registries and national sample surveys have also been used for surveillance purposes. Registries are established to identify cases through several sources (e.g., schools, hospitals, and laboratories). Registries require extensive confirmation of cases, leading to longer lag times between a health event and the reporting of such an event. The National Cancer Institute SEER program covers about 10 percent of the U.S. population; it provides data that are used to monitor long-term trends of cancer incidence and mortality. Currently, approximately 30 states have population-based registries, but they may be limited by both under-registration and selection bias (Stroup and Teutsch, 1998).
HEALTH STATUS, HEALTH CARE UTILIZATION, AND MEDICAL CARE COSTS
Statistics abound on health status and use of medical care services at the federal, state, and local levels. The National Health Interview Survey (NHIS) and the National Health and Nutrition Examination Survey (NHANES) are the major national surveys for assessment of health status in the United States and are sponsored by NCHS. NHIS is a primary source of information on the health of the civilian, noninstitutionalized population of the United States. Conducted continuously since 1957, it provides national data on the annual incidence of acute illness and accidental injuries, the prevalence of chronic conditions and impairments, the extent of disability, the utilization of health care services, and other health-related topics. To provide data on special topic areas in addition to the basic NHIS data, extensive supplements have been conducted annually. Topics covered in the supplements vary from year to year. For example, in 1995 the supplements included questions on the following: immunization, children’s and adults’ disability; follow-up on persons with disabilities interviewed in the prior year, family resources (access to care, health care coverage, income and assets), year 2000 objectives (tobacco use, nutrition, clinical preventive services, physical activity and fitness, and AIDS knowledge and attitudes (National Center for Health Statistics, 1998). The NHIS sample design includes about 40,000 households interviewed, resulting in a sample of about 102,000 individuals, with oversampling of black and Hispanic persons.
NHANES was established in 1971 to collect the kinds of health data
best obtained by direct physical examinations and physiological and biochemical measurements. NHANES is the cornerstone of the National Monitoring and Related Research Program, providing data needed for nutrition monitoring, food fortification policy, establishing dietary guidelines, and assessing government programs and initiatives such as Healthy People 2000 and 2010 objectives of DHHS. In the past, researchers sometimes had to wait as long as 10 years after data collection before gaining access to data based on the entire 6-year sample. Now, NHANES is a continuing, annual survey, linked to the NHIS, and data are being collected from a representative sample of the U.S. population, newborns and older, every year.
NHIS and NHANES are only two of the many national federal surveys that collect data on heath status, medical care utilization, and insurance coverage. Other important federal surveys collect similar data as well as data on medical care expenditures:
The National Immunization Survey (NIS) is a continuing nationwide telephone sample survey to gather data on children 19–35 months of age. In 1997, data were obtained for 32,742 children to provide estimates of vaccine-specific coverage for national, state, and 28 urban areas considered to be high risk for under-vaccination.
The Medical Expenditure Panel Survey (MEPS) conducted by the Agency for Health Care Policy and Research (AHCPR) is a study of approximately 9,000 households. MEPS is a subsample of NHIS participants, providing health status and other data for enhanced analytical capacity. Use of NHIS data in concert with the data collected in the 1996 MEPS provides the capacity for longitudinal analysis. Each sample panel is interviewed a total of five times over 30 months to yield annual use and expenditure data for two calendar years. The 1996 MEPS household component reflects an oversampling of households with Hispanics and blacks (Cohen et al., 1999). MEPS also has an institutional component.
The National Household Survey on Drug Abuse (NHSDA), conducted by SAMHSA, focuses on the incidence, prevalence, consequences, and patterns of substance use and abuse. In 1997, the NHSDA was expanded from 18,000 respondents to about 25,000 respondents to generate estimates for the nation and for two states (California and Arizona). In 1999, the NHSDA was further expanded to 70,000 respondents to generate estimates for all 50 states.
The Medicare Current Beneficiary Survey, conducted by HCFA, is
an ongoing rotating panel survey of approximately 12,000 aged and disabled Medicare beneficiaries, consisting of four overlapping panels of Medicare beneficiaries surveyed each year. Each panel contains a national representative sample of beneficiaries who are interviewed 12 times in the community or a long-term care facility to collect three complete years of utilization data. The survey provides comprehensive data on health and functional status, use of medical services, covered and noncovered health care expenditures, and health insurance for Medicare beneficiaries.
The National Health Care Survey is a family of NCHS provider-based surveys that measure the utilization of health services through a series of surveys of providers. Included are hospitals (National Hospital Discharge Survey), physicians (National Ambulatory Care Survey), emergency and outpatient departments (National Hospital Ambulatory Medical Care Survey), ambulatory care centers (National Survey of Ambulatory Care Surgery), nursing homes (National Nursing Home Survey), and health agencies providing home health care services and hospice care (National Home and Hospice Care Survey).
The National Survey of Family Growth (NSFG) is a periodic survey of women ages 15 to 44 years. The purpose of the survey is to provide national data on factors affecting birth and pregnancy rates, adoption, and maternal and infant health. In 1995, for the first time, the sample was obtained from households that had been interviewed in the NHIS. A total of 10,847 women were interviewed, and Hispanic and black women were oversampled. Cycle 6 of the NSFG will include a sample of men for the first time.
The Healthcare Cost and Utilization Project, conducted by AHCPR, consists of the State Inpatient Database (SID) and the Nationwide Inpatient Sample (NIS). SID contains all hospitals and all discharges from 22 participating states. AHCPR receives the data from each statewide data organization, processes the data into a uniform format, and then returns the uniform SID files to the statewide data organization. The NIS database contains a sample of hospitals selected from SID. The NIS comes with weights that can be used to produce national estimates, regional estimates, and state estimates for participating states.
The Current Population Survey (CPS) is a monthly sample survey of about 50,000 households conducted by the U.S. Bureau of the Census for the Bureau of Labor Statistics. The CPS is the primary source of information on labor force characteristics of the U.S. population. Monthly estimates from the CPS include employment, unemployment, earnings,
hours of work, and other indicators. The annual March supplement produces national and state estimates of health insurance coverage, including private health insurance, Medicare, Medicaid, the Civilian Health and Medical Program of the Uniformed Services (CHAMPUS), and military health care.
In addition to the federal health statistics surveys and programs briefly discussed above, each of the 50 states and the private sector maintain data systems and conduct many surveys of hospitals, health professionals, and health care organizations. The private health sector includes organizations of health service providers, health professionals, health insurance payers, consumers, industry, and private philanthropy. Many national and state data collection activities are conducted by these private organizations, but their quality is variable. The results of all these statistical efforts are duplicative and overlapping data systems in the public and private sectors. Hospital inpatient data, for example, are collected in the public sector by NCHS, AHCPR, HCFA, SAMHSA, Veterans Administration, and others. Most states have their own hospital discharge data systems conducted by state rate-setting, planning, and health systems agencies. In the private sector, hospital data are collected by the American Hospital Association, many abstracting organizations, Blue Cross, professional standard review organizations, and health maintenance organizations. It is recognized that hospital data are necessary to understand, monitor, and evaluate programs related to hospital-based delivery of health care. The reporting burden on hospitals, however, is great; recording, storing, abstracting, and processing medical records is expensive for both the institution and the users. The rationale for these overlapping and duplicative hospital inpatient data is difficult to justify.
USERS OF HEALTH STATISTICS
Few people now dispute the need for data as a basic requirement for the development of policy in every area of our national life, whether it be health, environment, education, employment, delinquency and crime, defense, agriculture, transportation, welfare, housing or any important area (Hauser, 1975). In the health area, the need to plan for appropriate levels of health resources, to protect the public from hazards in the workplace and environment, and to evaluate the effectiveness of health programs led to increased demands for data on health status, health resources and their
utilization, costs, and financing. The HHS Survey Consolidation Working Group (1995) listed the following key customers and constituents of HHS.
HHS agencies that use data for program management, actuarial projections, health policy (see Kronick, 1999), and evaluation;
HHS researchers who use data in epidemiologic studies, identification of hypotheses for biomedical research, demographic studies, etc.;
analysts in other executive and legislative organizations, including the Office of Management and Budget, the Congressional Budget Office, and the General Accounting Office;
state and local governments involved in public health and health care financing;
academic researchers working independently and with HHS support;
advocacy groups, including advocates of public health, disease-specific medical research, children, the aged, Americans with disabilities, and those advocating strategies for health reform;
the private sector, including providers, managed care organizations, and associations.
The following may be added to the above (Hauser, 1975):
patients, families, and community residents in general;
social service workers;
payers, purchasers, and regulators with interest in and responsibility for evaluating quality and outcomes of care;
community and consumer organizations, charitable and volunteer groups in the health area;
legal professionals bringing criminal or civil charges, and others.
GAPS IN HEALTH STATISTICS
Many federal health data systems have evolved from specific program needs as the federal role in health care has expanded in the last three decades. The needs for health data at the state and local levels, however, have not adequately been met. The Cooperative Health Statistics System, a nationwide cooperative network of public and private agencies linked together
to meet their respective needs for health statistics, is no longer supported by the federal government (National Center for Health Statistics, 1980). While the responsibilities of the states for administering health care programs are being expanded greatly, there has not been a commensurate increase in resources devoted to their statistical functions and their needs for health data. For example, the growing racial and ethnic diversity within the states raises important issues with respect to monitoring the health status and health care costs and financing of minority groups. However, sample sizes in national data sets are not large enough to disaggregate at the state level. In addition, there is a wide range of ability and capacity among state statistical programs to collect, analyze, and interpret health data.
Health Statistics for Subpopulation Groups and Minorities
As efforts continue to reduce health disparities among special population groups of low-income persons, racial and ethnic minorities, and persons with disparities, it is recognized that data are needed to monitor our progress toward eliminating these disparities. Except for the data derived from the decennial census and from the vital registration system (birth and death statistics), the existing sources of health data do not permit examination of socioeconomic differences for any but the three largest race and ethnic categories: non-Hispanic white persons, non-Hispanic black persons, and persons of Hispanic or Mexican origin. Data shown for broad groupings usually mask significant differences among subgroups. For example, “Asian or Pacific Islander” includes persons with ancestry in such countries as China, Vietnam, the Phillippines, Japan, and Samoa, while “Hispanic” combines persons whose origins were Cuba, Puerto Rico, Mexico, or any other countries of Central or South America. These subgroups often have very diverse health status and risk behavior. It is essential that our health statistical systems at the national and state levels capture this diversity.
The surveys, surveillance, and vital statistics programs meet many of the current needs for health data. The cross-sectional survey data give a “snapshot” at a point in time of the health status of people at different stages in their lives and allow periodic examinations of changes over time. Still needed, however, are large-scale longitudinal efforts that record in se-
quence the health events of life. Longitudinal efforts in the health area are limited. Recent examples of relatively short-term follow-ups of survey participants include the NHANES I Epidemiologic Followup Study, the Longitudinal Study on Aging, NHIS Disability Supplement, MEPS, and the Medicare Current Beneficiary Survey (MCBS).
The needs for additional longitudinal studies have been specifically addressed in at least 10 of the 62 reports dealing with some aspects of health data and data systems published by the National Research Council and Institute of Medicine since 1985 (Jane Durch, personal communication, September 9, 1999). These reports have specifically recommended the need for longitudinal studies, such as the recommendation that NCHS develop and implement a continuous, longitudinal survey of health care utilization and expenditures, and their health care providers, using cohorts of individuals selected from among NHIS survey respondents (National Research Council and Institute of Medicine, 1992)
Other Identified Gaps
Testimony to the many other identified gaps in current health statistics and the needs for specific health data is documented in the recent compilation of the conclusions and recommendations regarding health data and data systems in the published reports of the National Research Council and Institute of Medicine since 1985 (Jane Durch, personal communication, September 9, 1999). A total of 62 studies made 249 recommendations relating to health data and data systems, averaging about 4 recommendations per study. The number of recommendations ranged from 1 per study (14 studies) to 17 in one study, Toward a National Health Care Survey: A Data System for the 21st Century (National Research Council and Institute of Medicine, 1992). The specifics of these recommendations are too many to enumerate. The recommendations are varied, including the proposed establishment of a surveillance system to detect, monitor, and warn of adverse effects in the recipients of blood and blood products (Institute of Medicine, 1995); the collection of person-based longitudinal information in the National Health Care Survey, expanding the data collected to include, but not be limited to, information on the health care received, costs and gross expenditures for health care, and outcomes (National Research Council and Institute of Medicine, 1992); recommendations for additional resources for current or new data systems, and many others.
On one hand, implementation of the many health data recommenda-
tions enumerated in the National Research Council reports would proliferate the existing fragmented data systems, especially at the federal level; on the other hand, there are many identified gaps in the existing data systems and many needs for improved health statistics to fill them.
Overlap and Consolidation
In 1995, DHHS recognized that it had a considerable investment in surveys and other data systems to support broad analytic and program objectives, and that the operation of these data systems and surveys was decentralized with limited central strategic planning and direction, resulting in overlaps with respect to populations of interest, analytic capabilities, sample and questionnaire designs, and collection efforts. Through Reinventing Government Part II, DHHS formed an interagency survey consolidation working group that on April 11, 1995, reported its HHS Plan for Consolidation of Surveys (HHS Survey Consolidation Working Group, 1995). Key features of the plan included 15 specific proposals, including establishing the capability to use NHIS as a sampling frame for other surveys such as MEPS, NHANES, NSFG, and NHSDA. By moving to this consolidated, annual household data collection effort, the analytical capabilities of these surveys have been significantly expanded and enhanced. The NHIS household interview core questionnaire provides population-based statistics on health status and health care utilization with sufficient sample size to allow for analyses based on disaggregation of detailed age, race, sex, income, and other sociodemographic characteristics, and allows for the collection of data on a broad range of topics provided by NHIS.
Implementation of some of the above recommendations has been a significant step forward in data collection and a very exciting development in health statistics. In 1996, the MEPS sample was composed of the fourth-quarter NHIS sample. When MEPS data are linked to NHIS data, the microanalytical potential for studies of health status, prevalence of chronic conditions, health care coverage, and utilization of and expenditures for medical care services is greatly enhanced. Likewise, NHANES and NSFG also are now using the sampling frame of the NHIS.
Unfortunately, no progress has been made on the coordination of the National Household Survey on Drug Abuse (NHSDA) with NHIS. As noted earlier, the sample size of the NHSDA has been expanded to 90,000
persons in 1999 to enable the production of the state estimates. The Survey Consolidation Working Group had recommended that a design framework in the NHSDA could be consolidated with NHIS in several respects, including (1) closer coordination between the questionnaires of the two surveys; (2) using NHIS as the sampling frame for NHSDA; and (3) conducting NHSDA as a supplement to the NHIS. One wonders why there are sufficient resources for the expansion of NHSDA, but not for NHIS, to provide state estimates.
Other important proposals for consolidation included: (1) merging the National Nursing Home Survey and the MEPS institutional component into an integrated, periodic survey of nursing home capacity, services, utilization, and expenditures; (2) the Medicare Current Beneficiary Survey should be closely coordinated with MEPS in terms of greater questionnaire coordination and analytical linkages; and (3) design of a state-level telephone survey to obtain basic health status, access to care, insurance, and expenditure data of importance for national policy analysis, performance evaluation, and modeling. By using an expanded NHIS sample and questionnaires from the consolidated national surveys, state-level data can be obtained efficiently and will be directly comparable to national data. Information on access to health care and health insurance coverage is needed by the states. To date, such information has been available from the CPS March supplement. Questions may be raised as to why such important health data at the state level have to come from a labor force survey rather than from a health survey.
It is clear that considerable progress has been made within DHHS in survey consolidation. But we have a long way to go to eliminate fragmentation and overlap in DHHS surveys as well as in other federal and state agencies and the private sector.
The information sector of society is rapidly changing with the evolution of computer technology. The development and widespread use of the computer unquestionably has been one of the great technological changes in the past 50 years. One of its effects on statistics has been very large reductions in clerical personnel requirements and consequent reductions in total costs. The most pervasive effects of computers on health statistics have been in the dramatic changes in all aspects of data collection, data analysis, and data dissemination. We now have the ability to do things in
all these areas that could not be done at all without computers, either because they could not be done in time to be useful or because they would have cost too much to be practical.
We have witnessed significant changes in methods of sample survey data collection in recent years, from personal household interview surveys to random-digit-dialing telephone surveys, from computer-assisted telephone interview surveys to computer-assisted personal interviews. Expanded access to the Internet and more powerful computing hardware for management and processing of data have had positive effects on the accuracy of data collected and disseminated as well as on the timeliness of available data. Access to health data via computers has clearly increased, thereby changing and expanding the user pool and thus the uses of the data.
The amount of data about individuals and their use of the health delivery system has grown exponentially as a direct result of advances in computer technology. The ability to capture and retain information on individual records and the use and disclosure of personally identifiable medical information have been the subjects of substantial discussion by government agencies, professional associations, and others. This issue is discussed below as it relates to health statistics.
Privacy Protection, Confidentiality, and Data Sharing
The conflict between freedom of information and invasion of privacy in relation to data collection has received increasing attention in recent years. A balance must be struck between the public’s right to know and the right of individuals and institutions to protect their privacy. Even in those programs where strong legal safeguards and technical procedures protect the confidentiality of the information collected, there remains a persistent fear that this vast complex of information might be used as an instrument of social control, if not for commercial purposes.
Advances in technology and the increasing collection of personal data for public and private decision making are raising concerns among many Americans about the confidentiality of the information they provide for use in government surveys. Both individuals and businesses are questioning how the information is used and who has access to it. At the same time, data users, especially those outside government, are increasingly frustrated by limits on the amount of detailed information they can obtain from statistical agencies.
A CNSTAT report on confidentiality and accessibility of government
statistics offered principles and specific recommendations for managing data for research and policy making and the confidentiality of information (National Research Council, 1993). Another Institute of Medicine report offered recommendations related to public disclosure of quality-of-care information and protection of the confidentiality of personal health information (Institute of Medicine, 1994). A more recent study, For the Record: Protecting Electronic Health Information (National Research Council, 1997), dealt with the need for the health care industry to create the infrastructure necessary to support the privacy and security of electronic health information. These reports recognized that diverse groups of researchers, business leaders, and policy makers have developed databases to permit increasingly sophisticated analyses of community health needs, practice patterns, costs, and quality of care. Greatly enhanced electronic capabilities for data management create opportunities for easy linkage of health data files, resulting in concerns about misuse of the information and how well the privacy and confidentiality of personal health information will be guarded.
Data Sharing and Data Linkage
It is beyond the scope of this paper to deal with ways to protect patient- and person-identifiable health and medical data collected in the public and private sectors. One aspect of the privacy and confidentiality issue, however, that could have a significant impact on reduction of duplicative and overlapping reporting systems is data sharing and data linkage among government agencies. It has long been recognized that the development of comprehensive data systems concerning the interrelations among various aspects of social and economic patterns sometimes requires that various data sets be combined. Recommendations have been made for exchange of statistical data under legislatively mandated “protected enclaves” for selected statistical and research agencies within the federal government (Office of Federal Statistical Policy and Standards, 1978). The House Government Reform Committee approved H.R. 2885, the Statistical Efficiency Act, which designates eight federal statistical agencies as statistical data centers and allows for limited sharing of statistical information by other agencies with these data centers and sharing among the statistical data centers. If approved by the Senate, this Act will go a long way to facilitate sharing of data among federal agencies. For example, NCHS could use business data from the Census Bureau or the Bureau of Labor Statistics to construct
sampling frames for surveys of employers or health providers, or use census data from the Census Bureau to augment population samples.
Linking public and private data is an area with tremendous potential for analysis if issues of confidentiality can be overcome. Linking such data is especially important for medical effectiveness and outcomes research, which examines the effects of alternative treatments of a given medical condition on the eventual outcomes realized by the patients (Agency for Health Care Policy and Research, 1991).
Data integration among current large data collection activities should be carried out to maximize the results of separate efforts. Linkage of data files should be encouraged when there is good reason to believe that the results of a specific linkage program will be sufficiently complete for the specific purpose and that biases and limitations of linkage studies will not be so severe as to vitiate results. Linkages should be carried out only when there is a hypothesis that can be investigated through linked data and if suitable safeguards of confidentiality can be applied when the research benefits exceed any potential risks to subjects.
Quality and Reliability of Data
Many organizations, especially federal statistical agencies, such as NCHS, have made and continue to make considerable and commendable efforts to maintain and improve the quality of their major statistical series. In the private sector, however, the quality and reliability of data are uneven and unknown in many ongoing databases. Survey results are subject to sampling, reporting, processing, and nonresponse errors; the data cannot be fully understood and properly used unless these errors are reported. Standard errors are routinely reported in federal statistical reports on survey results but are unavailable in most reports emanating from facility and man-power surveys conducted in the private sector. The improvement of the quality and reliability of health statistics in the private sector is most urgently needed.
Standardization of Data Elements, Uniform Definitions and Coding, Minimum Data Sets
Health data are collected by many organizations and at multiple geopolitical levels for a variety of uses. Standardization of data elements across programs is necessary to permit comparisons and to avoid duplicative ef-
forts. Considerable progress has been made at the federal level in providing standards for data collection, analysis, and distribution by the Office of Management and Budget. For example, a standard classification for race and ethnicity has been promulgated and implemented by federal agencies in all their data collection activities. In subject areas of health insurance and disability, it is essential that questions be standardized across federal surveys. Currently, the surveys often produce different estimates because of the lack of standardization.
Some progress has been made in the development of uniform minimum data sets under the auspices of the NCVHS, but this effort must be continued as the needs for data at the state and local levels continue to grow.
The Health Insurance Portability and Accountability Act (HIPAA) of 1996 includes a section on administrative simplification aimed at the reduction of administrative costs and burdens in the health care industry. It requires the DHHS to adopt national uniform standards for electronic transmission of certain health information. It is understood that the adoption of uniform national standards for electronic processing of insurance claims and related transactions will improve information flow and help generate significant savings, while improving efficiency and enhancing the quality of health care services. The law may remove some roadblocks that now impede access to some data or make it difficult to link benefits, services, and outcomes.
HIPAA also requires DHHS to develop a unique health identifier for each individual, employer, health plan, and health care provider. The NCVHS is charged with offering technical support and advice to the Secretary of DHHS on the development of this unique identifier. The Committee is currently working with standard-setting organizations to identify, define, agree on, and then implement uniform standards. At the same time, insurers and providers will have to review and revise existing data infrastructures. Also, important and difficult issues relating to privacy of individually identifiable health information are being addressed. With the implementation of the Administrative Simplification subtitle of HIPAA, for the first time in its history the United States will have the means to monitor the health, health care, and health care costs of its entire population (Pollock and Rice, 1997).
CHALLENGES FOR THE FUTURE
From this brief historical review of health statistics in the United States, we reluctantly conclude that despite improvements, health statistics production in this country presents a picture of fragmented data collection, lack of common definitions and uniformity of reporting, duplicative and overlapping systems, and resistance to data sharing. We are encouraged that progress has been made along some fronts, but we have a long way to go to fill the data gaps and to provide the health statistics needed for the twenty-first century.
As former director of one of the federal statistical agencies, I believe that the best way to provide objective high-quality information on the demographic, economic, social, and health characteristics of our population, and trends in those characteristics, is through agencies specifically established for that purpose. These agencies have “no axe to grind,” can usually guarantee confidentiality to respondents, and hence are able to produce unbiased quality information acceptable to a wide array of users both within and outside government. However, even in the best of economic times it is difficult to obtain adequate budgets to support the necessary data collection and analysis activities. Recognizing the philosophy that the federal government should only be in the business of doing things that cannot be adequately done by states and/or the private sector, it may be necessary to reassess the core programs of the federal statistical system.
Regardless of what changes must be made in the core programs, we must ensure that an information base continues to be available that will provide baseline data, be useful for monitoring trends, and have the ability to quickly detect any changes or aberrations in the economic, social, or health characteristics of the nation. The appropriate federal role in statistics is to produce national-level data useful for those purposes as well as to provide norms to which subnational data can be compared. The data must be of high quality, produced in a timely manner, and relevant to issues of the day.
Federal statistical agencies must assume responsibility for activities that cannot reasonably or feasibly be assumed by individual states, local governments, and the private sector. The federal role must include the development and promulgation of standards and procedures for assuring the validity, reliability, comparability, and quality of statistical products and the provision of technical assistance in these areas. Federal statistical agencies
also must anticipate future needs for information and design todays systems to meet those needs.
In considering future prospects for improved health statistics to meet the needs of the twenty-first century, we must recognize that resources will not grow parallel to demands for data and services. The demands for health data are greater than our ability to produce them. Budgetary pressures are requiring assessment of current data collection and dissemination procedures. Statistical agencies must make choices between data collection, research, and analysis, and among needed data sets.
As we move closer to our objective of a national and systematic approach to meeting the information needs for health policy development and program evaluation, we also need to coordinate our data collection activities, both within the federal establishment and between government and the private sector. Although considerable progress has been made in coordination, we must continue to avoid unnecessary and costly duplication, to encourage comparability of information collected by different systems, and to use the ongoing data collection programs to provide specific information for many organizations. More effort is needed to provide essential data, yet reduce the burden on individual and institutional respondents. We must develop and articulate a twenty-first century vision for health statistics.
Agency for Health Care Policy and Research 1991 Report to Congress: The Feasibility of Linking Research-Related Data Bases to Federal and Non-Federal Medical Administrative Data Bases. AHCPR Pub. No. 91–0003. Washington, DC: U.S. Government Printing Office.
Andersen, R., and O.W.Anderson 1967 A Decade of Health Services: Social Survey Trends in Use and Expenditures. Chicago, IL: The University of Chicago Press.
Cohen, S.B., R.DiGaetano, and H.Goksel 1999 Estimation Procedures in the 1996 Medical Expenditure Panel Survey Household Component. MEPS Methodology Report No. 5: May. AHCPR Pub. No. 99– 0027. Rockville, MD: Agency for Health Care Policy and Research.
Committee to Evaluate the National Center for Health Statistics 1973 Health Statistics Today and Tomorrow: A Report of the Committee to Evaluate the National Center for Health Statistics. Vital and Health Statistics, Series 4, No. 15. DHEW Publication No. (HRA) 74–1452: September.
Duncan, J.W., and W.C.Shelton 1978 Revolution in United States Government Statistics. Office of Federal Statistical Policy and Standards, U.S. Department of Commerce. Washington, DC: U.S. Government Printing Office.
Frankel, L.R., and J.S.Stock 1969 On the sample survey of unemployment. Journal of American Statistical Association; December 1969:77–80.
Hauser, P.M. 1975 Social Statistics in Use. New York: Russell Sage Foundation
HHS Survey Consolidation Working Group. 1995 HHS Plan for Consolidation of Surveys: April 11, 1995. Washington, DC: U.S. Department of Health and Human Services.
Institute of Medicine 1994 Health Data in the Information Age: Use, Disclosure, and Privacy. Committee on Regional Health Data Networks. Molla S.Donaldson and Kathleen N.Lohr, editors. Division of Health Care Services. Washington, DC: National Academy Press.
1995 HIV and the Blood Supply: An Analysis of Crisis Decisionmaking. Committee to Study HIV Transmission Through Blood and Blood Products. Lauren B. Leveton, Harold C.Sox, Jr., and Michael A.Stoto, editors. Division of Health Promotion and Disease Prevention. Washington, DC: National Academy Press.
Kronick, R. 1999 Numbers We Need: Health Statistics and Health Policy. Paper presented at the Committee on National Statistics Workshop, Toward a Health Statistics System for the 21st Century, November 4, Washington, DC. Available: <http://www.ncvhs.hhs.gov/hsvision/visiondocuments.html>. [July 12, 2001]
Levit, K., C.Cowan, H.Lazenby, A.Sensenig, P.McDonnell, J.Stiller, A.Martin, and the Health Accounts Team 2000 Health Spending in 1998: Signals of Change. Health Affairs 19:124–132.
Morbidity and Mortality Weekly Report 1999 Achievements in Public Health, 1990–1999. August 6, 1999. 48(30):649–656.
National Center for Health Statistics 1980 Directions for the ‘80s: Final Report of the Panel to Evaluate the Cooperative Health Statistics System. DHHS Pub. No. (PHS) 80–1204. Hyattsville, MD: U.S. Public Health Service.
1998 Current Estimates From the National Health Interview Survey, 1995. Vital and Health Statistics 10(109). DHHS Pub No. (PHS) 98–1527. Washington, DC: U.S. Government Printing Office.
1999 Health United States, 1999 With Health and Aging Chartbook. DHHS Pub. No.(PHS) 99–1232. Washington, DC: U.S. Government Printing Office.
National Research Council 1992 Principles and Practices for a Federal Statistical Agency. Committee on National Statistics. Margaret E.Martin and Miron L.Straf, editors. Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.
1993 Private Lives and Public Policies: Confidentiality and Accessibility of Government Statistics. Panel on Confidentiality and Data Access. George T.Duncan, Thomas B.Jabine, and Virginia A.de Wolf, editors. Committee on National Statistics, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.
1997 For the Record: Protecting Electronic Health Information. Committee on Maintaining Privacy and Security in Health Care Applications of the National Information Infrastructure. Computer Science and Telecommunications Board, Commission on Physical Sciences, Mathematics, and Applications. Washington, DC: National Academy Press.
National Research Council and Institute of Medicine 1992 Toward a National Health Care Survey: A Data System for the 21st Century. Panel on the National Health Care Survey. Gooloo S.Wunderlich, editor. Committee on National Statistics, Commission on Behavioral and Social Sciences and Education, and Division of Health Care Services, Institute of Medicine. Washington, DC: National Academy Press.
Office of Federal Statistical Policy and Standards 1978 A Framework for Planning U.S. Federal Statistics for the 80s. U.S. Department of Commerce. Washington, DC: U.S. Government Printing Office.
Office of Management and Budget 1998 Statistical Programs of the United States Government: Fiscal Year 1999. Washington, DC: Executive Office of the President.
Pollock, A.M., and Rice, D.P. 1997 Monitoring health care in the United States: A challenging task. Public Health Reports: 112:108–113.
Rice, D.P. 1981 Health statistics: Past and present. Editorial. The New England Journal of Medicine 305(4):219–220.
Stroup, D.F., and Teutsch, S.M. 1998 Statistics in Public Health: Qualitative Approaches to Public Health Problems. New York: Oxford University Press.
Thacker, S.B., Stroup, D.F., Parrish, R.G., and Anderson, H.A. 1996 Surveillance in environmental public health: Issues, systems, and sources. American Journal of Public Health 86(5):633–638.