Improving Data Sources

Over the course of the workshop, the discussion converged on important implications for data collection, and the participants made numerous suggestions—some highly detailed and others more general—for improving data sources on the elderly and aging. Participants agreed that current data collection efforts are highly fragmented and are administered by different agencies and investigators, each with its own substantive interests, distinctive histories, and constituencies.

Furthermore, budget constraints make it impossible for surveys to capture every desirable piece of information on elderly health, employment and economic status, care arrangements, family structure, and the like. Therefore, participants suggested both priorities for data collection efforts and strategies for achieving greater efficiency and coordination among existing data sets. With these shared concerns about fragmentation and resource constraints as a source of departure, six common themes regarding problems and suggestions for improving data on older Americans emerged from the workshop discussions.10

OVERSAMPLING SUBGROUPS OF THE ELDERLY

The changing demographics of the elderly population, as well as shifting policy concerns, warrant inclusiveness in data collection efforts. In particular,

10  

Of course, not all participants agreed with each and every priority; for instance, some urged for detailed data collection at the state and local levels, while others called for more effective use of available data.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop Improving Data Sources Over the course of the workshop, the discussion converged on important implications for data collection, and the participants made numerous suggestions—some highly detailed and others more general—for improving data sources on the elderly and aging. Participants agreed that current data collection efforts are highly fragmented and are administered by different agencies and investigators, each with its own substantive interests, distinctive histories, and constituencies. Furthermore, budget constraints make it impossible for surveys to capture every desirable piece of information on elderly health, employment and economic status, care arrangements, family structure, and the like. Therefore, participants suggested both priorities for data collection efforts and strategies for achieving greater efficiency and coordination among existing data sets. With these shared concerns about fragmentation and resource constraints as a source of departure, six common themes regarding problems and suggestions for improving data on older Americans emerged from the workshop discussions.10 OVERSAMPLING SUBGROUPS OF THE ELDERLY The changing demographics of the elderly population, as well as shifting policy concerns, warrant inclusiveness in data collection efforts. In particular, 10   Of course, not all participants agreed with each and every priority; for instance, some urged for detailed data collection at the state and local levels, while others called for more effective use of available data.

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop workshop participants noted that new strategies for oversampling currently underrepresented subgroups of older men and women would be valuable. The older population is growing increasingly diverse, and these diverse subgroups experience their later years in very different ways. As noted above, economic status varies widely by race, ethnicity, gender, and residential pattern. Workshop participants also pointed out that while gains in health and life expectancy at older ages are expected to continue in the aggregate, this optimistic scenario might not hold true for all populations, especially less educated and poor people. Most national surveys contain samples that are too small to be statistically representative of older age groups by sex and race or ethnicity. Several participants noted that the subgroups one wishes to use for analytic purposes may not necessarily be defined by the traditional demographic characteristics of race, ethnicity, or gender. Rather, characteristics such as residential status, urban or rural location, or early disability status may be important correlates of the outcome variables of interest. Other participants suggested that factors such as regional climate conditions, support for local services, and social interactions with younger populations may be important correlates of physical and emotional health. Therefore, where possible, oversampling the older population—and especially underrepresented subgroups—may be needed. Similarly, participants noted that sufficient numbers of new immigrants may not be captured in current data sets. One proposed solution is for longitudinal studies to periodically replenish their samples, so that "new" Americans can be included in the analyses. Appendix B lists populations that are oversampled in existing data sets used for the study of aging. Another dominant theme at the workshop was that data should be collected on multiple cohorts and that data should be collected at multiple points in time for each cohort. Participants noted that between-cohort comparisons can be more useful than simply studying the experience of one birth cohort. Moreover, social change can be assessed by contrasting the experiences of diverse cohorts. Caution must be taken in extrapolating from the experience of earlier cohorts, however; each birth cohort arguably has unique characteristics, and newer cohorts of the old, especially the post-World War II baby boomers, may have very different health, education, work history, family arrangement, and benefits profiles than earlier cohorts. Participants who called for detailed cohort data also noted that many characteristics of the elderly—especially mental and physical health, disability, and the demand for health services—are the result of cumulative experiences throughout life. For example, cohort studies by Elo and Preston (1992) demonstrate that susceptibility to certain diseases is established early in life and persists throughout adulthood. Since a poor health environment in childhood is often associated with low levels of schooling and occupational attainment, the direct effects of schooling and occupational status on adult health status can be overestimated

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop without proper data on earlier experiences. To understand such connections would require that longitudinal data be obtained on multiple cohorts. For these reasons, workshop participants stressed the importance of repeated panel data and longitudinal data on different cohorts of the old, sufficiently large to capture diversity. Such efforts are currently under way; HRS, for one, will add members of the 1924–1930 birth cohort during its next wave of data collection (Juster and Suzman, 1995). These data, combined with HRS data on the birth cohorts of 1931–1941 and the AHEAD data on cohorts of 1923 and earlier, will ensure the coverage of every cohort of the current elderly population. This coverage will allow for between-cohort comparisons and will allow researchers to study the effects of period effects—such as economic conditions—on the elderly. CROSS-AGENCY COORDINATION AND DATA INTEGRATION The need for improved cross-agency planning and coordination, as well as coordination and data integration across multiple data sources, emerged frequently as a point of discussion.11 Since the early 1980s, the availability of large longitudinal and cross-sectional data sets on aging has increased dramatically. In the last 15 years, NIA has invested heavily in new data collection efforts, and various governmental departments have either supplemented existing surveys or developed new surveys to track the aging population. The burst of activity included the development of new NIA-funded data sets such as the AHEAD and the National Long-Term Care Survey, the introduction of questions on functional status of the elderly to the 1990 census, and the addition of the Supplement on Aging to the National Health Interview Survey (NHIS) of the National Center for Health Statistics (NCHS).12 As statistical data on the elderly population have accumulated, the data bases from which these data derive have become increasingly specialized in nature. Furthermore, even within single federal agencies, supplements, topical modules, and new surveys have proliferated. Integrating or linking data from various sources offers the advantage of increasing the utility of existing data bases at a relatively modest cost (Agency for Health Care Policy and Research, 1991). In particular, six kinds of data integration or linkages were discussed: 13 11   A report by the Committee on National Statistics Panel on Retirement Income Modeling (Citro and Hanushek, 1997) discusses the needs for improved cross-agency planning and coordination and for integration of data sources on topics that relate to retirement income security. 12   This is a very abbreviated listing of the numerous data collection efforts that have occurred in the last decade. A detailed list of data sets for studying the aged is provided in Appendix B. 13   The six kinds are not necessarily mutually exclusive: for instance, firm-and employer-level data can also be considered a private data source.

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop integration of public and private data sources, such as data collected by the Robert Wood Johnson Foundation; integration of large national survey data with local-level data; integration of population-level data sources with administrative record data, such as those maintained by the Health Care Financing Administration (HCFA) and Social Security Administration; integration of population-level data with firm-or employer-level data; integration of population-level data with (benefit) provider-level data; and integration of population-level data with demonstration data, such as those from the Social Health Maintenance Organization.14 Linkages between data sets can be accomplished either through matches of records on the same individuals from different data bases, known as exact matches, or through matches of records on different individuals who are identified as identical or similar in important respects, known as statistical matches (see Gilford, 1988). Both types of matches serve to provide a broader set of data on individual cases—without undertaking new data collections—by merging two or more data sets that supply different sets of characteristics. Many strides have been made in the last 10 years, yet workshop participants noted that data linkage systems, for the most part, are still in their formative stages. The most successful linkages have occurred through the horizontal integration of multiple population-level data sets and the integration of population-level data sets with administrative record data. The use of data linkages can be enhanced by active coordination among agencies. One such program is currently under way. In 1995 NCHS launched the HHS survey integration plan, a major effort to restructure the health surveys sponsored by HHS in an effort to fill major data gaps, improve analytic utility, and create greater operational efficiencies. The plan addresses a range of linkages and consolidation approaches, including the integration of survey samples. A cornerstone of the plan is the restructuring of HHS health surveys so that the redesigned NHIS is the sampling "nucleus" for most HHS population surveys. The 1996 Medical Expenditures Panel Study, for one, will use the 1995 NHIS as a sampling frame, thus enhancing the NHIS data by providing detailed data on expenditure-related topics such as insurance and utilization. 14   The Social Health Maintenance Organization (SHMO) is a demonstration project funded by HHS that combines the delivery of acute and long-term care with adult day care services and transportation. Medicare beneficiaries pay slightly higher monthly premiums than elsewhere; Medicaid recipients pay nothing. The SHMO demonstration will help test whether comprehensive health services, linking acute and chronic care under an integrated financing scheme, can be provided at a cost that does not exceed the public costs of Medicare and Medicaid.

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop The availability of health care provider-level data, insurer-level data, and employer-level data and the capacity to link population-level data with such sources are limited, however, according to many workshop participants. Efforts have been under way at HHS to remedy several of these limitations. The department has begun to build on and coordinate three surveys on health care providers: the National Nursing Home Survey, an NCHS survey; the National Nursing Home Expenditure Survey, a survey of the Agency for Health Care Policy and Research (AHCPR); and the Medicare Current Beneficiary Survey, a HCFA survey. In addition, plans are under way to integrate the National Employer Health Insurance Survey and the Health Insurance Provider Survey, both of which obtain insurance information from employers. Plans are also being made for restructuring all HHS provider-based surveys. Participants stressed the importance of obtaining data from detailed insurance claims records and characteristics of insurance and pension plans, as well as the cost, type, and effectiveness of medical and personal care being provided to individuals. They also acknowledged, however, that protecting individuals' privacy must be an equally important objective. Some participants noted, however, that reliance on Medicare data for information on expenditures and utilization may be thwarted by the growth of managed care, a system that does not provide the same level of detailed data. Progress on linkages between individual-level and employer-level data has been less substantial. Workshop participants noted that employer-level data are crucial for understanding the labor force behavior of future cohorts. Factors such as employers' willingness to retain older workers and the type of health benefits afforded to early retirees will be critical for policies related to retirement age and eligibility for Social Security benefits. Several participants said that more employer-level data should be collected and linked to population-level data. Private foundations and corporations were frequently mentioned as potential data "partners" of large population-level surveys. For example, the Robert Wood Johnson Foundation, a national philanthropy devoted to health and health care research, is collecting extensive data on health care systems in 60 communities. One way to expand the utility of such data sets is to sample respondents from the same primary sampling units as used in "partner" population-level data sets. Although privately funded and collected data sources can provide valuable information on health care providers and characteristics of the care recipients, ethical and procedural issues surrounding the integration of public and private data sources have not yet been resolved. Perhaps the greatest strides in linkages have occurred with the integration of individual-level survey data with administrative data. Administrative records are invaluable sources of data: for example, the National Death Index provides information on the date, location, and cause of death, while Social Security Administration records provide detailed information on individual earnings. A tracking system has been developed whereby all NCHS surveys will be linked to National Death Index and HCFA data bases. Many federally funded surveys

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop currently have, or will have, the capacity to be linked to administrative records, including the National Death Index and Social Security Administration earnings data (see Appendix B). Linkages of administrative data with survey data could provide a powerful mechanism to explore the relationship between services provided and resulting health status in the geographic area served by health services organizations, for instance (Newacheck and Starfield, 1995). Workshop participants cautioned, however, that in cases of statistical uses of administrative records, particular care must be taken to safeguard traditional assurances of confidentiality of statistical data. ETHICAL CHALLENGES Throughout the workshop, participants grappled with the conflict between maximizing the usefulness of data sources and abiding by the ethics of confidentiality and informed consent. Moreover, participants noted that a host of ethical uncertainties surround newly developing data collection practices, such as genetic testing on the physiological samples (usually blood) obtained from survey participants, linking population-level data with local-level data, and integrating public and private data sources. Data bases that draw samples directly or indirectly from the decennial census (e.g., the Current Population Survey), must comply with U.S. Code Title 13, which prohibits the release of any microdata that theoretically could result in identification of respondents. For instance, SIPP, which uses an address list that derives from the decennial census, has "confidential" files that include linked administrative data that can only be analyzed by designated Census Bureau employees. Linkage to local-level data runs the risk of violating the federal rule that no data will be available in "identifiable form," defined as "any representation of information that permits information concerning a specific respondent to be reasonably inferred by either direct or indirect means" (U.S. Office of Management and Budget, 1996:2878). Definitive solutions to the tension between data linkage and the use of small-level and administrative data have not yet been found. One action has been proposed by a working group formed under the National Information Infrastructure: the idea of a "privacy ombudsman" or "czar." Survey respondents would have access to a central person who could answer their questions and provide additional information and assurance about the confidentiality of their responses. Researchers are only beginning to tackle the ethical issues involved with the collection and analysis of genetic data. Among the ethical and consent issues raised during the workshop were informed consent for future genetic studies not yet planned; the ownership of genetic information; the disposition of genetic material and registry data, as well as implications for commercial processes; notification of study participants of new or unexpected findings; the implications for family members of genetic findings; the access of insurance companies to

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop research findings; and resolving whether participants really understand the "informed" consent information (see Clayton et al., 1995, for a review of related issues). Representatives from NCHS presented workshop participants with a current ethical dilemma. As part of the extensive phlebotomy protocol of the National Health and Nutrition Examination Survey (NHANES III), a small sample of blood was collected that could be used for analysis of DNA and cell lines. Since the avalanche of genome information expected to accrue over the next decade has led to concerns regarding the potential misuse of such information, this situation requires resolution of complex privacy, informed consent, and related ethical issues. Options range from completely anonymous testing to recontacting and obtaining additional information from NHANES III respondents. For an anonymous research design, data on a limited number of categories—such as age (grouped by broad categories), race or ethnicity, gender, and education—could be identified so that a randomly chosen subset of each cell could be selected, assigned a new random identification number, and tested. Using these limited variables, anonymous testing can provide information on the frequency of genotypes in target populations or case-control studies. A recontact, on the other hand, would allow researchers to obtain additional informed consent, thereby allowing genetic information to be linked to the remaining, extensive NHANES III observations. Longitudinal and family medical history data could also be obtained through recontact. As of the time of the workshop, NCHS had not determined what course of action to take. Yet another set of concerns about ethics and procedural issues surround the proposed practice of integrating national, public survey data with privately collected data sources, such as the Robert Wood Johnson Foundation data on health utilization patterns at the community level. Questions about the possible loss of nonprofit tax status, and the possibility of private, for-profit organizations using publicly funded surveys for market research or for-profit purposes were among the concerns raised. Participants agreed that innovative data linkages and data collection efforts will inevitably continue to raise new ethical challenges. ATTENTION TO STATISTICAL METHODOLOGY Although most discussions at the workshop focused on data collection efforts, participants agreed that data collection is only the first step in successfully studying the aging population. The need for research and policy-oriented analysis on patterns of aging and their consequences was also underscored. Such investigations will require sophisticated use of existing statistical methodology and, in some cases, the development of new methods. For instance, the collection of additional longitudinal data may be most fruitful when accompanied by the further development and assessment of stochastic models for temporal processes. Moreover, projections of population characteristics based on traditional assump-

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop tions about the aging process—especially continued declines in morbidity—will have to be dealt with cautiously. Suggestions were also made for additional model building, with an emphasis on interactions among changes in health status, economic status, and living arrangement. Actuarial, economic, demographic, and epidemiologic models are among the tools available to forecast the health and behavior (e.g., service utilization, benefit levels, etc.) of future cohorts of the elderly (Manton, Singer, and Suzman, 1993). In general, federal government programs have employed actuarial forecasting methods, which traditionally have been used to anticipate the future fiscal risk of programs based on well-defined past experiences, while academic researchers have developed more sophisticated models of health, functioning, and life expectancy that describe such outcomes as a function of individual states and characteristics (Freedman and Soldo, 1994). Workshop participants also observed that important factors in many projections concerning population size, composition, and health status are the future levels of age-specific mortality. Projections should not be based solely on optimistic assumptions about aging and life expectancy; a reverse in life expectancy gains is also a possibility, especially given the anticipated heterogeneity among future cohorts of the elderly. Participants also noted that American researchers could benefit from examining cross-national comparative data. Such an effort is important for understanding how similarities and differences in both initial conditions and policy responses in the United States compare to those in other nations. For example, because most western European nations are experiencing an aging population similar to that in the United States—albeit at a slightly earlier time—data from these nations may inform predictions and assumptions about the U.S. aging population. DATA FOR STATE-LEVEL AND LOCAL-LEVEL ESTIMATES Recognizing the current debate about a move toward block grants, the state and local locus of many policies and programs for the elderly, and growing interest in the effects of state variation in benefit levels, service structures, and other key features of social policy programs, workshop participants noted the critical need for data at the state and local level with respect to nearly every policy area discussed at the workshop. For example, most major federal health data bases were originally designed to provide national or regional estimates, but state-level data will be critical in evaluating health care reforms since reform is likely to be implemented differently in each state (and perhaps only in selected states). Consequently, state-level data represent a key component in studying health and health care of the elderly. All states maintain vital records, and many maintain data collection systems for hospital discharge information. Although claims payment systems are main-

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop tained by every state Medicaid agency, not all such systems lend themselves to analysis. Moreover, there are large state-level differences in Medicaid funding: the dollar expenditure per recipient varies by a factor of three between the highest and lowest paying states. Consequently, participants stressed the need for data collection and retrieval to be coordinated so that data collected at the local level can be aggregated to state and national levels and so that data collected at state and national levels can be related to data collected at local levels: doing so may permit comparison of effects across communities and states. Pressing needs include the selection of primary sampling units that are consistent with state-level estimation, inclusion of a sufficient number of cases for each state to permit accurate estimation, and making better use of claims data and other administrative records. The value of linking qualitative and local data to national health surveys in order to interpret local health care trends, processes, costs, and effects on the elderly was also discussed throughout the workshop. Several efforts are under way to both enhance local-level data and link state-level data with national population surveys. One part of the HHS data integration project involves the development of a modular design that will facilitate state-level estimates and provide a mechanism for states to "buy into" national survey efforts to meet their own needs. The Centers for Disease Control is taking the lead in establishing baseline data at the state level; its plans include the development of a comprehensive, integrated, and flexible state-level telephone survey that can provide ongoing interviewing infrastructure to address a variety of issues, including access to care. Participants expressed concern about the ethical issues and inherent tension between data linkage and the use of small-area data. Clearly, available data sources would be enhanced by appending neighborhood or regional-based data, especially those variables measuring local resources, economic opportunities, aggregate poverty levels, and other local-area characteristics. Likewise, understanding the effects of health care reforms requires outcome data that can be aggregated at the patient-provider, plan, or system of care and community levels. 15 At the same time, however, the confidentiality of respondents' reports must be assured. Participants also expressed caution about the use of local-level data when studying the elderly. When elderly survey respondents live in nursing homes or institutions, it is unlikely that the economic and social characteristics of 15   One source that provides county-level data on health care services availability is the Area Resource File (ARF), maintained by HHS's Bureau of Health Professions. The ARF is a county-based file summarizing secondary data from a wide variety of sources into a single file to facilitate health analysis. The file contains over 7,000 elements for most counties in the United States (except Alaska). Data elements include county descriptor codes, and data on the number of health professionals in the county, health facilities, population, the training of health professionals, medical expenditures, hospital expenditures, Medicare enrollments and reimbursements, and economic characteristics of the county.

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop the surrounding neighborhoods are significant for the elderly residents themselves. ACCURATE AND STANDARD CONCEPTS AND MEASURES As progress is made in studying the aging process, researchers have begun to develop more precise and applicable measures of elderly health status, living arrangements, and care arrangements. Workshop participants pointed out that progress needs to continue in measuring each of the following: the timing, severity, and progression of chronic and acute health conditions; functional status, including ADLs and IADLs; psychological variables, such as denial and proclivity toward help-seeking behavior; and long-term care settings and characteristics. Measurement can be improved by calibrating survey measures and by striving for item comparability across surveys. It was emphasized that single-time measures of the presence of illness are inadequate. Rather, longitudinal data sets should be used to time the onset of diseases and to track the course of one's illnesses and health conditions. Also, longitudinal studies need to obtain information on events and conditions that occur during the time period between the data collection times, not just at the two times. While acute conditions, such as heart attacks or strokes can be dated, it is much more difficult to ascertain the onset of chronic conditions, such as high blood pressure, bronchitis, or Alzheimer's disease. Tracking health and illness should begin prior to old age; data sets with information on people at midlife should be used to focus on at-risk subgroups, such as the disabled and the poor. Moreover, survey items should capture the severity of illness, measured by indicators such as the degree of interference in a person's daily life imposed by the illness. Participants debated the utility of the ADL and IADL measures as indicators of disability. Some argued that these measures are good broad demarcators of independence and dependence status and relatively robust predictors of later mortality. Others countered that the ADL and IADL measures are not appropriate for predicting a broad range of outcomes and do not capture the broad continuum of disability status. An additional limitation of the ADL and IADL questions is that the question wordings sometimes confound supply of care with need for care. Specifically, some IADL and ADL questions ask sample members whether they received help with a variety of tasks. An affirmative response could reflect the availability of care providers in addition to a need for personal care. The difficulty of disentangling the need for health care and access to health care was discussed at several times during the workshop. Participants pointed to numerous studies that showed that an increase in the availability of health care services was accompanied by an increase in reported ill health. It was cautioned that survey data could erroneously reveal decreases in morbidity if future genera-

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop tions of older persons have reduced access to health care. Consequently, some workshop participants noted, there is a need to improve measures of the effects of not receiving the health care one needs. Participants called for improved measures of psychological characteristics and more fine-grained measures of care arrangements. Emotional or subjective health measures, such as ''denial" about one's health status, might be good predictors of seeking and following medical care and advice. Likewise, measurements of care arrangements warrant elaboration and improvement. AHCPR representatives reported that long-term care settings are changing and will continue to change. These settings now include subacute care, home health care, assisted living, and integrated care; both the level and type of care provided vary widely among these diverse settings. Moreover, the sociodemographic characteristics of persons utilizing these services vary widely. Current data do not differentiate among these arrangements. Participants called for longitudinal data on long-term care: care arrangements vary by cost, intensity, and setting over time, and such changes need to be captured if researchers are to adequately understand long-term care arrangements. Finally, general suggestions were offered to improve measurement in surveys. Suggestions included standardization of items across surveys and calibration of survey measures—when possible. Standardization of survey items was raised as a possible goal, yet one that may not be applicable to all research topics. For example, questionnaire items obtaining information on "health insurance provider" may not lend themselves to standardization over time and across surveys, as shifts occur in insurance provision arrangements. A decade ago, the majority of health insurance was provided by fee-for-service indemnity plans; now, health maintenance organizations (HMOs)—and more generally, managed care providers—are becoming the most common health insurance provider. Participants noted that with the increasingly widespread use of computer-assisted personal interviewing and computer-assisted telephone interviewing techniques, researchers may be able to easily and effectively arrange questionnaire items using a sequence of "unfolding" questions, which are contingent on answers to prior questions.

OCR for page 17
Improving Data on America's Aging Population: Summary of a Workshop This page in the original is blank.