4
Methodological Issues in the Collection and Use of Data About Dying

End-of-life researchers and others working more broadly with large datasets have identified several methodological issues that require consideration as we look at the use of existing datasets to better understand quality of life and care at the end of life. Small sample sizes are always an obstacle, particularly when using nationally collected health data that are not focused on people who are dying. For adults, with sufficient accumulation of years of data, large enough samples can be assembled for many kinds of analysis of topics related to dying. For children and young adults, however, that is not the case; there have been no studies to date that have focused closely enough on younger people to provide any useful aggregate information about those approaching the end of life. This chapter describes the most pervasive methodologic issues that make end-of-life research difficult. Some of these issues are specific to research on the dying, but many also apply to other areas of health research. Both the specific and the general are included in this chapter, as they apply to research on death and dying.

METHODOLOGICAL ISSUES IDENTIFIED BY END-OF-LIFE RESEARCHERS

End-of-life researchers have described many challenges associated with efforts to monitor quality of life and quality of care at the end of life, as discussed below: (1) obtaining information from the perspectives of the person dying, the person’s loved ones, and health providers; (2) coping



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 53
4 Methodological Issues in the Collection and Use of Data About Dying End-of-life researchers and others working more broadly with large datasets have identified several methodological issues that require consideration as we look at the use of existing datasets to better understand quality of life and care at the end of life. Small sample sizes are always an obstacle, particularly when using nationally collected health data that are not focused on people who are dying. For adults, with sufficient accumulation of years of data, large enough samples can be assembled for many kinds of analysis of topics related to dying. For children and young adults, however, that is not the case; there have been no studies to date that have focused closely enough on younger people to provide any useful aggregate information about those approaching the end of life. This chapter describes the most pervasive methodologic issues that make end-of-life research difficult. Some of these issues are specific to research on the dying, but many also apply to other areas of health research. Both the specific and the general are included in this chapter, as they apply to research on death and dying. METHODOLOGICAL ISSUES IDENTIFIED BY END-OF-LIFE RESEARCHERS End-of-life researchers have described many challenges associated with efforts to monitor quality of life and quality of care at the end of life, as discussed below: (1) obtaining information from the perspectives of the person dying, the person’s loved ones, and health providers; (2) coping

OCR for page 53
with variations in the quality of existing data; (3) coping with the difficulties in collecting data from dying people and their loved ones; (4) characterizing the quality of end-of-life care; and (5) defining the period to be considered the “end of life.” A more general issue is the lack of standardized terminology in this field. The various conceptual models described in Chapter 2 of this report (which use the original terms, as reported by authors) begin to suggest the variety of language used. The relatively small community of researchers working in this field do interact closely, but should be encouraged to begin converging on a set of defined terms that can then be used in surveys and other types of research to improve comparability among data sets. Obtaining information from varying perspectives. A full evaluation of the quality of life and quality of care at the end of life requires information from several perspectives: (1) that of the person dying, (2) that of dying person’s loved ones who are intimately involved in the person’s death, and (3) that of health care providers who are in a unique position to judge the quality of care with respect to current science and professional standards and to report the services used. Most clinicians and researchers acknowledge the importance of the patient’s perspective in understanding quality of care, as well as the importance of recognizing the family as the target of care (Donaldson and Field, 1998; Stewart et al., 1999; Teno et al., 1999). Gathering data in a systematic way that captures the perspectives of patients, their loved ones, and health providers is challenging, though—and at present, it is basically not done. Coping with variations in the quality of existing data. A second challenge is dealing with the variation in quality—i.e., completeness, reliability, and validity—of available data. Information about quality of life and quality of care at the end of life is important to clinicians and to researchers, to individuals and organizations interested in internal quality improvement efforts, and to agencies concerned with external inspections. As this field of inquiry grows and develops, there will be variations in the precision of the data collection and in acceptable standards for reliability and validity. Clearly, some early pilot studies will have to use tools that are not validated, but one aim of that type of research will be to learn about the tools and determine whether they can be used widely. Similarly, as we look to large datasets for information, we must weigh the advantages of developing a portrait using the relatively crude tools that exist, with the risk of being misled by the use of data originally designed for a wholly different purpose. Coping with the difficulties in collecting data from dying people and their loved ones. A third important challenge is coping with the difficulties of collecting data from dying people and their loved ones.

OCR for page 53
People who are sick enough to die or struggling to cope with the physical and emotional challenges of caring for a dying loved one are much less likely to be able to respond to surveys or interviews than are people without those burdens. Investigators estimate that one in three dying persons is unable to be interviewed close to death because of somnolence, other cognitive deficits, asthenia, or other medical reasons (Teno et al., 2000). Consequently, missing data and the proportion of proxy responses will present significant challenges to those analyzing these data. Characterizing the quality of end-of-life care. Yet another challenge is characterizing the quality of end-of-life care. Describing quality involves capturing overuse of medical resources and well as underuse, in addition to documenting poor skills and performance on the part of health care providers (Donaldson and Field, 1998). Judgments in these areas are value laden and must be made with careful consideration of the current evidence base. Nonetheless, research suggests that expert panels can adequately identify the appropriateness and quality of care (Brook et al., 1986; Schuster et al., 1998). Assessing continuity and coordination of care is also important to quality, but, again, methods to do this are very limited at this time. Defining the period to be considered the “end of life.” Defining a period of time to be called the “end of life” is problematic. The changing face of death in this country requires an acknowledgment of the chronic nature of eventually fatal illnesses—such as congestive heart failure and end-stage renal disease—as well as a better understanding of the trajectory of dying from complications associated with dementia and frailty in old age (Lunney et al., 2001; Lynn, 2001; Teno et al., 2000). This suggests that initial information gathering be done tentatively, with a wide net that can be progressively narrowed, as we begin to better understand these various pathways to death. METHODOLOGICAL ISSUES RELATED TO THE USE OF EXISTING DATASETS There are several methodological issues associated with the use of large datasets created for purposes other than end-of-life research. These generally stem from the methods and sampling strategies used to collect the data for these datasets. Design and scope of the original study. If an existing dataset is the result of a specific survey, the usefulness of the data for research on the quality of life and care at the end of life will depend in part on the design and scope of the original study. Information from cross-sectional health surveys of the general population may enhance our understanding

OCR for page 53
of quality of life and care at the end of life if the sample was large enough to allow the identification of a sufficient number of decedents. The National Health Interview Survey, for example, collects information from 30,000 respondents each year. Matching information from this survey to the National Death Index has yielded 54,534 decedents over the period between 1986 and 1994 (Schoeni, 2002), allowing very detailed analyses. However, very few surveys are conducted on such a scale. Data from longitudinal panel studies of the elderly or of others at high risk of death are also valuable, provided that the measurement interval is short and likely to result in data collected close to death. Sampling strategies. Another important methodological issue in using existing datasets to study quality of life and care at the end of life that warrants attention is the sampling strategies that were used to collect data. Large surveys based on household sampling strategies will have limited value for end-of-life study, because findings cannot be generalized to institutionalized populations and may not fully represent the aging population. Many people at the end of life are elderly and/or chronically ill, and likely to be institutionalized or heavily concentrated in age-or income-restricted housing. List-based sampling strategies, such as the use of Medicare beneficiary roles, better tap these key populations, but they also result in data that cannot be generalized to the full spectrum of the terminally ill population. Unit of analysis. Careful consideration needs to be given to the unit of analysis of existing datasets. Institutional-level data that includes information on individual clients could be an important source of information for end-of-life research. Pooling data from multiple institutions, however, requires careful consideration of the original sampling strategies and timing of data collection. METHODOLOGICAL ISSUES RELATED TO SURVEY METHODS Several aspects of survey methods are particularly troublesome for the purpose of studying individuals who are elderly or sick enough to die. Variation in responses among people in different age groups. Various age groups are known to derive different meaning from the same question, and elderly people are less comfortable than younger people with questions that require drawing comparisons or psychological self-description (Herzog and Rodgers, 1992). Elderly respondents also have difficulty making time tradeoff or utility judgments and completing visual analog scales. Furthermore, age, cognitive ability, and health status

OCR for page 53
affect the likelihood, as well as the quality, of responses to questionnaires and interviews. Reliability and validity of administrative data. The reliability and validity of administrative data is an additional concern. For example, coding error rates within the Healthcare Cost and Utilization Project’s National Inpatient Sample were found to vary widely across states, hospitals within states, geographic location, and hospital characteristics (Berthelsen, 2000). Furthermore, information about comorbidity is seriously underreported (Green and Wintfeld, 1993; Iezzoni et al., 1992), especially for patients with life-threatening disorders (Jencks et al., 1988). Use of proxy respondents. The use of proxy respondents warrants special attention because of the critical role they play in collecting information about people who are cognitively impaired or too sick to be studied directly, because proxy respondents represent the key source of information after a death has occurred. It is important to understand that these proxies are very likely to have been affected by the decedents dying process and death. They may be asked to report both on their own distress as well as provide information about the decedent. Despite this caution, in general, studies report fairly good agreement between subjects and proxies in many types of assessments. A recent review of clinical studies comparing proxy data with other sources of information for adults (Neumann et al., 2000) found the following: Spouses, children, or other close family members tend to be capable proxies, although proxy reports may be influenced by caregiving burden. Proxy and subject reports are often comparable in describing levels of functioning, although proxies tend to identify more impairment. Proxies and subjects generally agree on overall health, chronic physical conditions, and physical symptoms. Relatively little is known about the comparability of proxy reports regarding health care utilization. There is low to moderate agreement between proxies and subjects regarding depressive symptoms and psychological well-being, with proxies describing more problems. Proxies are often in agreement with subjects on reports of cognitive status, although proxies may overestimate cognitive abilities. Variation in agreement between subjects and proxies. Variation limits the validity of some data and hampers the comparison of large datasets that contain differing proportions of proxy responses. When possible, it is important to adjust for proxy responses with the develop

OCR for page 53
ment of a predictive model based on a subset of data for which both patient and proxy responses are available for comparison. Difficulties in measuring complex variables. The measurement of complex variables inevitably introduces another difficult issue— sources of measurement variation among surveys and within rounds of the same survey. The measurement of a person’s disability or functional status is a useful example of this issue. First, there are multiple accepted ways of measuring function, including (1) questionnaires or observations of activities of daily living (ADLs) or instrumental activities of daily living (IADLs); (2) questions about physical activity, exercise, or recreation; (3) questions about mobility, range of motion, strength, and endurance; and (4) clinically administered performance batteries. Within just one approach—self-administered questionnaires concerning ADLs—seemingly minor differences in the structure and wording of a questionnaire result in major differences in prevalence estimates of disability (Freedman and Martin, 2000; Picavet and van den Bos, 1996; Wiener et al., 1990). Many of the dimensions of quality at the end of life are equally complex and prone to measurement variation. ANALYTIC ISSUES IN USING DATA FROM IDENTIFIED DECEDENTS When using data from identified decedents, important analytic issues arise: Approaches to handling incomplete data. Handling incomplete data is challenging because variables are likely to be missing for specific reasons, and their absence may bias overall results. Those most in pain or most disabled are perhaps the least likely to provide data points near the end of life. Careful consideration should be given to techniques to reduce non-response biases (Lemke and Drube, 1992). Approaches to describing health and functional status. Describing the health and functional status of people sick enough to die is especially difficult because of the complexity of the conditions that interact to produce a given state and then change at individually varying rates over time. Multivariate statistical approaches are essential, but even then, careful consideration should be given to capturing individual heterogeneity within complex health states (Manton and Woolson, 1992). This observation implies that subjective, narrative (i.e., expansive) data are needed to supplement the categorical data usually collected in large datasets. Approaches to operationalizing longevity. Operationalizing longevity has been approached in various ways. The majority of studies take

OCR for page 53
the approach of a direct comparison of decedents with survivors at the end of the follow-up period, but other studies incorporate survival time either through methods that compare sample mortality to population mortality (e.g., Cox proportional hazards model for the statistical analysis of failure time data) or by basing measures on observed mortality (e.g., Palmore’s longevity quotient). Addressing the interaction between length of life and outcomes of care. The interaction of length of life and outcomes of care raises another thorny problem. Most variables relevant to health tend to get “worse” near death; for example, pain, costs of care, caregiver burden, disability. Consequently, care strategies that prolong survival—which themselves may be considered poor outcomes if they only prolong the dying process—may appear to produce poorer outcomes because of the larger numbers of subjects who then fall within this prolonged stage of worsening outcomes. On the other hand, strategies that appear to improve valued outcomes may do so only by shortening lifespan, thus compressing the period of observation during which these negative outcomes tend to be observed. ISSUES RELATED TO PRIVACY, CONFIDENTIALITY, AND LINKING DATASETS The idea of linking records, created for different purposes, and possibly at different times and places, of a single individual, is not new. A recent report of the U.S. General Accounting Office (GAO, 2001) cites a paper from the 1950s that discusses linking data from different sources through matching names. Since that time, it has become increasingly clear that linking data sources can create information not available from single sources, and has the potential to reduce both the costs of data collection and the burden on respondents. As part of the effort to “reinvent government,” the U.S. Department of Health and Human Services (HHS) has undertaken a major planning effort to restructure its health surveys. Multiple HHS data collection efforts are now linked analytically through the use of common core questionnaires, common sampling frames, and common definitions and terms. This results in an overall reduction in the burden imposed on survey respondents, increases the efficiencies of data collection, and vastly improves the analytic capabilities of HHS surveys (HHS, 1995). In this report, the value of data linkage is recognized, but the issues related to it are, as well. The committee made no recommendations specifically about privacy issues because, as described in this section, these issues receive a great deal of attention from various public and private

OCR for page 53
entities, which have been working successfully toward both increasing linkages and safeguarding privacy and confidentiality. Linked records do not raise entirely new privacy issues. They are extensions of the privacy issues that have been and continue to be debated, legislated, and regulated in relation to single data sources. But linkage does raise some additional issues. The GAO (2001) identified five examples: Consent to linkage. In some cases data linkage requires that subjects give consent to the linkage, but in other cases, they may be unaware that, in essence, new information about them is being created. Some linkages require data sharing between agencies, and when this occurs, certain laws and policies concerning disclosure and consent are relevant. Notably, the Privacy Act generally requires consent for disclosure from one agency to another, but there are exceptions. Data sharing. In order to compile the information needed for record linkage and “make the link,” agencies must often share identifiable person-specific data. But traditionally, data have been kept separately, and various statutes have been enacted to prohibit or control certain kinds of data sharing. Privacy concerns stem from a desire to control information about oneself and a perceived potential for inappropriate government use (e.g., to pursue criminal charges). Security risks could also arise during data transfer. Reidentification risks. Some datasets are linked using a code-number procedure or are stripped of explicit identifiers as soon after the linkage as possible; nevertheless, reidentification of at least some data subjects may be possible through a deductive process, so only controlled use would be appropriate. To facilitate broader access to statistical and research data, agencies have created more fully “deidentified” public-use datasets (ICDAG, 1999). Although many linked datasets are not made available for public use, some are—and concerns about the reidentification risks associated with these datasets are increasing. Potential sensitivity. The potential sensitivity of data (risk of harm to data subjects) cuts across all other privacy issues. This is true for linked data as well as for single-source datasets. However, linkage may heighten the sensitivity of data that, taken by themselves, appear to be relatively innocuous. Security of linked data. Security is crucial to protecting stored data. For linked data, this is especially true because a linked dataset may be more detailed or more sensitive than its components. In response to these concerns, and as data linkages have become more common, techniques for protecting privacy and improving “data stew

OCR for page 53
ardship” have also developed, through the leadership of various groups, including the Office of Management and Budget and its Interagency Council on Statistical Policy, the Federal Committee on Statistical Methodology, and Confidentiality and Data Access Committee; and the DHHS Data Council1 and the DHHS Office for Human Research Protections. The National Research Council and its Committee on National Statistics, and the IOM have issue also issued reports dealing with these issues (e.g., NAS, 2000; NCPB, 2000) Laws and Regulations Affecting Linked Data The Privacy Act of 1974 establishes governmentwide policies for the disclosure of data by federal agencies and requires agencies to safeguard identifiable information. Under the act, agencies may not disclose identifiable information to third parties without the individual’s prior consent. The act contains 12 categories of exceptions to the consent requirement, intended to accommodate legitimate needs for identifiable information, such as conducting research and statistical activities that involve record linkage. In addition, there are certain federal regulations, most notably the Federal Policy for the Protection of Human Subjects, known as the Common Rule, that govern certain research projects that involve human subjects or personal information on them; these projects may include record linkage. Under the Common Rule (HHS regulations codified at part 45, Part 46, Subpart A of the Code of Federal Regulations), research supported or regulated by any of 17 federal agencies is subject to certain federal oversight requirements. In accordance with the Common Rule, organizations have established local institutional review boards (IRBs), made up of both scientists and nonscientists, to approve or disapprove research projects depending on such factors as whether researchers minimize the risks to research subjects and obtain their informed consent. Tools to Ensure Privacy of Linked Records GAO (2001) identified the tools relevant to assuring the privacy and confidentiality of linked records, which include: 1   The HHS Data Council consists of HHS officials who have a direct reporting relationship to the Secretary, the HHS Privacy Advocate, and the Senior Advisor on Health Statistics. The Council coordinates HHS data collection and analysis activities, including privacy policy activities.

OCR for page 53
techniques for masked data sharing, procedures for reducing reidentification risks (including safer data and safer settings), and techniques to reduce the sensitivity of the data being linked. Techniques for masked sharing or linkage include list inflation, third-party models, and grouped linkage. Secure transfer is aided by techniques, such as encryption, as well as physically secure transfer vehicles (e.g., secure data lines). Safeguard reviews can help ensure that security measures are being followed in another agency. Obtaining consent or providing the ability to “opt out” may be necessary for at least some linkage projects. One approach, used in the Health and Retirement Study, is an explicit consent form, which asks the respondent’s permission for specific records to be transferred from SSA to the University of Michigan for the purpose of linkage. The basic physical and electronic security approaches that are used to protect any data stored electronically also are relevant for information resulting from record linkage. These include access controls, audit trails, and storage strategies. (NRC, 1997; IRS, 2000; Jabine, 1993; GAO, 2001b; 2000a,b,c; 1999; 1998). Stewardship involves compliance with relevant laws as well as the coordination of project-by-project decisions, (which may include whether or not to conduct a specific linkage or whether to release linked data); systems for accountability; and organizational culture.