Read "Effectiveness and Outcomes in Health Care: Proceedings of an Invitational Conference" at NAP.edu

« Previous: 13 Administrative Data in Effectiveness Studies: The Prostatectomy Assessment

Page 94 Cite

Suggested Citation:"14 Issues in the Use of Large Data Bases for Effectiveness Research." Institute of Medicine. 1990. Effectiveness and Outcomes in Health Care: Proceedings of an Invitational Conference. Washington, DC: The National Academies Press. doi: 10.17226/1631.

Page 95 Cite

Page 96 Cite

Page 97 Cite

Page 98 Cite

Page 99 Cite

Page 100 Cite

Page 101 Cite

Page 102 Cite

Page 103 Cite

Page 104 Cite

Page 105 Cite

Page 106 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

14 Issues in the Use of Large Data Bases for Effectiveness Research Stephen F. Jencks We have an enormous opportunity to move forward with outcomes analysis, particularly outcomes analysis based on claims and other large data sets. At the same time, I think there is a real risk of promising more than these approaches can deliver and compromising the future of this research. DEFINING LARGE DATA SETS I begin by explaining what a large data set is because I think the term has been too narrowly construed at times. Certainly, size is a feature of a large data set, but two other characteristics may be more important. POPULATION BASE First, a large data set usually contains, in some sense, data for a popula- tion or a random sample of a population. This can mean payer administra- tive data, such as claims data from the Medicare program (and thus all the data about services available from that source). It can also mean: · All hospitalizations occurring in a state. A number of states have all- payer data bases, and some of these are developing considerable clinical richness. All persons with a given disease in certain geographic areas. An example is the SEER (Surveillance, Epidemiology, and End Results) data bases maintained by the National Cancer Institute. · All persons born or dying in a state. State vital record systems are rich with data, and the National Mortality Registry provides an index for those state death records. · All persons in a random sample. Examples are the National Medical 94

USE OF LARGE DATA BASES 95 Expenditure Survey, the National Long-Term Care Survey, and the Uniform Clinical Data Set of the Health Care Financing Administration (HCFA) (which will include a random sample of Medicare discharges). · Various complex populations, such as Medicaid data sets, where people wander in and out of eligibility in complicated ways but nevertheless com . . pose a popu .atlon. ORIGINAL PURPOSE Large data bases are typically collected for some purpose other than that for which researchers wish to use them, and they often lack, therefore, some features or data researchers want. They are typically strong on size, on longitudinal detail, and on linkability to other data sets, but they are particularly likely to be thin on clinical detail and on functional status. Large data sets, then, tend to be unbiased pictures of patients and practice in the real world, but they rarely have just what researchers want and they are not randomized for treatment. In large data bases, many descriptors of health events are fairly good. There are errors in assigning codes to health events, but over- all the data are highly usable. The researchers can find surgeries, hospital- izations, office visits, and many kinds of health events, such as myocardial Infarctions. Outcomes data tend to be pretty good for certain outcomes and not so good for others. It is usually possible to get some inflation about outcomes other than death, costs, and resource utilization from billing data. These include: · Morbid events, such as rehospitalization, extended stay, and complications that are indicated by diagnoses and procedures; · Kinds of service utilization that indicate health status; · Information on nursing home tenure (Medicare data, for example, include not only bills for care in skilled nursing facilities, but also physician bills, which indicate, by the location of service code, that the patient was in a nursing home); and · Causes of death (these data from death registries can be hard to obtain and difficult to link, but investigators at HCFA's Office of Research and elsewhere have succeeded in doing so). Risk adjustors tend to be weak. Although previous diagnoses and use of services are available, and multiple concurrent diagnoses are available for hospital care, physiological risk adjustors are rarely available. On the other hand, there are data sets emerging, such as that being created in Pennsylvania, with substantial physiological data for inpatients, and there will be HCFA's Uniform Clinical Data Set (1~. Functional status data are almost unobtainable in large data sets. There

96 EFFECTIVENESS AND OUTCOMES IN HEALTH CARE has been intensive discussion about whether one can define a functional status instrument that should be collected on every patient, but this issue has not been resolved. USES OF LARGE DATA BASES What does one use a large data set for? The Office of Research at HCFA has a triple agenda in the area of effectiveness, namely, to look at the closely linked issues of the comparative effectiveness of providers, of procedures, and of payment systems. Large data sets have a variety of applications in these areas. SAMPLING FRAMES Large data bases are valuable as sampling frames for more intensive studies. The Office of Research and Demonstrations used the Medicare hospital discharge file in this way when it developed the Medicare mortality predictor system. We chose discharges from the discharge file and then went back and pulled those records and got supplementary information in order to develop risk adjustment tools. RATES AND OUTCOMES SURVEILLANCE Elliot Fisher and John Wennberg have described an example of how informative this kind of surveillance can be (2~. The Office of Research is using Medicare data to study diagnosed complications, rehospitalizations for apparently related conditions, and mortality for eight major surgical procedures. These analyses will be broken down by race, locality, and age. We are not analyzing these data by hospital, both because there are scientific problems involved and because the response of the professional community might well be so hostile as to interfere with effective use of the data. We will, however, be consulting with a number of groups about how to make the data more useful. VARIATIONS IN OUTCOMES There are three issues in variation of outcomes: 1. the amount of variation in outcomes across providers doing the same procedure; 2. variations in outcomes among different procedures for "similar" patients, such as those described by Fisher and Wennberg for transurethral versus open prostatectomy; and 3. variations in the effectiveness of different providers (for example, comparison of rates of various outcomes).

USE OF LARGE DATA BASES 97 As we move along this spectrum, methodological problems multiply and our ability to be confident in the conclusions we can draw from large data sets becomes progressively weaker. LINKING LARGE DATA SETS An important feature of large data sets is that they can often be linked so as to increase information about a patient or an event. Data sets with Social Security numbers on them can be linked to one another, and many data sets have or easily could have Social Security numbers. Linkages can broaden many kinds of research. The following are examples. . The Office of Research is linking Medicare files to the SEER registry in an effort to increase information about what happens to patients who are diagnosed with cancer (the registries contain information on stage and treatments). This is a powerful way to enrich a smaller data base: although the SEER data base is not small by most definitions, it is small compared to the Medicare data base. · Katherine Kahn, Robert Brook, Emmett Keeler, and others at The RAND Corporation have been studying the impact of the Medicare Prospective Payment System on quality of care. In that study, the Medicare claims data base has been linked to individual cases selected at random from hospitals in order to provide information on rehospitalization and mortality; this information would otherwise be very expensive, perhaps even unobtainable. In summary, the range of uses for large data sets is extraordinarily broad. We should be careful not to limit our thinking to analyses of Medicare hospital claims data, which are only a very thin slice of the pie. LIMITATIONS OF LARGE DATA BASES Large data sets have obvious and less obvious limitations. DATA QUALITY Many of the quality issues in large data sets will be familiar to any investigator who has used such secondary data. One feature of secondary data sets, however, requires special emphasis: unused data tend to be useless. Unless the people who create a large data set make use of an item in a way that provides feedback to those who collect it, the risk is very high that the item will contain so much error as to be unusable. We have found this to be true for items ranging from Social Security number to discharge destination. Thus, careful coordination between creators of data sets and investigators can be critical.

98 TIMELINESS EFFECTIVENESS AND OUTCOMES IN HEALTH CARE Because they are collected for other purposes, large data sets tend not to be available in a timely fashion. This is a special problem in using them for assessing individual providers because it is hard to get hospitals and physi- cians interested in data that are basically archival. ACCESS There are three kinds of problems in getting at the data: administrative access, processing, and understanding. · There is a perfectly straightforward way of getting the income data that Barbara McNeil discusses (3) by linking Internal Revenue Service (IRS) data from tax returns using the Social Security numbers. The problem with this elegant solution is that, by law, IRS cannot release the data. To come a little closer to the possible, one can link to employment data in the Social Security Administration files, again using Social Security numbers. That is technically feasible and has been done, but because of privacy rules it can only be done by the people at Social Security, which means they must invest staff effort. That requirement really restricts what researchers can do with that large data set. The National Mortality Registry records the fact that a death certificate exists for an individual, but you have to deal with each state's vital records officer to obtain information from the death certificates. That process costs blood, sweat, tears, and money. The problem could be solved by legislation or some other means; such a solution would promote important research. . The greatest access problems, however, are not getting copies of a data set or getting computer time, despite the costs of spinning 20 to 200 reels of tape. Access includes learning how to use these data well once one has them. The big access problem is understanding the intricacies, flaws, quirks, and limitations of these data. It is knowing that a frequency code for a procedure is generally good but that it is unreliable in Illinois in one year because the carrier counted all of the rejected claims when figuring out how many cases were done. There is a lot of detail that is terribly nit- picky, but ignorance can lead to the wrong conclusions. That kind of mastery is hard to acquire. METHODOLOGICAL ISSUES The real controversies in using large data sets are methodological. They focus on whether large data sets can be used to assess the effectiveness of a procedure or a provider or the relative effectiveness of procedures or providers.

USE OF LARGE DATA BASES 99 The fundamental theorem in using risk-adjusted data to examine effective- ness is that one can infer the relative effectiveness of two treatments from the risk-adjusted difference in outcomes. This requires not only that one know the outcome, but also that one be able to adjust for the risk. Our limited capability for risk adjustment, relative ignorance about how providers select procedures, and ignorance about the interactions between providers and procedures create very serious difficulties when we try to employ this fundamental theorem in the real world. RISK ADJUSTMENT Our best risk adjustment instruments account for less than 30 percent of variation in mortality among individuals with the same condition, which leaves 70 percent or more to be explained by other factors. We can attribute this 70 percent to "luck" or define it more focally as some combination of: · things we do not measure about patients, · things we do not measure about the care we give, · things we do not know about the care we give, and · the various mistakes we make in providing care that we do not quantify very well and rarely record. Accounting for 30 percent of the variance might be sufficient to allow us to apply the fundamental theorem if we were confident that the remaining sources of variation were not different among patients getting different treatments or treated by different providers. But we often do not know how good the adjustors are in terms of the kinds of variation we might see among the treatment groups. Our risk adjustments are probably not much better than clinical judgment. Expert systems can do a bit better than experts, but not much better. We as clinicians cannot say very accurately which patient will live or die or which patient will be bed-ridden a year after hip surgery. Our instruments are probably weakest for outcomes other than death. Risk adjustment tools are extremely interesting. I spend a lot of my time working on them, developing them, and assessing them, but I think that they are still not fully developed medical technologies. Indeed, considering the very limited evidence we have for risk adjustment systems as tools for identifying ineffective procedures or ineffective providers, I doubt if the Food and Drug Administration would let them be marketed if they were drugs or medical devices. This analogy is appropriate because these systems are being used in settings where they may have a major impact on the health care system. They may be good, but we do not have sufficient evidence yet, and we ought to be generally cautious.

100 TREATMENT SELECTION EFFECTIVENESS AND OUTCOMES IN HEALTH CARE To make inferences from these observational data sets, we must understand something about how treatments are selected, particularly about the unmeasured risk factors that physicians may consider when they select treatments. Such factors would confound an assessment of outcomes. PROVIDER-PROCEDURE INTERACTION The effectiveness of a procedure is inextricably linked to the effectiveness of the provider who performs it. Both may be influenced by the payment system under which the procedure is performed. Let me give an example of how that interaction might be important. Suppose we had done the recent trial of antiarrhythmic drugs in myocardial infarction using claims-based data; suppose those data were infinitely supplemented so that we had perfect risk adjustment. We would, I think, have found that most uses of these drugs occur in more sophisticated and advanced settings, such as teaching hospitals. If those sophisticated and advanced settings generally have better outcomes for their patients, yet patients on antiarrhythmic drugs experienced worse results in those settings, that worse outcome would have been confounded by the general pattern of better results in those settings. Although multivariate techniques may con- trol for this effect, the problem requires further study. This is not a selection phenomenon resulting from unmeasured variables used by the physician in choosing a treatment for a patient. This selection phenomenon involves interaction between the competence of the people who perform a procedure and the effectiveness of the procedure. STATISTICAL ISSUES Without becoming highly technical, I wish to note two statistical issues that are important in using large data sets. One relates to evaluating provid- ers, the other to evaluating procedures. Multiple Hypotheses If one uses large data sets to examine outcomes for individual providers, the sheer number of providers and tests can create problems of interpretation. The Medicare Hospital Mortality Information release, for example, examines about 20 categories in more than 5,000 hospitals, a total of about 100,000 outcomes. Although HCFA's Health Standards and Quality Bureau has taken a number of steps to deal with evaluating so many results, such as

USE OF LARGE DATA BASES 101 publishing three years' data and using sophisticated statistical techniques, the best way to take advantage of these data remains unclear. Processes irz Control If one wants to assess a procedure, especially if one wants to compare two procedures, it is necessary to have a process that is in statistical control. This means that the variation in outcomes is highly predictable and is distributed in a statistically predictable fashion. Available evidence suggests that, in routine practice, procedures are not in such control and that, for example, outcomes are different for different providers. RAND OMI ZED C ONTR OLLED TRIA LS VERS US ANALYSIS OF LARGE DATA SETS Few people think that randomized controlled trials (RCTs) alone or analyses of large data sets alone are sufficient to meet research needs. What we really need to know is how large data sets can complement RCTs. It is this complementary role, in which large data sets are used to extend and replace RCTs, that we must pursue by analyzing large data sets and comparing the results to those from RCIs. Inferring the relative effectiveness of procedures from large data sets alone is risky at our present level of understanding. It is important to realize that, although the results of many studies with large data sets will not be definitive, the data that clinicians are working with at the moment are not definitive either. Analysis of large data sets can add probabilistic data to RCTs, thus bringing clinicians closer, in a Bayesian mode, to smart clinical choices. From that point of view, what can be done with large data sets is exceedingly important. Data almost never speak with great clarity. There is almost always a substantial confidence interval around the results, a lack of certainty as to whether the investigators really did the study exactly correctly. There are conflicting data from other studies. There is a constant problem in evaluat- ing immediate clinical evidence, whether the decision rules to be applied to that evidence come from RCTs or large data sets. Large data bases introduce more problems in evaluating data, but these problems arise in evaluating data that would not otherwise be available to clinicians at all. PROSPECTS What can we clearly use large data sets for now, and how might we be expand those uses in the future?

102 EFFECTIVENESS AND OUTCOMES IN HEALTH CARE First, these sets are clearly very good as sampling frames. Second, if one either supplements the large data sets or uses large data sets to supplement other data sets, one can obtain very powerful information about risk, physiology, and disease process. Third, one can learn from relatively crude outcome rates. I think that John Wennberg, Elliot Fisher, and Noralou and Leslie Roos have really done a signal service here. The outcomes we are going to be publishing for a variety of surgical procedures will follow the direction they have set. Their argument is that the rates of not-so-good events or bad outcomes are important because those rates are much higher than the literature suggests and much higher than physicians and patients believe. They argue that better understanding of the real risks of procedures would lead to more conservative and better practice, and their argument seems very reasonable to me. Therefore, these large data sets have immediate practical importance. Fourth, large data sets are useful for looking at certain adverse events that we cannot study in other ways. Consider Wayne Ray's study, which showed an association of hip fracture with the use of various psychotropic drugs in the elderly. Given the strong suspicion that a lot of that use is inappropriate, we could not ethically mount an RCT to examine this relationship. So, we have to look to large data sets for such evidence. There are many other kinds of data about practices and procedures that can only be obtained from large data sets. AN AGENDA Let me try briefly to set out an agenda. First, we need to validate the input, that is, the diagnostic and other data that are in these large data sets. We have some information about validity, but it is lying around in funny places, and we need to bring it together. Investigators need access to the results of administrative examinations of the diagnoses recorded in the Medicare data, but further validation is also needed: we need to know how well procedures and complications are recorded. Although we may reasonably infer that a patient admitted with a hip infec- tion after a total hip replacement has developed that infection as a result of surgery, it is much more speculative to link other subsequent events or nonevents to procedures. Second, we need to increase access to large data sets. This includes creating data centers, changing rules in some cases, and making the data sets easier to understand and use. Third, we need to do a lot of linking of various kinds of data sets. The HCFA Office of Research has been experimenting with SEER, has worked with the Social Security Administration, and has done a bit with mortality registries. We have to think more carefully about this. If there is some

USE OF LARGE DATA BASES 103 linkage that would really improve health research and that linkage requires a change in the law, let us find some way of preserving the confidentiality of the data and go to Congress to ask for a change in the law. Fourth is the creation of new public data sets. I am very ambivalent about proposing this because I fear that unmeetable promises are being made to promote some state data bases. Nevertheless, I think the data sets that are being created in Pennsylvania, Colorado, and Iowa are extremely interesting sources of information. Investigators should be thinking now and talking to state people now about how to use them. I will give you an example of the importance of this thinking and talking. Pennsylvania is collecting the entire MedisGroups data set of more than 200 items, but current plans are, I understand, to provide access only to a summary score. Earlier communication might have made it easier to change the situation so researchers would have access to that entire data set. The Uniform Clinical Data Set is an even more interesting and flexible source of data. Fifth is the creation of public reference data sets that have been carefully validated If researchers are going to try to determine the functional status of people after surgical procedures, there is a lot to be said, for example, for selecting certain centers, whether randomly assigned or recruited, from which these data will be collected and in which special efforts will be made to guarantee data quality. Sixth, we need to learn much more about risk adjustment, and I do not mean just better instruments. For example, there is some evidence that one can do fairly accurate risk adjustment for routine elective surgery from diagnosis and previous treatment data. We need to know how true that is and when it is sufficient. We also need to know when the variations across providers will be adequately measured by the risk adjustment tools we have and when there are major variations that those tools cannot get to. Seventh, we need to look at when risk adjustment can help us to identify the relative effectiveness of providers or procedures. Two examples follow. · HCFA is presently designing a study to determine when risk-adjusted mortality can be used to screen for cases where peer review will find prob- lems with care. There are many other areas in which the validity of large data bases must be determined before research on them can be validated. Specifically, we need to know when risk adjustment can replace randomization in evaluating either a procedure or the relative effectiveness of two procedures. · A possible validation study would be to expand a clinical trial by asking the physician to record, before opening the assignment envelope, the treatment he or she would have selected had there not been a randomization process. With such a design one can ask, "How well would risk adjustment have been able to correct for the selection bias that physician judgment would have introduced in a retrospective, risk-adjusted study?" There are probably a lot of other useful approaches, and the Institute of Medicine

104 EFFECTIVENESS AND OUTCOMES IN HEALTH CARE might make helpful suggestions in this area. What is really needed is empirical evidence, not people saying, "This isn't randomized, therefore it isn't truth," and not people saying, "We have controlled for the relative risk, so it is true." Finally, we need to develop some consensus on how large data sets can be used. This problem extends beyond how studies using these data should be carried out and exactly what should be done in the studies. For example, we need some consensus on how the HCFA mortality data can be used. Major health organizations are beginning to work toward such a consensus, and developing that consensus may be an important step in working toward consensus on how to use data from other large data bases. Cutting across all these issues is the challenge of doing as much as we can without promising more than we can deliver. I hope the issues discussed in this chapter will move us forward in the narrow but important path between the risks of promising too much and attempting too little. REFERENCES 1. Krakauer, H. The Uniform Clinical Data Set. Pp. 120-133 in effectiveness and Outcomes in Health Care. Heithoff, K.A. and Lohr, K.N., eds. Washington, D.C.: National Academy Press, 1990. 2. Fisher, E.S. and Wennberg, J.E. Administrative Data in Effectiveness Stud- ies: The Prostatectomy Assessment. Pp. 80-93 in Effectiveness abut Outcomes in Health Care. Heithoff, K.A. and Lohr, K.N., eds. Washington, D.C.: National Academy Press, 1990. 3. McNeil, B.J. Claims Data and Effectiveness: Acute Myocardial Infarction and Other Examples. Pp. 65-70 in Effectiveness and Outcomes in Health Care. Heithoff, K.A. and Lohr, K.N., eds. Washington, D.C.: National Academy Press, 1990.

Collection of Primary Data: Introduction Harold C. Sox, Session Moderator In the context of the Effectiveness Initiative, primary data are those ob- tained from sources other than administrative claims data sets. Thus, the purpose of primary data collection is to supplement the information that is obtained from the administrative data sets. There are several reasons, which were discussed earlier in this volume, why claims-based data are often not adequate for medical research. · To attribute an improved outcome to an intervention, patients who had the intervention should have been identical prior to the intervention to patients who did not have the intervention. There are multivariate statistical methods for adjusting for baseline differences between the intervention group and those who did not have the intervention. Administrative data sets typically do not have sufficiently detailed clinical information for this purpose. This information can sometimes be obtained by reviewing the patients' hospital records. · Studying an intervention in a subset of patients may reveal effects that are not observed in the entire population. To create useful subsets of patients, one must have clinical information that is often not available in administrative data sets. . The range of outcomes that can be measured with administrative data sets is limited. Administrative data sets have information about whether a patient is alive or dead, as well as whether the patient was rehospitalized or required an intervention. Information about disease status, functional status, or the patient's preferences must usually be obtained by other means, such as reviewing the patient's hospital record or interviewing the patient. John E. Ware is a senior scientist at the Institute for Improvement of Health and Medical Care at the New England Medical Center. His chapter 105

106 EFFECTIVENESS ED OUTCOMES IN HEATH CARE focuses on gathering data directly from patients and emphasizes practical issues in primary data collection, as well as issues of precision, reliability, and validity. Henry Krakauer is Director of the Office of Program Assessment and Information in the Health Standards and Quality Bureau (HSQB) of the Health Care Financing Administration (HCFA). In 1987, HSQB began a complex project to develop a data set for use by Medicare Peer Review Organizations (PROs) and the wider research community. The data set was intended to contain far more detailed clinical data than were available heretofore in the HCFA data files. Dr. Krakauer discusses the part of this project known as the Uniform Clinical Data Set.

Next: 15 Measuring Patient Function and Well-Being: Some Lessons from the Medical Outcomes Study »

Effectiveness and Outcomes in Health Care: Proceedings of an Invitational Conference (1990)

Chapter: 14 Issues in the Use of Large Data Bases for Effectiveness Research

Welcome to OpenBook!

Get Email Updates