Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 94
14
Issues in the Use of
Large Data Bases for
Effectiveness Research
Stephen F. Jencks
We have an enormous opportunity to move forward with outcomes analysis,
particularly outcomes analysis based on claims and other large data sets. At
the same time, I think there is a real risk of promising more than these
approaches can deliver and compromising the future of this research.
DEFINING LARGE DATA SETS
I begin by explaining what a large data set is because I think the term has
been too narrowly construed at times. Certainly, size is a feature of a large
data set, but two other characteristics may be more important.
POPULATION BASE
First, a large data set usually contains, in some sense, data for a popula-
tion or a random sample of a population. This can mean payer administra-
tive data, such as claims data from the Medicare program (and thus all the
data about services available from that source). It can also mean:
· All hospitalizations occurring in a state. A number of states have all-
payer data bases, and some of these are developing considerable clinical
richness.
All persons with a given disease in certain geographic areas. An
example is the SEER (Surveillance, Epidemiology, and End Results) data
bases maintained by the National Cancer Institute.
· All persons born or dying in a state. State vital record systems are
rich with data, and the National Mortality Registry provides an index for
those state death records.
· All persons in a random sample. Examples are the National Medical
94
OCR for page 95
USE OF LARGE DATA BASES
95
Expenditure Survey, the National Long-Term Care Survey, and the Uniform
Clinical Data Set of the Health Care Financing Administration (HCFA)
(which will include a random sample of Medicare discharges).
· Various complex populations, such as Medicaid data sets, where people
wander in and out of eligibility in complicated ways but nevertheless com
. .
pose a popu .atlon.
ORIGINAL PURPOSE
Large data bases are typically collected for some purpose other than that
for which researchers wish to use them, and they often lack, therefore, some
features or data researchers want. They are typically strong on size, on
longitudinal detail, and on linkability to other data sets, but they are particularly
likely to be thin on clinical detail and on functional status. Large data sets,
then, tend to be unbiased pictures of patients and practice in the real world,
but they rarely have just what researchers want and they are not randomized
for treatment. In large data bases, many descriptors of health events are
fairly good. There are errors in assigning codes to health events, but over-
all the data are highly usable. The researchers can find surgeries, hospital-
izations, office visits, and many kinds of health events, such as myocardial
Infarctions.
Outcomes data tend to be pretty good for certain outcomes and not so
good for others. It is usually possible to get some inflation about outcomes
other than death, costs, and resource utilization from billing data. These
include:
· Morbid events, such as rehospitalization, extended stay, and complications
that are indicated by diagnoses and procedures;
· Kinds of service utilization that indicate health status;
· Information on nursing home tenure (Medicare data, for example, include
not only bills for care in skilled nursing facilities, but also physician bills,
which indicate, by the location of service code, that the patient was in a
nursing home); and
· Causes of death (these data from death registries can be hard to obtain
and difficult to link, but investigators at HCFA's Office of Research and
elsewhere have succeeded in doing so).
Risk adjustors tend to be weak. Although previous diagnoses and use of
services are available, and multiple concurrent diagnoses are available for
hospital care, physiological risk adjustors are rarely available. On the other
hand, there are data sets emerging, such as that being created in Pennsylvania,
with substantial physiological data for inpatients, and there will be HCFA's
Uniform Clinical Data Set (1~.
Functional status data are almost unobtainable in large data sets. There
OCR for page 96
96
EFFECTIVENESS AND OUTCOMES IN HEALTH CARE
has been intensive discussion about whether one can define a functional
status instrument that should be collected on every patient, but this issue
has not been resolved.
USES OF LARGE DATA BASES
What does one use a large data set for? The Office of Research at HCFA
has a triple agenda in the area of effectiveness, namely, to look at the
closely linked issues of the comparative effectiveness of providers, of procedures,
and of payment systems. Large data sets have a variety of applications in
these areas.
SAMPLING FRAMES
Large data bases are valuable as sampling frames for more intensive
studies. The Office of Research and Demonstrations used the Medicare
hospital discharge file in this way when it developed the Medicare mortality
predictor system. We chose discharges from the discharge file and then
went back and pulled those records and got supplementary information in
order to develop risk adjustment tools.
RATES AND OUTCOMES SURVEILLANCE
Elliot Fisher and John Wennberg have described an example of how
informative this kind of surveillance can be (2~. The Office of Research is
using Medicare data to study diagnosed complications, rehospitalizations
for apparently related conditions, and mortality for eight major surgical
procedures. These analyses will be broken down by race, locality, and age.
We are not analyzing these data by hospital, both because there are scientific
problems involved and because the response of the professional community
might well be so hostile as to interfere with effective use of the data. We
will, however, be consulting with a number of groups about how to make
the data more useful.
VARIATIONS IN OUTCOMES
There are three issues in variation of outcomes:
1. the amount of variation in outcomes across providers doing the same
procedure;
2. variations in outcomes among different procedures for "similar" patients,
such as those described by Fisher and Wennberg for transurethral versus
open prostatectomy; and
3. variations in the effectiveness of different providers (for example,
comparison of rates of various outcomes).
OCR for page 97
USE OF LARGE DATA BASES
97
As we move along this spectrum, methodological problems multiply and
our ability to be confident in the conclusions we can draw from large data
sets becomes progressively weaker.
LINKING LARGE DATA SETS
An important feature of large data sets is that they can often be linked so
as to increase information about a patient or an event. Data sets with Social
Security numbers on them can be linked to one another, and many data sets
have or easily could have Social Security numbers. Linkages can broaden
many kinds of research. The following are examples.
.
The Office of Research is linking Medicare files to the SEER registry
in an effort to increase information about what happens to patients who are
diagnosed with cancer (the registries contain information on stage and treatments).
This is a powerful way to enrich a smaller data base: although the SEER
data base is not small by most definitions, it is small compared to the
Medicare data base.
· Katherine Kahn, Robert Brook, Emmett Keeler, and others at The
RAND Corporation have been studying the impact of the Medicare Prospective
Payment System on quality of care. In that study, the Medicare claims data
base has been linked to individual cases selected at random from hospitals
in order to provide information on rehospitalization and mortality; this information
would otherwise be very expensive, perhaps even unobtainable.
In summary, the range of uses for large data sets is extraordinarily broad.
We should be careful not to limit our thinking to analyses of Medicare
hospital claims data, which are only a very thin slice of the pie.
LIMITATIONS OF LARGE DATA BASES
Large data sets have obvious and less obvious limitations.
DATA QUALITY
Many of the quality issues in large data sets will be familiar to any
investigator who has used such secondary data. One feature of secondary
data sets, however, requires special emphasis: unused data tend to be useless.
Unless the people who create a large data set make use of an item in a way
that provides feedback to those who collect it, the risk is very high that the
item will contain so much error as to be unusable. We have found this to be
true for items ranging from Social Security number to discharge destination.
Thus, careful coordination between creators of data sets and investigators
can be critical.
OCR for page 98
98
TIMELINESS
EFFECTIVENESS AND OUTCOMES IN HEALTH CARE
Because they are collected for other purposes, large data sets tend not to
be available in a timely fashion. This is a special problem in using them for
assessing individual providers because it is hard to get hospitals and physi-
cians interested in data that are basically archival.
ACCESS
There are three kinds of problems in getting at the data: administrative
access, processing, and understanding.
· There is a perfectly straightforward way of getting the income data
that Barbara McNeil discusses (3) by linking Internal Revenue Service (IRS)
data from tax returns using the Social Security numbers. The problem with
this elegant solution is that, by law, IRS cannot release the data. To come a
little closer to the possible, one can link to employment data in the Social
Security Administration files, again using Social Security numbers. That is
technically feasible and has been done, but because of privacy rules it can
only be done by the people at Social Security, which means they must
invest staff effort. That requirement really restricts what researchers can do
with that large data set.
The National Mortality Registry records the fact that a death certificate
exists for an individual, but you have to deal with each state's vital records
officer to obtain information from the death certificates. That process costs
blood, sweat, tears, and money. The problem could be solved by legislation
or some other means; such a solution would promote important research.
.
The greatest access problems, however, are not getting copies of a
data set or getting computer time, despite the costs of spinning 20 to 200
reels of tape. Access includes learning how to use these data well once one
has them. The big access problem is understanding the intricacies, flaws,
quirks, and limitations of these data. It is knowing that a frequency code
for a procedure is generally good but that it is unreliable in Illinois in one
year because the carrier counted all of the rejected claims when figuring out
how many cases were done. There is a lot of detail that is terribly nit-
picky, but ignorance can lead to the wrong conclusions. That kind of
mastery is hard to acquire.
METHODOLOGICAL ISSUES
The real controversies in using large data sets are methodological. They
focus on whether large data sets can be used to assess the effectiveness of a
procedure or a provider or the relative effectiveness of procedures or providers.
OCR for page 99
USE OF LARGE DATA BASES
99
The fundamental theorem in using risk-adjusted data to examine effective-
ness is that one can infer the relative effectiveness of two treatments from
the risk-adjusted difference in outcomes. This requires not only that one
know the outcome, but also that one be able to adjust for the risk. Our
limited capability for risk adjustment, relative ignorance about how providers
select procedures, and ignorance about the interactions between providers
and procedures create very serious difficulties when we try to employ this
fundamental theorem in the real world.
RISK ADJUSTMENT
Our best risk adjustment instruments account for less than 30 percent of
variation in mortality among individuals with the same condition, which
leaves 70 percent or more to be explained by other factors. We can attribute
this 70 percent to "luck" or define it more focally as some combination of:
· things we do not measure about patients,
· things we do not measure about the care we give,
· things we do not know about the care we give, and
· the various mistakes we make in providing care that we do not quantify
very well and rarely record.
Accounting for 30 percent of the variance might be sufficient to allow us
to apply the fundamental theorem if we were confident that the remaining
sources of variation were not different among patients getting different treatments
or treated by different providers. But we often do not know how good the
adjustors are in terms of the kinds of variation we might see among the
treatment groups.
Our risk adjustments are probably not much better than clinical judgment.
Expert systems can do a bit better than experts, but not much better. We as
clinicians cannot say very accurately which patient will live or die or which
patient will be bed-ridden a year after hip surgery. Our instruments are
probably weakest for outcomes other than death.
Risk adjustment tools are extremely interesting. I spend a lot of my time
working on them, developing them, and assessing them, but I think that
they are still not fully developed medical technologies. Indeed, considering
the very limited evidence we have for risk adjustment systems as tools for
identifying ineffective procedures or ineffective providers, I doubt if the
Food and Drug Administration would let them be marketed if they were
drugs or medical devices. This analogy is appropriate because these systems
are being used in settings where they may have a major impact on the health
care system. They may be good, but we do not have sufficient evidence yet,
and we ought to be generally cautious.
OCR for page 100
100
TREATMENT SELECTION
EFFECTIVENESS AND OUTCOMES IN HEALTH CARE
To make inferences from these observational data sets, we must understand
something about how treatments are selected, particularly about the unmeasured
risk factors that physicians may consider when they select treatments. Such
factors would confound an assessment of outcomes.
PROVIDER-PROCEDURE INTERACTION
The effectiveness of a procedure is inextricably linked to the effectiveness
of the provider who performs it. Both may be influenced by the payment
system under which the procedure is performed.
Let me give an example of how that interaction might be important.
Suppose we had done the recent trial of antiarrhythmic drugs in myocardial
infarction using claims-based data; suppose those data were infinitely
supplemented so that we had perfect risk adjustment. We would, I think,
have found that most uses of these drugs occur in more sophisticated and
advanced settings, such as teaching hospitals. If those sophisticated and
advanced settings generally have better outcomes for their patients, yet
patients on antiarrhythmic drugs experienced worse results in those settings,
that worse outcome would have been confounded by the general pattern of
better results in those settings. Although multivariate techniques may con-
trol for this effect, the problem requires further study.
This is not a selection phenomenon resulting from unmeasured variables
used by the physician in choosing a treatment for a patient. This selection
phenomenon involves interaction between the competence of the people
who perform a procedure and the effectiveness of the procedure.
STATISTICAL ISSUES
Without becoming highly technical, I wish to note two statistical issues
that are important in using large data sets. One relates to evaluating provid-
ers, the other to evaluating procedures.
Multiple Hypotheses
If one uses large data sets to examine outcomes for individual providers,
the sheer number of providers and tests can create problems of interpretation.
The Medicare Hospital Mortality Information release, for example, examines
about 20 categories in more than 5,000 hospitals, a total of about 100,000
outcomes. Although HCFA's Health Standards and Quality Bureau has
taken a number of steps to deal with evaluating so many results, such as
OCR for page 101
USE OF LARGE DATA BASES
101
publishing three years' data and using sophisticated statistical techniques,
the best way to take advantage of these data remains unclear.
Processes irz Control
If one wants to assess a procedure, especially if one wants to compare
two procedures, it is necessary to have a process that is in statistical control.
This means that the variation in outcomes is highly predictable and is distributed
in a statistically predictable fashion. Available evidence suggests that, in
routine practice, procedures are not in such control and that, for example,
outcomes are different for different providers.
RAND OMI ZED C ONTR OLLED TRIA LS VERS US
ANALYSIS OF LARGE DATA SETS
Few people think that randomized controlled trials (RCTs) alone or analyses
of large data sets alone are sufficient to meet research needs.
What we really need to know is how large data sets can complement
RCTs. It is this complementary role, in which large data sets are used to
extend and replace RCTs, that we must pursue by analyzing large data sets
and comparing the results to those from RCIs. Inferring the relative effectiveness
of procedures from large data sets alone is risky at our present level of
understanding.
It is important to realize that, although the results of many studies with
large data sets will not be definitive, the data that clinicians are working
with at the moment are not definitive either. Analysis of large data sets can
add probabilistic data to RCTs, thus bringing clinicians closer, in a Bayesian
mode, to smart clinical choices. From that point of view, what can be done
with large data sets is exceedingly important.
Data almost never speak with great clarity. There is almost always a
substantial confidence interval around the results, a lack of certainty as to
whether the investigators really did the study exactly correctly. There are
conflicting data from other studies. There is a constant problem in evaluat-
ing immediate clinical evidence, whether the decision rules to be applied to
that evidence come from RCTs or large data sets. Large data bases introduce
more problems in evaluating data, but these problems arise in evaluating
data that would not otherwise be available to clinicians at all.
PROSPECTS
What can we clearly use large data sets for now, and how might we be
expand those uses in the future?
OCR for page 102
102
EFFECTIVENESS AND OUTCOMES IN HEALTH CARE
First, these sets are clearly very good as sampling frames.
Second, if one either supplements the large data sets or uses large data
sets to supplement other data sets, one can obtain very powerful information
about risk, physiology, and disease process.
Third, one can learn from relatively crude outcome rates. I think that
John Wennberg, Elliot Fisher, and Noralou and Leslie Roos have really
done a signal service here. The outcomes we are going to be publishing for
a variety of surgical procedures will follow the direction they have set.
Their argument is that the rates of not-so-good events or bad outcomes are
important because those rates are much higher than the literature suggests
and much higher than physicians and patients believe. They argue that
better understanding of the real risks of procedures would lead to more
conservative and better practice, and their argument seems very reasonable
to me. Therefore, these large data sets have immediate practical importance.
Fourth, large data sets are useful for looking at certain adverse events
that we cannot study in other ways. Consider Wayne Ray's study, which
showed an association of hip fracture with the use of various psychotropic
drugs in the elderly. Given the strong suspicion that a lot of that use is
inappropriate, we could not ethically mount an RCT to examine this relationship.
So, we have to look to large data sets for such evidence. There are many
other kinds of data about practices and procedures that can only be obtained
from large data sets.
AN AGENDA
Let me try briefly to set out an agenda.
First, we need to validate the input, that is, the diagnostic and other data
that are in these large data sets. We have some information about validity,
but it is lying around in funny places, and we need to bring it together.
Investigators need access to the results of administrative examinations of
the diagnoses recorded in the Medicare data, but further validation is also
needed: we need to know how well procedures and complications are recorded.
Although we may reasonably infer that a patient admitted with a hip infec-
tion after a total hip replacement has developed that infection as a result of
surgery, it is much more speculative to link other subsequent events or
nonevents to procedures.
Second, we need to increase access to large data sets. This
includes
creating data centers, changing rules in some cases, and making the data
sets easier to understand and use.
Third, we need to do a lot of linking of various kinds of data sets. The
HCFA Office of Research has been experimenting with SEER, has worked
with the Social Security Administration, and has done a bit with mortality
registries. We have to think more carefully about this. If there is some
OCR for page 103
USE OF LARGE DATA BASES
103
linkage that would really improve health research and that linkage requires
a change in the law, let us find some way of preserving the confidentiality
of the data and go to Congress to ask for a change in the law.
Fourth is the creation of new public data sets. I am very ambivalent
about proposing this because I fear that unmeetable promises are being
made to promote some state data bases. Nevertheless, I think the data sets
that are being created in Pennsylvania, Colorado, and Iowa are extremely
interesting sources of information. Investigators should be thinking now
and talking to state people now about how to use them. I will give you an
example of the importance of this thinking and talking. Pennsylvania is
collecting the entire MedisGroups data set of more than 200 items, but
current plans are, I understand, to provide access only to a summary score.
Earlier communication might have made it easier to change the situation so
researchers would have access to that entire data set. The Uniform Clinical
Data Set is an even more interesting and flexible source of data.
Fifth is the creation of public reference data sets that have been carefully
validated If researchers are going to try to determine the functional status
of people after surgical procedures, there is a lot to be said, for example, for
selecting certain centers, whether randomly assigned or recruited, from which
these data will be collected and in which special efforts will be made to
guarantee data quality.
Sixth, we need to learn much more about risk adjustment, and I do not
mean just better instruments. For example, there is some evidence that one
can do fairly accurate risk adjustment for routine elective surgery from
diagnosis and previous treatment data. We need to know how true that is
and when it is sufficient. We also need to know when the variations across
providers will be adequately measured by the risk adjustment tools we have
and when there are major variations that those tools cannot get to.
Seventh, we need to look at when risk adjustment can help us to identify
the relative effectiveness of providers or procedures. Two examples follow.
· HCFA is presently designing a study to determine when risk-adjusted
mortality can be used to screen for cases where peer review will find prob-
lems with care. There are many other areas in which the validity of large
data bases must be determined before research on them can be validated.
Specifically, we need to know when risk adjustment can replace randomization
in evaluating either a procedure or the relative effectiveness of two procedures.
· A possible validation study would be to expand a clinical trial by
asking the physician to record, before opening the assignment envelope, the
treatment he or she would have selected had there not been a randomization
process. With such a design one can ask, "How well would risk adjustment
have been able to correct for the selection bias that physician judgment
would have introduced in a retrospective, risk-adjusted study?" There are
probably a lot of other useful approaches, and the Institute of Medicine
OCR for page 104
104
EFFECTIVENESS AND OUTCOMES IN HEALTH CARE
might make helpful suggestions in this area. What is really needed is
empirical evidence, not people saying, "This isn't randomized, therefore it
isn't truth," and not people saying, "We have controlled for the relative
risk, so it is true."
Finally, we need to develop some consensus on how large data sets can
be used. This problem extends beyond how studies using these data should
be carried out and exactly what should be done in the studies. For example,
we need some consensus on how the HCFA mortality data can be used.
Major health organizations are beginning to work toward such a consensus,
and developing that consensus may be an important step in working toward
consensus on how to use data from other large data bases.
Cutting across all these issues is the challenge of doing as much as we
can without promising more than we can deliver. I hope the issues discussed
in this chapter will move us forward in the narrow but important path
between the risks of promising too much and attempting too little.
REFERENCES
1. Krakauer, H. The Uniform Clinical Data Set. Pp. 120-133 in effectiveness and
Outcomes in Health Care. Heithoff, K.A. and Lohr, K.N., eds. Washington, D.C.:
National Academy Press, 1990.
2. Fisher, E.S. and Wennberg, J.E. Administrative Data in Effectiveness Stud-
ies: The Prostatectomy Assessment. Pp. 80-93 in Effectiveness abut Outcomes in Health
Care. Heithoff, K.A. and Lohr, K.N., eds. Washington, D.C.: National Academy
Press, 1990.
3. McNeil, B.J. Claims Data and Effectiveness: Acute Myocardial Infarction
and Other Examples. Pp. 65-70 in Effectiveness and Outcomes in Health Care. Heithoff,
K.A. and Lohr, K.N., eds. Washington, D.C.: National Academy Press, 1990.
OCR for page 105
Collection of Primary Data:
Introduction
Harold C. Sox, Session Moderator
In the context of the Effectiveness Initiative, primary data are those ob-
tained from sources other than administrative claims data sets. Thus, the
purpose of primary data collection is to supplement the information that is
obtained from the administrative data sets. There are several reasons, which
were discussed earlier in this volume, why claims-based data are often not
adequate for medical research.
· To attribute an improved outcome to an intervention, patients who had
the intervention should have been identical prior to the intervention to patients
who did not have the intervention. There are multivariate statistical methods
for adjusting for baseline differences between the intervention group and
those who did not have the intervention. Administrative data sets typically
do not have sufficiently detailed clinical information for this purpose. This
information can sometimes be obtained by reviewing the patients' hospital
records.
· Studying an intervention in a subset of patients may reveal effects that
are not observed in the entire population. To create useful subsets of patients,
one must have clinical information that is often not available in administrative
data sets.
.
The range of outcomes that can be measured with administrative data
sets is limited. Administrative data sets have information about whether a
patient is alive or dead, as well as whether the patient was rehospitalized or
required an intervention. Information about disease status, functional status,
or the patient's preferences must usually be obtained by other means, such
as reviewing the patient's hospital record or interviewing the patient.
John E. Ware is a senior scientist at the Institute for Improvement of
Health and Medical Care at the New England Medical Center. His chapter
105
OCR for page 106
106
EFFECTIVENESS ED OUTCOMES IN HEATH CARE
focuses on gathering data directly from patients and emphasizes practical
issues in primary data collection, as well as issues of precision, reliability,
and validity.
Henry Krakauer is Director of the Office of Program Assessment and
Information in the Health Standards and Quality Bureau (HSQB) of the
Health Care Financing Administration (HCFA). In 1987, HSQB began a
complex project to develop a data set for use by Medicare Peer Review
Organizations (PROs) and the wider research community. The data set was
intended to contain far more detailed clinical data than were available heretofore
in the HCFA data files. Dr. Krakauer discusses the part of this project
known as the Uniform Clinical Data Set.
Representative terms from entire chapter:
administrative data