| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 71
PART IV
Methoclological Issues and
Work in Progress
OCR for page 72
OCR for page 73
Use of Large Data Bases:
Introduction
Emmet'B. Keeler, Session Moderator
i
Although it is not entirely clear what is meant by large data bases, we know that
to administer its programs, the Health Care Financing Administration (HCFA)
collects enormous amounts of data that contain information on the location and use
of medical services, both inpatient and outpatient, and information on everyone
covered by and mortality associated with Medicare and Medicaid. To keep the
costs of administration down, HCFA does not collect all the clinical detail that
researchers might want. However, the data are fairly universal in scope, and there
are lots of possibilities for using them as a resource: linking them to outside data,
putting together different HCFA files (such as hospital records with outpatient
records), and so forth. Used creatively, they are an invaluable resource for any-
body interested in studying what is actually occurring in the United States.
Janet B. Mitchell is president of the Center for Health Economics Re-
search in Needham, Massachusetts. She and her institute are both well
known for their studies of payment mechanisms and their effects on physi-
cians. Dr. Mitchell gives a general methodological overview of the things
that can be done with administrative data sets.
Elliot S. Fisher is a physician at Dartmouth Medical School and was
evolved in the large data set analysis of the Wennberg study, which is the
prototype for effectiveness research. (John Wennberg is director of the
Center for Evaluative Clinical Sciences.) Dr. Fisher and Dr. Wennberg
highlight the problems and achievements of the original study and describe
the use of administrative data in the ongoing assessment of treatments for
benign pro static hyperplasia.
Stephen F. Jencks is a physician and chief scientist at the Office of
Research in HCFA. Dr. Jencks has extensive experience in sponsoring,
critiquing, and performing a number of studies looking at postadmission
mortality. He discusses the uses and limitations of claims data for out-
comes research.
73
OCR for page 74
1 ~
The Role of Large Data Bases in
Effectiveness Research
Janet B. Mitchell
The first question in any consideration of the use of large data bases in
effectiveness research is: what is a "large data base"? Usually, it refers to
administrative records, or insurance claims data, regarding patients receiving
various treatments. The nice thing about using claims for research purposes
is that someone else actually collects the data, namely, providers filling out
the claims forms. By the time the researcher receives the claims, the data
are already computerized in a consistent format.
SIZE OF LARGE DATA BASES
One of the major difficulties in working with these data bases is that they
are indeed large-enormous or gargantuan might be more appropriate descriptors!
It is not uncommon to work with millions of claims on hundreds of reels of
tape. I am sure many of you have conducted clinical research involving
hundreds of patients, and you may be wondering why I or anyone else
would want to get involved with millions of records in the first place. The
reason, of course, is that these records do not represent individual patients,
but rather pieces of information describing the medical services received by
each patient. These pieces of information need to be put together in order
to obtain a picture of an episode of care. During a single inpatient episode,
for example, a patient might incur anywhere from a dozen to a hundred
bills. For longer periods of care, the number of records would be consider-
ably larger, especially for sicker patients.
Why so many claims? In Medicare, for instance, inpatient hospital and
skilled nursing facility stays are billed using a single claim, but physician
and other Part B services are billed individually. Thus, there will be a
claim for every discrete service: for every surgical procedure, for every
74
OCR for page 75
USE OF LARGE DATA BASES
75
visit, for every X-ray, for every laboratory test. The detailed nature of
these claims data bases is one of their greatest strengths; the creative researcher
can use them in an almost infinite variety of ways.
USES OF LARGE DATA BASES
Probably the most common use of claims data for effectiveness research
is to follow patients with a specific diagnosis or patients receiving a specific
therapy. Diagnoses are available on institutional claims; procedures are
documented on all physician bills. For example: What happens to patients
receiving percutaneous transluminal angioplasty? What services do those
patients receive afterwards and in what kinds of settings? Some services
will suggest that complications have arisen, say, if the procedure is followed
closely by repeat angioplasty or bypass surgery.
Outcomes, such as readmission and mortality rates, can also be studied.
Besides studying individual patients or episodes of care, claims data can
also be used to evaluate effectiveness at the level of individual providers,
such as hospitals. Thus they provide an opportunity to examine questions
such as whether mortality rates for a given procedure depend in part on a
hospital's surgical volume, for example.
MEDICARE DATA BASES
Medicare claims files are particularly valuable, for several reasons. First,
every beneficiary has a unique identification number based on his or her
Social Security number. Because this number is attached to every Part A
and Part B claim, it is easy to construct episodes of care for individual
patients. Sometimes, however, these numbers are slightly different on the
Part A and the Part B claims. Fortunately, there are fairly straightforward
algorithms that can be used to equate them.
Second, the Health Care Financing Administration (HCFA) maintains
claims data on samples of patients for research purposes. These samples
are selected, based on their identification numbers, and remain in the data
base until the patient dies. This enables researchers to follow the same
patients over a period of years. In addition, HCFA maintains eligibility
files that include information on dates of death. Because of the need to
prevent Social Security checks from being mailed to deceased beneficiaries,
these deaths are verified and the dates are believed to be reasonably valid.
Historically, researchers have primarily used Part A hospital records to
study effectiveness issues. Only relatively recently have they discovered
the value of Part B claims, either in their own right or as supplements to
Part A data. One major limitation of hospital claims for effectiveness research
is the absence of detailed information on what was actually done to the
OCR for page 76
76
EFFECTIVENESS AND OUTCOMES IN HEALTH CARE
patient in the hospital. Part A claims do include information on surgical
procedures, but this information is generally limited to procedures that affect
assignment to diagnosis-related groups (DRGs); thus, many diagnostic sur-
geries are missing. The only data available on ancillary diagnostic tests,
furthermore, are simply charges per revenue center, that is, charges for
radiology with no indication of how many X-rays were performed or which
ones. There is also no information on physician visits and consultations.
Except for some services performed by residents, however, every physi-
cian service will show up as a Part B bill. These bills provide the researcher
with an in-depth look at the mix of services provided during the hospital
stay. Because each physician bill includes the date of service, we can also
look at the timing of various tests. This can be useful in trying to infer the
clinical decision-making process that took place during the hospitalization.
The Part B detail can also be used to define the universe of patients
receiving a specific therapy of interest. Not all patients undergoing coronary
bypass surgery will be identified through DRGs 106 and 107, for example;
a surprising number will show up in other DRGs, such as those involving
valve replacements. This is important, as geographic variation has been
found in the frequency with which bypass operations are combined with
other open-heart surgery. Thus, how a study sample is selected could have
profound effects on the research findings.
Anesthesiologists and assistant surgeons frequently report a different procedure
than that billed by the surgeon. Usually, they are reporting an operation in
the same general anatomic area, but not always. My rule has always been
to assume that the primary surgeon is right and use what this surgeon reports
to define the sample.
Using claims data to examine outcomes associated with ambulatory epi-
sodes of care is more problematic because of the absence of diagnostic
information on the Part B claims. Thus it is not possible to determine the
reason for a given office visit or to trace referral patterns accurately. Beginning
this year, however, physicians are being required to assign diagnoses a code
number from the International Classification of Diseases (ICD-9-CM) and
to include those numbers on their claims, so it is possible that such analyses
will be feasible in the future.
It is possible to identify specific illnesses indirectly, using the procedure
codes on the Part B claims. Services provided under Medicare Part B are
billed using the Common Procedural Terminology (CPT-4) or, in the case
of nonphysician services, a system developed by HCFA known as HCPCS
(HCFA Common Procedure Coding System). There are over 10,000 codes
available for billing purposes. This wealth of codes is the despair of many
policymakers, who feel it helps fuel the inflation in physician spending.
However, it is a boon to researchers.
Unlike the ICD-9-CM procedure codes, which are often vague concerning
OCR for page 77
USE OF LARGE DATA BASES
77
the precise nature of the surgical procedure or diagnostic test, CPT-4 records
that information in excruciating detail. We can tell, for example, not just
that a patient received a total hip replacement, but whether it was an origi-
nal replacement, whether it was a conversion of previous hip surgery to a
total hip replacement, or whether it was a revision of an earlier replace-
ment. In the latter instance, we also know whether the revision involved the
acetabular part of the hip, the femoral component, or both. Some examples
of identifying outpatient treatments through the procedure codes would include
hemodialysis for end-stage renal disease patients and chemotherapy for can-
cer patients.
A particular interest of many researchers is how the utilization of services
varies around the country. Unfortunately, only the institutional claims include
information on exactly where the service was provided. The only geographic
identifiers on Part B claims are the carrier (which generally corresponds to
a state) and the reasonable charge locality. The reasonable charge locality
is a fairly arbitrary geographic entity used by the carriers to determine
allowed charges. It provides a finer breakdown than the state, but it is still
fairly crude. In fact, for 16 states, only a single statewide locality is used.
The Part B claims also lack any information on where the patient lives.
This means that population-based measures of utilization and outcomes can
be easily created only for hospital services. The researcher who wants to
study the utilization of ambulatory services must obtain information on the
patient's residence from HCFA's eligibility files and merge it.
Let me mention here an additional consideration when analyzing Part B
claims data. Although Medicare is a national program, each carrier has
considerable flexibility in how it actually processes and pays claims. These
idiosyncrasies can lead the unwary researcher astray.
Permanent pacemaker insertion is a good example of the potential prob-
lems that can be encountered. A number of physicians use the team approach
to pacemaker insertion; a surgeon makes the pocket to hold the device and a
cardiologist inserts the electrodes. Carriers have attempted to recognize the
team approach and reimburse it in a number of different ways. In some
states, each physician submits a bill for pacemaker insertion without any
indication that another physician was involved. The carrier knows which
physicians practice in this way and pays each physician less than if he or
she had performed the procedure independently. The researcher cannot tell
this from the claims data, however, and it will appear as if twice the number
of pacemakers were inserted in that area.
One carrier has dealt with the team approach by having one physician
bill for the insertion, while the other physician bills for pacemaker repair. If
a researcher did not know this ahead of time, it would appear that there
were a lot of pacemaker failures in that particular state.
So far, I have been talking about Part B physician and Part A hospital
OCR for page 78
78
EFFECTIVENESS AlID OUTCOMES IN HEALTlI CARE
claims, but Medicare claims are also available for other types of services,
such as skilled nursing facility and home health care. These claims can be
particularly valuable for examining rehabilitative treatment; one example
might be to look at the care received following hip fracture.
MEDICAID DATA BASES
To date, most research has focused on Medicare patients, for two reasons.
The Medicare program is consuming an increasingly large share of the
federal budget, and the claims data have been readily available (more or
less) from HCFA. Because of problems in data acquisition, the services
received by Medicaid patients have historically received less attention. HCFA
is working on some new data bases that will eventually provide Medicaid
claims in a consistent format for all states. I believe data from about a half-
dozen states are available at the present time.
There are several advantages in using Medicaid claims to analyze effec-
tiveness, either in conjunction with or in place of Medicare claims. For
one, the Medicaid-eligible population encompasses a much wider age range,
thus permitting study of pregnancy and pediatric illnesses. In addition,
there are other important conditions whose incidence is simply not sufficient
to study in the Medicare population. Substance abuse is one example;
another is AIDS. Although the permanently and totally disabled are also
eligible for Medicare coverage, most AIDS patients simply do not survive
long enough to qualify for benefits. A large number do become eligible for
Medicaid, often early in the disease process, and Medicaid claims can be
used to help track the effectiveness of various treatment regimens.
Another advantage of Medicaid claims is that the Medicaid program
covers a wider range of benefits than does Medicare, especially in the areas
of long-term care and prescription drugs. A major disadvantage of Medicare
claims has been that, although the program serves the elderly, it covers only
a small part of long-term care only 150 days of nursing home care per
year, and that care must be in a skilled nursing facility. This means that
studies of patients with chronic conditions requiring ongoing custodial care
(for example, Alzheimer's disease, stroke, or spinal cord injury) will be
able to paint only a partial picture of health care use. Because state Medicaid
programs do cover these services, however, Medicaid claims can be used to
fill some important gaps.
Similarly, because Medicaid pays for most prescription drugs, these claims
can be used to evaluate alternative treatments or to identify a sample of
patients undergoing a given treatment regimen: for example, all AIDS patients
receiving AZT. Data on prescription drugs can be used in many ways. An
obvious one is to compare the effectiveness of drug therapy to surgical
intervention. Another is to look at adverse or unintended consequences of
OCR for page 79
USE OF LARGE DATA BASES
79
specific medications. One researcher, for example, examined the incidence
of hip fracture in patients receiving psychotropic drugs.
One of the main disadvantages of Medicaid claims is that Medicaid recipients
are not representative of the population at large. This is in contrast to
Medicare recipients: a sample of Medicare patients with myocardial infarction
is virtually synonymous with a sample of elderly persons with myocardial
infarction. Another disadvantage is that, unlike Medicare beneficiaries,
Medicaid patients are not always continuously eligible for care. This is
particularly true of recipients of Aid to Families with Dependent Children,
who may be eligible for only some months in a year.
The Medicare Catastrophic Coverage Act passed by Congress last year
would have given Medicare many of Medicaid's data advantages, and thus
research advantages, by expanding coverage. Both the skilled nursing facility
benefit and the home health care benefit were extended, for example, providing
more data on these components of postacute care. Screening mammography
was a brand-new benefit. Most important, the legislation expanded Medicare
coverage to outpatient prescription drugs. Repeal of the Act in late 1989
deprived researchers of the opportunity to broaden the questions that could
be addressed using Medicare claims data and thus expand effectiveness and
outcomes research.