Measuring Prices and Quantities of Medical Care: Improving Medical Care Price Indexes
A national medical care account requires measures of the quantity of medical care services, as does a national health account. Medical care services are the outputs of the medical care sector, and thus also of the medical care account, and they are one of the inputs into the production of health.
In this report, we have taken the position that the units of medical care production are defined by treatments of disease, or treated episodes of illness, which is also our measure of medical services. Some medical services are not associated directly with the treatment of disease; nevertheless, the disease-based metric provides the framework for measuring most medical services. The many difficulties in estimating expenditures by disease have been discussed in Chapter 3. Here we discuss estimating the quantity of services produced by the medical care sector, as well as the associated measure of medical care inflation.
When estimating the output of medical care, the fundamental problem is measuring the price and the quantity of treatments for a disease—the treatment of heart attacks, for example. Once that is accomplished across the range of diseases, it will be necessary to form aggregate measures of medical services and of medical care inflation. The exposition of this chapter benefits from beginning with a discussion of the second step, forming the aggregations.
ALTERNATIVE ESTIMATIONS OF QUANTITY GROWTH IN MEDICAL CARE SERVICES: THE ROLE OF PRICE AND QUANTITY INDEXES
Two methodologies exist for aggregating quantity changes of medical treatments. The first, “deflation,” borrows the methodology conventionally used by
compilers of national accounts: the change in expenditures for some category of goods and services is divided by a price index (a measure of inflation) for that category to obtain the quantity measure.
The alternative method is to aggregate the quantities directly. To do this, a quantity index of treatments is computed (e.g., for heart attacks) using the numbers and types of treatments and their costs, and taking into account differences in severities and in modes of treatment. This approach is less frequently used in the computation of national accounts. However, for countries in which the government provides medical care (so the price charged is not relevant), the direct quantity index approach is an attractive one. Even for the United States, the direct approach has advantages and should be considered, although data for estimating the direct quantity index number approach for medical care are not easy to compile due to the fragmented systems and the difficulty in accounting for variation in severity.
Deflation is the standard national accounts approach to obtaining quantity measures. The term “deflation” describes a process in which the central step is dividing the change in expenditures between two periods by a price index. Deflation results in a measure of the growth of quantities.1 As a medical care example, expenditures on treating heart attacks could be divided by a price index for that service. The result yields a quantity measure—it is an index number—indicating the change in number of treatments for heart attacks. The index also shows the rate of growth in medical services for this medical ailment. Generally accepted procedures for deflation are presented in the system of national accounts (Fisher, 1993, Chapter 16).
The United States and Canada use the Fisher index number system for estimating gross domestic product (GDP). Note, first, that the expenditure on any item equals its price times its quantity, so the expenditure for any group of items equals ΣPQ. Deflation under the Fisher index system takes the following form: change in expenditures / Fisher price index = Fisher quantity index. Algebraically, for periods 0 and 1, the equation is as follows:
The Fisher index number is the geometric mean of Paasche and Laspeyres index numbers. Thus, in equation 4.1, the first term in square brackets, [ΣP1Q0/ΣP0Q0], is a Laspeyres price index, and the second a Paasche price index. The Paasche and
Laspeyres indexes in square brackets are multiplied together and then the square root taken to get the corresponding Fisher indexes, which are represented by the terms within the curly brackets. Thus, the denominator term on the left-hand side of equation 4.1 is equal to the Fisher price index, which equals the square root of the Laspeyres price index times the Paasche price index.
The Fisher price index in equation 4.1 functions as the “deflating price index” or the “deflator.” The Fisher quantity index (the right-hand side of equation 4.1) is a parallel construction.
The Bureau of Economic Analysis (BEA) uses the Fisher index number system for GDP as a whole and for sectoral measures. For example, the output of North American Industry Classification System (NAICS) 62 (medical care and social services) in the BEA industry accounts is a Fisher quantity index.
Note that prices are the weights in the Paasche and Laspeyres quantity indexes and hence in the Fisher quantity index. That is, aggregation of the different treatments—for circulatory disease, digestive disease, and so forth—proceeds by valuing the treatments using their prices. Price weighting of the quantity index on the right-hand side of equation 4.1 is the result of the algebra of the deflation method.
The theoretically correct way to aggregate output is by marginal cost (MC). By the usual competitive assumption, MCs equal prices, which are the weights in the quantity index (the right-hand side of equation 4.1). Thus, deflation preserves (approximately) the theoretical aggregation condition.
The usual national accounts justification for deflation (instead of estimating the quantities directly) is the presumption that prices within a category move more nearly together than do the quantities. Thus, the prices of, for example, different kinds of meat or different grades of beef steak are assumed to move together, although the quantities may not. A small sample of prices can therefore yield an accurate price index for the category.2 A small sample of the quantities, however, is likely to be invalidated by the variance.
The assumption that prices and marginal cost measures are equal is tenuous for any industry.3 However, it is especially problematic in medical care, for which many prices have remote connections to costs. Even in the United States, therefore, the usual national accounts deflation methodology is problematic when applied to medical care. In other countries, the United Kingdom for example, government-provided medical care means the price index for medical care has no
relevance. For this reason, unlike in the United States, quantities of medical care in the UK national accounts are not estimated using deflation methods.4
One might question as well whether Hicksian aggregation holds for medical care. That prices of alternative heart attack treatments move together over time appears unlikely. For this reason, as well, the deflation approach to estimating medical care output is problematic. Movements in quantities may be equally diverse across alternative treatments for the same disease. However, collecting large samples of quantities of medical treatments is at least as feasible as collecting large samples of their prices and possibly less expensive.
Direct Quantity Indexes
Instead of indirectly computing the right-hand side of equation 4.1 using price indexes (justified by the assumption that P = MC), one could calculate the right-hand side of the equation directly, using quantities of treatments and cost estimates. That is (using the Fisher quantity index):
For example, an index would be computed from quantities of treatments; classified by case severity, type of treatment, and so forth; and weighted by the costs (not the prices) of the various treatments. The correct measure is marginal cost; in practice, average cost is more likely to be obtainable and may provide an adequate approximation to marginal cost.
Several European and Oceanic countries are experimenting with direct quantity indexes of medical care. Schreyer (2008) lists Australia, Finland, France, Germany, the Netherlands, New Zealand, Norway, Sweden, and the United Kingdom in this group. In some of these cases, however, the indexes are not computed on a disease basis. Yet price indexes by disease classification (needed for the deflation alternative in section 4.2.1) are relatively rare as well.
It seems advisable to pursue both strategies identified above. We encourage BEA to estimate direct quantity indexes when data can be developed to make their estimation practical. One reason for doing so is that the direct quantity index is the way other countries are likely to proceed, so carrying out similar computations for the United States is important for assessing international comparability. A second reason is that it is by no means clear how long the United States will stand apart from other countries in its method for delivery of medical care. Prices are already not all that relevant to the U.S. medical sector. Costs, however, are
always relevant and medical care reform is likely to increase the need for efforts to measure them more accurately.5
Recommendation 4.1: The Bureau of Economic Analysis should experiment with estimating direct quantity indexes of medical care in addition to its usual deflated measures.
Indexes at Different Levels of Aggregation
We have alluded to the fact that indexes, whether price indexes or quantity indexes, will be computed at different levels of aggregation. An index is also necessary for medical care overall, because the sector involves an aggregation over many different medical conditions. Similarly, an index number is needed to construct a measure for individual types of treatment—for example, circulatory diseases. Such an intermediate-level measure would aggregate treatments for heart attacks, strokes, and so forth.
Even at the lowest feasible level of measurement, indexes are still necessary. For example, many treatments for heart attacks exist; any aggregation of them requires constructing an index number (price or quantity) for this treatment category. An index number is a way of controlling for heterogeneity—differences in treatments, in severity, or in demographics of the patient, as well as other factors. It is hard to conceive of any medical care output that is sufficiently homogeneous that a straight count of the number of treatments, or a simple average of treatment prices (in the price measurement literature known as a unit value index), would be adequate. Either one will be contaminated in the time series by differences in the mix of treatments or severities or other factors that influence both the treatment undertaken and its cost or price (see further discussion of this point in section 4.3).
Put another way, all feasible measurements in medical care are aggregates, because they encounter heterogeneity (in some cases extreme) in the units they encompass. The heterogeneity must be allowed for and controlled in the measurement. Otherwise, a movement toward more effective, but more expensive, treatments may be mistakenly interpreted as an increase in the price of medical care, when it should instead be interpreted as an increase in the quantity of care.
It was for this reason that this chapter opens with an exposition of index number aggregation schemes. Even though in practice one begins by measuring prices and quantities of treatments for particular diseases, they will still be index numbers. These lower level measurements have similar properties to the indexes that describe GDP or any other economic aggregate.
PRICE INDEXES FOR MEDICAL CARE: THE PRODUCER PRICE INDEXES AND THEIR USE IN BEA ACCOUNTS
BEA currently uses components of the Bureau of Labor Statistics (BLS) Producer Price Index (PPI), supplemented with some other price measures, to deflate the medical care components of the National Income and Product Accounts. The PPI indexes are used to estimate output for the existing BEA industry accounts for NAICS 62 and its subsectors. As noted in the previous section, when the receipts of a NAICS industry or sector are deflated by a medical care price index, the result is a quantity index of medical services produced by the industry or sector (equation 4.1).
For the PPI, BLS tracks the changes over time in prices received by U.S. domestic producers for their goods and services. As described in Chapter 2, NAICS provides the organizational structure; in the case of medical care, separate indexes are generated for a general hospitals, psychiatric and substance-abuse hospitals, other specialty hospitals, physicians’ offices (with component indexes by specialty), diagnostic imaging centers, medical labs, nursing care facilities, residential mental retardation facilities, home health care, and blood and organ banks (all of these are NAICS 3-digit subsectors or 4-digit or 5-digit industries).
For medical care PPIs, the recorded price includes reimbursements to the medical care providers from all sources, including the patient, insurance, Medicare, and Medicaid. Unlike the Consumer Price Index (CPI), government payments and payments by insurance companies are included in the PPI medical care indexes (Catron and Murphy, 1996; Murphy and Topel, 2006). The pricing unit varies by industry. Examples of pricing units are a patient’s stay in a hospital for a treatment of a specified diagnosis, from admission to discharge, or the services provided by a physician in one patient visit. For nursing homes, the unit is essentially a day of nursing care.
The PPI hospital methodology, initiated in 1992 with its then-new hospital price indexes, marked a significant advance over what had been done historically. Previously, only the CPI had a medical care component. The PPI covered mostly goods-producing industries. In the old CPI, the unit of measurement was defined by such items as the cost of a day in a hospital or of a visit to the doctor.
The PPI moved to the episodes-of-treatment concept, in which a diagnosis for an in-hospital treatment is priced out for the duration of the inpatient stay. Then, in subsequent periods, BLS asks the hospital what it would charge to treat a diagnosis with the same characteristics—the same severity, the same demographics, and other conditions. Notice that BLS selects a diagnosis in its PPI hospital index sample; it does not sample the treatment, which could change.
When the treatment for the same diagnosis does change, BLS asks the hospital for the cost difference between the new and the old treatments. The cost difference provides the basis for a quality adjustment in the index. For example, suppose that in the initial period the hospital provided a figure of $3,000 as the cost of the existing treatment for a diagnosis. Then, in the subsequent period, a different treatment is used for this diagnosis, and the hospital gives $3,600 as the new cost figure. The hospital, however, also reports that the cost difference between the two treatments (considered at the same time, that is, in a period when both were in use) was $500. In this example, the PPI would record $100—not the full $600—as the price increase for treating the diagnosis.
The PPI index for “general medical and surgical hospitals” is published with category detail that corresponds to chapter and subchapter headings of the International Classification of Diseases (ICD).6 Of the 20 disease groupings that have been published since 1992, index values in June 2008 ranged from 147.1 for infectious and parasitic diseases (1992 = 100) to 229.6 for diseases of the blood and blood-forming organs and immunology.7 Hospital costs have not risen at the same rates across diseases. This provides one demonstration why more detail is necessary for analyzing medical care costs and services than is typically provided by the “hospital” aggregate that is published in the National Health Expenditure Accounts.
Several new hospital disease indexes have been added recently, including treatment of trauma and of HIV. Additionally, there are PPI indexes for specialized psychiatric hospitals and substance-abuse hospitals, as well as for residential mental retardation facilities. These indexes map onto the mental health chapter of the ICD.
The improved methodology in the post-1992 PPI hospital index caused it to grow less rapidly than an index using the older method. Catron and Murphy (1996) suggest several reasons for this in addition to improved methodology. For one, their evidence indicated that hospital charges to insurance and other third-party payers advanced more slowly than hospital charges to individual payers, who alone are represented in the CPI (charges and transaction prices can differ wildly and charges certainly need bear no relation to MC). Even after the CPI
PPI categories can be found in the table for the net output of selected industries and their products on the BLS PPI web page (see http://www.bls.gov/ppi/ppitable05.pdf). The PPI hospital component also contains an index for “other receipts,” which includes revenue from nonmedical operations, such as gift shops.
All these indexes were subsequently rebased to June 2008.
shifted over to a modification of the PPI methodology, PPI hospital indexes have continued to advance more slowly than CPI indexes. Nevertheless, pricing a diagnosis, instead of simply collecting the hospital’s daily charge and ancillary charges, was a substantial improvement, and it seems certain that the improvement removed upward bias from the measure of medical care inflation.
As a pragmatic matter, BLS was able to base its PPI hospital indexes on an episode-of-disease concept because hospitals are compensated both by Medicare and many commercial insurers by diagnostic related groups. For this reason, hospitals have the data that the PPI hospital index requires.
Other PPI Medical Care Components
The PPI hospital indexes, including an index for specialty hospitals, are estimated and published using a disease classification. Hospitals account for about 45 percent of the national accounts medical care sector. However in the past, nonhospital PPIs have not been collected on an episode-of-disease concept. Some nonhospital PPIs—for example, physicians’ offices of obstetrics and gynecology and offices of mental health practitioners—can be readily mapped into the cost-of-disease framework. Similarly, mental hospitals fit naturally within an ICD chapter.
For other segments of the health care sector, data are not yet available that would work in a cost-of-disease framework. However, BLS recently tested collecting prices from other medical care industries according to the cost-of-disease concept used for the PPI hospital index. These include physicians’ offices, medical laboratories, and diagnostic imaging centers. Pharmaceuticals have also been coded by the cost-of-disease classification. These are very positive steps.
BLS plans on grouping these components to achieve ICD indexes that report change in the cost of disease on a wherever-treated basis. That is, in addition to publishing PPIs by industry (e.g., physicians’ offices), they will publish indexes by disease (e.g., circulatory disease). At present, they do not plan to publish cost-of-disease indexes for each industry, so there will not be cost-of-disease indexes for, say, physicians’ offices, comparable to the ICD-level detail currently published for hospitals.
Recommendation 4.2: If sample sizes permit, the Bureau of Labor Statistics (BLS) should publish not only aggregated (over industries) cost-of-disease indexes but also indexes for each industry (that is, hospitals, physicians’ offices, and so forth) by cost-of-disease classification, so that users can examine the components of disease cost and do their own analysis. The panel is encouraged by BLS plans to extend the cost-of-disease framework used in its hospital Producer Price Index to other medical care indexes, including doctors’ offices and other ambulatory care industries. This initiative should be carried forward and given appropriate funding.
Deflation with PPI Indexes
Even though PPI hospital indexes have been available with a cost-of-disease classification for more than 15 years, BEA does not deflate at the disease level. No expenditure data are available by cost of disease, so there is nothing for BEA to deflate. For that reason, BEA uses the aggregate PPI for hospitals as the deflator. This situation underscores how inadequate are data for the medical care sector, compared with other sectors of the economy. In other sectors, PPI indexes usually provide less detail than is available for expenditures, so it is typically the availability of PPI indexes that limits deflation detail, not the availability of expenditure data.
As noted in Chapter 3, the 2007 Economic Census collected revenue from hospitals and from certain other medical care industries by the same diagnostic-related groups classification system used in the PPI. This welcome development was the result of the North American Product Classification System (NAPCS), which developed harmonized product classification systems for use in Canada, Mexico, and the United States. For the United States, the NAPCS was also intended to harmonize classifications used by the Census Bureau and BLS in order to facilitate analysis by users of government statistics and deflation by BEA. The NAPCS specified product detail in medical care that is consistent with the ICD classification.
It is obvious from equation 4.1 that the units of analysis must be the same for the price and expenditure data used in the equation. Lack of product classification consistency across U.S. statistical agencies has in the past forced various expedients upon the agency responsible for deflation. In the immediate future this problem will be resolved by new data from the Census Bureau and from the PPI. Provided that yearly extrapolators of the 2007 census product detail are available, the U.S. statistical system will have gone a long way toward providing expenditures and price indexes for medical care that are harmonized around a cost-of-disease framework.
Recommendation 4.3: The Census Bureau should give high priority to providing annual data for hospitals and other medical care industries, grouped by a cost-of-disease system that matches the one used in the 2007 Economic Census and in the Producer Price Index. The panel notes the welcome provision of new data on receipts categorized by disease that accompany publication of the 2007 Economic Census, though a considerable amount of work remains to be done to bring the quality of these data up to the standards needed for use in official statistics.
Specifically, a concerted effort will be needed to increase the number of census respondents willing to provide information at the disease level of detail.
WHAT IS NEEDED IN A PRICE INDEX FOR MEDICAL CARE?: EVALUATION OF THE PPI
The emergence of price indexes and expenditure (receipts) data that are grouped by a cost-of-disease system makes it essential to examine the existing price indexes, to assess their adequacy for measuring output and inflation in the medical care sector.
BLS has announced its intention to extend the PPI hospital methodology to other indexes. Thus, rather than assessing the existing BLS nonhospital indexes, we examine issues that have arisen in the PPI hospital index. The discussion will apply by extension to indexes for other areas of care or to their development.
Synthetic or “Model” Prices
For most parts of the PPI, BLS uses a fixed-specification pricing methodology in which price comparisons are made only for units that are matched in two periods. For the product or service that was selected in the initiation period (generally by probability sampling), BLS obtains a price in subsequent periods for the exact specification, in order to hold product or service quality constant in comparing prices. For medical care, however, it is very unlikely that a condition selected (on a probability basis) in an earlier period will present itself subsequently in exactly the same form. The sampled price for heart attack treatment one month may be for a 55-year-old male with particular form of heart attack; the next month it may be an 80-year-old with another form. Thus, strict application of the usual fixed-specification pricing method will not work well for pricing medical care.
In order to get repeat price quotes for a fixed diagnosis, BLS obtains a synthetic price for the hospital index. That is, it asks the hospital what it would have charged in the current period for the exact diagnosis that BLS selected in the initiation period, controlling for severity, demographic profile, and other relevant factors (Murphy workshop presentation, in National Research Council, 2009). The method currently used does not adjust for quality by tracking outcomes (although the framework itself would be amenable to that), and it could allow for changing technologies. The synthetic price approach employed for the hospital index (which will presumably be used for the doctor’s office and other medical care indexes planned for the future) was adapted from the “model price” method developed by Statistics Canada for construction price indexes.8 BLS has also applied the model price method to various services industries such as engineering, for which finding the exact item to price in subsequent periods is impossible.
Note that BLS could have estimated a unit value index for, say, heart attacks, but it did not. That is, it could have taken a simple average of the charges for
all heart attack treatments in the hospital in the initiation period, another simple average of the charges for all heart attack treatments in the subsequent period, and let the ratio of those two simple averages form the price index. Such a ratio is called a unit value index in the price index literature. A unit value index implies that all treatments are homogeneous or that any heterogeneity among them can be neglected. Changes in the mix of treatments or in the characteristics of the patient either do not matter or are small enough to be ignored.
On the quantity side, the unit value index implies that a simple count of the number of, for example, heart attack treatments suffices. Bearing in mind the definition of the unit value, (ΣPQ/n), where n is the number of treatments in the period,
where n0 and n1 are the numbers of treatments in periods 0 and 1, respectively. Deflation by the unit value index gives the following:9
Thus, if the unit value index is valid, there is no need to control for the severity of cases or for any other characteristic that might affect treatment. If the unit value index is valid, computing the quantity index is very simple because it is necessary only to count the number of treatments, without controlling for patient mix, severity, or any other characteristic of the patient. And, there is no need to collect the costs, either for deflation or for weighting the quantity index: if the number of treatments is available in both periods (needed in any case to compute the unit value index), there is no need to deflate by the unit value index; one can just use the change in the number of treatments.10
Extreme heterogeneity in medical treatments, however, exists. From a treatment perspective, rarely do two patients have exactly the same illness. And the distribution of patient severity may change over time. For example, as some surgical cases have shifted to ambulatory surgery and out of the hospital, the remaining inpatient cases were more severely ill. Heterogeneity makes problematic both the normal BLS fixed-specification method used for other goods and services as well as the unit value index method. By default, therefore, BLS adopted the approach of estimating a synthetic price for the exact specification that was chosen in the initiation period.
The PPI synthetic price method worked, in the sense that it proved to be practical and gave plausible indexes. However, no real evaluation of the method has ever been carried out, and it is warranted. How do hospitals answer the BLS hypothetical question? Do they just quote from some standard charge list? Do they mark up costs in some way? Do they take shortcuts, such as looking at changes in compensation of personnel? In short, how valid are hospitals’ responses? It seems impossible to make any kind of judgment about the validity of the PPI medical price indexes without knowing more about how the respondents determine their answers to the BLS question.
Recommendation 4.4: The Bureau of Labor Statistics (BLS) should undertake a special study of how hospital respondents estimate their current charge for the detailed specification BLS chooses in the initiation period.
BLS survey statisticians have in the past performed “response analysis surveys,” which examine how respondents reported information to BLS. An example of such a study is Goldenberg, Butani, and Phipps (1993). The proposed study should also encompass the way hospitals have perceived and responded to BLS instructions about changing treatments, to determine why few treatment changes have been reported, along the lines suggested in section 4.3.3.
Quality Change Adjustments for Improved Treatments
It is generally accepted, both in the price index literature and in medical economics,11 that price indexes need to be adjusted to reflect improved medical treatments or (as it is usually called in the price index literature) quality change. Fisher and Shell (1972) and Triplett (1983) show that the appropriate way to make a quality adjustment in an index of output (or in the output price index) when markets are competitive is by the cost difference between the older and the improved treatments. BLS uses the change in cost as a quality adjustment when changed treatments are encountered in the PPI hospital index. An example has already been given in section 4.2.
Although the BLS quality adjustment method corresponds to the dictates of the theory, we note that the conditions of the theory are very restrictive. The restrictions make it problematic when applied to medical care. The theory of the output price index, adopted as the theoretical framework of the PPI, is the theory of a constant input, fixed technology price index.12 It is isomorphic to the theory of the cost-of-living index, which is a constant utility, fixed preference function index. In that fixed input, fixed technology world, the theoretical ideal is to use the difference in production costs between the old treatment and
the new treatment to make a quality adjustment in the index (because otherwise the index would not hold inputs fixed).
An input-cost adjustment is warranted if the quality change does not involve a shift in the underlying production technology. Some quality changes fit this model. For example, computers have over some periods used the same underlying technologies, but improvements make machines faster. Greater hospital resources put into cleaning and sanitation in order to reduce in-hospital infections fits the fixed technology model as do some of the quality improvements in treatments distinguished in the Hospital Compare Project (administering aspirin to heart attack patients, for example).
In medical care, however, many treatment changes involve new technologies. The cataract surgery that moved to a sutureless procedure is a good example (see the quality change section below). It was not a constant technology innovation. This innovation not only improved the treatment but actually reduced its cost, so basing the quality adjustment on the cost change is inappropriate. In addition, it probably improved treatment outcomes further by reducing the likelihood of mistakes and complications from putting in and removing the suture. Treating complications adds to cost, which should be picked up in the price index used for the accounts, although to do so would require that the cost of complications from a procedure be linked to the original episode of treatment. When these costs are eliminated, it should count as a productivity improvement, which would have to enter the accounts through an adjustment to the price index.
In principle one could ask, what would it cost to produce the characteristics or outcome of the new treatment in the old technology? However, the outcome of the new procedure often cannot be produced using the old technology. One could also in principle ask what it would cost to produce the characteristics of the old treatment with the new technology, but that is a nonsense question in the cataract surgery case, since the new sutureless cataract surgery technology so completely dominates the old.
Cases such as those described above demonstrate that the only clear alternative is to revert to medical outcome measures, even though outcomes do not, strictly speaking, comport with the theory that underlies the PPI. This is not an easy solution, either, since outcome measures are few. The role of medical outcomes in national health accounting is twofold: (1) it is required for tracking quality change in the output (the treatments) for the medical care account,13 and (2) it is (as we discuss in Chapter 6) central to the broad health account, since changes in population health need to be attributed categorically to specific treatments of disease.
Frequency of New Treatment Encounters in the PPI
BLS has said that few hospitals report treatment changes for the hospital indexes (Fixler and Ginsburg, 2001).14 This is somewhat puzzling, given the frequent and often major changes that have taken place in medical practices over the years. One might expect that PPI collection would encounter treatment changes frequently, not infrequently. One possibility is that the BLS pricing mechanism discourages reports of changed treatments. Because BLS collects a synthetic price, as described in section 4.3.1., the hospital need not report a treatment change to continue to report to BLS. In principle, that is, BLS tracks a diagnosis, not a treatment, but the hospital may find it easier to report a fixed treatment, despite BLS instructions. However, little is known about this beyond the useful discussion in Fixler and Ginsburg (2001). Clarifying what the hospital is reporting and how it interprets and responds to BLS instructions is part of the research in Recommendation 4.2; the study should shed light on whether the low rate of treatment changes reported by hospitals to BLS is erroneous.
A second possibility combines to an extent with the first. BLS selects for its hospital index a sample of diagnoses from a hospital, and retains the sample for 7 years. Perhaps the BLS repricing cycle is too short to detect many treatment changes or (more likely) too short for an old treatment to disappear fully, even if a new one has been adopted. If the old treatment is not completely supplanted by a new one, the hospital may find it easier to report data on the old one (the respondent likely knows that if a treatment change is reported, that person will be asked for more information, which means more work).
Third, the encounter rate for new treatments that BLS should expect is unknown. It is known that, over a long time period, treatments for nearly every medical condition change, as does the mix of characteristics and ailments of the patients being treated (Fuchs, 1999). But what proportion of treatments might be expected to change in a relatively short interval? If diffusion studies of new treatments, comparable to the famous study of hybrid corn (Griliches, 1957), were available, they would be a better basis for judging whether the rate at which BLS encounters new treatments is really low.15
Recommendation 4.5: The Bureau of Labor Statistics (perhaps with the help of outside researchers) should evaluate the implications of the apparently low rate of encounters with new treatments in its hospital indexes. Examining data from reimbursement protocols could provide a benchmark for how rapidly treatments are changing in medical practices. It is likely that a detailed sample of some medical care components is an appropriate approach. Select-
ing those International Classification of Diseases chapters or subchapters (such as the circulatory diseases chapter and the nervous system and sense organs chapter) for which existing research has shown rapid technological changes in some treatments has the advantage of placing the benchmark in the context of the economic research that is most often cited.
WHEN TREATMENTS MOVE ACROSS INDUSTRIES OR ACROSS ESTABLISHMENTS
We note in Chapter 2 that completed treatments may extend across industry lines—hospitals, doctors’ offices, clinics, laboratories, and skilled nursing facilities are all in different NAICS industries. That mix of facilities poses problems in obtaining the cost of a completed treatment, although we concluded that the problems were relatively manageable.
However, another type of shift between and among different medical establishments is more problematic. Shapiro, Shapiro, and Wilcox (2001) explain that cataract surgeries were once performed in hospitals but are now an outpatient treatment, performed at much lower cost. Moreover, in terms of outcome, the lower cost treatment gives as good an improvement in vision as the more expensive hospital treatment of earlier days, and the authors contend that the less expensive treatment is undoubtedly better for having fewer side effects, particularly discomfort in recovery. Changes in the treatment of cataracts provide an archetype for a broad class of changes in medical treatments that pose severe problems for measuring medical care output and inflation.
Under current PPI procedures for pricing cataract treatments, BLS might sample surgeries that take place in a hospital; the resulting price index for cataract surgery would feed into the published hospital PPI for ICD Chapter 6, Diseases of the Nervous System and Sense Organs (code 366 is cataracts). If surgeries shift to a clinic, then another cataract surgery price index would be obtained, which would be, in principle, a component of the ambulatory care index for ICD Chapter 6.16 Suppose neither hospital nor clinic changed its price. If patients switch from the more expensive hospital surgery to a less expensive ambulatory care facility or a doctor’s office, and if quality remains constant (i.e., if treatment outcomes are comparable), then the cost saving to the patient would be missed in the PPI, even if, as BLS is now planning, the hospital and ambulatory cataract surgery operations are aggregated into an overall index for ICD Chapter 6.
The PPI records prices received by each provider, and, in the example, no provider’s price changed. But if we are interested in the price paid by the patient, by the insurance company, or by Medicare, the price has fallen. The PPI would not pick up this price reduction.
The PPI is constructed on its own concept, an industry price index, and, in the example, none of the industry price indexes is missing anything. But as Berndt et al. (2000) point out, the United States has no national price index for medical care. When a cheaper treatment becomes available in another industry and patients (or their insurance companies) cross industry lines, an aggregation of PPI indexes will not produce an accurate national measure for medical care. What is needed is an index that is capable of capturing the facility shift as a decrease in price.
The cataract surgery problem has been described as a substitution, and substitutions are well known in the price index literature. Another example is that of ulcer treatment moving from a surgery solution to an antibiotics treatment. In either case, the key to understanding the problem is recognition that this substitution is toward the same product at a lower cost; it is not the usual price index “substitution bias” created by shifting demands toward products that have lower relative prices over time.
When medical treatments cross industry lines (the prototypical cataract surgery problem) the problem posed for medical care price indexes is identical, conceptually, to the “discount store problem” much discussed in the CPI literature: pricing products within a store, as BLS does, misses price changes that a consumer experiences from shopping at different outlets (Reinsdorf, 1993; Reinsdorf and Moulton, 1997; Feenstra and Shapiro, 2003). In this case, BLS must compare prices across retail outlets in a way such that the portion of the cost reduction (if any) that is a true price change and the portion (if any) associated with a change in retail services can be estimated. Similarly, for BLS to compare medical treatments across providers in the PPI requires a way to estimate whether the medical outcome is the same when the shift across providers occurs, to partition any reduction in cost into reduction in price, if any, and reduction in outcome, if any. It is clear that some reductions (and possibly some increases) in medical care costs stem from changes in medical care services and are not price decreases.
Some context is needed. In medical care examples, such as the cataract surgery case, the shift in treatment is often a shift away from establishments in one industry (hospitals, NAICS 622) toward others in the ambulatory care subsector of the NAICS (for example, NAICS 621493, freestanding ambulatory surgical and emergency centers). But it need not be. The PPI would also miss price change if one hospital began to offer cataract surgery on an outpatient basis and patients shifted away from hospitals that offered only the conventional treatment with an in-hospital stay.17 The phenomenon requires a shift in treatments across establishments toward lower cost establishments. They need not be in different NAICS industries, although in the medical cases they often are.
The cataract surgery problem has been much discussed recently, but usually in anecdotal terms. Missing so far are quantitative measures. How much of the changes in medical care is characterized by treatments moving toward cheaper yet equivalent treatments? For example, it is unknown even how extensive are these types of substitution changes in ICD Chapter 9 (code 366). And how fast are the changes proceeding, and over what interval? A shift that proceeds within a short time period has much different measurement implications than one that proceeds over many years. How many of these shifts occur in institutional settings where conventional collections miss them? Shifts toward lower cost treatments within hospitals, for example, are picked up in principle by BLS sampling methods for the PPI. Are there also treatment shifts that are cost saving but—unlike the cataract surgery archetype—have worse outcomes?
Approaches to the Cross-Industry Problem
Two empirical approaches seem available. The researcher might gather data from individuals or from claims, computing the total cost of a treatment in two periods. Alternatively, an adjustment might be applied to PPI price quotes across industries or in aggregating PPI indexes across industries. The two alternatives have offsetting empirical advantages and disadvantages.
Suppose a new outpatient treatment becomes available and is potentially a replacement for a hospital inpatient treatment. Minimally invasive surgery is an example. If BLS staff knew about the change (see the discussion in section 4.3.3 of the low rate of treatment improvements reported to the PPI), they could in principle collect the price of the new outpatient treatment and match it to the former in-hospital treatment. Thus, using i and j to designate the patients in the two periods and k and m to designate the establishments doing the surgery, and using 1 and 2 to designate the periods, the price relative18 would be: p2j2m2/ p1i1k1. The notation emphasizes that not only do the periods and the establishments differ, but also the patient and the patient’s characteristics do.
BLS would need, in addition, information about the outcomes of the two treatments, because they would need to make a quality adjustment for any direct comparison across different establishments. Hence, the correct price comparison is
where μ is the outcome measure (valued in dollar units and adjusted for any changes in the severity of the patients being treated), subscripted for 1 (old treatment) and 2 (new treatment). Most of the time, the values of μ are unknown or
must be assumed. If it were known that the minimally invasive surgery was at least as good as the in-hospital surgery, for example, then a direct price comparison without an outcome measure would provide an upper bound on the index, that is, it would understate the price decline. The cataract surgery study by Shapiro, Shapiro, and Wilcox (2001) used this assumption.
It has sometimes been stated that omission of the medical outcome measure may reduce the accuracy of the index, but at least the price change is in the proper direction. But it is clear from manipulation of equation 4.5 that whether the index moves in the right direction depends on whether:
One can count on the “proper direction” presumption only if condition 4.6.b obtains.
Suppose, however, that BLS ignores the introduction of the noninvasive techniques, that it does not compare prices across industry lines (the current situation) because it lacks measures of μ. Suppose, additionally, that a researcher found or estimated industry values for p2i2j2, p1i1j1, μ2, and μ1 ; that is, the researcher has estimated how much cheaper was the invasive surgery than the in-hospital kind and how much better or worse was its outcome. In principle, this information could be used to correct for the BLS omission when the indexes for industries j1 and j2 are aggregated into the price index for medical care.19
This implies a major research agenda. Figuring out how to do the quality adjustment requires much scientific and medical information. Moreover, in many cases an improvement in medical outcomes will need to be traced to multiple sources that apply to different industries. For example, people with bad hips are better off today than 25 years ago, probably mostly because of innovation in the inpatient setting (including devices), but also probably because of improvement in the surgeon’s techniques (which does not show up as a hospital charge), better anesthetic technique, and better rehabilitation techniques after discharge.
In a workshop presentation by Bonnie Murphy of BLS (see National Research Council, 2009), it was noted that the PPI program can, in principle, handle treatment shifts within providers. So, if a cataract surgery was changed from a sutured to a sutureless procedure, and it was performed by the same kind of provider—say, in the hospital, even on an outpatient bases—that shift could be captured. If the nonsuture cataract surgery was performed in a physician’s office, that would not be, as BLS current index sampling procedures cannot accommodate changed treatments that cross providers. Murphy stated that BLS would in principle want to be able to measure price change associated with these kinds of treatment changes.
Using Claims or Household Data
Much of the academic literature has relied on patient claims data to provide a picture of price trends for treating specific conditions. In these studies, the good (or service) has been defined, as in this report, as a completed episode. For example, for a heart attack patient, this may involve time and expenditures on a series of initial treatments plus those that take place during the recovery period At the end of that episode, an estimate of all dollars spent for a patient over the entire period is collected; this forms the basis for pricing a completed episode.
Claims data minimize many of the difficulties with provider-side data. Provided patient links are retained across providers, claims data record the cost of the whole patient episode, and there is no need to join together data from different providers—that is, in the notation used previously, one starts with the observation: Σjcij.
Moreover, claims data are available in very large sample sizes. BEA has been working with a claims data set from insurance companies with 700 million observations. The heart attack study by Cutler et al. (2001) used over 1.7 million Medicare records (they had additional records from a major teaching hospital). In contrast to those magnitudes, the PPI samples appear minuscule: Fixler and Ginsberg reported that the PPI hospital index sample in 1997 consisted of 1,602 price quotes from 209 establishments (down through attrition from 358 at initiation). At the same date, there were 761 quotes in the PPI physicians’ index (Fixler and Ginsberg, 2001, pp. 231, 240). BLS drew new samples in 2001, but no information is available on sample sizes. Clearly, the PPI samples are not nearly so comprehensive as those drawn from claims data.20
Offsetting their undeniable advantages, claims data usually require computing unit values as price indexes. For some databases, there is no basis for matching observations in adjacent periods; however, some longitudinal claims databases do exist, and more may be created. Thus, researchers may increasingly have opportunities to compute the ratio of average prices per patient for two periods.
UNIT VALUES COMPARED WITH SPECIFICATION PRICING
A unit value is simply the value for some category divided by a count of the number of units in that category. For example, unit values for imported autos are derived from the total value of autos imported in some time period divided by the number of cars imported; the unit value index is the change in unit values from one period to the next.
A specification price index for imported autos (for example, the BLS import price index) is based on matching cars in samples for two periods, so that the price change is measured only for a fixed specification. A change in the mix of cars imported will affect the unit value index (more economy cars would lower the unit value index, fewer of them would raise it) but would not affect the fixed-specification index. Of course, the fixed-specification index is not without its own problems: if the car in the sample changes its specifications (quality change), some adjustment must be applied in order to factor out price change from value change because of changes in quality. But few international economists would favor the unit value index over the fixed-specification price index, even if they point to measurement problems with the latter (a recent paper on measurement problems in BLS import and export price indexes is Feenstra, Diewert, and U.S. Office of Prices and Living Conditions, 2001).
For collections from patients, it is almost certainly the case that price indexes will be unit value based. Repeat collections of the price for treating a heart attack, with specified conditions including case severity, demographics, and so forth, are the norm for the PPI index for hospitals. Repeat collections from heart attack patients are not a feasible collection strategy since the same patient typically does not go in for a heart attack treatment each period that a price index is constructed (rather, it will be a different person with a heart attack in the next period), so unit value indexes are the only options. Like other unit value indexes, those created for medical conditions will change with the mix of patient characteristics.
We noted earlier (section 4.3.1) the disadvantages of unit value price indexes—that they treat all observations as equivalent and thus variation among observations is ignored. The implied quantity measure is simply the (equally weighted) change in the number of treatments.
When samples are very large, it may be possible to stratify by some variables, such as severity, for example, to attenuate this problem—although claims data generally lack the clinical detail that is often necessary for an adequate adjustment. One need not compute a unit value index for all heart attacks; if data were available, classes of heart attacks could reduce heterogeneity to an extent. Indeed, a BEA study (Aizcorbe and Nestoriak, 2008) distinguished nine heart attack indexes, depending on the type of treatment received (e.g., acute myocardial infarction with coronary artery bypass graft, acute myocardial infarction with angioplasty). Nevertheless, a unit value index implies that the quantity change within the stratum is simply the change in (the count of) the number of treatments.
How serious is unit value bias?21 What can be said about the empirical liability of the unit value index, compared with the empirical limitations of the
provider-side alternatives? In thinking about these questions, recourse to other situations in which unit value indexes are used is valuable. Houses, like medical treatments, cannot usually be priced with the repeat-sale, matched specification methodology employed in price indexes. The change in the average sale price for houses is frequently published. That is a unit value index. It provides valuable and suggestive information and is often based on large samples, but it is well known that the average sale price varies with changes in the mix of houses sold, even when stratified tightly by geographic area. A hedonic price index for houses is an alternative to the unit value index. The hedonic index controls for changes in the mix of houses sold by holding constant the characteristics of the houses, thus providing a more accurate price index. Hedonic indexes for houses are published in many contexts in economics and property appraisers’ professional literatures.
Pharmaceutical research provides a medical example. Patricia Danzon, in a presentation to the March 2008 workshop (National Research Council, 2009), noted that prices for pharmaceuticals have often been estimated simply by dividing expenditure by number of units. This amounts to a unit value index. In cross-national comparisons, such unit value data have led to the inference that pharmaceutical prices are much higher in the U.S. health care system.
Pharmaceuticals are precisely defined—they are measured at the level of the mechanism of action, the strength, the pack, the manufacturer, etc. This allows researchers to calculate accurate geographic comparisons of utilization and price differences. Danzon and Furukawa (2006) found that a significant portion of the expenditure difference across countries is explained by variation in the drugs being used—the formulations have quality dimensions to them. Simply dividing expenditures by number of prescriptions can vastly overstate price differences. Using the number of prescriptions in the denominator essentially imputes all the expenditure difference across countries to a price change, whereas much of it is in fact attributable to new drugs or new formulations and to generics. Even with very large samples, international comparisons of prices using unit value indexes will give misleading results. Thus, pharmaceuticals are a good example of why accounting for quality differences is important in medical care price indexes.
There is justifiable concern about the price index bias that arises from missing treatments for disease that cross industry lines (e.g., the cataract surgery example). At this point, not enough is known about the prevalence of such cases, especially in the short run, to do more than speculate about the magnitude of this bias. Even so, moving to alternative data sets that minimize the cross-industry bias is attractive.22 The more severe the potential bias is from this source, the more likely that one would accept a method that suffers from unit value bias.
Recall the distinction drawn in Chapter 2 between treatments that may bridge several industries and may shift among them and the cataract surgery type of problem in which the same treatment, or one with equivalent outcomes, shifts from a more expensive to a less expensive industry provider. The distinction is equivalent to the distinction between the normal CPI substitution bias (which is resolved with superlative indexes and current weighting) and the CPI discount store bias.
Aizcorbe and Nestoriak (2008) present results that they interpret as estimates of cross-industry bias. They compute unit value indexes that treat shifts among industries (e.g., in-hospital treatment compared with doctor’s office visits) as price change and compare them with unit value indexes in which the distribution of patients across industries is held fixed. The former rose 1.7 percentage points less than the latter over a 2.5 year period, and for cardiology cases the difference was much larger—17.5 percent over the entire period. It is likely, however, that this estimate includes both the cataract surgery kind of effect and the normal substitution on the part of insurance companies across industries.
It would be useful for BLS, BEA, and others to assess quantitatively how frequently and how fast treatments move into cross-industry treatment situations, as well as to assess the magnitude of unit value bias in treatment price indexes. Both of these research questions involve examining the characteristics of treatments and of their changes, so synergy is likely between them.
Recommendation 4.6: Research on the magnitudes and rates of cross-industry shifts in treatments should be undertaken by the statistical agencies or other researchers, along with research on the magnitude of unit value bias in medical care indexes.
THE WAY FORWARD: STRATEGIES FOR IMPROVING MEDICAL CARE DEFLATORS
The most obvious way to improve medical care price indexes, through the development of cost of disease deflators, is to work to refine the current PPIs. The top priorities should be to:
convert the other medical care industries to cost-of-disease price indexes, when appropriate and practical,
devote more resources to work on the treatment outcomes issue, and
improve methods for adjusting for, or estimating the size of, the cross-industries problem.
This approach could incorporate academic research where appropriate (BEA has never been bound to use unadjusted PPIs, if they think they have new or better information). The idea will be to produce new deflators that incorporate some combination of the strengths of BLS indexes and those emerging from outside researchers’ knowledge, along the lines of the prescriptions for improving mental health accounts in Triplett (2001).
Another option is to investigate alternative databases that could yield price indexes, which might mean abandoning the PPI. Research indexes frequently cited include those developed by Aizcorbe-Nestoriak in a project at BEA along
with some experimental CPI work that has been carried out and presented at various times (see National Research Council, 2009, for a summary).
The state of data is such that it seems promising to encourage both approaches, even if resources are insufficient. Knowledge gained from the second approach will help the implementation of the first. And there is no reason to abandon the PPIs; they are likely to be the building blocks for a medical care account in the immediate future, if for no other reason than that there is nothing else. Success in building a medical care account will give medical economists what economists in other specialties have—a price/output data set that they can improve.