A brief overview of the conduct of health services research, and certain cautions about inferences from such studies, are warranted before individual studies in these areas are reviewed.

Evaluating the Strength of Evidence From Health Services Research

Important questions about the way in which aspects of health care delivery affect outcomes usually cannot be answered with the most powerful research design, the randomized clinical trial. It would, for example, be unacceptable to most individuals recently diagnosed with cancer to be assigned randomly to an insurance plan, hospital, or doctor, although such studies are not impossible. For example, an experiment was conducted in the 1970s to assess how insurance plans and cost sharing affected health care and outcomes among the general population (New-house, 1998). Most of the time, however, considerations of cost and practicality lead health services researchers to conduct observational rather than experimental studies. Often cancer patients are identified retrospectively, through cancer registry data or hospital discharge records, and outcomes are compared across different health care settings or processes of care. Alternatively, individuals with cancer in different settings may be identified shortly after diagnosis and followed prospectively with systematic measurement tools designed to assess different outcomes (e.g., quality-of-life measures). Although they are generally less costly and easier to conduct, nonexperimental studies are subject to a number of potential biases that can make findings difficult to interpret. Any differences observed between study groups could be due to underlying differences in group membership rather than to the intervention or condition being evaluated. Individuals enrolled in health mainentance organizations (HMOs), for example, tend to be younger and healthier than those insured through FFS plans. Comparisons of groups that vary by insurance coverage must therefore control for differences in age and health status. However, information necessary to "adjust" the analysis to compare across groups of like individuals is not always available or may not capture all of the underlying differences.

Some unique features of cancer and its diagnosis make these "case-mix" adjustments very important, but difficult (Dent, 1998). Differential use of screening and diagnostic tests, for example, can bias the results of comparative studies of cancer survival. In a classic study of survival following treatment for lung cancer, Feinstein and colleagues (Feinstein et al., 1985) found higher survival rates for individuals treated in 1977 than in the period from 1953 to 1964. Survival was better for the entire group and for subgroups in each of the three main TNM (tumor-node-metastasis) stages. The more recent cohort, however, had undergone many new diagnostic imaging procedures, which resulted in "stage migration." Many patients who previously would have been classified in a "good" stage were assigned to a "bad" stage. The use of new diagnostic techniques allows patients with unobserved metastases to "migrate'' from TNM stages with a better prognosis (e.g., Stage I or II) into those with a worse prognosis (Stages II and III). The migration would improve survival in the lower stages, because fewer patients with metastases are assigned to them. Migration would also improve survival in the higher stages, since the metastases in the newly added patients were silent rather than overt. This bias was called the Will Rogers phenomenon, after the humorist-philosopher who is said

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement