Read "Assessment of Diagnostic Technology in Health Care: Rationale, Methods, Problems, and Directions" at NAP.edu

« Previous: 1. Rationale for Assessment of Diagnostic Technology

Page 23 Cite

Suggested Citation:"2. The Use of Diagnostic Tests: A Probabilistic Approach." Institute of Medicine. 1989. Assessment of Diagnostic Technology in Health Care: Rationale, Methods, Problems, and Directions. Washington, DC: The National Academies Press. doi: 10.17226/1432.

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Page 34 Cite

Page 35 Cite

Page 36 Cite

Page 37 Cite

Page 38 Cite

Page 39 Cite

Page 40 Cite

Page 41 Cite

Page 42 Cite

Page 43 Cite

Page 44 Cite

Page 45 Cite

Page 46 Cite

Page 47 Cite

Page 48 Cite

Page 49 Cite

Page 50 Cite

Page 51 Cite

Page 52 Cite

Page 53 Cite

Page 54 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

The Use of Diagnostic Tests: A Probabilistic Approach Diagnostic tests and the infonnation that they convey are too often taken- for granted by both physicians and patients. The most important error is to assume that the test result is a true representation of what is really going on. Most diagnostic information is imperfect; although it changes the physician's perception of the patient, he or she remains uncertain about the patient's true state. As an example, consider a hypothetical test. With this test, 10 percent of patients who have the disease win have a negative result (a false- negative result), and 10 percent of the patients who do not have the disease win have an abnormal result (a false-positive result). Thus, when the result is abnormal, the clinician cannot be certain that the patient has the disease: abnormal results occur in patients who have the disease and in patients who do not. There is similar uncertainty if the test result is negative. As long as tests are imperfect, this uncertainty is intrinsic to the practice of medicine. The physician who acknowledges the imperfections of a -diagnostic test win ask, "In view of this test result, how uncertain should ~ be about this patient?" Fortunately, there is a method for answering this question: the theory of probability. This chapter is a primer for applying probability theory to the interpretation of test results and deciding when to do a test rather than treat or do nothing.1 It is divided into five parts: (1) first ~ This chapter is adapted from an article written by one of Me authors (Sox 1986~. The material is covered in greater depth in standard textbooks (Sox et al. 1988, Weinstein 1980~. 23

24 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY principles; (2) interpreting test results: the posttest probability; (3) esti- mating the pretest probability; (4) measuring test performance; (5) ex- pected-value decisionmaking; and (6) the choice among testing, starting treatment, or doing nothing. FIRST PRINCIPLES The way in which one decides to do a diagnostic test is based on two · · prince es. PRINCIPLE I: Probability is a Useful Representation of Diagnostic Uncer- tainty. Uncertainty is unavoidable. How can we best respond to it? A starting point is to adopt a common language. Some express their uncertainty as the probability that the patient has a specified disease. By using probabil- ity rather than ambiguous terms such as "probably" or "possibly," the clinician expresses uncertainty quantitatively. More important, probabil- ity theory allows one to take new information and use Bayes' theorem to calculate its effect on We probability of disease. These advantages are compeHing, and our approach to test evaluation is based on providing the information required to use probability theory to interpret and select diagnostic tests. Example: In a patient with chest pain, past history is very useful when trying to decide whether he or she has coronary artery disease. Patients whose pain is typical of angina pectons and is also closely linked to overexertion are said to have "typical angina pectons." Over 90 percent of men with this history have coronary artery dis- ease. When anginal pain is less predictably caused by exertion, the patient is said to have "atypical angina." About two-thirds of men win this history have coronary artery disease. Physicians who are uncertain about the meaning of aipatient's chest pain often ask the padent to undergo an exercise test. The probability of coronary artery disease after a positive exercise test may be calculated with Bayes' theorem. If the history is typical angina, the probability after a positive test is nearly I.0. If the history is atypical angina, the probability after a positive test is about 0.90. Comment: Estimating the probability of coronary artery disease helps to identify the situations in which the probability of disease will be altered dramatically by an abnormal test.

THE USE OF DIAGNOSTIC TESIS 25 PIUNCIPLE lo: A Diagnostic Test Should Be Obtained Only When Its Outcome Could Alter Me Management of the Patient. A test should be ordered only when forethought shows that it could lead to a change in patient management. How does one decide if a test win alter the management of a patient? There are several considerations. The elect of a test result on the probability of disease. If the probabil- ity of disease after the test will be very similar to the probability of disease before the test, the test is unlikely to affect management. The posttest probability of disease can be calculated by using Bayes' theorem, as discussed later in this section. Example: The probability of coronary artery disease in a person with typical angina pectoris is 0.90. If an exercise test result is abnormal, the probability of disease is 0.98. If the result is normal, the probability of disease is 0.76. Many physicians would conclude that the effect of the results is too small to make the test worthwhile for diagnostic purposes. The threshold mode! of decisionmaking. This approach is based on He concept that a test is judged by its effect on the probability of disease (Pauker and Kassirer 1975, 19801. The mode] postulates a treatment threshold probability, below which treatment is withheld and above which it is offered. In this situation, a test is only useful if, after it is performed, the probability of disease has changed so much that it has crossed from one side of the treatment threshold probability to the over. If the posttest probability were on the same side of the threshold as He pretest probabil- ity, the decision of whether or not to treat would be unaffected by the test results, and the test should not be ordered. One must estimate He benefits and the harmful effects of treatment in order to set the treatment threshold probability. Example: Some patients with suspected pulmonary embolism are allergic to the contrast agents that are used to perform a pulmonary arteriogram, the definitive test for a pulmonary embolism. Many physicians say that if faced with this situation, they would start anticoagulation if they thought that the patient had as little as a 5 to 10 percent chance of having a pulmonary embolism. Thus, their treatment threshold probability is 0.05 to 0.10. Elect of test results on clinical outcomes. Even if a test result leads to a change in management, if the patient will not benefit, the test should not

26 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY have been done. Thus, one is concerned not only with the test itself but also the efficacy of the actions that are taken when its result is abnormal. Example: Investigators have calculated Me average improvement in life expectancy that results from We management changes foBow~ng coronary arter~ography in patients wad stable angina pectons. The analysis shows that mid~e-aged men win gain, on average, approxi- mately one year from undergoing coronary artenography and coro- nary bypass surgery if severe disease is present (Stason and Fin- eberg 1982~. This test does have an effect on clinical outcomes. Marginal cost-effectiveness of the test. This measure of test perform- ance is a way to characterize me efficiency with which additional re- sources (dollars) are translated into outcomes Longevity). It takes into account the increased costs from doing a test and me incremental benefit to the patient. A test result may lead to a good outcome, such as improved longevity, but the increase in cost for each unit of increase in longevity may be so high that there is a consensus that the test should not be done. INTERPRETING TEST RESULTS: THE POSTTEST PROBABILITY The inte~pretation.of a test result is an important part of technology assessment. A test with many false-negative and false-positive results will be interpreted with far more caution than a test with few such misleading results. Therefore, measuring the performance characteristics of a test is important, because the clinician must know them in order to interpret the result. Important Definitions The probability of disease after leaping the results of a test is called the posttest probability of disease. It is the answer to the question, '~What does this test result mean?" One calculates the posttest probability with B ayes ' theorem, which is denved from the first principles of probability and requires both the pretest probability of disease and two measures of the accuracy of the test. One measure is called the sensitivity of the test 2 See also He Glossary of Terms at the end of dais chapter.

THE USE OF DIAGNOSTIC TESTS 27 (true-positive rate, or TPR). It represents the likelihood of a positive test in a diseased person, as is shown in the following equation: Sensitivity = number of diseased patients with positive test number of diseased patients . Example: There have been many studies of the exercise electrocar- diogram. In these studies, a padent with-chest pain undergoes both the exercise electrocardiogram and a definitive test for coronary artery disease, the coronary arter~ogram. About 70 percent of pa- tients who had a positive artenogram also had a positive exercise electrocardiogram (as defined by the presence of at least ~ mm of horizontal or downsioping ST segment depression). Thus, accord- ing to this result, the sensitivity of an exercise electrocardiogram for coronary artery disease is 0.70. The second measure of Test accuracy is its false-positive rate, Me likelihood of a positive result in a patient without disease. Specificity, the true-negative rate (TNR), is 1 minus the false-positive rate. False- number of nondiseased patients with positive test positive = rate number of nondiseased patients . Example: The studies of the exercise electrocardiogram have shown mat about 15 percent of patients who did not have coronary artery disease nonetheless did have an abnormal exercise electrocardio- gram. Thus, the false-positive rate of the exercise electrocardiogram for coronary artery disease is 0.15. Likelihood ratio. The likelihood ratio is a measure of how much the result alters the probability of disease. Likelihood ratio probability of result in diseased patients probability of result in nondiseased patients . We can use this definition to define a positive test result and a negative test result. A positive test result raises the probability of disease, and its

28 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY likelihood ratio is TO. The likelihood ratio for a positive test result is abbreviated as ER(+3. A negative test result lowers Me probability of disease, and its likelihood ratio is between 0.0 and I.0. The likelihood ratio for a negative test result is abbreviated ER(-3. Example: If an exercise test is positive, me likelihood ratio is 0.70 divided by O.IS, or 4.666. The odds of having coronary artery disease increase by a factor of 4.666 if an exercise test is abnormal. (Odds are defined in me Glossary of Terms.) If an exercise test is negative, the likelihood ratio is 0.30 divided by 0.85, or 0.35. When an exercise test is negative, the odds of having coronary artery disease are 0.35 times the pretest odds. Bayes' theorem uses data on test performance in me following way. In these formulas, TPR (true-positive rated is used in place of sensitivity, and FPR is used to denote false-positive rate. TNR denotes the true-negative rate, and FAR denotes the false-negative rate These terms are defined in me Glossary). The pretest probability of disease is represented by p(D). Probability of disease if test = · · IS positive Probability of disease if test = is negative p(D) x TPR . p(D) x TPR + [ 1 - p(D)] x FPR p(D) x FAR p(D) x E;NR + [1 - p(D3] x TNR The probability of a positive test result equals the probability of a true- positive result plus He probability of a false-positive result. Probability of positive = p(D)xTPR + ([l-p(D)]xFPR. test result B ayes ' theorem can be written in a simplified way that facilitates cal- culation. This form is called the odds-ratio form of Bayes' theorem. Posttest odds = pretest odds x likelihood ratio.

THE USE OF DIAGNOSTIC TESTS 29 Example: A clinician is planning to use an exercise test with a sensitivity (TPR) of 0.7 and a false-positive rate (FPR) of 0.15. Suppose the pretest probability of disease, p(D), is 0.30: Probability p(D) x TPR of disease if = - - test positive p(D) x TPR + t} - P(D)] x FPR .30 x .70 .21 .30 x .70 + .70 x .15 .21 + .105 = .667. The pretest odds are .30/.70 = 0.43 to I.0. The likelihood ratio for the test is .70/.15 = 4.667. Posttest odds = pretest odds x likelihood ratio = 0.43 x 4.667 = 2.0 to 1.0. Odds of 2.0 to I.0 are equivalent to a probability of 0.66. The importance of Bayes' theorem in interpreting a test is that it defines the relationship between pretest probability and posttest probability, which is shown in Figure 2.~. The relationship between these two entities has several implications. The interpretation of a test result depends on the pretest probability of disease. If a result is positive, the posttest probability increases as the pretest probability increases (Figure 2.la). If the result is negative, the posttest probability decreases as the pretest probability decreases (Figure 2.Ib). The consequence of this relationship is that one cannot properly interpret the meaning of a test result without taking into account what was known about the patient before doing the test. This statement is inescapa- bly true, because it is based on first pnnciples of probability theory. The effect of a test result depends on the pretest probability. The vertical distance between the 45-degree line in Figure 2. ~ and the cuIve is the difference between the pretest and Me posttest probability. When the clinician is already quite certain of the patient's true state, the probability of a disease is either very high or very low. When the pretest probability is very low, a negative test has little effect, and a positive test has a large effect. When the probability is very high, a

30 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY 1 .0- , LL 0.8 _ ~ 6 - 0.6 {D ~ o o ~ CL CL C,0 0.4 en ~ ~ CO In o 0.2 / / - / 0~0' 1 1 1 1 1 1 1 1 1 1 0.0 0.2 0.4 0.6 0.8 1.0 PRETEST PROBABILITY FIGURE 2.1 Relationship between pretest probability and pastiest probability of disease. Figure 2.1a The pastiest probability of disease corresponding to a positive test result was calculated with B. ayes ' theorem for all values of the pretest probability. The sensitivity and specificity of the hypothetical test were both assumed to be 0.90. negative test has a considerable effect, and a positive test has lithe effect. This example shows Hat a test result that confirms one's prior judgment has little effect on the probability of disease. Tests have large effects when the probability of disease is intermediate, which corresponds to clinical situations in which the physician is quite uncertain. Tests can also be useful when Weir result does not confirm the prior clinical impres- sionfor example, a negative result in a patient who is thought vein likely to have a disease. The pretest probability affects the probability that a positive or nega- live test result wid occur. The higher the pretest probability, the more likely one is to experience a positive test. Conversely, a negative test is less likely as the pretest probability increases.

THE USE OF DIAGNOSTIC TESTS 1.0 - - - J _ _ m ~ 0.6 m c, O ~ C: Z - ~ 0.4 co ~ In o 31 0.8 0.2 / / - / / / / l 0.0- ~ I- . , . . . 1 . 0.0 0.2 0.4 0.6 0.8 1.0 PRETEST PROBABILITY Figure 2.lb The pastiest probability of disease corresponding to a negative test result was calcu- lated win B ayes' theorem for ~1 values of Me pretest Probability. The sensitivity and specificity of the test were both assumed to be 0.90. The posttest probability depends on the sensitivity and the false-posi- tive rate of the diagnostic test. This relationship is one reason to be concerned about measuring test performance accurately. The Assumptions of Bayes' Theorem B ayes ' theorem is derived from first principles of probability theory. Therefore, when it is used correctly, the result is reliable. Errors in using Bayes' theorem can occur when people ignore several assumptions. One assumption of Bayes' theorem is mat sensitivity and specificity are constant, regardless of the pretest probability of disease. This assump- tion can be false. A test may be less sensitive in detecting a disease in an early stage, when the pretest probability is low, Han it would be in an

32 ASSESSMEN7 OF DIAGNOSTIC TECHNOLOGY advanced stage, when mere are many signs and symptoms and the pretest probability is high. This error may be avoided by dividing the study population into subgroups that differ in the extent of clinical evidence for disease (Weiner et al. 1979~. A second assumption is that the sensitivity and false-positive rate of a test are independent of the results of other tests. This conditional in~e- pen~ence assumption! is important when Bayes' theorem is used to calcu- late the probability of disease after a sequence of tests. The posttest probability after the first test in a sequence is used as the pretest probabil- ity for the second test. In an ideal study of two tests, bow tests in the sequence and a definitive diagnostic procedure have been performed on many patients. The sensitivity and specificity of the second test in the sequence are calculated twice: in patients with a positive result on the first test and in patients with a negative result on the first test. If the sensitivity and specificity of the second test are the same, they are said to be conditionally independent of the results of He first test, and the condi- tional independence assumption is valid. In practice, the conditional inde- pendence assumption is seldom tested, and the clinician should be cau- tious about using recommendations for sequences of tests. THE PRETEST PROBABILITY OF DISEASE Why is the pretest probability of disease an important concept in understanding He assessment of diagnostic technology? The pretest proba- bility is required to calculate the posttest probability of disease, and thus to interpret a diagnostic test; it is also the cornerstone of the decision whether to treat, to test, or to do nothing. A patient's pretest probability of disease encodes He individual's own clinical findings and is one of the ways in which a decision can be tailored to the patient. Knowing how to estimate the pretest ~ probability is an essential clinical skill and is de- scr~bed in the section that follows. When is testing particularly useful? Padents are particularly unlikely to benefit from testing when the pretest probability is very high or very low. If the pretest probability is very high, the physician is likely to treat the patient unless a negative result raises doubts about the diagnosis. The posttest probability of disease after a negative test may be so high that treatment is still indicated. Example: In a patient with typical angina pectoris, the posttest probability aher a negative exercise test is 0.76. Most physicians

THE USE OF DIAGNOSTIC TESTS 33 would begin medical treatment for coronary artery disease even if the probability of disease were considerably less than 0.76. For these physicians, the decision to treat would not be affected by the nonnal exercise test result. If the pretest probability is very low, as occurs in screening asymptomatic individuals, the clinician is likely to do nothing unless a positive test result raises concern. If, for example, the pretest probability is less than 0.001, the posttest probability may be less than 0.01. In this situation, a change in management is not indicated. Figure 2. ~ shows that the greatest benefit from testing is likely to occur when the pretest probability of disease is intermediate. This corresponds to a clinical situation in which Mere is uncertainty about the patient's true state. Patients are also likely to benefit from testing when the pretest probability is close to a treatment threshold probability. At this point, it requires only a small change in the probability of disease to cross the threshold and alter management. Physicians customarily use their intuition to estimate me probability of disease. The two principal influences on probability estimates are per- sonal experience and the published literature. Using personal experience to estimate probability. To estimate proba- bility, the physician should recall patients with characteristics similar to the patient in question, and then try to recall what proportion of these patients had disease. This cognitive task is forbiddingly difficult. In practice, the assignment of a probability to a clinical situation is largely guesswork. There are several cognitive principles for estimating probability (Tver- sky and Kahneman 1974~. These principles are caned heuristics. A clinician is using the representat~veness heuristic when he or she operates on the principle Cat "If the patient looks like a typical case, he probably has the disease." Thus, if a patient has all the findings of Cushing's disease, he is thought very likely to have the disease itself. The representativeness heuristic can be misleading, because it leads the physician into ignoring the overalD prevalence of a disease. It can also lead to error if the patient's findings are poor predictors of disease or if the physician overestimates probability when there are many redundant pre- dictors. Additionally, the clinician's internal representation of the disease may be incorrect because it is based on a small, atypical personal experi- ence. Clinicians are using the availability heuristic when they judge the

34 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY probability of an event by the ease with which similar events are remem- bered. This heuristic is usually misleading. Individuals often adjust from an initial probability estimate (the an- chor) to take account of unusual features of a patient. The anchoring and adjustment heuristic is an important principle. It is equivalent to someone planning a trip by public transportation; the first step is to identify the subway station that is closest to the destination. The person then walks Hugh Be neighborhood of the station to the final destination. Bayes' theorem is the best way to make adjustments from the initial anchor point. Published experience. The reported prevalence of a disease in a clinical population is a useful stardng point for estimating the probability of the disease. The physician can then modify this initial estimate to take into account the patient's clinical findings. Most published studies have two important shortcomings. The first drawback is that these studies usually lack the data required to estimate probability. A typical description will report the prevalence of a clinical finding in patients with a disease, rather than the prevalence of various diseases in patients with a clinical symptom. The anchor point for estimating probability is the prevalence of a disease in patients with a particular clinical finding or diagnostic problem. Thus, a typical study win report the prevalence of weakness in patients with Cushing's disease, when what is needed is the prevalence of Cushing's disease in people complaining of weakness. Published studies fall short in another way. They report the prevalence of a finding in patients with a disease, which is the sensitivity of the finding; they do not report its prevalence in patients who do not have the disease, which is the false-positive rate of the finding. The most useful type of study also reports the prevalence of a finding in patients who were initially suspected of having the disease but were proven not to have it. One reason for these shortcomings is that studies are often done by specialists who report on patients referred to them with a disease. Studies should be done by primary care physicians who keep track of everyone in their practice with a particular clinical complaint, eventually identifying ah patients as either having or not having a particular disease. Clinical prediction rules. These are denved from systematic study of patients with a diagnostic problem, and they define how combinations of clinical findings may be used to estimate probability (Wasson et al. 1985~. One well-known rule is designed to help a preoperative consultant esti- mate the probability that a person scheduled for surgery will have a cardiac complication during surgery (Goldman et al. 1977~. The rule

THE USE OF DIAGNOSTIC TENTS t to 6 m 0.6 O ~ a: co Oh o 0.8 0.4 ~ ~ s .U~ / FP=.t,~ / I ~/~.20 1 // .0.0 1 0.0 0.2 0.4 0.6 0.8 1.0 PRETEST PROBABILITY 35 FIGURE 2~2 Effect of test sensitivity and specificity on pastiest probability. Flgure 2.2a As seen in Me upper family of curves, die false-positive rate (denoted by FP) of a test is an important factor in detemurung die pastiest probability after a positive test. The false- positive rate, however, has a very small effect on the pastiest probability after a negative test result, as seen in die lower family of curves. designates the clinical findings that are the best predictors of a complica- tion and assigns a numerical weight to each. The clinician measures the 'preoperative score" by taking the sum of the numerical weights of each finding. He or she then estimates the probability of a complication by noting the frequency of complications in prior studies of patients with · · slml tar scores. MEASURING THE PERFORMANCE CHARACTERISTICS OF DIAGNOSTIC TESTS This section describes what many would regard as the central issue in the assessment of diagnostic tests: how to measure their performance characteristics. As shown in Figure 2.2, the sensitivity arid specificity of

36 Figure 2.2b ~ , ASSESSMENT OF DIACNOSIIC TECHNOLOGY 1.0 0.8 - m m 0.6 o tr CO 0.4 In o to v.v 0.0 - T P -- 9 S/// ///TP - .60 ~ TP-.80/ //~ iTP - .9 5 I . 1 ~ 1 I 0.4 0.6 0.8 1.0 PRETEST PROBABILITY least sensitivity (denoted by TP) has relatively little effect on the nostm~t nrohah;l:- after a positive test, as seen ire the upper family of curves. probability after a negative test. however (lower family ^, ^ id. AL provable It does affect the pastiest am- - ACES ll~WC.Vt;I (lower Emily of curves), p~icul~rlv ~,~ ~- ictcSL probability is high. --at Roy when tne a test determine its effect on the probability of disease and, therefore, how the test should be interpreted. Studies that measure the sensitivity and false-positive rate of a test are important, but they are difficult to perform. Many apply only to a nanow spectrum of patients, and studies of the same test in different institute may lead to discrepant results. _ · - I ~~1AO E'campfe: Computed tomography (car) is often used to determine the extent of a newly discovered lung cancer and thus whether removing the cancer has any chance of curing the patient. As shown in Table 2.1, a survey of studies of CI in lung cancer patient shown wide variation in the results. ~ ~ r__l~llL OllU WE

THE USE OF DIAGNOSTIC TESTS 37 TABLE 2.l True-Positive Rate and False-Positive Rate of Computed Tomography for Detecting Mediastinal Metastases from Lung Cancer True- False- Positive Positive Likelihood Likelihood Study Rate Rate Ratio(+) Ratio (-) 1 .29 .54 .5 1.5 2 .51 .14 3.6 .6 3 .54 .32 1.7 .7 4 .57 .15 3.8 5.0 5 .61 .19 3.2 .5 6 .74 .02 37.0 .3 7 .80 .24 3.3 .3 8 .85 .11 7.7 .2 9 .88 .06 14.7 .1 10 .94 .37 2.5 .1 1 1 .95 .36 2.6 .08 12 .95 A1 2.3 .08 13 .95 .32 3.0 .07 SOURCE: Inouye and Sax 1986. Figure 2.3 shows the consequences of this wide vananon in measured test performance characteristics: the probability of mediastinal metasta- ses if the CI scan is abnormal and if it is normal. The data used to calculate the posttest probability, for a pretest probability of 0.50, were taken from two of the studies in Table 2.1. Depending on which study is used, the interpretation of the test varies widely. In one case, one may interpret a test result as indicating that disease is present if the test is positive and absent if the test is negative. Using data from another study, one cannot conclude anything from a test result, because the probability of disease is changed very little by the test results. This example shows forcefully how much clinical decisions can depend on high-quality stud- ies of test performance. The discussion of the measurement of test performance characteristics starts with a description of some of the terms used in describing and interpreting studies of test performance. The design of a typical study is as follows. A series of patients undergo the test under study and a second test that is assumed to be a perfect indicator ofthe patient's true state. The results are displayed in Table 2.2.

38 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY pretest probability =0.5 Lest normal | study A study B test abnormal O .1 .2 .3 .4 .5 .6 .7 .8 .9 1 .0 probability of mediastinal metastases FIGURE 23 Posttest probability of mediastinal metastases. The pretest probability is 0.5. Study A and Study B denote two studies of the performance of the Cl scan in detecting mediastinal metastases. The pastiest probability was calculated with Bayes' theorem, using the true-positive rate and false-positive rate from each of the two studies. TABLE 2~2 Test Perfonnance Measurement Disease Disease Test Result Present Absent Positive A B Negative C D Total A+C B+D NOTE: True-positive rate (sensitivity) = A/(A + C); false- negative rate = C/(A + C); false-positive rate = B/(B + D); true-negative rate (specificity) = D/(B + D).

THE USE OF DIAGNOSTIC TESTS 39 Designing Studies of Test Performance The principal problem with most studies of the operating characteris- tics of tests is that the clin~caRy relevant population differs from the study population (for definitions of unfamiliar terms, see Glossary of Tenns). Selective referral may result in as few as 3 percent of the clin~caBy rele- vant population who received the index test being referred for the gold- standard test (Philbrick et al. 1982~. Those who design studies of test perfonnance need to ask the following questions, which are based on the wow of Philbrick and Feinstein (Philbr~ck et al. 19801. Do the Patients in the Sadly Population Closely Resemble the Patients in the Clinically Relevant Population? Early in the history of a test, the discrepancy between these two groups may be particularly striking. Often the nondiseased subjects are normal volunteers, for whom the false-positive rate of the test will be lower than the expected in the clinically relevant population. Often the diseased patients are very sick indeed, because an early goal of study is to be sure the test can detect disease. If only the sickest patients are included, the true-positive rate win be higher than in the clinically relevant population. Was an Abnormal Result on the Index Test a Criterion for Referring the Patientfor the Gold-Standard Test? Ideally, the answer is no, and Me index test is obtained routinely on patients who have been referred to have the gold-standard test for other reasons. Refernng physicians are much more apt to refer patients with an abnormal index test result and are unlikely to refer patients with a nega- tive index test, because the latter is seen as presumptive evidence against the disease. When the index test is a referral criterion (workup bias), the true-positive rate and the false-positive rate will both be higher than would be expected in the clinically relevant population. If the index Test or the Gold-Star~ard Test Required Visual Interpreta- tion, Was the Observer Blinded to All Other Information About the Pa- tient? - When the observerts interpretation of one test is influenced by knowl- edge of ~e results of the other test, the concordance between the two

40 ASSESSMENT OF DIAGNO=IC TECHNOLOGY results is likely to increase. Test-review bias refers to He situation in which the index test is interpreted by someone who knows the results of the gold-standard test. Diagnosis-review bias refers to the opposite situ- ation, in which the gold-standard test is interpreted by someone who knows the results of the index test. Both of these biases increase the true- positive rate and reduce the false-positive rate. Were the True-Positive Rate and False-Positcve Rate of the Test Meas- ured in Clinically Relevant Subgroups of Patients? Most study populations contain a spectrum of padents, whose disease state varies in clinical seventy and in anatomic extent. An average figure for true-positive rate and false-positive rate may conceal clinically ~mpor- tant differences among subgroups. The true-positive rate may be higher, for example, in patients win extensive disease than in those with early or mild disease. The ideal study provides the true-positive rate and false- positive rate in clinically defined subgroups and in subgroups defined by anatomic extent of disease. Was Interobserver Disagreement Measured? Experts often disagree on the interpretation of images or tracings. Two clinicians can provide different answers to the same question. Which interpretation is to be believed? The study protocol should provide for independent interpretation of study data by two or more observers. Inter- observer disagreement should be calculated. Is the Gold-Stan~rd Procedure an Accurate Measure of the True State of the Patient? The sensitivity and false-positive rate should be measures of a test's ability to predict the patient's true state. In fact, they are measures of the index test's ability to predict the results of the gold-standard test. If the gold standard does not reflect the patient's tine state perfectly, one will be unable to interpret the results of a test as a measure of disease.

THE USE OF DIAGNOSTIC TESIS 41 Is the Study Population Described Carefully Enough to Allow Compari- son to the Clinically Relevant Population? The demographic and clinical characteristics of the study population must be presented in enough detail to permit a determination of me applicability of the findings to the patients in a particular clinical selling. Choosing a definition of an abnormal result. Most studies of test performance define sensitivity and specificity in relation to a single cutoff value of a continuous variable. Much information may be lost when test results are defined as dichotomous variables, such as "positive" and "negative." Many test results are expressed as a continuous variable, such as the serum concentration of creating phosphokinase. A very high serum concentration of creating phosphokinase is much more indicative of myo- cardial infarction than a serum concentration that is just above the upper limit of normal. When sensitivity and specificity are known for each point on a continuous scale, the posttest probability can be calculated for any test result. The relationship between the true-positive rate and the false-positive rate of series of cutoff points may be expressed graphically. The graph is caned a receiver operating characteristic (ROC) curve. The ROC curve was first used to express He performance of radar systems in distinguish- ing warplanes from other objects on the radar screen. Figure 2.4 shows a ROC curve for the exercise electrocardiogram. The ROC curse ex- presses graphically a basic rule: as you adjust the cutoff point to detect more diseased patients, you inevitably label more nondiseased patients as having disease. Example: The ROC cuIve in Figure 2.4 shows that there are very few false-positive results when one chooses 2.5-mm ST-segment depression as the definition of an abnormal result. Nevertheless, few patients with coronary artery disease have such an extreme result on the exercise electrocardiogram, and there would be many false- negative results if this cutoff point were chosen. By instead choos- ing I-mm ST-segment depression to define an abnormal result, one detects many more patients with disease, but there are far more false-positive results than when 2.5-mm ST-segment depression was chosen.

42 ASSESSMENT OF DIAGNOSTIC TEClINOLOGY 1 0 :> .6 1 I_ o 8 4 . , - _ r t>1.0 / ~5 / 2 1 ` >2 5 - ~ > 5 2 4 6 8 1 0 false-positive rate FIGURE 2.4 A ROC curve for the exercise electrocardiogram as a predictor of signif~- cant coronary artery disease. The numbers represent the amount of ST-segment depression (measured in milluneters) that is used to define an abnormal exercise test. How does one choose He optimum cutoff point on the ROC curve? The optimum point is determined by the pretest probability of disease (pED]) and by the ratio of the costs of treating nondiseased patients as if they had disease (C) to the benefits of treating diseased patients (B) (Metz 197X). Slope of ROC curve (] - p[D]) C at the optimum operating point pED] B . The slope of the ROC cunre is relatively steep for points that are close to the origin, where both the tme-positive rate and the false-positive rate are low. The clinician should choose a cutoff point near the origin when the disease is rare or the treatment is dangerous; this choice will serve to minimize bow the number of false-positive results and the danger to nondiseased patients. The slope of the ROC curve is flat near the upper

THE USE OF DIAGNO=IC TENS 43 right hand corner, where the true-positive rate is very high and false- negative results are uncommon. The clinician should choose a cutoff point in this area when the patient is very likely to have disease or when the treatment is safe and effective. This choice win minimize false- negative results in a situation where Hey would be very harmful. EXPECTED-VALUE DECISIONMAKING Expected-value decisionmaking is the central idea behind quantitative approaches to decisionmaking when the outcomes are uncertain. Physi- cians cannot be right ad the time. Given our limited understanding of the biologic factors that underlie He response to treatment, some patients win always have done better if they had been treated differently. Since the physician cannot always make the right recommendation for an individual patient, he or she should choose a decisionmaking strategy that win maximize the number of good outcomes that are seen during a lifetime of making decisions. This strategy is caned expected-value dec~sionmak- ing.3 The decisionmaker chooses the option that has the largest benefit when averaged over ad patients. Expected-value decisionmaking is a straightforward concept. The prob- lem lies in applying it to padent care. How does one calculate an average value for a management alternative? How does one place numerical values on the outcomes of an illness? These questions are best answered by example. Ex mple: Consider a treatment decision for a patient with chronic pancreatitis. The patient himself and the internusts caring for him favored operating on the patient's pancreas. The surgeons were not enthusiastic, citing the high mortality of the operation. In making their case, the internists decided to calculate the expected value of medical therapy and surgery. They represented choice between surgery and continued medical management by the decision tree shown in Figure 2.5. The first node (represented by a square) represents He decision to operate or not, and there are two branches, one for the surgery option and one for the medical treatment option. 3We use '~value" as a general term for that which one tries to maximize in decision- making. Strictly speaking, one might speak of expected-outcome decisionmaking, in which the outcome could be life expectancey or a measure of preference for Me outcome states.

44 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY su rgery ' 0=.05 medical treatment survive pain unchanged |post-op death | 40 [pain unchanged! lPain resolved I - fIGURE 2~ A decision tree for deciding between surgery ar~d medical management of chronic pancreaiitis. The square represents a decision node. The circles represent chance nodes. The square represents a decision node. The rectangles represent terminal nodes for He venous outcome states, and the numbers enclosed within the rectangles are the products of Be length of life and the quality of life in the outcome state. Setting Up the Decision Tree Surgery: The first node on the surgery branch is a chance node (represented by a circle), which represents uncertainty about whether the patient would survive the operation. The patient may survive or may die, but the true outcome of the operation is unknown and can only be represented by a probability. On average, the mortality rate of the operation is 5 percent, which seemed a reasonable representa- tion for this padent, who was otherwise well. The next uncertainty was the outcome of treatment. Only about 60 percent of patients obtain relief of pain after surgery. The possible outcomes are represented by terminal nodes (shown as a rectangle). Each out- come is assigned a quantitative measure, such as the life expectancy in that outcome state. This patient's life expectancy was 20 years. Medical treatment: Because management associated with the medical option does not change, there are no chance nodes, and the patient's life expectancy is 20 years.

THE USE OF DIAGNOSTIC TESIS 45 Weighing the Outcomes for Quality of Life The patient's life expectancy was 20 years if he survived the operation, and it was thought to be the same regardless of whether he experienced chronic pain or was pan-free. The patient pointed out that 20 years of life with chronic pain was equivalent to 12 years of being pain-free. In other words, to be free of pain he was wining to give up eight years of life win chronic pain. This memos for weighing the length of life in a certain state of health by a factor that represents the quality of life in that state is called the "time trade-off' method. It is described in standard textbooks (Wein- stein et al. 1980, Sox et al. 19881. Calculating the Expected Value of the Treatment Options The average (or expected) outcome is calculated by taking the product of aU the probabilities along a path to a terminal node and multiplying it by the value assigned to Me terminal node. The management alternative with the highest expected value is usually the preferred choice. In this case, the expected length of life, measured in healthy years, was 16.8 years for surgery and 12 years for medical management. The surgeons were convinced by this analysis and scheduled the patient for surgery. Note that expected-value decisionmaking allows one to balance the risks and benefits of treatment. These factors are usually considered intuitively. By assigning a value to each outcome and weighing it by the chance that it win occur, expected-value decisionmaking alBows one to integrate risks and benefits. THE CHOICE AMONG DOING NOTHING, TESTING, OR STARTING TREATMENT The art of medicine is making good decisions with inadequate data. Physicians often must start treatment when stilU uncertain about whether the patient has the disease for which He treatment is intended. If treat- ment is started, there is a risk of causing harm to a person who does not have the disease, as well as the prospect of benefiting the person who does. If treatment is withheld, a person who is diseased win be denied a chance at a rapid, effective cure. This situation is often unavoidable, and the physician has three choices.

46 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY · Do nothing: the chance of disease is low, treatment is either hammed or ineffective (or both), and a false-positive result might occur and lead to handful treatment for someone who does not have disease. · Get more irg~ormatzon: do a test or observe the patient's course in the hope that Me correct choice win become apparent. · Start treatment now: the chance of disease is relatively high, treat- ment is safe and effective, and a false-negative result might lead to withholding useful treatment from someone with disease. The method for solving this problem analytically is caned me thresh- old mode! of medical decisionmaking (Pauker and Kassirer 1975' 1980; Doubilet 19X3~. The threshold mode] is an example of expected-value decisionmaking that is applied to a particular type of decision. The key idea is We treatment threshold probability, which is the probability of disease at which one is indifferent between treating and not treating. The basic principle of the threshold mode] is the following dictum: Do a test only if the probability of disease could change enough to cross the treatment threshold probability. Three steps are required to translate this idea into action. Step ·: Estimate the pretest probability of disease. Step 2: Set the treatment threshold probability. This step is difficult because it requires the clinician to express the balance of the risks and benefits of treatment in a single number. One can use clerical intuition to set the treatment threshold probability. This task is made easier by the following relationship: C Treatment threshold = C + B where C is the cost of treating nondiseased patients, and B is the benefit of treating diseased patients (Pauker and Kassirer 1975~. The cost and the benefit must be expressed in the same units, which can be dollars, life expectancy, or a measure of the patient's attitudes toward treatment and the disease. Note that when the costs of treating nondiseased patients equal the benefits of treating diseased patients, the treatment threshold is 0.5. Thus, for many treatments, the treatment threshold probability will be less than

THE USE OF DIAGNOSTIC TESTS 47 0.50. For a safe, beneficial treatment, such as antibiotics for commuruty- acquired pneumonia, the treatment threshold probability may be less than 0.10. If there is good reason to suspect disease, the pretest probability win be above the treatment threshold. In deciding whether to perform a test, Me clinician must ask whether the posttest probability after a negative test result would be below the treatment threshold probability. This question is answered by taking Step 3, descnbed below. One can also use analytic methods to set the treatment threshold proba- biiity (Sox et al. 1988~. Consider me decision tree in Figure 2.6, which shows a hypothetical problem in which treatment must be chosen despite uncertainty about whether the patient has the disease for which the treat- ment is intended. To use We decision tree to estimate the treatment threshold, recall that this threshold is the probability of disease at which one is indifferent between treating and not treating. First, one assigns values to each of the probabilities and outcome states except for the probability of disease. Second, one calculates the expected value of the two options, leaving the probability of disease as an unmown. Third, one sets the expression for the expected value of the treatment option equal to that of the nontreat- ment option. Fourth, one solves for the probability of disease. To use a decision tree, one must assign a probability to each chance node and a numerical value to each outcome state. The latter value could be life expectancy. Altematively, as shown in Figure 2.6, one could assign each outcome state a Utility which is a quantative measure of relative preference. A utility of I.0 is assigned to the best outcome, and a utility of 0.0 to the worst. The utility of each intermediate outcome state is then assessed on this scale of 0.0 to I.0. When utility is used as the measure of outcome, the altemative with the highest expected utility should be the preferred alternative. Step 3: Use Bayes' theorem to calculate the posttest probability of disease. If the pretest probability is above the treatment threshold, one must calculate the probability of disease if the test is negative. If the pretest probability is below the treatment threshold, one must calculate the probability of disease if the test is positive. If the pretest probability is far enough above or below the treatment threshold, a test result will not affect management because the posttest probability will be on the same side of the treatment threshold as the pretest probability. There is a pretest probability for which the posttest probability is exactly the point at which one is indifferent between not

48 ASSESSMENT OF DIAGNOSTIC TECHNOLOGY operative death p=03 U O no tumor SURGERY I, NO SURGERY 1 L tumor survive surgery U=.98 operative death p_.O3 U O I P[tumor] ~ Tsurvive ~ r tumor U=.25 , |p[tumorj| U=.98 p=.48 no cure U 23 |no tumor U 1 0 FIGURE 2.6 A decision Bee for choosing between treatment and no treatment when the clinician does not know whether He patient has the disease for which treatment is indi- cated.

THE USE OF DIAGNOSTIC TESTS if} 49 ............................................................................................................................ . ~ ........................................................................... ... . . ... at re at ,,, ., ,., ., , ~ ......... ..... DON'T TEST [ no treat| ~ '~ ~ DON'T TEST | | no treat| | · ;.;.;;; ; ; ;;;; ; ;;; ;; ;;;; ;; ;;; ;;;;; ;;; ;;; ; ;;; ; ; ............................................................................................. ................................. ........................................................... ............................................................ ~ ~ .,, ,., ,,,.,, treat ......................................................................................................................................................................................... TOSS-UP Llnotreatl ............................................................................................. treat ...................................................... ........................................................................... ................................................................ ........................................................................... - 3~ . ............................... ................................................................... ........................................................................... TCOT ~ ~ ~ ~ . _-, no treat area ~ ................................................................... ..... -.- - - - ....................................................................... ........................... FIGURE 2.7 Illustration of how to set die no treat-test threshold. As pO, the pretest probability, is gradually increased, the pastiest probability is first below the treatment threshold, then equal to it, and finally above it. At the point where pO equals the treatment threshold, one should be indifferent between not treating and testing. This probability is the no treat-test threshold. treating and testing (the treatment threshold probability). Below this pretest probability, caned the no treat-test threshold (Pauker and Kassirer 1980), a positive test result could not increase the probability of disease enough to cross the treatment threshold, and both testing and treatment should be withheld. Above this threshold, the posttest probability will exceed the treatment threshold, and testing is indicated. These concepts are illustrated in Figure 2.7.

so ASSESSMENT OF DIAGNOSTIC TECHNOLOGY |do nothing | ~ , - ; l ~ l l , |treatment | threshold pretest p[D] ~ ~ ' t.~.~.~.~ ~ ~ ~ ~ ~ ~ ~ ~ ~ "~".'."'"""""''""""~"~"""""'"""''t""'"''"""""~'"."'""''"'."'~'e'e''" T E ~ T .... .. , :., . . ; .... .. ...... ........ ......... ~ _ .50 1.0 probability of disease FIGURE 2~8 Using the treatment threshold probability to help decide whether to do a test. One can use the same approach to calculate the point at which one should be indifferent between testing and treating (the test-treatment threshold). Both of these thresholds are a function of the true-positive rate of the test, the false-positive rate of the test, the treatment threshold, and a measure of what experiencing the test means to the patient (the cost of the test). Figure 2.8 shows the three zones of the probability scale. Using Figure 2.8, one needs only to estimate the pretest probability to know whether testing is the preferred action or whether one should treat or do nothing. SUMMARY The purpose of this chapter is to provide a working knowledge of how probability theory and expected-value decisionmaking are used to help make decisions about diagnostic testing. Past studies of diagnostic tests have measured only test performance. A complete evaluation should provide information about the treatment threshold and how to estimate the pretest probability of disease. With this information, the clinician can decide when a test will alter management and can use the results to choose the action that will most benefit the patient.

THE USE OF DMGNOSTIC ~:STS GLOSSARY OF TERMS 51 Bayes' theorem: an algebraic expression for calculating the posttest probability of disease if the pretest probability of disease [p(D)] and the sensitivity and specificity of a test are known. Clinically relevant population: the patients on whom a test is normally used. Cost-effectiveness analysis: comparison of clinical policies in teens of their cost for a unit of outcome. Marginal cost-~ectiveness: the increase in cost of a policy for a unit increase in outcome. False-negative rate: the likelihood of a negative test result In a diseased patient (abbreviated F~. False-negative result: a negative result in a patient with a disease. False-positive rate: the likelihood of a positive test result in a patient without a disease (abbreviated FPR). False-positive result: a positive result in a person who does not have the disease. Gold-standard test: the test or procedure that is used to define the true state of the patient. Index test: He test for which performance is being measured. Likelihood! ratio: a measure of discrimination by a test result. A test result with a likelihood ratio >~.0 raises the probability of disease and is often referred to as a "positive" test result. A test result with a likelihood ratio <~.0 lowers the probability of disease and is often called a "nega- tive" test result. Likelihood ratio = probability of result in diseased persons probability of result in nondiseased persons .

52 ASSESSMENT OF DIAGNO=IC TECHNOLOGY Negative test result: a test result mat occurs more frequency in patients who do not have a disease than in patients who do have the disease. Odds: the probability. Odds = probability of event - probability of event . Positive test result: a test result that occurs more frequently in patients with a disease than in patients who do not have the disease. Posttest probability: the probability of disease after the results of a test have been reamed (synonyms: posterior probability, posttest risk). Predictive value negative: probability of the absence of the disease if a test is negative. Predictive value positive: probability of a disease if a test is positive. Pretest probability: the probability of disease before doing a test (syno- nyms: prior probability, pretest risk). Probability: an expression of opinion, on a scale of 0.0 to I.0, about the likelihood that an event will occur. Sensitivity: the likelihood of a positive test result in a diseased person (synonym: true-positive rate, abbreviated TPR). Sensitivity = number of diseased patients with positive test number of diseased patients . Specificity: the likelihood of a negative test result in a patient without disease (synonym: true-negative rate; abbreviated TAR). Specificity = number of nondiseased patients win negative test number of nondiseased patients -

THE USE OF DIAGNOSTIC TESTS 53 Study population: the patients for whom test performance is measured (usually a subject of the clinically relevant population). Treatment threshold probability: the probability of disease at which the clinician is indifferent between withholding treatment and giving treat- ment. Below the threshold probability, treatment is withheld; above the threshold, treatment is given. True-negative result: a negative test result in a person with a disease. True-positive result: a positive test result in a person with a disease. REFERENCES Doubilet, P. A mathematical approach to interpretation and selection of diagnostic tests. Medical Decision Making 3:177-195, 1983. - Goldman, L., Caldera, D.L., Nussbaum, S., et al. Multifactorial index of cardiac risk in non-cardiac surgical procedures. New England Jour- nal of Medicine 297:845-850, 1977. Greener, P.F., Mayewski, Ret., Mushlin, A.I., and Greenland, P. Selection and interpretation of diagnostic tests and procedures: Principles and applications. Annals of Internal Medicine 94(part 2~:553-560, 1981. ouye, S.K., and Sox, H.C. Standard and computed tomography in the evaluation of neoplasms of the chest. Annals of Internal Medicine 105:906-924, 1986. Metz, C.E. Basic principles of ROC analysis. Seminars in Nuclear Medicine 8:283-298, 1978. Pauker, S.G., and Kassirer, I.P. The threshold approach to clinical deci- sion making. New England Journal of Medicine 302:1109-1117, 1980. Pauker, S.G., and Kassirer, J.P. Therapeutic decision making: A cost- benefit analysis. New England Joumal of Medicine 293:229-234, 1975. Philbrick, I.T., Horwitz, At., Feinstein, A.R., Langou, R.A., and Chan- dler, I.P. The limited spectrum of patients studied in exercise test research: Analyzing the tip of the iceberg. Journal of the American Medical Association 248:2467-2470, 1982. Philbrick, I.T., Horwitz, R.~., and Feinstein, A.R. Methodology problems of exercise testing for coronary artery disease: Groups, analysis, and bias. American Joumal of Cardiology 46:807-812, 1980.

54 ASSESSMENT OF DIAGNOSTIC TEClINOLOGY Ransohoff, D.F., and Feinstein, A.R. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. New England Joumal of Medicine 299:926-930, 1978. Sox, H.C., Blatt, M.A., Higgins, M.C., and Marton, K.~. Medical Deci- sion Making. Boston, Buttenvor~, 1988. Sox, H.C. Probability theory in the use of diagnostic tests: An inDoduc- tion to critical study of He literature. Annals of Intema] Medicine 104:60-66, 1986. Stason, W.B., and Fineberg, H.V. Implications of alternative strategies to diagnose coronary artery disease. Circulation 66(Suppl. III):80-86, 1982. Tversky, A., and Kahneman D. Judgment under uncertainty: Heuristics and biases. Science 185:1124-1131, 1974. Wasson, I.H., Sox, H.C., Neff, R.K., and Goldman, Lo. Clinical prediction rules: Applications and methodologic standards. New En~lan`1 Imlr nal of Medicine 313:793-799. 1985. ~ ~ _ _ _ Weiner, D.A., Ryan, T.~., McCabe, C.H., et al. Exercise stress testing: Correlation among history of angina, ST-segment response and preva- lence of coronary-artery disease in the Coronary AItery Surgery Study (CASS). New England Joumal of Medicine 301:230-235, 1979. Weinstein, M.C., Fineberg, H.V., Elstein, A.S., Frazier, H.S., Neuhauser, D., Neutra, R.R., and McNeil, B.~. Clinical Decision Analysis. Phila- delphia, W.B. Saunders, 1980. Weinstein, M.C., and Stason, W.B. Foundations of cost-effectiveness analysis for health and medical practices. New England Joumal of Medicine 296:716-721, 1977.

Next: 3. Assessment: Problems and Proposed Solutions »

Assessment of Diagnostic Technology in Health Care: Rationale, Methods, Problems, and Directions (1989)

Chapter: 2. The Use of Diagnostic Tests: A Probabilistic Approach

Welcome to OpenBook!

Get Email Updates