Appendix A
INTERPRETATION OF DIAGNOSTIC RESULTS

Establishing and Comparing Test Performance

Properly establishing and then understanding the performance of diagnostic tests is crucial to the design and operation of infection control and eradication campaigns incorporating these tests. Most diagnostic tests are imperfect, so that often a test result cannot be interpreted with certainty, and properly establishing test performance is technically demanding and expensive. This is particularly true for many Johne’s disease (JD) tests, which undoubtedly has impeded JD control and eradication efforts.

A test result is often classified into a category, such as positive or negative. For bacterial culture, a positive test may be the appearance of colonies on the medium that has a colony morphology consistent with that of the bacteria of interest within the appropriate timeframe. A negative test would be the failure of growth of any colonies with the appropriate morphology to appear by the time incubation is ended. For a serologic test such as Enzyme-Linked



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 175
Appendix A INTERPRETATION OF DIAGNOSTIC RESULTS Establishing and Comparing Test Performance Properly establishing and then understanding the performance of diagnostic tests is crucial to the design and operation of infection control and eradication campaigns incorporating these tests. Most diagnostic tests are imperfect, so that often a test result cannot be interpreted with certainty, and properly establishing test performance is technically demanding and expensive. This is particularly true for many Johne’s disease (JD) tests, which undoubtedly has impeded JD control and eradication efforts. A test result is often classified into a category, such as positive or negative. For bacterial culture, a positive test may be the appearance of colonies on the medium that has a colony morphology consistent with that of the bacteria of interest within the appropriate timeframe. A negative test would be the failure of growth of any colonies with the appropriate morphology to appear by the time incubation is ended. For a serologic test such as Enzyme-Linked

OCR for page 175
Immunosorbent Assay (ELISA), a positive test result might be the appearance of a particular color marker of sufficient intensity (optical density) that it is above a cutoff value. In this example, a negative test result would be when the color intensity is below the cutoff value. Further, because of the many variables affecting the test mechanics, the actual cutoff value for each run usually has to be calculated from control results within that test run. The mechanics of many of these tests are sufficiently complex and resource intensive that their performance is often laboratory- and operator-dependent, making protocol standardization difficult at best. When test outcomes are classified as positive or negative, present or absent, the result can be wrong in two ways. A positive result can be a true positive, correct, or a false positive, wrong. A false positive occurs when the condition being tested for is actually not present in the animal, but the test indicates that it is. For example, in the case of fecal bacterial culture, a false positive could occur when organisms in contaminated feed or water are consumed by an animal, and then pass through the animal rather than infect it (e.g., Sweeney et al., 1992a). Alternatively, false positive cultures could occur from accidental laboratory contamination or other error. In the case of a serologic test such as ELISA, the false positive result could occur because the animal responded to an antigen in their environment that is immunologically similar to the target antigen from Map that is used in the test. A negative test result can be a true negative or a false negative. A false negative test means that the condition is present in the animal but the test indicates that it is not. In the case of fecal culture, a false negative could occur because the number of organisms in the fecal specimen is too few to be detected, but the animal is infected (e.g., Whitlock et al., 2000b). In the case of serologic tests, the test could be a false negative because an infected animal has not mounted the particular immune response that the test detects, or has mounted a weak immune response that is below the threshold of detection as a positive result. The measure that is most useful when interpreting an uncertain test result is the predictive value, which is the likelihood that a test result is correct. The positive predictive value (PPV) is defined as the likelihood that a positive test result is a true positive and the negative predictive value (NPV) is defined as the likelihood that a negative test result is a true negative (Last, 1995). These predictive values depend on the prevalence of the condition being tested for in the population being tested, meaning both PPV and NPV are different when the test is used in an uninfected group compared to when the test is used in an infected group with a high prevalence. Intuitively, a positive test is more likely to be a false positive in a herd without any history of having any animals diagnosed with the infection. Similarly, a negative test is more likely to be a false negative in a herd with a history of having many animals with confirmed disease. As a consequence, predictive values are not useful for comparing test performance across groups of animals with significantly different infection prevalences.

OCR for page 175
Because of the dependence of predictive values on the actual infection prevalence (as opposed to the apparent test prevalence) in the group being tested, they are generally not useful as a basis for comparing test performance even though they are often reported. Instead, tests are most usefully compared on the basis of their respective epidemiological sensitivity and specificity values. Sensitivity is defined as the probability that an animal with the infection will test positive (true positive,) and specificity is defined as the probability that an animal without the infection will test negative (true negative), respectively (Last, 1995). It is important to note that test sensitivity depends on the spectrum of disease in the group of individuals in which the test is being used. Because the disease process is more advanced in an individual with advanced clinical disease, a test will usually be more sensitive in that animal than in an individual that was just infected and pathological changes are not as advanced. Similarly, a test will work better in a herd with long-established disease than in a herd with recently introduced disease, since the long-infected herd will have both a higher prevalence and more individuals with advanced disease. An important note is that in addition to changing prevalence, prior testing with removal based on the test results can markedly change the spectrum of disease and thus test sensitivity and specificity in that group. Bayes Theorem relates predictive values and prevalence to sensitivity and specificity (Last, 1995). Understanding these relationships is critical to making appropriate testing decisions, both in deciding which tests to employ and how to interpret positive and negative test results. Note that although the concepts are related, epidemiologic sensitivity and specificity are different from analytic sensitivity and specificity. Analytic sensitivity is the ability of the test procedure to detect an analyte that is the result of the target infection, such as an antigen or an antibody. Analytic specificity is the ability of the procedure to discriminate against other analytes that are not the result of the target infection but are closely related, such as cross-reacting antigens or their antibodies. For example, the analytical sensitivity of conventional culture procedures is the probability of detecting a given number of organisms per gram of a particular type of sample. With a greater number of organisms per gram, the analytic sensitivity increases. The type of diagnostic sample, such as feces versus lymph node tissue, has a major impact on analytical sensitivity because feces have many more competing organisms and contaminants than do lymph node tissues. Because the number of organisms in feces is quite variable and may be sufficiently low as to have a very low probability of detection in a given fecal sample, even in clinical animals, epidemiologic sensitivity of fecal culture in clinical animals is less than 100 percent. Epidemiologic sensitivity will be even lower in subclinically infected animals because of their lower level of shedding. Establishing epidemiologic sensitivity and specificity values properly is an expensive process. First, a sufficient number of animals in two groups, the infected and the infection-free, must be studied to provide reasonable confidence intervals on the estimates. Second, the disease spectrum of the study animals must be representative of the disease spectrum in the animals to which the test will be applied. For establishing sensitivity, the disease spectrum for JD ranges

OCR for page 175
from the clinically affected animals, which are usually older, to animals that are silently infected with little pathological change or immunological response, which tend to be younger. Typically, there are far more of the latter than the former. If the group of study animals is biased toward those that are more severely affected, the estimate of test sensitivity will be biased upward. For establishing specificity, the infection-free animals must have been exposed to the cross-reacting or competing conditions at the same frequency as the groups in which the test will be used. For example, for immunologically based tests such exposures may be similar environmental mycobacteria or other bacteria with cross-reacting antigens. One of the difficulties of establishing test specificity is that the exposures that lower specificity are likely different in different geographic regions and different livestock species. Third, the appropriate gold standard, a definitive reference testing procedure with very high sensitivity and specificity, must be used to establish the infection status of the study animals. At present, the gold standard is necropsy followed by extensive culture and histological examination of multiple sections of lower small intestine and associated lymph nodes to reliably establish the infection status of each study animal. Based on the assumption that acquisition of infection as an adult is rare, an alternative approach is to follow previously tested animals to slaughter, allowing the progression of the natural history of the disease to a more advanced and thus easier to detect state. Often, rather than such an intensive, expensive investigation, other strategies are used, such as using combinations of antemortem tests. The problem with this compromise is that it changes the assessment of test performance from an absolute measure to a relative measure of unknown bias. Given that the spectrum of disease is weighted toward those subclinical animals that are difficult to detect, this approach likely biases estimates of test performance upward. The result is that as technology improves, the estimations of conventional test performance have been moving downward. It is also important to note that unless the gold standard test is applied, the resulting prevalence estimate is an apparent prevalence. Deriving a valid estimate of true prevalence from an apparent prevalence requires that both the disease spectrum and competing condition exposure be the same in the tested group as the group from which the estimates of test performance were derived. At best, these are tenuous assumptions.