Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
APPENDIX C 205 APPENDIX C PITFALLS IN DESIGN, ANALYSIS, AND INTERPRETATION This appendix provides detail on some of the basic issues involved in the design, analysis, and interpretation of contemporary epidemiologic studies on air pollution. BIAS, CONFOUNDING, AND CHANCE The three most formidable influences on study validity are bias, confounding, and chance. BIAS Bias is the consequence of any technique used in a study that produces results that tend systematically to be on one side or the other of âtrueâ values. It is sometimes referred to as systematic or nonrandom error. Random error, in contrast, falls equally on both sides of true values. Bias can arise during any phase of study activity--design, data collection, analysis, or interpretation. Texts and review articles describe the many types of bias that epidemiologists have identified.12 Chapter 2 and Chapter 3 provide the basis for an understanding of biases that arise from the improper assessment of either exposure or effect. An instrument used for measurement, such as a questionnaire or a sampling device, obviously can produce a systematic error. In air pollution research, far more serious biases are likely to result from improper understanding (or âmodelingâ) of exposure or effect, which can also produce estimates for individual subjects that are consistently too high or too
APPENDIX C 206 low. For instance, an acute effect might be underestimated if the influence of concurrent respiratory infection is ignored or the appropriate lag time between exposure and response is miscalculated. Ozone exposure might be systematically overestimated for a person who spends all day in an air-conditioned building. The major practical impact of bias is the misclassification of study subjects with respect to the extent of exposure, the effect, or both. Whether biases in the data lead to overestimates or underestimates of risk depends on how these classification errors are distributed among groups that are assumed to have various exposures. If there is a tendency to assign greater values to effects in subjects with greater exposure, misclassification can increase the likelihood of a false-positive result, or overestimation of risk. If there is a tendency to assign lower values to effects in subjects with greater exposure, misclassification can increase the likelihood of a false-negative result. Finally, if the degree and direction of the bias in measurement are indifferent to the presumed magnitude of exposure, subjects will be randomly misclassified. In the latter case, what is a nonrandom error on the level of individual measurement behaves like a random error at the level of population measurement. The net effect will be a bias toward negative studies or underestimation of risk, because the real contrasts between exposure groups in the study will be diluted, black and white will become gray, and the magnitude of the association between exposure and effect will be underestimated--or the association might disappear altogether. A true relative risk of 12 (risk of disease in exposed group is 12 times risk in nonexposed or less- exposed group) might be estimated as a relative risk of 2. In air pollution studies, where true relative risks tend to be rather low, a relative risk of 2 could disappear. It is clearly preferable to eliminate or reduce bias during the design phase, but, with proper planning, it can sometimes be estimated (even after data collection), and its potential distortion of study results assessed. The techniques for measuring some types of bias include focused substudies (e.g., comparison of telephone interviews with a subsample of personal interviews), use of multiple control groups (e.g., comparison of lung-cancer cases with both other-cancer and noncancer
APPENDIX C 207 controls), and comparison of baseline data on subjects lost to followup with those on persons followed up. CONFOUNDING Confounding variables are factors that vary with exposure and are causally related to the outcome of interest. They can vary directly or inversely with exposure, and the causal relation can be positive or negative. The net impact of a confounding variable on a study's result is determined by the direction and magnitude of its associations with exposure and effect. Cigarette smoking, for example, is usually positively correlated with exposure to ambient air pollution exposure (heavier smokers tend to live in more polluted areas) and to many forms of respiratory illness.9 Failure to control for the effects of cigarette smoking will tend to produce a false or spurious association between air pollution and illness. Other uncontrolled confounders can interfere with the detection of a true association. Confounding can ultimately be viewed as a form of bias in which, by definition, errors are unevenly distributed among exposure groups. In a specific study, this bias might cause either a false-positive or false-negative result. Epidemiologists use three basic techniques to control for the impact of confounding variables or unevenly distributed biases. First, study populations can be selected so as to reduce or avoid these factors. For example, a study of lung cancer in nonsmokers or a study of lung function in young children would completely avoid confounding by cigarette smoking; a study conducted in a nondeveloped country might avoid biases due to air conditioning. Second, matching or stratification can be used during the design phase to make the prevalence of a confounding variable similar among groups to be compared. Third, when data have been collected, subjects can be analyzed in groups stratified by particular characteristics, such as age or smoking, or the effect of confounding can be adjusted for with multivariate analysis.
APPENDIX C 208 CHANCE Epidemiologists must be sensitive to the role of chance in influencing the results of a study. Chance operates in several fundamentally different ways, of which the most important are sampling error (choice of subjects), random error in measuring exposure or effects, and random error of inference regarding overall study results. Some random error in measuring exposure, effect, or a confounding variable is unavoidable, even with the most precise instruments. To minimize it, we can increase the precision of the instrument or take more samples so that the average measurement more closely approaches the truth. Random error does not constitute a dominant problem in air pollution epidemiology, compared with the problems presented by nonrandom errors. There is always some probability that the results of a given study are due substantially to chance (sampling variability). The field of statistics has developed a set of techniques for estimating and expressing this probability. First, if the result of a study is negative, the probability that this result is due to chance alone is assessed by calculating the study's statistical power. The higher the statistical power, the smaller the likelihood of a false-negative result--i.e., the more likely that the lack of an effect is genuine. Power is a function of sample size, magnitude of the association sought, and prevalence of exposure in the study populations. It is important to note that statistical power, as conventionally thought of, is not affected by nonrandom error, so a very âpowerfulâ study could be insensitive because of nonrandom errors in the data. Power can be calculated after a study to assess the validity of a negative result. But it must be considered during study design. The values of parameters of a power calculation can be estimated during study planning, and the minimal sample needed to detect an effect of a given size with acceptable certainty can be determined. The calculations must take into account the need for adequate numbers of individuals within critical groups in the population; the numbers must be large enough to allow
APPENDIX C 209 stable rates and average values for each group, whether defined by race, sex, day of testing, or extent of exposure. In some instances, the sample size will be fixed in advance or have an upper limit, because of the established number of persons in an exposed cohort or of cases available for study. By definition, exposure to ubiquitous air pollutants is widespread, and the size of the exposed population is rarely a major constraint. Matters of power and sample size should be explicitly addressed during the design phase, both to make it possible to terminate planning of work that has little likelihood of success and to ensure that negative findings, if they occur, will be properly interpreted. Some negative findings, of course, are of great value and should be pursued with vigor (and large samples). The likelihood that a positive study result could be due to chance alone is usually estimated by calculation of the statistical significance, or the p (probability) value. The p value reflects the effects of both random error and nonrandom error (bias), but statistical theory (as well as tabulations of critical cutoff values) assumes that only random error is present. Consequently, calculation of p values explores only a part of the real likelihood that a particular study result is due to chance. The wider use of confidence intervals instead of p values in reporting the results of air pollution research would add useful information about the size, direction, and certainty of any departure from the null hypothesis. Several reviews of the generic problems involved in using p values are available. It suffices here to say that future air pollution studies will benefit from the recognition that statistical significance is far easier to calculate than it is to interpret. GROUP DATA VERSUS INDIVIDUAL DATA A critical part of study design involves decisions as to whether individual exposures and individual outcomes will be estimated or will be available only in aggregated form (e.g., by family, by city block, or by community). Aggregation, or grouping of data on individuals, generally reduces the sensitivity of a study design. When a group of individuals have been assigned to a given exposure or effect category, important variations among them are
APPENDIX C 210 obscured and information is lost. Lack of awareness of the individual variation can lead to misclassification; for instance, some members of a group with small average exposure might actually have had moderate or large individual exposures. In air pollution studies, this problem occurs typically when exposure data from central stations, rather than individual monitors, are used. As pointed out in Chapter 3, individual or personal monitoring is not feasible or necessary in every study. Some studies, designed to answer very important questions, require large numbers of people. Cohort studies designed to provide data on the effect of ambient air pollution on changes in pulmonary function over time are an example. Even larger populations must be studied if the health outcome variable is a morbidity event. Such events as hospital admissions for asthma, although common in a medical context, are relatively rare among the population as a whole. Therefore, one has to observe a very large population, if one is to detect an impact of air pollution on the number of these events within a reasonable period. The Ontario Air Pollution Study, for example, detected a daily excess of 22 asthma admissions due to air pollution among nearly 6 million exposed and observed people.1 The large population studied provided the needed statistical power, but also dictated the use of aggregate, rather than individual, exposure data. This tradeoff between cohort size and precision of information is unavoidable. Both small-cohort and large-cohort studies are needed, because they provide answers to fundamentally different research questions. If personal monitors are available and sufficiently accurate and if the cost of their maintenance does not require reductions in sample size that outweigh gains in statistical precision, they can be quite valuable even in large-cohort studies. Personal monitoring can verify and âanchorâ the aggregated measurements from central monitoring stations in the community. If serious discrepancies appear between personal-monitor data and central data, one can try to reconcile them and determine the causes and extent of misclassification errors. If this kind of work is done during a pilot study, it can also help to determine the sample size required in the definitive study. Deviation of individual exposure from aggregate exposure might be critical when effects are nonlinear. Consider, for example, a case in which the centrally
APPENDIX C 211 measured value is just below some threshold, but some persons have had exposures substantially higher or lower than the average. Personal monitoring of every subject is not required for these purposes; a subsample might be sufficient. Although the unnecessary aggregation of data is undesirable, aggregate data are in some ways the preferred type of epidemiologic data for prevention and âcommunity diagnosis.â Aggregate data on indexes of respiratory health allow us to compare the health of one community with that of another and to assess the community-wide impact of preventive strategies. Moreover, ambient air pollution is regulated on the basis of aggregate data on exposure. Consequently, epidemiologic studies that use such data are directly interpretable in relation to national air quality standards. SOURCES OF RANDOM VARIATION Epidemiologic studies of air pollution must be designed with an awareness that random variation can have many sources and that individual observations are often not fully independent. For example, a study of respiratory symptoms in 50 subjects in each of two cities recorded once a month for a year might be regarded as having a sample size of 1,200 (observations) or 100 (subjects) or 2 (cities). If results are to be used to characterize other cities that vary appreciably in important ways, the relevant sample size might be closer to 2 than to 100 or 1,200. When such considerations are apparent during the design of a study, the distribution of observations allowed by the study budget can be adjusted (e.g., 10 subjects in each of 10 cities, rather than 50 in each of two). There is generally no substitute for adequate sampling and study of each potentially relevant source of random variation, such as city. A common design in air pollution epidemiology involves a comparison of respiratory health in a âpollutedâ town with that in one or two less polluted or unpolluted towns. The pairs of towns compared are assumed to be representative of all similar pairs of towns with similar contrasts in pollution. That assumption is not necessarily valid, inasmuch as the next two or three pairs, selected at random from those with identical pollution
APPENDIX C 212 contrasts, might yield different associations between air pollution and health. The selection of towns or groups for comparison is thus a source of random variation distinct from the random variation related to selection of individual subjects. Small sample size for this group component of variation can increase the uncertainty attached to a study's results and limit generalization from them. For that reason, a recent large epidemiologic study of air pollution in France used aggregate exposure data for study subjects scattered throughout 28 towns or districts.3 SELECTION OF STUDY POPULATION AND ROLE OF SENSITIVE POPULATIONS Study subjects are ordinarily selected because they are exposed to air pollution of different magnitude or because they do or do not have some outcome of interest. If exposure is the basis for selection, exposure patterns in the population must be adequately characterized in advance; if effects are the basis, the quality of pertinent diagnostic information is an issue. Selection of study populations is an integral part of a research strategy and of study design, and strategic concerns sometimes dominate the selection process. The air pollution effect that is sought might be virtually undetectable in the presence of more powerful risk factors, which could make it appropriate to study individuals or groups that lack these factors. Examples of this approach include the study of young children to avoid confounding by smoking or by occupational exposures and the study of long-term residents to avoid confounding by residential mobility. Exclusion of (and thus, control for) confounding factors through subject selection eliminates the possibility of gaining information about interactions between these factors (e.g., smoking) and the ambient air pollutants under study. If air pollution and other factors, such as smoking, interact synergistically (i.e., multiply each other's effects), smokers might well be the group in which the most cases of pollution-related disease occur and therefore the group in which it is easiest and most important to detect it. The use of sensitive populations in a study can also be deliberate. Human sensitivity to deleterious factors
APPENDIX C 213 tends to vary widely, and our ability to detect effects when they are present is considerably sharpened when we can identify and focus on a population that has a steep dose-response curve. The response of highly susceptible people to specific magnitudes of air pollution is assumed to predict the response of average people to greater magnitudes. Asthmatics, bronchitics, children, the elderly, and subjects with cardiopulmonary diseases have long been considered to be sensitive to air pollutants. As Chapter 2 pointed out, the biologic nature of this presumed hypersusceptibility is not well understood, and much more information from all types of studies is needed for the characterization of sensitive populations. Investigators must know which sensitive populations are needed for which research questions and must have ways of determining whether a given person is a member of the population needed. Initial screening of a population might even be necessary to ensure that the desired sensitivity characteristic is present, particularly if the characteristic--such as atopy (a hereditary predisposition toward developing allergic reactions) or bronchial hyperreactivity--cannot easily be identified with routine clinical data.6 VOLUME OF DATA AND DATA REDUCTION It is characteristic of air pollution epidemiology that large amounts of data are produced. These data must be reduced to manageable proportions, and that requires attention early and often in each major phase of a study--design, performance, analysis, and reporting. Methods of data reduction are diverse and include selection of smaller subsets for some studies, making continuous data discrete (e.g., creating a dichotomy), averaging (e.g., of repeated measures across time), and statistical modeling (including rate-of-change models). The following are examples of data reduction in air pollution studies: â¢ Reduction of semicontinuous ozone data to peak exposures and time- weighted averages for each year of study (reduction could still leave several dozen ozone measurements).
APPENDIX C 214 â¢ Setting aside* of data on nonasthmatics (persons presumed to have low sensitivity to the agent in question)--i.e., reduction of their influence on the results to zero. â¢ Reduction of complex cigarette-smoking histories to numbers of pack-years. â¢ Reduction of asthma data to number of episodes per year. Profound and difficult issues must be addressed in deciding which data items to collect and, once they are collected, which to reduce (and how) and which to set aside. Inappropriate data reduction can negate the careful selection and matching of exposure, confounder, and effect variables. Those who make decisions about data reduction should be thoroughly familiar with the overall design and purposes of their study. Large volumes of data inevitably entail dilution of resources and attention, so data quality might be inversely related to data quantity. Even if the cost of collecting another item or two on each subject is small, the possible erosion of quality militates against the collection of any data that cannot be used to advance the main objectives of the study. Of course, changes in budgets and new research reports that arrive during the study period can dictate changes in the basic data to be collected. MULTIPLE ANALYSES Multiple tests and large data volume in air pollution studies also entail multiple statistical procedures and drawing of multiple conclusions. Two serious and related problems in data analysis and interpretation are multiple * We note a preference for the idea of setting aside data not used in a particular analysis, as opposed to discarding such data. Data are in fact rarely discarded (nor should they be) as long as there is still some thought that they might be useful for some other, future purpose.
APPENDIX C 215 comparisons and after-the-fact hypotheses. The problem of multiple comparisons is well recognized. When a large series of statistical tests of significance are performed, the probability that one or more of them will be outside the arbitrary level of statistical significance (usually < 0.05) will be larger than that arbitrary level itself--perhaps much larger. The p value represents the probability that a positive result could have been achieved by chance alone. A p value of 0.05 corresponds to a 5% chance of such a false-positive result; if one conducts 100 statistical tests on the data, about 5 positive results can be expected, regardless of the true biologic relationships involved. The multiple-comparisons problem is present whether one is testing the same hypotheses over and over or testing many hypotheses once each. The problem of multiple comparisons is compounded because the multiple tests might not all be reported in the same publication, and many might not be reported at all. Some might not even be formally and explicitly performed, if quick inspection shows that results are not likely to be âinteresting.â One might perform tests only if preliminary study shows a likelihood of âpositiveâ results. The problem of untested or unreported hypotheses, sometimes referred to as the âfile-drawer problem,â can plague the interpretation of air pollution studies. One who is attempting to evaluate a reported positive result of an epidemiologic study is usually unaware of whether the positive relationship stemmed from a hypothesis developed before or after preliminary analysis. Given the risks of false-positive results due to chance, a before-the-fact hypothesis that yields a positive result obviously carries more weight than one that appears after exhaustive review of the data. By specifying in advance the major hypotheses that they wish to test, investigators can greatly add to the confidence in their positive results. The purpose of the foregoing discussion is not to discourage the thorough exploration of large data sets, nor the reanalysis of old ones, but rather to point out to those concerned with air pollution epidemiology that multiple analyses of large data sets carry certain perils
APPENDIX C 216 to interpretation and that these perils can be mitigated during study planning. MULTIVARIATE MODELS Contemporary epidemiologic studies of air pollution usually take a multivariate approach to analysis, because several confounding variables need to be controlled for simultaneously. Great advances have been made in the last 20 years in the application of powerful multivariate statistical techniques to epidemiologic data. However, major problems can be encountered when these approaches are used, particularly with large volumes of data. There is sometimes a tendency to use these sophisticated techniques in an unsophisticated way--i.e., to compensate for inadequacies in the data. In general, a multivariate model can be no better than the data put into it; non-random error is likely to occur in air pollution studies and is especially resistant to statistical treatment. Collinearity--the correlation of important variables with each other--leads to other problems. The presence of large correlations between variables lessens the opportunity to attribute effects to any particular predictor variable, including exposure to specific air pollutants. Concentrations of specific pollutants tend to vary together. Loading of models with too many variables, especially ones that do not convey information not inherent in the others, can sometimes cause over- correction, or overadjustment, and keep a real effect from being detectable. This can be understood as a result of mincing the data into so many small stratified bits that the statistical power of the study is seriously diminished. Statistical techniques alone cannot compensate for limitations imposed by collinearity in an air pollution data set. Fortunately, new techniques can be used at least to determine the extent of the collinearity problem in a particular study. Multivariate models can be used in two distinct ways: to explain the relationships among exposure, effect, and confounding variables within the data set; and to extrapolate beyond the range of the data, to permit predictions for situations with different magnitudes or conditions of exposure. In the latter use, with no actual data for guidance, the results of extrapolation
APPENDIX C 217 depend on the nature of the chosen model. Even within a study, adoption of a biologically incorrect model can lead to false inferences about the presence or absence of an association. The choice of possible models is usually wide. Some, such as the multistep theory of carcinogenesis, are based on presumptions about biologic mechanisms of action; others, such as models that relate pollutant concentration to tissue dose, are concerned with toxicokinetics. Most model formulations have no biologic basis other than intuitive plausibility. The biologic uncertainty is usually so great that many epidemiologists are beginning to perform analyses on several models routinely. The differences in estimates of association or risk that are produced reflect an important source of uncertainty. Sensitivity analysis provides a new technique for estimating the sensitivity of a given epidemiologic result to model specification.8 Multivariate analyses of air pollution data have not always searched for effects due to interactions between variables, as opposed to effects due to single variables. In the simplest case, two pollutant measures, or one pollutant and one confounder, can produce an effect when present together, but no effect when present alone. Such interactions might be important in air pollution studies and should be tested through statistical modeling whenever possible. Goldsmith and co-workers have recently used path analysis, a more highly structured type of linear regression, to analyze data on pollution concentrations and daily emergency-room visits in Los Angeles.2 This technique requires the investigators to create models that specify the relations among variables in a more detailed and structured manner than conventional multivariate analysis. It could lead to improvements in separating the effects of specific exposure variables, particularly in ecologic studies. COMBINING RESULTS FROM INDEPENDENT STUDIES Epidemiologists have a special interest in the combination of results from independently conducted studies. Results of individual studies are not likely to be definitive on a particular research question and require interpretation within the context of similar work. In addition, several studies often show a negative result or a statistically nonsignificant association, but
APPENDIX C 218 by themselves lack the statistical power (usually owing to insufficient sample size) to detect small effects. Sometimes, multiple analyses produce inconsistent conclusions. An inconclusive result and a positive result are generally not difficult to reconcile, but two conclusive results in âoppositeâ directions can present serious problems--in policy and regulation, as well as in science. Epidemiologic studies, when they are based on observation of natural phenomena, are not repeatable in a strict sense. The data analyst must attempt to reconcile conflicting results by further study and usually more convoluted methods of analysis. Alternatively, there are forums--such as criteria documents, committee reports, and review articles--that allow for expert subjective judgment on the strengths and weaknesses of individual studies and the overall direction of the evidence. Environmental health researchers have become interested in formal techniques for combining evidence. Some are based on Bayesian principles of subjective probability and analyses of decision under uncertainty.14 Meta- analysis, a more formal procedure, treats results from groups of independently conducted studies in a statistical manner.10 13 Although meta-analysis is seductively simple, it contains serious perils when applied to most epidemiologic studies, and its quantitative nature can mask serious flaws in data. In essence, meta-analysis assumes that the results of studies can themselves be treated as random variables with predictable distributions. That assumption might be reasonable for experiments repeated under very similar conditions, but it is rarely so for epidemiologic studies, in which extraneous factors are harder to control and nonrandom errors dominate the random ones. Weighting studies according to such features as data quality can also be treacherous. Moreover, for meta-analysis to work, it must be assumed to include all or a representative sample of all studies on a particular question. The possible existence of an unknown number of multiple comparisons, in file drawers or in the minds of researchers, makes this a dubious assumption as well, although procedures have been developed to calculate the number of unretrieved studies that would be required to alter a combined-probability estimate.11 The pooling of observations from independent studies to increase sample size is questionable in environmental epidemiology, because it ignores the differences between studies
APPENDIX C 219 altogether. Numeric combination of results from different studies still has only a small role in epidemiology. INDIVIDUAL REGRESSIONS Logit, or individual regression analysis, a powerful new approach to the analysis of small-cohort (panel) data on air pollution and disease occurrence over time, uses a multiple logistic model to predict each subject's probability of having an asthmatic attack or some other adverse event in a given period. Previous methods, most commonly used in asthma studies, had relied on the panel attack rate as the outcome variable and had constructed models to predict this rate. The panel attack rate is defined as the number of panelists reporting an attack on a given day divided by the total number of panelists reporting. Interindividual differences in response to pollutants are such that this grouped outcome variable obscures the useful information embedded in the observations. Whittemore and Korn reanalyzed part of a previously collected data set and, on the basis of individual probabilities, improved the adjustment for confounders and increased the power of the data set for detecting small effects.15 It was also possible to adjust for temporal autocorrelation, in which the best predictor of an asthmatic attack on a given day is the occurrence of an attack on the previous day. Much of the power of this approach stems from each person's constituting an experiment over time while serving as his or her own control. The incorporation of individual, rather than grouped or aggregate, data into these types of studies, with refinements in the exposure and effect data themselves, will yield some promising avenues for research. It might be possible, given enough days of observation, to conduct highly sensitive epidemiologic studies on acute-effect questions with only a small sample of subjects. INFERENCE AND CAUSALITY Once questions have been formulated, the practice of epidemiology is a sequential process of assembling bits of information into a body of evidence and using inductive reasoning to draw inferences from it. A few points about
APPENDIX C 220 the nature of epidemiologic reasoning, particularly the inference of causality, must be understood. Epidemiologic reasoning has two fundamental characteristics: it deals with units of observation (rates of disease in populations) that are larger and less perceptible than events that are directly observable, such as cancer seen in a microscope, a spike on a chart recorder, or a color in a test tube; and it is normally based on the observation of natural events, rather than on the results of experimental manipulation. The first characteristic is not peculiar to epidemiologic inquiry. Most branches of science have progressed far beyond the point where two events must be directly observed if they are to be causally linked. Hume recognized 250 years ago that causality in science is, in a sense, always inferred: âWe are never able, in a single instance, to discover any power or necessary connection, any quality which binds the effect to the cause, and renders one the infallible consequence of the other.â5 Positive (or negative) findings in either observational or experimental research are therefore statistical associations, which can always be explained in any of three ways: they reflect causality, they reflect a noncausal relation, or they reflect chance. Epidemiologic reasoning does not stop at the demonstration of a statistical association, but goes on to âthe practical purpose of discovering (causal) relations which offer possibilities of disease prevention.â7 (Bias, confounding, and chance--the three main determinants of the truth, or validity, of a given study--have already been discussed.) Noncausal or spurious associations that emerge from epidemiologic studies must be distinguished from indirect associations. Indirect associations occur when a third factor is an intermediate or intervening variable in the causal chain between the primary two. Indirect associations are the rule, rather than the exception, in epidemiology (particularly air pollution epidemiology), because the exposure, as measured, is rarely the ultimate or direct cause of the adverse effect. Advances in knowledge tend to reveal successively deeper layers of causality, like the wooden Russian dolls that nest within one another. To take a hypothetical example, an association between adverse health effects and particles might
APPENDIX C 221 be found to be due to aerosol sulfates, then to hydrogen ions, and eventually to interactions between hydrogen ions and ozone. We must emphasize that an indirect association can be a basis for preventive action, as long as changing the indirect causal factor results in the desired change in the occurrence of disease. Over the last 25 years, epidemiologists have developed a set of guidelines for judging whether statistical associations derived from observational studies truly reflect causality. These guidelines, which apply equally to inference from a single study and to inference from a set of similar studies, are as follows: â¢ Strength (magnitude) of the association: The larger the calculated relative risk, the greater the likelihood that the observed association is causal. Relative risks of illness due to air pollution are likely to be small. â¢ Consistency (reproducibility): Causal inferences are strengthened when a variety of investigators have reproduced the findings under a variety of circumstances. â¢ Sequence of events: Inferences are strengthened when it is clear that exposure preceded illness, rather than vice versa. â¢ Biologic gradient of the association (dose-response relationship): If an exposure is causally related to an illness, the risk of developing the illness or its severity should be related, in a graded fashion, to the magnitude of exposure. Occasionally, a confounding variable may be so closely linked with exposure that they share the same dose-response relation to the illness and cannot easily be separated. â¢ Specificity of the association: Causality is more likely if a particular exposure is associated with only one illness and vice versa. This guideline rarely applies to air pollution research, in which all the diseases of major concern are multifactorial. â¢ Biologic plausibility of the association: An epidemiologic inference of causality is greatly strengthened when it conforms to knowledge concerning the biologic behavior of a toxin and its mechanism of
APPENDIX C 222 action. This evidence may be obtained from clinical research or toxicologic studies. The greatest possible assurance of a causal relationship can be obtained by finding that the rates of illness in a population change in a consistent manner after a change in exposure. Opportunities to observe such related changes sometimes occur naturally and in rare instances can be studied by experiment or intervention. Sometimes, however, too great a distinction is made between causal inference from observational studies and that from experimental studies--as though the evidence provided by the two were qualitatively different. The difference between the two lies in the greater ability of experimental research to control extraneous variables, and not in the inherent quality of the associations it finds. The final step in the inferential process in epidemiology requires the extension of a study's results to persons, populations, or settings not specifically included in the study. The confidence with which this is done for positive results is usually based implicitly on how successful the investigators have been in identifying and handling the factors that produce or influence the pollution-effect association they have observed, including sampling variation. Whether and how the conditions studied are special can then be judged. Inferring from negative studies must be a good deal more cautious, because, as it is sometimes said, it takes a nearly infinitely large sample to conclude that a particular association cannot be found somewhere, under some set of circumstances.4 REFERENCES 1. Bates, D.V., and B. Sizto. Relationship between air pollutant levels and hospital admissions in southern Ontario. Can. J. Pub. Health 74:117-122, 1983. 2. Goldsmith, J.R., H.L. Griffith, R. Detels, S. Beeser, and L. Neumann. Emergency room admissions, meteorologic variables, and air pollutants: A path analysis. Am. J. Epidemiol. 118:759-778, 1983.
APPENDIX C 223 3. Groupe Cooperatif PAARC. Air pollution and chronic or repeated respiratory diseases: II. Results and discussion. Bull. Eur. Physio-pathol. Respir. 18:101-116, 1982. 4. Hernberg, S. Evaluation of epidemiologic studies in assessing the long-term effects of occupational noxious agents. Scand. J. Work Environ. Health 6:163-169, 1980. 5. Hume, D. Treatise on Human Nature. Selby-Bigge, Ed. Oxford: Clarendon, 1896. 709 pp. 6. Jones, R.N., B.T. Butcher, Y.Y. Hammad, J.E. Diem, H.W. Glindmeyer, III, S.B. Lehrer, J.M. Hughes, and H. Weill. Interaction of atopy and exposure to cotton dust in the bronchoconstrictor response. Br. J. Ind. Med. 37:141-146, 1980. 7. MacMahon, B., and T.F. Pugh. Epidemiology: Principles and Methods. Boston, Mass.: Little, Brown Publishing Co., 1970. 376 pp. 8. Osborn, J.F., and P. Armitage. Statistical Methods in Medical Research. New York: Blackwell Scientific Publications, 1979. 154 pp. 9. Pengelly, L.D., A.T. Kerigan, C.H. Goldsmith, and E.M. Inman. The Hamilton study: Distribution of factors confounding the relationship between air quality and respiratory health. J. Air Pollut. Control Assoc. 34:1039-1043, 1984. 10. Rosenthal, R. Combining results of independent studies. Psychol. Bull. 85:185-193, 1978. 11. Rosenthal, R. The âfile drawer problemâ and tolerance for null results. Psychol. Bull. 3:638-641, 1979. 12. Sackett, D.L. Bias in analytical research. J. Chron. Dis. 32:51-63, 1979. 13. Strube, M.J., and D.P. Hartmann. A critical appraisal of meta-analysis. Br. J. Clin. Psychol. 21:129-139, 1982.
APPENDIX C 224 14. Thomas, D., J. Siemiatycki, R. Dewar, J. Robins, M. Goldberg, and B. Armstrong. The problem of multiple inference in studies designed to generate hypotheses. Am. J. Epidemiol. 120:503, 1984. 15. Whittemore, A.S., and E.L. Korn. Asthma and air pollution in the Los Angeles area. Am. J. Pub. Health 70:687-696, 1980.