often, these data are available only at the scale of the municipality or the county. Researchers need data for each individual with residential addresses or specific residence locations for analysis of disease incidence patterns and assessment of proximity to possible sources of exposure. With the advent of geographic information systems, such analyses are becoming increasingly easy to conduct (Guthe et al., 1992; Rushton et al., 1995, 1996; Wartenberg, D. 1992; Wartenberg et al., 1993; Wartenberg, D. 1994), although in most places such data are not available or of insufficient accuracy for meaningful analysis. In addition, it would be useful to have other data, such as cancer incidence and birth defects data, and data on less severe outcomes that might be affected by air pollution, such as asthma incidence and the results of pulmonary function tests. Finally, data on confounding variables, such as behavioral and lifestyle characteristics, would facilitate more rigorous evaluations. There exist several preepidemiologic methods that allow for adjustment by confounders, if such data are available. These adjustments may reveal hidden associations or explain associations that had erroneously been attributed to exposure to environmental contaminants. If such data are not made available, substantial resources will be needed to undertake investigations.
Third, there is a need for a more systematic evaluation of the statistical properties of the preepidemiologic methods. Investigators need to know what method to use when and what they can and cannot detect. The statistical power studies conducted to date do not provide that type of systematic evaluation. This evaluation could be facilitated by developing a protocol for such comparisons, including the specific data sets, hypothetical and real, that should be used and the test results that should be reported. Then, data or computer programs should be made available to the scientific community for more comprehensive testing of existing methods and for development and testing of new methods. Finally, by compiling the results, one may begin to understand how to use these tools and what interpretations to draw from the results.
Fourth, one must consider the trade-offs involved in the interpretation of statistical significance in preepidemiologic studies. Researchers have argued in different contexts that traditional significance testing is both too liberal and too conservative. There is a need to look at the type of problems for which these methods are used (e.g., identification of new etiologies, identification of specific exposures or excess incidence of disease, replication of observed historical excess incidences of disease, and development of public policy) and to develop guidelines for interpretation for each use. Simply applying the rule that statistical significance is a p value of <0.05, as is currently practiced, does not adequately address the disparity of needs and the variation in the severity of the consequences of false-positives and false-negative results. In epidemiology, results identifying a new etiology become credible only after substantial replication, regardless of the p value.
Failure to identify an exposure that is causing an excess incidence of disease (a false-negative result) will likely lead to additional cases of disease, possibly a high cost to society, while false confirmation of a hazard (a false-positive