Read "Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology" at NAP.edu

Page 94 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

5
Data Systems and Opportunities for Advances

This chapter describes some existing data systems that permit the characterization of personal exposure and health status. Given the scarcity of resources for studies in environmental epidemiology, researchers need to make the best use of existing data. It is beyond the scope of this chapter to cover all the pertinent data systems or to describe systems in detail. Rather, the focus is on classes of data-collection systems, some of the major systems in each class and their important features, and their use. The emphasis will be on data systems that are publicly available (often from the federal government). For a more-comprehensive list of federal data systems related to environmental exposure, see EPA et al. (1992); for other discussions of state and local data systems see, for instance, Health Officers Association of California (1986), National Governors' Association (1989), Frisch et al. (1990), and Sexton et al. (1992, 1994). There is a need for greater dissemination of the knowledge of the existence and availability of federal, state, and local systems. Many of these are limited in size, coverage, end points, completeness, or accuracy, but where they meet the investigator's needs, they can save much time and expense. A geographic information system can be very useful in investigation by providing an organizing framework for data on exposure and outcomes.

Introduction

The interest of the American public in environmental pollution seems to be driven primarily by concerns about health. People ask, ''Have we

Page 95 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

been exposed?" "Have we been affected?" "Will we be affected later?" They might well ask, also, "Do our management programs have any effect on the health of the public?" Table 5-1 outlines some epidemiologic research strategies that address these concerns. It shows that many types of epidemiologic studies and data can be used to determine the relation between the environment and human health. Although experimental studies of animals and laboratory studies of humans do provide some answers to these questions, epidemiologic research is essential to their resolution. Often, however, epidemiologic studies are neither available nor possible, and policy must be based on toxicologic evidence and animal studies.

Considerations of cost, urgency, and limited special expertise often require that officials rely on analyses of existing data that were gathered for other purposes. Epidemiologic studies of the classical kind involve the measurement of both the health status and the environmental exposure (or internal dose) of the persons being studied. However, such measurements cannot always be obtained. For instance, if historical exposures were not measured, the investigator may have to estimate them from other, less reliable, information. On the other hand, the exposure might be so extensive that no suitable control population remains. Individualized measures of exposure and health can also be infeasible or too expensive when a health effect occurs so infrequently that adequate study would require that a large number of exposed persons be evaluated in detail; such problems require other epidemiologic methods or the use of secondary data.

During development and implementation of public-health policy, analyses of secondary data are important at several stages. Intense study of selected small groups of people can provide useful information about risk that identifies a need for public policy. To determine the extent of potential exposure, the size and characteristics of the population exposed, or the background frequency of the health effect of interest, secondary data analyses are useful. Information from existing data systems is useful in program planning and development when data are needed to validate findings of earlier targeted research studies.

During implementation, data from existing systems can provide additional insights for public-health policy. Existing data systems tend to reflect the programmatic and regulatory structure of government programs, so the identification of useful systems (or of their absence) might help to define the most appropriate needs for assessment and availability of the public-health response. This in turn allows for midcourse changes to reduce costs, improve response, or otherwise improve on-going programs.

Data systems are a primary mechanism for evaluating the impact of a public-health policy. For instance, a public-health program might target

Page 96 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

TABLE 5-1 Issues of Major Concern to the Public and Methodologic Responses

	Methodologic Responses
	Exposure Assessment	Applied (Response) Epidemiology^a	Epidemiologic Study^b
Are we exposed?	X
Are we affected now?		X
Did exposure cause a health effect?	X	X	X
Will we be affected later	X		X
Did we improve health with a program initiative?			X
^a Applied, or response, epidemiology refers to studies designed as a quick response to concerns expressed by a group of individuals regarding the potential for exposure or health effects. These are the basis of much of the "gray" literature and many of the studies performed by public-health agencies. ^b Epidemiologic studies include classical case-control and cohort studies of targeted populations, in contrast with studies of the general population.

ozone because of its effects on several pulmonary health end points. However, the success of the program might be evaluated solely from the ambient concentrations of ozone in a polluted area. To evaluate the impact on public health, it is important to know the relation between the observed ambient concentrations of ozone and the frequency of various pulmonary health end points. Modification of public-health policies depends on knowledge of such relations, identified largely through analyses of secondary data.

Data-Collection Systems: What They Measure

Evaluation of the relation between an environmental pollutant and human health requires data to characterize exposures to the pollutant, including concentrations in the environment, the probability and characteristics of human exposure, and the distributions of internal doses, as well as trends or differences in the health status of exposed people. Determination of risk-management alternatives requires, in addition, information on the sources and distribution of the pollutant. Data systems may address each of these needs. However, they have not necessarily been established with the goal of integration with other classes of data, and

Page 97 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

	Registries and Surveillance	Reference Surveys for Exposure	Reference Surveys for Health Effect	Risk Assessment
Are we exposed?		X
Are we affected now?	X		X
Did exposure cause a health effect?	X	X	X	X
Will we be affected later	X	X		X
Did we improve health with a program initiative?		X

most data-collection systems collect only one kind of data or data on one aspect of the general problem.

One distinguishing characteristic of a data system is how and when the responding units are sampled. For example, persons may be selected at random from a defined population but tested at a fixed time (8:00 am every day) or once at a haphazard time (when the laboratory is not otherwise busy). Some surveys are designed to obtain probability samples that accurately represent a reference group, such as a population or an occupational setting, but others obtain samples by convenience, such as collections of information from participating states or hospitals. A short-term survey may not be representative across time. A survey system might select sampling units that are characteristic or representative of larger reference groups. Characteristic sampling is based on selection from a list of strata; representative sampling is based on the distribution of strata in the population. For instance, in selecting monitoring sites, one might decide that several important types of environments should be evaluated. Monitoring sites can be selected to characterize those types of environments, as in the stratified sampling of air in urban areas and rural areas. Alternatively, one might select monitoring sites on the basis of a stratified probability-sampling scheme to yield data that are representative of the distribution of environments. Monitoring is expensive, and decisions about where to put monitors are generally considered carefully, but the decisions may not be optimal for a specific environmental-epidemiology study. For example, if budgets allow for only a few monitors to measure some chemical, should they be placed to obtain the most-representative

Page 98 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

geographic coverage? In or near population centers? Where prior information suggests the levels are highest? Other?

A data-collection system can be either a compendium or a systematic survey. That is, it can consist of individual studies with similar but separate research designs and measurements, or it can collect data from many sources in a standardized fashion. Neither system is necessarily identical between study years or cycles. That is, the pollutant or health effect assessed by a systematic survey and how it is assessed may vary from time to time, from place to place, or in other ways.

Data systems with the characteristics mentioned above are useful for evaluating the relation between environment and health. The usefulness of any data system is limited by its characteristics, so it is important to understand the sampling and assessment characteristics of each data source before using it. (See the discussion below on bridging environmental and health issues.)

Source of Pollutant

The development of systems to collect information about discharges of pollutants (apart from occupational exposures) is a primary objective of the Environmental Protection Agency (EPA) (table 5-2). Its role as the principal environmental-risk management agency in the federal government requires data on the relative contributions of sources and on control options. The primary objective of EPA's data systems is to provide information pertinent to regulation, so they are designed to be comprehensive with regard to polluters and pollutants that have been identified as toxic. Many pollution-related data systems have emphasized the characterization of pollutant sources, rather than the distribution and fate of pollutants in the environment or the potential exposures of humans. Representative data (as opposed to comprehensive data) have little utility in assessing compliance with regulation of individual pollution sources, though such data can be useful in assessing needs for and monitoring the success of management programs.

Other data-collection systems characterize the amounts of a pollutant at its source. These include production volumes and emission inventories. These systems, too, are not directly concerned with the fate of pollutants in the environment. Data systems that contain location- and time-specific information can be used in analytic models to estimate the transport and fate of pollutants in the environment. However, few data systems contain both time-integrated information (for instance, yearly, periodic, or daily data on emissions) and geographic information (for instance, production volume at a worksite).

Page 99 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Pollutant Concentrations In the Environment

The locations covered by most pollutant-concentration data systems are chosen to be characteristic rather than representative (table 5-3 ). Thus, most of the National Air Monitoring Stations or water-system quality sites of the National Stream Quality Accounting Network are in densely populated areas. These data systems contain detailed information on the location of the monitoring site, and samples are collected frequently enough to represent short periods. However, site selection is not based on detailed information about the population, the area, or the distribution of exposures among individuals, and the positioning of a station does not necessarily reflect the most likely route of human exposure. For instance, some air-monitoring stations are on the tops of buildings, and water-quality assessments are performed at the outflow pipes of water-treatment facilities, not at residential taps. Those locations might yield informative data on relative exposures, but may not represent either the distribution of concentrations in the environment or the actual exposures of people.

Pollutant-concentration data systems are probably underused for ecologic studies. These systems contain detailed geographic data, and, although few pollutants may be assessed, the analytic methods tend to be relatively stable over time, and exposure is generally measured at or integrated over short intervals.

Although most data on pollutant concentration are from monitoring systems, data from "response epidemiologic studies" are increasing. Response (or applied) epidemiologic studies are designed to respond quickly to expressed concerns regarding the potential for exposure or adverse health effects. Examples of response epidemiologic programs are the health-assessment studies of the Agency for Toxic Substances and Disease Registry (ATSDR) and the health-hazard evaluations of the National Institute for Occupational Safety and Health (NIOSH). In these studies, environmental concentrations of various pollutants are regularly assessed. However, study sites are often selected because the potential exposure is considered high or because of complaints about symptoms, so sites are not characteristic of ordinary population exposures. These studies do, however, attempt to characterize explicitly the scope or potential for human exposure in these presumably extreme settings, and they may contribute information on the relation between the environmental distribution of pollutants and human exposure or internal dose.

Human Exposure

Data on human exposure (table 5-4) are the least developed of the classes considered here, and generalizations to larger groups of people or

Page 100 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

TABLE 5-2 Data-Collection Systems: Source of Pollutant

Data-System Name	Description
Production Volume
Inventories Synthetic organic chemicals	Annual data on production and sales of synthetic organic chemicals produced in the United States
Site Inventories
National pollutant discharge elimination system	Permits for worksites that specify effluent concentration limits, monitoring, and reporting requirements
National Priorities List	List of the toxic-waste sites determined to be of immediate concern for remediation
Emission Inventories
Toxic chemical release inventory	Annual estimates of releases from manufacturing facilities of minimal size and volume of chemicals per year
Integrated database	Information on spent fuel and radioactive-waste inventories for nuclear reactors, storage facilities, and mine tailings, among others
Sales Volumes
Agricultural chemical use	Database of information on sales for agricultural purposes of fertilizers and pesticides, among others

particularizations to specific exposure situations are often difficult and uncertain. Much detail is required to make this class of data useful, but few detailed data systems have been developed. Detailed information on human exposures generally requires the use of personal monitors or structured activity questionnaires, but these tools are expensive and time-consuming. Thus, most systems contain information on small populations chosen to be characteristic, but not necessarily representative, of the target population. However, systems that exist generally have substantial extent and detail over periods as long as several years.

More data of this class could be gathered by brief structured activity questionnaires in large surveys. Brief questionnaires might provide less-

Page 101 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Data-System Name	Primary Objective	Coverage/Sample + Design	Linking Data
Production Volume Inventories
Synthetic organic chemicals	Monitoring	National totals: comprehensive	None
Site Inventories
National pollutant-discharge elimination system	Regulatory	National: comprehensive	Detailed geographic codes, river reach no., pollutant limits
National Priorities List	Regulatory	National: based on reports from regions, comprehensive	Detailed geographic codes, environmental concentrations
Emission Inventories
Toxic chemical release inventory	Informational	National: comprehensive for defined worksites	Detailed geographic codes
Integrated database	Informational	National: comprehensive for defined sites	Facility name
Sales Volumes
Agricultural chemical use	Monitoring	National: characteristic farm sample	None

detailed information on human exposure patterns than personal monitors, but for a fixed total budget they can yield data on greater numbers of people. A combination of brief questionnaires for large numbers of people with validation and characterization of a subset using personal monitors might even be more useful.

Internal Dose

Like information on human exposure, information on internal dose is rarely collected systematically (table 5-5). Occasional studies of biologic markers of specific agents in small, defined populations are plentiful, but

Page 102 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

TABLE 5-3 Data-Collection Systems: Environmental Concentrations

Data-System Name	Description
Montitoring Systems
Aerometric Information Retrieval System	Ambient concentrations, emissions, and compliance data for airborne criteria pollutants
Microbiology and residue computer information system	Contaminant data from samples of meat and poultry at slaughtering establishments and from import shipments
Regulatory Systems
Permit-compliance system	Information for tracking the permit, compliance, and enforcement of permittees under the Clean Water Act
Response Epidemiologic Studies
Health assessments (ATSDR)	ATSDR assessments to identify potential health concerns among populations living near National Priority List sites
Microenvironment Settings
Indoor air study	A pilot project to assess contaminants in indoor air

broad and systematic collections of data on biologic markers in the general population are few, and surveys have yielded little information with which to characterize the subjects' exposures. Direct measures of internal dose are not usually included in health-assessment studies (conducted by ATSDR) or health-hazard evaluations (conducted by NIOSH), but these sources could be modified to include internal-dose assessments.

ATSDR conducts public-health assessments to determine where, and for whom, public-health actions should be undertaken (ATSDR, 1992). Each assessment characterizes the nature and extent of hazards and identifies communities where public-health actions are needed. However, the assessment is largely or entirely a compilation and analysis of existing data, which rarely include internal doses of toxicants in the population of concern. The health-assessment format does not require the collection of

Page 103 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Data-System Name	Primary Objective	Coverage/Sample + Design	Linking Data
Montitoring Systems
Aerometric Information Retrieval System	Monitoring	National: air monitoring stations in urban areas	Detailed geographic codes/point-source identifiers
Microbiology and residue computer information system	Monitoring	National: Random sampling of meat products	No information on distribution of food
Regulatory Systems
Permit-compliance system	Regulatory, monitoring	National: comprehensive coverage of permittees	Detailed geographic codes, linked to Reach Pollutant Assessment System
Response Epidemiologic Studies
Health assessments (ATSDR)	Regulatory	National: all National Priority List sites	Detailed geographic codes, linked to environmental concentration data
Microenvironment Settings
Indoor air study	Research	Selected sites: not sampled to be representative	None

new data, for at least 2 reasons. First, the objective of the ATSDR health-assessment study is to determine whether there is a potential for human health effects, not to determine the extent or magnitude of actual exposure. Second, many internal-dose assessments are invasive; this decreases participation rates and increases opportunities for bias. However, when a health assessment indicates a potentially significant risk to human health, ATSDR is obliged under the 1986 Superfund Amendments and Reauthorization Act (SARA) to consider a registry as a followup (ATSDR, 1988a), and registrants may be invited to participate in biologic testing for markers of exposure or effect (NRC, 1989).

There is also a need for studies that characterize a population with a well-defined sampling scheme. The National Health and Nutrition Examination Survey (NHANES), conducted by the National Center for

Page 104 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

TABLE 5-4 Data-Collection Systems: Human Exposure

Data-System Name	Description
Time-Activity Patterns and Personal Monitoring
Total-exposure-assessment methodology	Goals to develop methods to measure individual total exposure to toxic and carcinogenic chemicals
Surveys
National Occupational Exposure Survey	Information on the probability of exposure to various chemicals based on job title
Registries
National Exposure Registry	Identification of individuals with verified exposure to selected chemical, with followup studies to be performed on individuals in registry

Health Statistics, studies about 30,000 persons in the US population, chosen by random sampling (clustered, stratified, with deliberate over-sampling of some subgroups). For study of general contaminants, such as lead or petrochemical oxidants, NHANES has been used as a data source. Given the followup capabilities of NHANES, detailed exposure data could be collected in subgroups of the entire sample that are identified as having received internal doses of particular interest. However, when the probability of exposure is small, the actual number of participants who could be studied to characterize the specific exposure would be small, possibly zero, and NHANES might not be sensitive enough. Specially designed surveys could be considered to characterize specific population exposures.

Health Status

Most health-status information systems are not developed for the primary purpose of studying environmental health (table 5-6). Vital records are collected for legal reasons, hospital-discharge and cost information [e.g., Medicare provider analysis and review (MEDPAR)] is collected for economic or administrative reasons, and the National Health Interview

Page 105 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Data-System Name	Primary Objective	Coverage/Sample + Design	Linking Data
Time-Activity Patterns and Personal Monitoring
Total-exposure-assessment methodology	Research	Selected sites: characteristic of urban populations	None
Surveys
National Occupational Exposure Survey	Research	National: characteristic of selected industries	Job-title codes
Registries
National Exposure Registry	Research	National: selection based on reports, not probability sampling	Job titles, personal histories, residential geographic codes

Survey (NHIS) and NHANES are conducted for general US population health-monitoring reasons. Several other registries and surveillance systems are maintained to identify important risk factors, including environmental exposures, but in a general reporting system the amount of information that can be collected for each exposure of interest is limited. The information in such programs as the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute, which collects cancer incidence and survival data from approximately 10% of the US population, and the Birth Defects Monitoring Program (BDMP) of the National Center for Environmental Health is gathered by abstractors trained in coding and abstracting from hospital records. The National Exposure Registry being developed by ATSDR will be an exception to the absence of environmentally focused health-status data systems (see ''ATSDR Exposure Registries" below).

The range and quality of exposure data from surveys could sometimes be expanded by data obtained as an addition to routine followup but at additional cost. NHANES, NHIS, SEER, and the US national vital-statistics system all have followup capabilities. Some kinds of information collected in a followup survey could be only qualitative, such as whether a person was exposed or not, and others will be subject to problems of

Page 106 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

TABLE 5-5 Data-Collection Systems: Internal Dose

Data-System Name	Description
National Health and Nutrition Examination Survey	An examination survey of a probability sample of the US population, with some toxic-chemical concentrations measured in blood samples
National Human Adipose Tissue Survey	Concentrations of various toxic chemicals in adipose-tissue samples collected from autopsied cadavers and surgical patients

poor memory and recall bias. Because of the mobile nature of the US population, collection of blood samples or examination data in a followup survey would require a mobile unit that would be used to assess only a few people in each location or would require transporting subjects to a central location. Tissue samples would be even more difficult to collect. These problems are magnified as the population size, geographic range, and length of followup period increase. Collecting good information is expensive.

Comprehensive collections of data on vital events, especially birth and death, are highly accurate and nearly complete because of legal requirements to document these events. Collecting data to adequate quality standards is harder for health characteristics that change from year to year or even from day to day (such as diseases, symptoms, and use of health services) or that are thought not severe enough to warrant professional care (such as symptoms or non-life-threatening diseases). For these, accurate and comprehensive surveillance of the US population is impossible. Many of the large federal surveys collect data on representative, rather than characteristic, samples of the US population, so even large geographic areas, such as states, may not provide reliable estimates of health status. Some systems, such as the Behavioral Risk Factor Surveillance System (of the National Center for Chronic Disease Prevention and Health Promotion), could be used to collect exposure and health-status information at the state level, but are limited to information that could be reliably solicited by the telephone interview method used.

Untapped resources that could provide systematic information on the health status of populations include routine medical examinations and school test-performance scores. However, when these data resources are not collected under common standards—as, for instance, a routine medical examination might be—the consistency of data may be poor.

Page 107 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Data-System Name	Primary Objective	Coverage/Sample + Design	Linking Data
National Health and Nutrition Examination Survey	Monitoring	National: representative of civilian, noninstitutionalized	Residential geographic codes, personal histories, current job
National Human Adipose Tissue Survey	Monitoring	National: Characteristic of urban populations	Geographic codes

Bridging Environment and Health

As noted, few data systems include information on both exposure and health. Even in those with both—such as NHANES, BDMP, or SEER—the focus is generally on health, and information on exposures is minimal. Health-status data systems may have self-reported information on only a few items related to possible exposures, such as occupation. Of the 3 surveys mentioned here, only NHANES has information on biologic markers based on blood or urine samples.

Undimensional Studies

Maps of cancer mortality have been used to infer environmental or occupational exposures (Fraumeni, 1987; Pickle et al., 1987) and to identify populations at high risk for specific exposures (Rothenberg et al., 1990; CDC, 1990). Clusters or trends of diseases have been used to identify populations that might have had unknown toxic exposures (Deane et al., 1989; Edmonds and James, 1990; Anto et al., 1989) and that could then be studied more intensively. Conversely, emissions data from the Toxic Release Inventory, data on ambient concentrations of various pollutants (EPA, 1991), and lists of toxic-waste sites (Commission for Racial Justice, United Church of Christ, 1987) have been used to identify populations of concern for high exposures. In those descriptive studies, information on population exposure is usually inferential and based on proximity to a source of pollution.

Because chance alone will create clusters (Rothman, 1990; Wagener, 1990), the observation of a cluster—even one that is quite striking—often does not signal an increased probability of illness because of some risk factor. Rather, the cluster is a rare, but predictable, event in a population

Page 108 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

TABLE 5-6 Data-Collection Systems: Health Status

Data-System Name	Description
Vital Records
National Vital Statistics Program	Data from birth, death, and marriage records
National Surveys
National Health and Nutrition Examination Survey	Survey of a probability sample of the US population that includes interview, examination, and physiologic testing
National Health Interview Survey	Interview survey of a probability sample of the US population that includes rotating special topics, including knowledge of risk factors, such as radon, and occupational chemical exposures
Surveillance Systems
Birth Defects Monitoring Program	Information sent by participating hospitals on birth defects diagnosed and recorded in the newborn period
Surveillance, Epidemiologic, and End Results Program	Demographic and diagnostic information on patients identified as having some form of cancer
Response Epidemiologic Studies
Epidemiologic investigations	Centers for Disease Control and ATSDR; these studies in response to public concerns of potential exposures and health effects often include observational data on health status

that is not experiencing a change in the probability of an illness. The 1-in-1,000 event will occur, by definition, one time in 1,000, and if many thousands of clusters could be defined (by time interval, geographic location, specific health end point, etc.), then the observation of even multiple clusters may have little general meaning. Because of the great number of ways a population and an outcome can be subdivided, one is nearly cer-

Page 109 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Data-System Name	Primary Objective	Coverage/Sample + Design	Linking Data
Vital Records
National Vital Statistics Program	Monitoring	National: comprehensive	Residential geographic codes, job title
National Surveys
National Health and Nutrition Examination Survey	Monitoring	National: represents noninstitutionalized population	Residential geographic codes, personal history, job title
National Health Interview Survey	Monitoring	National: represents noninstitutionalized population	Residential geographic codes, job title
Surveillance Systems
Birth Defects Monitoring Program	Monitoring	National: covers only participating hospitals	None
Surveillance, Epidemiologic, and End Results Program	Monitoring	Participating geographic areas	Broad geographic codes; fine detail available in special studies
Response Epidemiologic Studies
Epidemiologic investigations	Health-hazard detection	National: compendium, based on reports	None

tain to find that some disease is more common in some segment than in others, with a low p value.

Linking Data Systems

Linking several data systems can improve information on both exposure and health.

Page 110 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Studies with Information on Environmental Concentrations, Health Status, and Nonenvironmental Risk Factors:

An environmental exposure is rarely the only potentially important determinant of risk, so a useful study must almost always include data on various other risk factors, such as certain behaviors, as well as the exposure and health status of the subjects. Such studies would involve linking data that could be used to calculate potential exposure to an environmental factor with subject-specific information on health status and behavioral risk factors. The following studies illustrate methods that could be used more widely.

Schwartz (1989) used health data from NHANES II and air-pollution measurements from the Storage and Retrieval of Aerometric Data (SAROAD) system, now referred to as the Aerometric Information Retrieval System, to evaluate the relation between lung function and chronic air pollution. Data on individual subjects were paired with air-pollution measurements based on the census tracts of the subjects' residences and on the locations of monitoring stations within 10 miles. Lung function varies with sex, age, and body size, estimated in this case by height and body-mass index. Smoking and respiratory conditions also have major effects on lung function. Information on all these risk factors was used in a multivariate analysis.

Ostro and Rothschild (1989) linked self-reported information on acute respiratory infections assessed in the NHIS to 2-week average air-pollution data based on metropolitan statistical areas from SAROAD. Information on other variables such as age, sex, race, smoking status, and chronic conditions was incorporated.

While outdoor monitoring provides only imprecise measures of personal exposure to air pollution, important relations may be identified if data from the monitoring stations are sufficiently correlated with exposure of the subjects. Relations might then be studied in greater detail in special studies. Broad correlations also provide indications of the impact of changes in the environment, as measured at monitoring stations, on changes in health status.

Studies with Information on Environmental Concentrations and Health Status, but Not Other Risk Factors:

Many studies that have examined correlations between measurements and health assessments have failed to adjust for other risk factors, environmental or otherwise, often because relevant data do not exist.

Linkage with mortality data is common and generally straightforward because of the virtually complete ascertainment of deaths and the use of

Page 111 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

highly developed, standardized coding systems. Air-pollution data have been linked to mortality data on a local basis in numerous cross-sectional studies (Boyd, 1960; Buck and Wicken, 1967; Glasser and Greenburg, 1971; Lave and Seskin, 1973). Time-series analyses may require control for other variables, such as seasonality and autocorrelation (Ozkaynak et al., 1986; Mazumdar et al., 1982; Schwartz and Marcus, 1990). These approaches are ecologic; that is, a measure of the distribution of pollutant concentrations in an area is correlated with a measure of the distribution of health status in the area such as death rates.

An alternative is to study health measures in cohorts of known exposure status, such as mortality in occupational cohorts (Fraser et al., 1982; Wingren et al., 1991). The subjects can be persons at risk of exposure to toxic materials because they live near toxic-waste sites or for other reasons, e.g., inclusion in the National Exposure Registry of ATSDR.

Other assessments of the health of a population are based on hospital admissions, emergency-room visits, and calls for ambulances. Pope (1991) used regression methods to study the association between hospital admissions for respiratory conditions and measurements of PM₁₀ in local areas, including control for temperatures based on month of admission. Although characteristics of individual subjects were not available, Pope noted strong associations between indicators of respiratory health, particulate pollution, and the operation of a nearby steel mill.

Studies with Information on Health Status and Nonconcentration Measures of Environmental Status:

The studies discussed above used ambient-concentration data from monitoring stations mainly as a surrogate for the probability that personal exposures were sufficient to affect health, but models can use other sources of environmental data. For instance, Frank et al. (1986) used the NHIS to evaluate the relation between chronic cardiovascular illness and exposure to carbon monoxide in the workplace. Information on occupational exposure from the National Occupational Hazards Survey was used to estimate the probability that people in specified jobs were exposed to carbon monoxide. Thus, the exposure data did not include direct measures. Information for individual subjects was linked to exposure probabilities on the basis of current occupation and humidity, based on county of residence. Additional risk factors evaluated in multivariate analyses included age, obesity, sex, demographic variables, and smoking status. Because information on health status is predicated on a subject's reporting that a medical professional had diagnosed a condition, Frank et al. incorporated economic variables—such as availability of health care and ability to afford health care—into the analyses.

Page 112 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

The dichotomous classification of persons as potentially exposed or not is occasionally informative, as in the case of occupational exposures. However, without information on the extent of exposure, any health effects are likely to be underestimated. For instance, Lynch et al. (1989) demonstrated that misclassification of chlorination exposure could obscure a possible relation between exposure and risk of urinary-bladder cancer.

Other ambient measures of environmental status can also be informative. During NHANES II, Mahaffey et al. (1982) determined blood lead concentrations. Further analyses indicated that average exposure levels changed during the course of the study (1976 through 1980). Additional analyses quantified the relation between various sources of lead and changes in blood lead concentrations, especially in children (EPA, 1985; ATSDR, 1988b). Exposures to leaded paint, dietary lead, and leaded gasoline were considered, and only the change in total lead used in gasoline production was correlated with the change in blood lead (Annest et al., 1983). The correlation was particularly strong among the youngest children.

Monitoring of Environmental Health Effects

Monitoring is the continuing and systematic collection, analysis, and interpretation of data. If monitoring is linked to specific programs designed to prevent or control health outcomes, the activity is better termed surveillance. This section focuses on monitoring of health status.

Many of the causes of chronic diseases are unknown, though it is clear that multiple factors affect many specific health end points, including workplace, diet, place of residence, recreational activities, and ancestry. Sorting out the relative importance of causes of disease in humans remains daunting, given the multitude of exposures and uncontrollable factors that can affect health. It is suspected that some noncommunicable diseases (e.g., some cancers and diabetes) are increasing in frequency because of unknown factors in the environment. It is desirable, therefore, to monitor the occurrence of such diseases to provide a kind of early-warning system that would enable causes to be identified more readily than in the past.

Such monitoring systems must be designed in light of the problems of determining the contribution of environmental factors when some, but not all, causes are known. The problems include

Difficulties in exposure identification and estimation.
Followup and latency (time from exposure to the appearance of disease).
Long-term nature of chronic diseases and the repeated use of health-care resources, such as laboratories and hospitals.

Page 113 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Size of the affected population.
Variable and imprecise symptomatology of some conditions.

To address those problems, monitoring systems have to be

Large, i.e., cover a substantial population.
Of long duration.
Capable of producing information that can be combined with similar data systems, so as to increase sample sizes; this requires the collection of data in a standardized fashion.
Capable of being linked to other data sources, e.g., exposure data, which would require personal identifiers and provisions for confidentiality.

A set of sentinel health events or health-status indicators needs to be identified for use in exposure, internal-dose, and health-status data systems. Lists for this purpose have usually focused on disease or syndrome end points (Rothwell et al., 1991; DHHS, 1991). For each health end point to be monitored, it is necessary to establish precise case definitions and gather baseline information on incidence and prevalence. The problems of definition and baseline determination are more complex if symptoms, rather than diagnosable disease, are considered.

Some monitoring systems for special purposes must be set up de novo, as has been the case for many cancer registries, but others can rely at least in part on existing data systems. Options include

Special surveys (not designed for long-term monitoring, although surveys can be repeated periodically).
Disease-reporting systems focused on incidence (as exist for several infectious diseases).
Capture-mark-recapture systems (see below).
Projects that link existing disease and exposure registries.
Special surveillance mechanisms based on health-maintenance organizations or health-insurance systems, such as Medicare.
Special record-linkage systems, such as that pioneered in Oxford, England, or under development in Manitoba, Canada. The Manitoba record-linkage system will evaluate the extent to which census data can be linked to the provincial health-insurance scheme to provide health information. This system, built on pioneering work by Roos et al. (1979), required a special agreement between Statistics Canada and the province of Manitoba. Strict conditions of confidentiality will be observed, and no individually identifiable information will be released (National Task Force on Health Information, 1991).

Several research methods that have been or could be used in special environmental-epidemiology studies could be extended to monitoring systems.

Page 114 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Followup (Cohort) Studies

A common example of a followup study is the study of health effects among individuals living near a specific toxic-waste dump. Such studies are difficult and results are often uncertain because the numbers of persons involved is seldom large enough to provide statistically powerful results. As discussed in chapter 2, a hazardous-waste site may contain many different toxic chemicals, and it may be difficult to link a specific chemical to a specific disease. Further, of the large numbers of different chemicals, few have been characterized with respect to their ability to induce a specific disease, so such studies would have to be largely planned as "fishing expeditions" with extensive requirements for data collection and individual surveillance and with multiple end points. The multiple comparisons are then likely to produce some statistically significant associations by chance alone, so interpretation would be difficult.

Epidemiologic research is often expensive and time-consuming, especially where longitudinal studies of large populations are involved, so there is reason to consider "piggy-backing" needed research on other kinds of studies. For example, prospective cohort studies that are not directly related to the environment could possibly be inexpensively modified to collect additional data relevant to many of the objectives of environmental epidemiology. If this were to be done in a coordinated way for several such cohorts, a combined analysis might be informative.

Repeat Cross-Sectional Surveys

Cross-sectional surveys to determine disease prevalence (or cumulated incidence) could be repeated (e.g., every 5 years) in populations known to have been exposed to environmental contaminants. Although less valuable in some ways than long-term followup of defined cohorts, because those who move away from the area (possibly because of known exposure or illness) would not be followed, they offer an alternative when resources for a large cohort study are not available.

Death-Certificate Diagnoses Linked to Geographic Information

Death certificates have been used to determine whether variations in an acute, fatal disease within and between geographic areas are consistent with exposures to an environmental agent. Death certificates are especially useful for diseases with a short course, a single cause, and a high case-fatality rate, such as infectious hepatitis. They are less useful for the identification of environmental causes of chronic and multifactorial disease.

Page 115 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Case-Control Studies

A difficulty with case-control studies is that the assessment of exposure involves extrapolation to the past. Replication of the findings in other settings is usually needed to provide the evidence required to infer causality.

Active Surveillance of Emergency Rooms and Hospital Admissions

Changes in the frequency of emergency visits or hospital admissions for specific conditions can provide information about acute environmental insults. On a long-term basis, hospital admission rates measure the prevalence of serious disease in the community, though a major disadvantage is the inability of nearly all existing systems to distinguish between first and repeat admissions for the same condition. This is a strong argument for retaining personal identifiers in the basic data.

Outbreak Investigations

After an apparent cluster of cases is identified, investigators may go into the community to identify potential reasons for the cluster. Such investigations are often necessary to allay public anxiety, but typically they have little scientific value. It has been difficult to find direct links of environmental agents to the risk of disease, as many clusters are a result of chance, rather than an identifiable environmental agent (Rothman, 1990).

Poison centers have identified acute episodes of environmental contamination (e.g., Goldman et al., 1990) This is one type of outbreak investigation, with the problems just discussed, though with the advantage that the symptoms can sometimes be related to exposure to a specific substance.

Health Care Financing Administration Data

Data are collected on nearly all chargeable episodes of disease in the US population aged 65 years and over by the Health Care Financing Administration. This resource is largely unexplored as a tool for disease-monitoring purposes (as it is exploited in Canada, where the provincial systems cover all ages). A disadvantage in the United States is the restriction to the elderly and the fact that the elderly, especially those of higher socioeconomic status, often move away from the area where they spent most of their lives. The full impact of environmental factors on many

Page 116 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

chronic diseases may not be expressed until older ages. If diseases occurring in older members of the population can be linked to identified episodes of past pollution, they could provide information needed to prevent future disease in those who are now young, but such links are hard to discern in a mobile population.

For many chronic diseases, the time of first diagnosis may not be important. Rather, a measure of cumulative incidence (approximated by prevalence in nonfatal conditions) could be just as informative, thus increasing the usefulness of data that may mark the presence of chronic disease but not the date of diagnosis.

ATSDR Exposure Registries

The Agency for Toxic Substances and Disease Registry was created by Congress by the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA) to address possible public-health effects of environmental exposures to hazardous substances from waste sites and chemical spills (NRC, 1991b). CERCLA requires ATSDR, in cooperation with the states, to establish national registries of persons who have been exposed to hazardous substances and later develop serious disease or illness (ATSDR, 1988a). While disease registries have not yet been established, the National Exposure Registry is further developed and is the focus of this discussion.

The stated purpose of ATSDR's Exposure Registry is ''to aid in assessing long-term health consequences of exposure to Superfund-related hazardous substances" (ATSDR, 1988a, p. 7). To facilitate epidemiologic research, ATSDR intends to design and create its data systems for both hypothesis generation (identifying possible adverse health outcomes) and hypothesis testing (of suspected adverse health outcomes). Other goals are to facilitate state and federal health-surveillance programs and to provide information that can be used to assess the effects of an exposure on a population.

ATSDR's data system will contain subregistries created in 4 phases. First, it narrows down potential sites for inclusion to a workable number using criteria similar to those presented in table 5-7. Second, site files are requested from EPA, the US Geological Survey, and agency personnel associated with a remediation project. At this time, additional secondary criteria are evaluated, including assessment of participation, existing biomonitoring data, number of secondary or potential confounding contaminants, and reported health problems. During the third phase, site visits are conducted with local and state departments of health and environment and other interested officials. Affected neighborhoods are inspected, and special characteristics, including susceptible or transient

Page 117 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

populations, are evaluated. Finally, on the basis of the above, a document presenting the rationale for selecting the site is prepared. The document is reviewed by ATSDR and, according to the resources available, the site is either approved or disapproved for establishment of an exposure registry. Final sites selected may also be based on the size of the population needed for the subregistry (JeAnne Burg, ATSDR, personal communication, 1993).

An individual is said to be exposed when 3 conditions are met:

•	A Contaminated Source: Valid information indicates the presence of the contaminant(s) of interest in air, drinking water, soil, food chain, or surface water.

•	A Route of Transmission: Evidence for that individual of one or more routes of entry (ingestion, inhalation, topical, or other parenteral routes) exists.

•	Indicated Transmission: The contaminant traveled from the source via an appropriate route of entry to the body (ATSDR, 1988a).

TABLE 5-7 ATSDR Criteria for Setting Priorities for Sites

Factor	Level
	Less Concern	Most Concern
Level of primary contamination	Below or close to standard (if known)^a	Exceeds standard^a
Toxicity of primary and secondary contaminant	Not a recognized human carcinogen, teratogen, neurotoxin, immunotoxin, etc., at levels present	Recognized as a human carcinogen, teratogen, neurotoxin, immunotoxin, etc., at levels present
Size of potentially exposed population	Small (<10 persons)	Large (>100 persons)
Current potential exposure	No	Yes
Past potential exposure (length)	Short-term (<1 year)	Long-term (>10 years)
Other considerations: particularly susceptible population and biomonitoring data indicating body burden
^a The standard is the level specified for that pathway. Source: ATSDR, 1988a.

Page 118 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

The data are collected by telephone, direct mailing, personal interviewing, or a combination of these. The method selected and the questionnaire or form are tailored to the specific subregistry. There will be a 6-month followup letter to confirm participation and address information and then annual or biennial updates of the core questionnaire via telephone interview (ATSDR, 1992). Participants may withdraw at any time.

The National Exposure Registry will contain multiple subregistries. The registry is to be maintained indefinitely, but the various subregistries may have a limited life (e.g., 5, 10, or 30 years) and a definite termination rationale (ATSDR, 1988a). The first 3 subregistries under development are for trichloroethylene (the most frequently encountered substance at proposed National Priority List sites (NRC, 1991b), benzene, and dioxin.

Approaches Used in Infectious Disease Epidemiology

A major reason for the success of infectious-disease control has been the establishment of broad, effective surveillance systems to monitor patterns of disease. These surveillance systems are used to identify geographic areas having increases in incidence of disease, develop data to generate hypotheses about etiology, and test preventive health-control measures (Langmuir, 1963; Thacker et al., 1983; Thacker and Berkelman, 1988). With due note of the different time course and multifactorial etiology of many noncommunicable diseases, a similar surveillance mechanism could help to elucidate the contribution of environmental factors to disease.

Currently Available Incidence Systems

Many kinds of disease-incidence data systems exist. Communicable-disease surveillance has been established through public-health departments by making such diseases reportable. Public-health surveillance is a passive system in that physicians, hospitals, schools, laboratories, etc., are given the responsibility to report certain designated diseases to public-health units. In jurisdictions where a noncommunicable disease has been made reportable, such as cancer in many countries, mandatory reporting combined with imaginative use of existing data-collection systems has resulted in nearly complete national coverage. Mandatory reporting is not always necessary, however; the SEER program in the United States is voluntary but has virtually complete coverage. Some have even argued that making reporting mandatory might decrease the completeness and quality of reporting.

There are advantages and disadvantages in relying on passive public-health surveillance. The accuracy of case diagnosis may be poorer than

Page 119 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

for disease registries, because registries often use standardized protocols for diagnosis and the initial reports are prepared by a relatively small number of persons who have been trained to use them (cancer registries, record-room librarians, etc.). The staff of registries is generally full time and dedicated to its purpose. In contrast, the accuracy and degree of ascertainment of public-health surveillance is low, because it depends heavily on the cooperation of large numbers of people who have other responsibilities and may rarely see the health outcome of interest. With both, there may be a tendency for bias, in that some socioeconomic groups may use the medical-care system less completely than other groups. However, the cost for the identification of cases is much lower for public-health surveillance than for registries of noncommunicable diseases. It is important to have a broad geographic coverage. With a broad coverage, areas of high incidence may be more readily identified, and, by evaluation of overall incidence patterns, it may be easier to determine whether clusters are merely the result of chance. Further, most environmental exposures are mixed, and multiple health outcomes are likely. For example, associations of air pollution with cancer, asthma, skin disease, and acute and nonacute respiratory symptoms are all plausible; therefore, concurrent monitoring of multiple disease outcomes is desirable. Even so, the amount of disease attributable to specific foci of environmental contamination is likely to be low. Thus, the effect of multiple other causes could easily overwhelm those from the environment, the "signal/noise ratio" being too low.

Public-health surveillance systems have not been used to any great extent for the monitoring of noncommunicable diseases that may be related to environmental exposures. Although such systems are generally inexpensive and with broad coverage, they can be incomplete, inaccurate, and misleading. With passive reporting, it has been estimated that only 10 to 50% of cases of serious communicable diseases are reported (Thacker and Berkelman, 1988). For communicable diseases, this is not a problem, as "outbreaks" of disease and changes in disease rates over time may represent a quadrupling in incidence in a very short period. In contrast, a "rapid" rise in a noncommunicable disease may represent less than a 10% increase in incidence over a period of years or decades. Passive reporting systems as currently constituted would not be able to identify such changes reliably; indeed, much-larger fluctuations would frequently occur through variation in reporting or by chance.

The ideal for surveillance of noncommunicable diseases is accurate incidence data across broad areas, long periods, and many diseases. It has been assumed that accurate incidence data will require virtually complete ascertainment of new cases within defined communities. However, society cannot afford registration systems for all diseases that may have

Page 120 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

environmental causes. Further, the basic underlying assumption can be challenged. Although complete ascertainment of cases is ideal for accurate incidences, if the degree of ascertainment in relevant population segments is known, then this can be taken into consideration in estimating incidence. Thus, when complete registration is not feasible, disease reporting to designated public-health departments may be a means of providing data sufficiently accurate for the early-warning mechanisms that many members of the public are now demanding. An example is salmonella infections, for which reporting appears to be only about 1% of all cases (in part because many victims do not seek medical attention), yet that is sufficient to identify many food-related outbreaks.

Potential Difficulties with Disease Monitoring

Although disease reporting is relatively inexpensive, costs of reporting, analyzing, and interpreting the data may be substantial. There are probably too many end points and too many random departures from the baseline to slavishly use conventional probability testing methods. Time, money, and effort must be spent on chasing down each false lead, and this cost must be set against the value of the real leads that will be confirmed. Routine disease monitoring will not overcome the issues related to small populations exposed to many environmental hazards or, conversely, wide and almost uniform exposures of large populations. High relative risks could remain undetectable in the first instance, and large attributable risks in the second. The issues of confounders, biased reporting, and confidentiality are not solved by routine monitoring systems, but they may be brought into heightened visibility.

Routine monitoring is proposed as one mechanism by which to evaluate the causes of diseases of unknown etiology and to facilitate the detection of trends in diseases of environmental origin. However, this approach will often need to be supplemented and strengthened by other approaches.

Confidentiality and Needs for Personal Identifiers

The study of interactions between environment and health poses questions of confidentiality and the need for personal identifiers. Over time, every person is exposed to a variety of potentially adverse environmental conditions and may experience a variety of adverse health effects. To evaluate complex interactions between environmental exposures and health effects, detailed and extensive information on individual subjects is often needed, and populations may have to be followed for long

Page 121 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

periods. Linkages among data systems increase the difficulty of protecting the confidentiality of information.

The United States has 2 federal laws to protect the privacy of the individual from excessive government intrusion. One deals primarily with the collection and transfer of information, the other with its release. The latter, the Freedom of Information Act (FOIA) (5 USC 552), enacted in 1967 and amended in 1974, requires federal agencies to make most kinds of government records available to persons who request them. Health-data systems have not generally been seriously affected by FOIA, as it specifically exempts "personal and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy" (CDC, 1984, p. 6).

It is the Privacy Act of 1974 (5 U.S.C. 552a) that most affects health-information databases. The Privacy Act strictly limits what information government agencies can demand from the public and provides for legal protection of and safeguards on the use of personally identifiable information maintained in federal records systems. Congress has expressed some concern that the computerized databases in use today have outpaced the ability of individuals to protect their privacy when using the mechanisms set up to deal with the predominantly paper-record systems in use in 1974 (OTA, 1986). Specifically, the creation of record linkages between databases can run afoul of the Privacy Act which states that information may not be used for any purpose other than the purpose for which it was supplied (CDC, 1984). This can cause problems when researchers attempt creative and innovative linkages between databases that were intended for other purposes and do not have formal releases from the individuals to use their information for this purpose.

ATSDR's National Exposure Registry, for example, is subject to the Privacy Act. Although the registry is generally prohibited from disclosing personal information without written consent (which is routinely collected from participants through an informed-consent form), the Privacy Act does allow registry data to be released without consent in the following circumstances:

To ATSDR personnel who maintain the registry.
If required by FOIA (personal identifiers removed).
For routine use. A routine use is defined as the use of a record for a purpose that is compatible with the purpose for which it was collected.
To a recipient who has provided advance written assurance that the information released will be used solely for statistical research or as a reporting record. ATSDR requires that anyone seeking registry data for research purposes submit a study protocol for review to an agency review

Page 122 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

panel that will in turn make recommendations to ATSDR. The final decision rests with ATSDR.
To a person pursuant to a showing of compelling circumstances affecting the health or safety of an individual if upon disclosure to the requester notification is transmitted to the last known address of the individual.
To Congress or the comptroller general.
Pursuant to the order of a court of competent jurisdiction (ATSDR, 1988a, p. 31).

The ATSDR registry has the advantage of having been started well into the computer age and thus of being able to incorporate confidentiality protections into its system design. For older and other databases, however, the following specific issues must be considered:

Can mechanisms be developed by which investigators can augment continuing longitudinal studies with new assessments in a timely fashion, perhaps by having the survey staff administer the tests so that more-detailed information could be provided to the investigator without risking the privacy of the subjects?
Can statistical projects be established, whereby the information from several surveys could be augmented in a specific population? For instance, could a specific area be identified where the ambient concentrations of various pollutants are measured in more detail, the exposures of representative members of its population are evaluated in detail, and members of its population are subjected to internal dose assessments and health-status assessments? The registries of ATSDR are an example of such a mechanism, although they are not usually geographically circumscribed.
Can the data systems of different agencies be linked, with return of the linked data to both agencies at the same level of detail as was provided?
Should all data systems that obtain information on individual subjects collect personal identifiers in a consistent way and maintain them in a confidential data file for use later?

To maintain the confidentiality of data systems, statistical masks should be developed by agencies to protect the confidentiality of the data without distorting the relations among individual data items. Making linked data available on public-use data tapes is generally preferable to the agencies' releasing data and will help to maintain confidentiality. Linked data could be made available in a variety of formats—e.g., all demographic variables present but little geographic detail, or few demographic variables present but detailed geographic information—so that

Page 123 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

each format could fulfill some analytic purpose without the possibility of investigators linking between formats and thus jeopardizing confidentiality.

Canada has had a National Mortality Data Base since 1950 and a National Cancer Incidence Reporting System since 1969. Many epidemiology studies have linked different data files, some collected originally for administrative purposes. These include evaluations of occupation and cancer (Howe and Lindsay, 1983), radiation and breast cancer (Miller et al., 1989), and pesticides in farmers (Wigle et al., 1990). The confidentiality issues have been solved largely by returning only anonymous data to investigators for analysis. However, when informed consent for linkage to vital-statistics data in the future had been obtained for a randomized trial of breast-cancer screening, individually identified information was returned to the investigators after linkage to the National Mortality Data Base (Miller et al., 1992).

Data Gaps, Resource Constraints, and Research Opportunities

The utility of a data system in addressing an issue depends not only on the scope and quality of data but on the question being asked. Investigators and policy-makers all too often fail to recognize the multiplicity of questions, research designs, and data shown in table 5-1. There is a tendency for academic researchers to downgrade ecologic analyses and a tendency to discount data collected from regulatory-agency data systems. There is a tendency for policy-makers to consider the design or funding of data systems as though other data systems do not exist or as though regulation is their only purpose.

Only the federal government can coordinate the evaluation and linkage of many existing data systems and data-collection operations, and it should do so. The data systems should be supported by advisory groups of experts from all concerned agencies, and they should discuss system modifications that might enhance useful linkages. Existing systems should be evaluated with regard to their usefulness in estimating human risks associated with exposures, i.e., environmental-health end points. The federal government should also evaluate the data systems of the national environmental monitoring systems to determine whether some modifications might enhance their usefulness. Most federal environmental legislation does not require the collection of data needed to evaluate the health benefits of various environmental regulations. Hence, the environmental and health data systems have developed largely without consideration of environmental-health issues. Classical, targeted epidemiologic programs have not been buttressed with surveillance data that

Page 124 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

would indicate the magnitudes of the environmental-health problems, and this hampers regulatory responses.

One way to begin to evaluate and identify modifications needed in existing systems is to develop a set of health-status indicators for inclusion in exposure, internal-dose, and health-status data systems. Attempts to list sentinel health events have usually focused on disease or syndrome end points (Rothwell et al., 1991; DHHS, 1991). However, many of the health end points are infrequent and therefore might not be useful in small populations. In studies that address public-health concerns—i.e., response epidemiologic studies—investigators often encounter symptom complaints and even collect information on symptoms, but symptoms are not regularly assessed by other health-data systems. Therefore, important data gaps are the lack of baseline health data on the frequency of certain rare diseases and conditions and the lack of data on the prevalence of symptoms in the general population.

Pollutant sources and ambient concentrations have been a focus of regulatory efforts. Assessment of the general health status of the population is usually a health-policy effort, largely independent of environmental health, so there is a paucity of data on human exposures and human internal doses. Existing and new data systems should be explored as sources for such data. One new data system is the National Human Exposure Assessment Survey, proposed by EPA. This survey was designed primarily to serve the interests of risk assessment, rather than to collect data to evaluate the effect of exposures on human health. However, relatively small changes in the design would materially increase the utility of this survey for environmental epidemiology. Such changes include the collection of personal identifying information, collection of data on other (nonenvironmental) potential confounders, and retention of data in a form that would permit linkage to outcome data sets, such as the National Death Index and population-based cancer registries. The committee urges EPA to cooperate closely with epidemiologists throughout the design of this survey, its implementation, continuing evaluation of the findings, and evaluation and possible modification of the study design.

An underused mechanism to collect exposure data is the brief activity questionnaire. The NHIS has been used to assess knowledge and protection practices relevant to environmental hazards such as radon and to occupational chemicals, but not to assess the duration and frequency of exposures of the general population. Brief questionnaires would, of course, provide less-detailed information on human exposure patterns than personal-monitoring studies.

Data to monitor the efficacy of various programs in decreasing body burdens of known toxicants (such as lead) are needed and, as new toxicants or data on new exposures (such as to mercury in paints) become

Page 125 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

available, the distributions of body burdens need to be assessed to assist in the development of new regulations and public-health programs. These surveys will need flexible data-collection protocols because the toxicants to be assessed can easily change. Estimated body burdens of different chemicals should be periodically updated.

An important limitation that hinders greater use of linked data systems is investigators' lack of knowledge about potentially useful data systems and multiple kinds of data. Several innovations in recent years—such as electronic bulletin boards, commercial on-line systems, and distributed networks (Makulowich, 1993)—open up important new communication possibilities. These kinds of activities, as well as traditional inventories, should be encouraged.

Several problems limit the ability of investigators to link data from different systems. Often, the only linking variable available is geographic location. Information on location is often detailed for toxicant assessment but limited to broad areas, such as counties, for health assessments because of confidentiality concerns. Geographic information systems are overcoming the limitations on combining data with different types of geographic identifiers and from different sites. However, those systems are largely cross-sectional. The mobility of the population (and of pollutants) and the variable latent periods of health end points warrant longitudinal analyses. Existing systems and existing analytic procedures remain critical to the prevention or reduction of health problems from toxic exposures, but some simple, feasible, inexpensive changes would enhance the value of the data.

The ability of investigators to link data from multiple systems is also limited by the lack of comprehensive information on exposures and on health end points other than death. Morbidity data are generally collected through surveys with various sampling designs. Few data exist on exposures and internal doses that might be considered representative, let alone comprehensive, of even a circumscribed population. Although each environmental-epidemiologic issue could be addressed by a specially designed data system, such an approach would be prohibitively resource intensive, and ecologic analysis will often be the only feasible way to make general inferences about exposure and health. Even in a confined, compromise data system, detailed exposure patterns or the morbidity of each person simply cannot be obtained. Public-health policy decisions will depend on information from more-limited data systems, such as those described in this chapter, buttressed by studies of smaller populations that determine the validity and relevance of the information derived from larger population analyses.

The degree to which data systems represent or characterize a larger universe—i.e., a population or a well-demarcated region—should always

Page 126 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

be made explicit. Many data systems cannot achieve comprehensive coverage, but there is a need to define sampling schemes better and to determine samples openly, rather than just to collect data from a sample of convenience. For some systems, a subset of environmental sites might be sampled to represent exposures of specific populations.

Another improvement would be the development of procedures to improve the comparability of data from different systems, that is, the use of common data modules or at least common data elements, including definitions of disease or health status. The National Exposure Registry is now collecting health-status data in a manner similar to that of the NHIS, and the prevalence of health end points determined through interviews with the registry population is to be compared with NHIS national population estimates for specific health-status indicators (ATSDR, 1992).

There is a need for many different types of cross-sectional, longitudinal, and followback studies to address environmental-health issues. Although some government agencies, such as the National Institutes of Health, regularly conduct varied studies to address etiologic issues, studies that address other needs in environmental health should be encouraged and conducted. Agencies involved with health promotion, such as ATSDR and the Centers for Disease Control and Prevention (specifically, the National Center for Environmental Health, the National Center for Injury Control and Prevention, and the National Institute for Occupational Safety and Health), need to conduct studies to address issues of health promotion and disease prevention.

The federal government should establish a mechanism by which to track the health impairments of populations for which data on exposures and other baseline measurements are available. The National Exposure Registry and the NHANES I Epidemiologic Followup Study are examples of such mechanisms. In addition, the National Death Index is a useful resource for the public-health community. Those mechanisms should be expanded, and this will require evaluation of confidentiality and other ethical issues, as well as careful review of the uses of data.

Improvements also could be made in the data systems that track inspections and compliance with regulations to enhance their utility for environmental-health assessment. For instance, the sampling period or geographic coverage around each site in the compliance-data systems could be extended. These systems typically collect environmental-concentration data until compliance is achieved. Collecting the data over an extended period would allow investigators to characterize the longer-term exposure patterns of sites known to contain pollutants. The choice of study sites where the nearby population distribution can be characterized would allow investigators to examine potential exposures with more assurance.

Page 127 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

References

Annest, J.L., J.L. Pirkle, D. Makuc, J.W. Neese, D.D. Bayse, and M.G. Kovar. 1983. Chronological trend in blood lead levels between 1976 and 1980. N. Engl. J. Med. 308:1373-1377.

Anto, J.M., J. Sunyer, R. Rodriguez-Roisin, M. Suarez-Cervera, and L. Vazquez. 1989. Community outbreaks of asthma associated with inhalation of soybean dust. N. Engl. J. Med. 320:1097-1102.

ATSDR (Agency for Toxic Substances and Disease Registry). 1988a. Policies and Procedures for Establishing a National Registry of Persons Exposed to Hazardous Substances (National Exposure Registry). Atlanta, GA: Agency for Toxic Substances and Disease Registry, Department of Health and Human Services, Public Health Service.

ATSDR (Agency for Toxic Substances and Disease Registry). 1988b. The Nature and Extent of Lead Poisoning in Children in the United States: A Report to Congress. Atlanta, GA: Agency for Toxic Substances and Disease Registry, Public Health Service, Department of Health and Human Services.

ATSDR (Agency for Toxic Substances and Disease Registry). 1992. The National Exposure Registry. Draft Document (5/19/92). [Atlanta, GA: Agency for Toxic Substances and Disease Registry, Department of Health and Human Services, Public Health Service.]

Boyd, J.T. 1960. Climate, air pollution, and mortality. Br. J. Preventive Soc. Med. 14:123.

Buck, S.F., and A.J. Wicken. 1967. Models for use in investigating the risk of mortality from lung cancer and bronchitis. Appl. Stat. 16:185.

CDC (Centers for Disease Control). 1984. CDC Staff Manual on Confidentiality. Atlanta, GA: Department of Health and Human Services, Public Health Service.

CDC (Centers for Disease Control). 1990. Guidelines for the determination of clusters. MMWR (no. RR-11).

Commission for Racial Justice, United Church of Christ. 1987. Toxic-Wastes and Race in the United States. [New York]: Public Data Access, Inc.

Deane, M., S.H. Swan, J.A. Harris, D.M. Epstein, and R. Neutra. 1989. Adverse pregnancy outcomes in relation to water contamination, Santa Clara County, California, 1980-1981. Am. J. Epidemiol. 129:894-904.

DHHS (Department of Health and Human Services). 1991. Healthy People 2000: National Health Promotion and Disease Prevention Objectives. DHHS Pub. (PHS) 91-50212. Washington DC: US Government Printing Office.

Edmonds, L.D., and L.M. James. 1990. Temporal trends in the prevalence of congenital malformations at birth based on the birth defects monitoring program, United States, 1979-1987. MMWR CDC Surveill. Summ. 39(4):19-23.

EPA (US Environmental Protection Agency). 1985. Costs and Benefits of Reducing Lead in Gasoline. Final Regulatory Impact Analysis . EPA-230-05-85-006. Washington, DC: Office of Policy, Planning and Evaluation, US Environmental Protection Agency.

EPA (U.S. Environmental Protection Agency). 1991. National Air Quality and Emissions Trends Report, 1989. Research Triangle Park, NC: Office of Air Quality Planning and Standards.

EPA, NCHS, and ATSDR (U.S. Environmental Protection Agency, National Center for Health Statistics, and Agency for Toxic Substances and Disease Registry). 1992. Inventory of Exposure-Related Data Systems Sponsored by Federal Agencies. EPA/600/R-92/078. Prepared by Eastern Research Group, Inc., Arlington, VA, for US Environmental Protection Agency, National Center for Health Statistics, and Agency for Toxic Substances and Disease Registry.

Frank, R.G., M.S. Kamlet, and S. Klepper. 1986. The impact of occupational exposure to toxic material on prevalence of chronic illness. Pp. 59-63 in Proceedings of the 1985

Page 128 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

Public Health Conference on Records and Statistics. DHHS Pub. (PHS) 86-1214. Hyattsville, MD: US Government Printing Office.

Fraser, P., C. Chilvers, and P. Goldblatt. 1982. Census-based mortality study of fertilizer manufactures. Br. J. Ind. Med. 39:323-329.

Fraumeni, J.F., Jr. 1987. Keynote lecture: etiologic insights from cancer mapping. Int. Symp. Princess Takamatsu Cancer Res. Fund 18:13-25.

Frisch, J.D., G.M. Shaw, and J.A. Harris. 1990. Epidemiologic research using existing databases of environmental measures. Arch. Environ. Health 45:303-307.

Glasser, M., and L. Greenburg. 1971. Air pollution, mortality and weather. Arch. Environ. Health. 22:334-343.

Goldman, L.R., D.F. Smith, R.R. Neutra, L.D. Saunders, E.M. Pond, J. Stratton, K. Waller, R.J. Jackson, and K.W. Kizer. 1990. Pesticide food poisoning from contaminated watermelons in California, 1985. Arch. Environ. Health 45:229-236.

Health Officers Association of California. 1986. Directory of Automated Information Systems in Local Health Departments. Sacramento: Health Officers Association of California.

Howe, G.R., and J.P. Lindsay. 1983. A follow-up study of a ten-% sample of the Canadian labor force. 1. Cancer mortality in males, 1965-73. J. Natl. Cancer Inst. 70:37-44.

Langmuir, A.D. 1963. The surveillance of communicable diseases of national importance. N. Engl. J. Med. 286:182-192.

Lave, L.B., and E.P. Seskin. 1973. Analysis of the association between U.S. mortality and air pollution. J. Am. Stat. A. 68:284-290.

Lynch, C.F., R.D. Woolson, T. O'Gorman, and K.P. Cantor. 1989. Chlorinated drinking water and bladder cancer: effect of misclassification on risk estimates. Arch. Environ. Health 44:252-259.

Mahaffey, K.R., J.L. Annest, J. Roberts, and R.S. Murphy. 1982. National estimates of blood lead levels: United States, 1976-1980: association with selected demographic and socioeconomic factors. N. Engl. J. Med. 307:573-579.

Makulowich, J.S. 1993. The use of electronic communications in environmental health research. Environ. Health Perspect. 101:34-35.

Mazumdar, S., S. Schimmel, and I.T. Higgins. 1982. Relation of daily mortality to air pollution: An analysis of 14 London winters, 1958/59-1971/72. Arch. Environ. Health 37:213-220.

Miller, A.B., G.R. Howe, G.J. Sherman, J.P. Lindsay, M.J. Yaffe, P.J. Dinner, H.A. Risch, and D.L. Preston. 1989. Mortality from breast cancer after irradiation during fluoroscopic examinations in patients being treated for tuberculosis. N. Engl. J. Med. 321:1285-1289.

Miller, A.B., C.J. Baines, T. To, and C. Wall. 1992. Canadian National Breast Screening Study. 1. Breast cancer detection and death rates among women 40 to 49 years. Can. Med. Assoc. J. 17:1459-1488.

National Governors' Association. 1989. The Potential for Linking Environmental and Health Data. Washington, DC: National Governors' Association.

National Task Force on Health Information. 1991. Implications of Privacy and Confidentiality Concerns on the Use of Health Information for Research and Statistics. Report of the Project Team to the National Task Force on Health Information. Ottawa: National Centre for Health Information, Statistics Canada. 43 pp.

NRC (National Research Council). 1989. Biologic Markers in Reproductive Toxicology. Washington, DC: National Academy Press.

NRC (National Research Council). 1991. Environmental Epidemiology. Vol. 1. Public Health and Hazardous Wastes. Washington DC: National Academy Press.

Ostro, B.D., and S. Rothschild. 1989. Air pollution and acute respiratory morbidity: an observational study of multiple pollutants. Environ. Res. 50:238-247.

Page 129 Cite

Suggested Citation:"5 Data Systems and Opportunities for Advances." National Research Council. 1997. Environmental Epidemiology, Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington, DC: The National Academies Press. doi: 10.17226/5804.

×

OTA (US Congress Office of Technology Assessment). 1986. Federal Government Information Technology: Electronic Record Systems and Individual Privacy. OTA-CIT-296. Washington, DC: US Government Printing Office.

Ozkaynak H., J. Spengler, A. Garzd, et al. 1986. Assessment of population health risks resulting from exposure to airborne particles. In S. D. Lee, ed. Aerosols: Research, Risk Assessment, and Control Strategies. Chelsea, MI: Lewis Publishers.

Pickle, L.W., T.J. Mason, N. Howard, R. Hoover, and J.F. Fraumeni Jr. 1987. Atlas of U.S. Cancer Mortality Among Whites: 1950-1980. DHHS Pub. (NIH) 87-2900. Washington DC: US Government Printing Office.

Pope, C.A. 1991. Respiratory hospital admissions associated with PM₁₀ pollution in Utah, Salt Lake, and Cache Valleys. Arch. Environ. Health 46:90-97.

Roos, L.L., J.B. Nicol, C.F. Johnson, and N.P. Roos. 1979. Using administrative data banks for research and evaluation: a case study. Eval. Quart. 3:236-255.

Rothenberg, R.B., K.K. Steinberg, and S.B.Thacker. 1990. The public health importance of clusters: a note from the Centers for Disease Control. Am. J. Epidemiol. 132(Suppl.1): S3-S5.

Rothman, K.J. 1990. A sobering start for the Cluster Busters' Conference. Keynote Presentation. Am. J. Epidemiol. 132(Suppl.1):S6-S13.

Rothwell, C.J., C.B. Hamilton, and P.E. Leaverton. 1991. Identification of sentinel health events as indicators of environmental contamination. Environ. Health Perspect. 94:261-263.

Schwartz, J. 1989. Lung function and chronic exposure to air pollution: a cross-sectional analysis of NHANES II. 1989. Environ. Res. 50:309-321.

Schwartz, J., and A. Marcus. 1990. Mortality and air pollution in London: a time series analysis. Am. J. Epidemiol. 131:185-194.

Sexton, K, D. S. G. Selevan, K. Wagener, and J. A. Lybarger. 1992. Estimating human exposure to environmental pollutants: availability and utility of existing databases. Arch. Environ. Health 47:398-407.

Sexton, K, D. K. Wagener, S. G. Selevan, T. O. Miller, and J. A. Lybarger. 1994. An inventory of human exposure-related databases. J. Exposure Anal. Env. Epidemiol. 4:95-109.

Thacker, S.B., and R.L. Berkelman. 1988. Public health surveillance in the United States. Epidemiol. Rev. 10:164-190.

Thacker, S.B., K. Choi, and P.S. Brachman. 1983. The surveillance of infectious diseases. J. Am. Med. Assoc. 249:1181-1185.

Wagener, D.K. 1990. Using biomarkers to assess exposure. In J.S. Andrews, Jr., B.O. Askew, J.A. Bucsela, D.A. Hoffman, B.L. Johnson, and C. Xintaras, eds. Proceedings of the Fourth National Environmental Health Conference: Environmental Issues-Today's Challenge for the Future. Held June 20-23, 1989 in San Antonio, TX. Atlanta, GA: Department of Health and Human Services, Public Health Service, Centers for Disease Control.

Wigle, D.T., R.M. Semenciw, K. Wilkins, D. Riedel, L. Ritter, H.I. Morrison, and Y. Mao. 1990. Mortality study of Canadian male farm operators: non-Hodgkin's lymphoma mortality and agricultural practices in Saskatchewan. J. Natl. Cancer Inst. 82:575-582.

Wingren, G., B. Persson, K. Thoren, and O. Axelson. 1991. Mortality pattern among pulp and paper mill workers in Sweden: a case-referent study. Am. J. Ind. Med. 20:769-774.