5
Data Systems and Opportunities for Advances

This chapter describes some existing data systems that permit the characterization of personal exposure and health status. Given the scarcity of resources for studies in environmental epidemiology, researchers need to make the best use of existing data. It is beyond the scope of this chapter to cover all the pertinent data systems or to describe systems in detail. Rather, the focus is on classes of data-collection systems, some of the major systems in each class and their important features, and their use. The emphasis will be on data systems that are publicly available (often from the federal government). For a more-comprehensive list of federal data systems related to environmental exposure, see EPA et al. (1992); for other discussions of state and local data systems see, for instance, Health Officers Association of California (1986), National Governors' Association (1989), Frisch et al. (1990), and Sexton et al. (1992, 1994). There is a need for greater dissemination of the knowledge of the existence and availability of federal, state, and local systems. Many of these are limited in size, coverage, end points, completeness, or accuracy, but where they meet the investigator's needs, they can save much time and expense. A geographic information system can be very useful in investigation by providing an organizing framework for data on exposure and outcomes.

Introduction

The interest of the American public in environmental pollution seems to be driven primarily by concerns about health. People ask, ''Have we



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 94
--> 5 Data Systems and Opportunities for Advances This chapter describes some existing data systems that permit the characterization of personal exposure and health status. Given the scarcity of resources for studies in environmental epidemiology, researchers need to make the best use of existing data. It is beyond the scope of this chapter to cover all the pertinent data systems or to describe systems in detail. Rather, the focus is on classes of data-collection systems, some of the major systems in each class and their important features, and their use. The emphasis will be on data systems that are publicly available (often from the federal government). For a more-comprehensive list of federal data systems related to environmental exposure, see EPA et al. (1992); for other discussions of state and local data systems see, for instance, Health Officers Association of California (1986), National Governors' Association (1989), Frisch et al. (1990), and Sexton et al. (1992, 1994). There is a need for greater dissemination of the knowledge of the existence and availability of federal, state, and local systems. Many of these are limited in size, coverage, end points, completeness, or accuracy, but where they meet the investigator's needs, they can save much time and expense. A geographic information system can be very useful in investigation by providing an organizing framework for data on exposure and outcomes. Introduction The interest of the American public in environmental pollution seems to be driven primarily by concerns about health. People ask, ''Have we

OCR for page 94
--> been exposed?" "Have we been affected?" "Will we be affected later?" They might well ask, also, "Do our management programs have any effect on the health of the public?" Table 5-1 outlines some epidemiologic research strategies that address these concerns. It shows that many types of epidemiologic studies and data can be used to determine the relation between the environment and human health. Although experimental studies of animals and laboratory studies of humans do provide some answers to these questions, epidemiologic research is essential to their resolution. Often, however, epidemiologic studies are neither available nor possible, and policy must be based on toxicologic evidence and animal studies. Considerations of cost, urgency, and limited special expertise often require that officials rely on analyses of existing data that were gathered for other purposes. Epidemiologic studies of the classical kind involve the measurement of both the health status and the environmental exposure (or internal dose) of the persons being studied. However, such measurements cannot always be obtained. For instance, if historical exposures were not measured, the investigator may have to estimate them from other, less reliable, information. On the other hand, the exposure might be so extensive that no suitable control population remains. Individualized measures of exposure and health can also be infeasible or too expensive when a health effect occurs so infrequently that adequate study would require that a large number of exposed persons be evaluated in detail; such problems require other epidemiologic methods or the use of secondary data. During development and implementation of public-health policy, analyses of secondary data are important at several stages. Intense study of selected small groups of people can provide useful information about risk that identifies a need for public policy. To determine the extent of potential exposure, the size and characteristics of the population exposed, or the background frequency of the health effect of interest, secondary data analyses are useful. Information from existing data systems is useful in program planning and development when data are needed to validate findings of earlier targeted research studies. During implementation, data from existing systems can provide additional insights for public-health policy. Existing data systems tend to reflect the programmatic and regulatory structure of government programs, so the identification of useful systems (or of their absence) might help to define the most appropriate needs for assessment and availability of the public-health response. This in turn allows for midcourse changes to reduce costs, improve response, or otherwise improve on-going programs. Data systems are a primary mechanism for evaluating the impact of a public-health policy. For instance, a public-health program might target

OCR for page 94
--> TABLE 5-1 Issues of Major Concern to the Public and Methodologic Responses   Methodologic Responses   Exposure Assessment Applied (Response) Epidemiologya Epidemiologic Studyb Are we exposed? X     Are we affected now?   X   Did exposure cause a health effect? X X X Will we be affected later X   X Did we improve health with a program initiative?     X a Applied, or response, epidemiology refers to studies designed as a quick response to concerns expressed by a group of individuals regarding the potential for exposure or health effects. These are the basis of much of the "gray" literature and many of the studies performed by public-health agencies. b Epidemiologic studies include classical case-control and cohort studies of targeted populations, in contrast with studies of the general population. ozone because of its effects on several pulmonary health end points. However, the success of the program might be evaluated solely from the ambient concentrations of ozone in a polluted area. To evaluate the impact on public health, it is important to know the relation between the observed ambient concentrations of ozone and the frequency of various pulmonary health end points. Modification of public-health policies depends on knowledge of such relations, identified largely through analyses of secondary data. Data-Collection Systems: What They Measure Evaluation of the relation between an environmental pollutant and human health requires data to characterize exposures to the pollutant, including concentrations in the environment, the probability and characteristics of human exposure, and the distributions of internal doses, as well as trends or differences in the health status of exposed people. Determination of risk-management alternatives requires, in addition, information on the sources and distribution of the pollutant. Data systems may address each of these needs. However, they have not necessarily been established with the goal of integration with other classes of data, and

OCR for page 94
-->   Registries and Surveillance Reference Surveys for Exposure Reference Surveys for Health Effect Risk Assessment Are we exposed?   X     Are we affected now? X   X   Did exposure cause a health effect? X X X X Will we be affected later X X   X Did we improve health with a program initiative?   X   most data-collection systems collect only one kind of data or data on one aspect of the general problem. One distinguishing characteristic of a data system is how and when the responding units are sampled. For example, persons may be selected at random from a defined population but tested at a fixed time (8:00 am every day) or once at a haphazard time (when the laboratory is not otherwise busy). Some surveys are designed to obtain probability samples that accurately represent a reference group, such as a population or an occupational setting, but others obtain samples by convenience, such as collections of information from participating states or hospitals. A short-term survey may not be representative across time. A survey system might select sampling units that are characteristic or representative of larger reference groups. Characteristic sampling is based on selection from a list of strata; representative sampling is based on the distribution of strata in the population. For instance, in selecting monitoring sites, one might decide that several important types of environments should be evaluated. Monitoring sites can be selected to characterize those types of environments, as in the stratified sampling of air in urban areas and rural areas. Alternatively, one might select monitoring sites on the basis of a stratified probability-sampling scheme to yield data that are representative of the distribution of environments. Monitoring is expensive, and decisions about where to put monitors are generally considered carefully, but the decisions may not be optimal for a specific environmental-epidemiology study. For example, if budgets allow for only a few monitors to measure some chemical, should they be placed to obtain the most-representative

OCR for page 94
--> geographic coverage? In or near population centers? Where prior information suggests the levels are highest? Other? A data-collection system can be either a compendium or a systematic survey. That is, it can consist of individual studies with similar but separate research designs and measurements, or it can collect data from many sources in a standardized fashion. Neither system is necessarily identical between study years or cycles. That is, the pollutant or health effect assessed by a systematic survey and how it is assessed may vary from time to time, from place to place, or in other ways. Data systems with the characteristics mentioned above are useful for evaluating the relation between environment and health. The usefulness of any data system is limited by its characteristics, so it is important to understand the sampling and assessment characteristics of each data source before using it. (See the discussion below on bridging environmental and health issues.) Source of Pollutant The development of systems to collect information about discharges of pollutants (apart from occupational exposures) is a primary objective of the Environmental Protection Agency (EPA) (table 5-2). Its role as the principal environmental-risk management agency in the federal government requires data on the relative contributions of sources and on control options. The primary objective of EPA's data systems is to provide information pertinent to regulation, so they are designed to be comprehensive with regard to polluters and pollutants that have been identified as toxic. Many pollution-related data systems have emphasized the characterization of pollutant sources, rather than the distribution and fate of pollutants in the environment or the potential exposures of humans. Representative data (as opposed to comprehensive data) have little utility in assessing compliance with regulation of individual pollution sources, though such data can be useful in assessing needs for and monitoring the success of management programs. Other data-collection systems characterize the amounts of a pollutant at its source. These include production volumes and emission inventories. These systems, too, are not directly concerned with the fate of pollutants in the environment. Data systems that contain location- and time-specific information can be used in analytic models to estimate the transport and fate of pollutants in the environment. However, few data systems contain both time-integrated information (for instance, yearly, periodic, or daily data on emissions) and geographic information (for instance, production volume at a worksite).

OCR for page 94
--> Pollutant Concentrations In the Environment The locations covered by most pollutant-concentration data systems are chosen to be characteristic rather than representative (table 5-3 ). Thus, most of the National Air Monitoring Stations or water-system quality sites of the National Stream Quality Accounting Network are in densely populated areas. These data systems contain detailed information on the location of the monitoring site, and samples are collected frequently enough to represent short periods. However, site selection is not based on detailed information about the population, the area, or the distribution of exposures among individuals, and the positioning of a station does not necessarily reflect the most likely route of human exposure. For instance, some air-monitoring stations are on the tops of buildings, and water-quality assessments are performed at the outflow pipes of water-treatment facilities, not at residential taps. Those locations might yield informative data on relative exposures, but may not represent either the distribution of concentrations in the environment or the actual exposures of people. Pollutant-concentration data systems are probably underused for ecologic studies. These systems contain detailed geographic data, and, although few pollutants may be assessed, the analytic methods tend to be relatively stable over time, and exposure is generally measured at or integrated over short intervals. Although most data on pollutant concentration are from monitoring systems, data from "response epidemiologic studies" are increasing. Response (or applied) epidemiologic studies are designed to respond quickly to expressed concerns regarding the potential for exposure or adverse health effects. Examples of response epidemiologic programs are the health-assessment studies of the Agency for Toxic Substances and Disease Registry (ATSDR) and the health-hazard evaluations of the National Institute for Occupational Safety and Health (NIOSH). In these studies, environmental concentrations of various pollutants are regularly assessed. However, study sites are often selected because the potential exposure is considered high or because of complaints about symptoms, so sites are not characteristic of ordinary population exposures. These studies do, however, attempt to characterize explicitly the scope or potential for human exposure in these presumably extreme settings, and they may contribute information on the relation between the environmental distribution of pollutants and human exposure or internal dose. Human Exposure Data on human exposure (table 5-4) are the least developed of the classes considered here, and generalizations to larger groups of people or

OCR for page 94
--> TABLE 5-2 Data-Collection Systems: Source of Pollutant Data-System Name Description Production Volume Inventories Synthetic organic chemicals Annual data on production and sales of synthetic organic chemicals produced in the United States Site Inventories National pollutant discharge elimination system Permits for worksites that specify effluent concentration limits, monitoring, and reporting requirements National Priorities List List of the toxic-waste sites determined to be of immediate concern for remediation Emission Inventories Toxic chemical release inventory Annual estimates of releases from manufacturing facilities of minimal size and volume of chemicals per year Integrated database Information on spent fuel and radioactive-waste inventories for nuclear reactors, storage facilities, and mine tailings, among others Sales Volumes Agricultural chemical use Database of information on sales for agricultural purposes of fertilizers and pesticides, among others particularizations to specific exposure situations are often difficult and uncertain. Much detail is required to make this class of data useful, but few detailed data systems have been developed. Detailed information on human exposures generally requires the use of personal monitors or structured activity questionnaires, but these tools are expensive and time-consuming. Thus, most systems contain information on small populations chosen to be characteristic, but not necessarily representative, of the target population. However, systems that exist generally have substantial extent and detail over periods as long as several years. More data of this class could be gathered by brief structured activity questionnaires in large surveys. Brief questionnaires might provide less-

OCR for page 94
--> Data-System Name Primary Objective Coverage/Sample + Design Linking Data Production Volume Inventories Synthetic organic chemicals Monitoring National totals: comprehensive None Site Inventories National pollutant-discharge elimination system Regulatory National: comprehensive Detailed geographic codes, river reach no., pollutant limits National Priorities List Regulatory National: based on reports from regions, comprehensive Detailed geographic codes, environmental concentrations Emission Inventories Toxic chemical release inventory Informational National: comprehensive for defined worksites Detailed geographic codes Integrated database Informational National: comprehensive for defined sites Facility name Sales Volumes Agricultural chemical use Monitoring National: characteristic farm sample None detailed information on human exposure patterns than personal monitors, but for a fixed total budget they can yield data on greater numbers of people. A combination of brief questionnaires for large numbers of people with validation and characterization of a subset using personal monitors might even be more useful. Internal Dose Like information on human exposure, information on internal dose is rarely collected systematically (table 5-5). Occasional studies of biologic markers of specific agents in small, defined populations are plentiful, but

OCR for page 94
--> TABLE 5-3 Data-Collection Systems: Environmental Concentrations Data-System Name Description Montitoring Systems Aerometric Information Retrieval System Ambient concentrations, emissions, and compliance data for airborne criteria pollutants Microbiology and residue computer information system Contaminant data from samples of meat and poultry at slaughtering establishments and from import shipments Regulatory Systems Permit-compliance system Information for tracking the permit, compliance, and enforcement of permittees under the Clean Water Act Response Epidemiologic Studies Health assessments (ATSDR) ATSDR assessments to identify potential health concerns among populations living near National Priority List sites Microenvironment Settings Indoor air study A pilot project to assess contaminants in indoor air broad and systematic collections of data on biologic markers in the general population are few, and surveys have yielded little information with which to characterize the subjects' exposures. Direct measures of internal dose are not usually included in health-assessment studies (conducted by ATSDR) or health-hazard evaluations (conducted by NIOSH), but these sources could be modified to include internal-dose assessments. ATSDR conducts public-health assessments to determine where, and for whom, public-health actions should be undertaken (ATSDR, 1992). Each assessment characterizes the nature and extent of hazards and identifies communities where public-health actions are needed. However, the assessment is largely or entirely a compilation and analysis of existing data, which rarely include internal doses of toxicants in the population of concern. The health-assessment format does not require the collection of

OCR for page 94
--> Data-System Name Primary Objective Coverage/Sample + Design Linking Data Montitoring Systems       Aerometric Information Retrieval System Monitoring National: air monitoring stations in urban areas Detailed geographic codes/point-source identifiers Microbiology and residue computer information system Monitoring National: Random sampling of meat products No information on distribution of food Regulatory Systems Permit-compliance system Regulatory, monitoring National: comprehensive coverage of permittees Detailed geographic codes, linked to Reach Pollutant Assessment System Response Epidemiologic Studies Health assessments (ATSDR) Regulatory National: all National Priority List sites Detailed geographic codes, linked to environmental concentration data Microenvironment Settings Indoor air study Research Selected sites: not sampled to be representative None new data, for at least 2 reasons. First, the objective of the ATSDR health-assessment study is to determine whether there is a potential for human health effects, not to determine the extent or magnitude of actual exposure. Second, many internal-dose assessments are invasive; this decreases participation rates and increases opportunities for bias. However, when a health assessment indicates a potentially significant risk to human health, ATSDR is obliged under the 1986 Superfund Amendments and Reauthorization Act (SARA) to consider a registry as a followup (ATSDR, 1988a), and registrants may be invited to participate in biologic testing for markers of exposure or effect (NRC, 1989). There is also a need for studies that characterize a population with a well-defined sampling scheme. The National Health and Nutrition Examination Survey (NHANES), conducted by the National Center for

OCR for page 94
--> TABLE 5-4 Data-Collection Systems: Human Exposure Data-System Name Description Time-Activity Patterns and Personal Monitoring Total-exposure-assessment methodology Goals to develop methods to measure individual total exposure to toxic and carcinogenic chemicals Surveys National Occupational Exposure Survey Information on the probability of exposure to various chemicals based on job title Registries National Exposure Registry Identification of individuals with verified exposure to selected chemical, with followup studies to be performed on individuals in registry Health Statistics, studies about 30,000 persons in the US population, chosen by random sampling (clustered, stratified, with deliberate over-sampling of some subgroups). For study of general contaminants, such as lead or petrochemical oxidants, NHANES has been used as a data source. Given the followup capabilities of NHANES, detailed exposure data could be collected in subgroups of the entire sample that are identified as having received internal doses of particular interest. However, when the probability of exposure is small, the actual number of participants who could be studied to characterize the specific exposure would be small, possibly zero, and NHANES might not be sensitive enough. Specially designed surveys could be considered to characterize specific population exposures. Health Status Most health-status information systems are not developed for the primary purpose of studying environmental health (table 5-6). Vital records are collected for legal reasons, hospital-discharge and cost information [e.g., Medicare provider analysis and review (MEDPAR)] is collected for economic or administrative reasons, and the National Health Interview

OCR for page 94
--> for disease registries, because registries often use standardized protocols for diagnosis and the initial reports are prepared by a relatively small number of persons who have been trained to use them (cancer registries, record-room librarians, etc.). The staff of registries is generally full time and dedicated to its purpose. In contrast, the accuracy and degree of ascertainment of public-health surveillance is low, because it depends heavily on the cooperation of large numbers of people who have other responsibilities and may rarely see the health outcome of interest. With both, there may be a tendency for bias, in that some socioeconomic groups may use the medical-care system less completely than other groups. However, the cost for the identification of cases is much lower for public-health surveillance than for registries of noncommunicable diseases. It is important to have a broad geographic coverage. With a broad coverage, areas of high incidence may be more readily identified, and, by evaluation of overall incidence patterns, it may be easier to determine whether clusters are merely the result of chance. Further, most environmental exposures are mixed, and multiple health outcomes are likely. For example, associations of air pollution with cancer, asthma, skin disease, and acute and nonacute respiratory symptoms are all plausible; therefore, concurrent monitoring of multiple disease outcomes is desirable. Even so, the amount of disease attributable to specific foci of environmental contamination is likely to be low. Thus, the effect of multiple other causes could easily overwhelm those from the environment, the "signal/noise ratio" being too low. Public-health surveillance systems have not been used to any great extent for the monitoring of noncommunicable diseases that may be related to environmental exposures. Although such systems are generally inexpensive and with broad coverage, they can be incomplete, inaccurate, and misleading. With passive reporting, it has been estimated that only 10 to 50% of cases of serious communicable diseases are reported (Thacker and Berkelman, 1988). For communicable diseases, this is not a problem, as "outbreaks" of disease and changes in disease rates over time may represent a quadrupling in incidence in a very short period. In contrast, a "rapid" rise in a noncommunicable disease may represent less than a 10% increase in incidence over a period of years or decades. Passive reporting systems as currently constituted would not be able to identify such changes reliably; indeed, much-larger fluctuations would frequently occur through variation in reporting or by chance. The ideal for surveillance of noncommunicable diseases is accurate incidence data across broad areas, long periods, and many diseases. It has been assumed that accurate incidence data will require virtually complete ascertainment of new cases within defined communities. However, society cannot afford registration systems for all diseases that may have

OCR for page 94
--> environmental causes. Further, the basic underlying assumption can be challenged. Although complete ascertainment of cases is ideal for accurate incidences, if the degree of ascertainment in relevant population segments is known, then this can be taken into consideration in estimating incidence. Thus, when complete registration is not feasible, disease reporting to designated public-health departments may be a means of providing data sufficiently accurate for the early-warning mechanisms that many members of the public are now demanding. An example is salmonella infections, for which reporting appears to be only about 1% of all cases (in part because many victims do not seek medical attention), yet that is sufficient to identify many food-related outbreaks. Potential Difficulties with Disease Monitoring Although disease reporting is relatively inexpensive, costs of reporting, analyzing, and interpreting the data may be substantial. There are probably too many end points and too many random departures from the baseline to slavishly use conventional probability testing methods. Time, money, and effort must be spent on chasing down each false lead, and this cost must be set against the value of the real leads that will be confirmed. Routine disease monitoring will not overcome the issues related to small populations exposed to many environmental hazards or, conversely, wide and almost uniform exposures of large populations. High relative risks could remain undetectable in the first instance, and large attributable risks in the second. The issues of confounders, biased reporting, and confidentiality are not solved by routine monitoring systems, but they may be brought into heightened visibility. Routine monitoring is proposed as one mechanism by which to evaluate the causes of diseases of unknown etiology and to facilitate the detection of trends in diseases of environmental origin. However, this approach will often need to be supplemented and strengthened by other approaches. Confidentiality and Needs for Personal Identifiers The study of interactions between environment and health poses questions of confidentiality and the need for personal identifiers. Over time, every person is exposed to a variety of potentially adverse environmental conditions and may experience a variety of adverse health effects. To evaluate complex interactions between environmental exposures and health effects, detailed and extensive information on individual subjects is often needed, and populations may have to be followed for long

OCR for page 94
--> periods. Linkages among data systems increase the difficulty of protecting the confidentiality of information. The United States has 2 federal laws to protect the privacy of the individual from excessive government intrusion. One deals primarily with the collection and transfer of information, the other with its release. The latter, the Freedom of Information Act (FOIA) (5 USC 552), enacted in 1967 and amended in 1974, requires federal agencies to make most kinds of government records available to persons who request them. Health-data systems have not generally been seriously affected by FOIA, as it specifically exempts "personal and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy" (CDC, 1984, p. 6). It is the Privacy Act of 1974 (5 U.S.C. 552a) that most affects health-information databases. The Privacy Act strictly limits what information government agencies can demand from the public and provides for legal protection of and safeguards on the use of personally identifiable information maintained in federal records systems. Congress has expressed some concern that the computerized databases in use today have outpaced the ability of individuals to protect their privacy when using the mechanisms set up to deal with the predominantly paper-record systems in use in 1974 (OTA, 1986). Specifically, the creation of record linkages between databases can run afoul of the Privacy Act which states that information may not be used for any purpose other than the purpose for which it was supplied (CDC, 1984). This can cause problems when researchers attempt creative and innovative linkages between databases that were intended for other purposes and do not have formal releases from the individuals to use their information for this purpose. ATSDR's National Exposure Registry, for example, is subject to the Privacy Act. Although the registry is generally prohibited from disclosing personal information without written consent (which is routinely collected from participants through an informed-consent form), the Privacy Act does allow registry data to be released without consent in the following circumstances: To ATSDR personnel who maintain the registry. If required by FOIA (personal identifiers removed). For routine use. A routine use is defined as the use of a record for a purpose that is compatible with the purpose for which it was collected. To a recipient who has provided advance written assurance that the information released will be used solely for statistical research or as a reporting record. ATSDR requires that anyone seeking registry data for research purposes submit a study protocol for review to an agency review

OCR for page 94
--> panel that will in turn make recommendations to ATSDR. The final decision rests with ATSDR. To a person pursuant to a showing of compelling circumstances affecting the health or safety of an individual if upon disclosure to the requester notification is transmitted to the last known address of the individual. To Congress or the comptroller general. Pursuant to the order of a court of competent jurisdiction (ATSDR, 1988a, p. 31). The ATSDR registry has the advantage of having been started well into the computer age and thus of being able to incorporate confidentiality protections into its system design. For older and other databases, however, the following specific issues must be considered: Can mechanisms be developed by which investigators can augment continuing longitudinal studies with new assessments in a timely fashion, perhaps by having the survey staff administer the tests so that more-detailed information could be provided to the investigator without risking the privacy of the subjects? Can statistical projects be established, whereby the information from several surveys could be augmented in a specific population? For instance, could a specific area be identified where the ambient concentrations of various pollutants are measured in more detail, the exposures of representative members of its population are evaluated in detail, and members of its population are subjected to internal dose assessments and health-status assessments? The registries of ATSDR are an example of such a mechanism, although they are not usually geographically circumscribed. Can the data systems of different agencies be linked, with return of the linked data to both agencies at the same level of detail as was provided? Should all data systems that obtain information on individual subjects collect personal identifiers in a consistent way and maintain them in a confidential data file for use later? To maintain the confidentiality of data systems, statistical masks should be developed by agencies to protect the confidentiality of the data without distorting the relations among individual data items. Making linked data available on public-use data tapes is generally preferable to the agencies' releasing data and will help to maintain confidentiality. Linked data could be made available in a variety of formats—e.g., all demographic variables present but little geographic detail, or few demographic variables present but detailed geographic information—so that

OCR for page 94
--> each format could fulfill some analytic purpose without the possibility of investigators linking between formats and thus jeopardizing confidentiality. Canada has had a National Mortality Data Base since 1950 and a National Cancer Incidence Reporting System since 1969. Many epidemiology studies have linked different data files, some collected originally for administrative purposes. These include evaluations of occupation and cancer (Howe and Lindsay, 1983), radiation and breast cancer (Miller et al., 1989), and pesticides in farmers (Wigle et al., 1990). The confidentiality issues have been solved largely by returning only anonymous data to investigators for analysis. However, when informed consent for linkage to vital-statistics data in the future had been obtained for a randomized trial of breast-cancer screening, individually identified information was returned to the investigators after linkage to the National Mortality Data Base (Miller et al., 1992). Data Gaps, Resource Constraints, and Research Opportunities The utility of a data system in addressing an issue depends not only on the scope and quality of data but on the question being asked. Investigators and policy-makers all too often fail to recognize the multiplicity of questions, research designs, and data shown in table 5-1. There is a tendency for academic researchers to downgrade ecologic analyses and a tendency to discount data collected from regulatory-agency data systems. There is a tendency for policy-makers to consider the design or funding of data systems as though other data systems do not exist or as though regulation is their only purpose. Only the federal government can coordinate the evaluation and linkage of many existing data systems and data-collection operations, and it should do so. The data systems should be supported by advisory groups of experts from all concerned agencies, and they should discuss system modifications that might enhance useful linkages. Existing systems should be evaluated with regard to their usefulness in estimating human risks associated with exposures, i.e., environmental-health end points. The federal government should also evaluate the data systems of the national environmental monitoring systems to determine whether some modifications might enhance their usefulness. Most federal environmental legislation does not require the collection of data needed to evaluate the health benefits of various environmental regulations. Hence, the environmental and health data systems have developed largely without consideration of environmental-health issues. Classical, targeted epidemiologic programs have not been buttressed with surveillance data that

OCR for page 94
--> would indicate the magnitudes of the environmental-health problems, and this hampers regulatory responses. One way to begin to evaluate and identify modifications needed in existing systems is to develop a set of health-status indicators for inclusion in exposure, internal-dose, and health-status data systems. Attempts to list sentinel health events have usually focused on disease or syndrome end points (Rothwell et al., 1991; DHHS, 1991). However, many of the health end points are infrequent and therefore might not be useful in small populations. In studies that address public-health concerns—i.e., response epidemiologic studies—investigators often encounter symptom complaints and even collect information on symptoms, but symptoms are not regularly assessed by other health-data systems. Therefore, important data gaps are the lack of baseline health data on the frequency of certain rare diseases and conditions and the lack of data on the prevalence of symptoms in the general population. Pollutant sources and ambient concentrations have been a focus of regulatory efforts. Assessment of the general health status of the population is usually a health-policy effort, largely independent of environmental health, so there is a paucity of data on human exposures and human internal doses. Existing and new data systems should be explored as sources for such data. One new data system is the National Human Exposure Assessment Survey, proposed by EPA. This survey was designed primarily to serve the interests of risk assessment, rather than to collect data to evaluate the effect of exposures on human health. However, relatively small changes in the design would materially increase the utility of this survey for environmental epidemiology. Such changes include the collection of personal identifying information, collection of data on other (nonenvironmental) potential confounders, and retention of data in a form that would permit linkage to outcome data sets, such as the National Death Index and population-based cancer registries. The committee urges EPA to cooperate closely with epidemiologists throughout the design of this survey, its implementation, continuing evaluation of the findings, and evaluation and possible modification of the study design. An underused mechanism to collect exposure data is the brief activity questionnaire. The NHIS has been used to assess knowledge and protection practices relevant to environmental hazards such as radon and to occupational chemicals, but not to assess the duration and frequency of exposures of the general population. Brief questionnaires would, of course, provide less-detailed information on human exposure patterns than personal-monitoring studies. Data to monitor the efficacy of various programs in decreasing body burdens of known toxicants (such as lead) are needed and, as new toxicants or data on new exposures (such as to mercury in paints) become

OCR for page 94
--> available, the distributions of body burdens need to be assessed to assist in the development of new regulations and public-health programs. These surveys will need flexible data-collection protocols because the toxicants to be assessed can easily change. Estimated body burdens of different chemicals should be periodically updated. An important limitation that hinders greater use of linked data systems is investigators' lack of knowledge about potentially useful data systems and multiple kinds of data. Several innovations in recent years—such as electronic bulletin boards, commercial on-line systems, and distributed networks (Makulowich, 1993)—open up important new communication possibilities. These kinds of activities, as well as traditional inventories, should be encouraged. Several problems limit the ability of investigators to link data from different systems. Often, the only linking variable available is geographic location. Information on location is often detailed for toxicant assessment but limited to broad areas, such as counties, for health assessments because of confidentiality concerns. Geographic information systems are overcoming the limitations on combining data with different types of geographic identifiers and from different sites. However, those systems are largely cross-sectional. The mobility of the population (and of pollutants) and the variable latent periods of health end points warrant longitudinal analyses. Existing systems and existing analytic procedures remain critical to the prevention or reduction of health problems from toxic exposures, but some simple, feasible, inexpensive changes would enhance the value of the data. The ability of investigators to link data from multiple systems is also limited by the lack of comprehensive information on exposures and on health end points other than death. Morbidity data are generally collected through surveys with various sampling designs. Few data exist on exposures and internal doses that might be considered representative, let alone comprehensive, of even a circumscribed population. Although each environmental-epidemiologic issue could be addressed by a specially designed data system, such an approach would be prohibitively resource intensive, and ecologic analysis will often be the only feasible way to make general inferences about exposure and health. Even in a confined, compromise data system, detailed exposure patterns or the morbidity of each person simply cannot be obtained. Public-health policy decisions will depend on information from more-limited data systems, such as those described in this chapter, buttressed by studies of smaller populations that determine the validity and relevance of the information derived from larger population analyses. The degree to which data systems represent or characterize a larger universe—i.e., a population or a well-demarcated region—should always

OCR for page 94
--> be made explicit. Many data systems cannot achieve comprehensive coverage, but there is a need to define sampling schemes better and to determine samples openly, rather than just to collect data from a sample of convenience. For some systems, a subset of environmental sites might be sampled to represent exposures of specific populations. Another improvement would be the development of procedures to improve the comparability of data from different systems, that is, the use of common data modules or at least common data elements, including definitions of disease or health status. The National Exposure Registry is now collecting health-status data in a manner similar to that of the NHIS, and the prevalence of health end points determined through interviews with the registry population is to be compared with NHIS national population estimates for specific health-status indicators (ATSDR, 1992). There is a need for many different types of cross-sectional, longitudinal, and followback studies to address environmental-health issues. Although some government agencies, such as the National Institutes of Health, regularly conduct varied studies to address etiologic issues, studies that address other needs in environmental health should be encouraged and conducted. Agencies involved with health promotion, such as ATSDR and the Centers for Disease Control and Prevention (specifically, the National Center for Environmental Health, the National Center for Injury Control and Prevention, and the National Institute for Occupational Safety and Health), need to conduct studies to address issues of health promotion and disease prevention. The federal government should establish a mechanism by which to track the health impairments of populations for which data on exposures and other baseline measurements are available. The National Exposure Registry and the NHANES I Epidemiologic Followup Study are examples of such mechanisms. In addition, the National Death Index is a useful resource for the public-health community. Those mechanisms should be expanded, and this will require evaluation of confidentiality and other ethical issues, as well as careful review of the uses of data. Improvements also could be made in the data systems that track inspections and compliance with regulations to enhance their utility for environmental-health assessment. For instance, the sampling period or geographic coverage around each site in the compliance-data systems could be extended. These systems typically collect environmental-concentration data until compliance is achieved. Collecting the data over an extended period would allow investigators to characterize the longer-term exposure patterns of sites known to contain pollutants. The choice of study sites where the nearby population distribution can be characterized would allow investigators to examine potential exposures with more assurance.

OCR for page 94
--> References Annest, J.L., J.L. Pirkle, D. Makuc, J.W. Neese, D.D. Bayse, and M.G. Kovar. 1983. Chronological trend in blood lead levels between 1976 and 1980. N. Engl. J. Med. 308:1373-1377. Anto, J.M., J. Sunyer, R. Rodriguez-Roisin, M. Suarez-Cervera, and L. Vazquez. 1989. Community outbreaks of asthma associated with inhalation of soybean dust. N. Engl. J. Med. 320:1097-1102. ATSDR (Agency for Toxic Substances and Disease Registry). 1988a. Policies and Procedures for Establishing a National Registry of Persons Exposed to Hazardous Substances (National Exposure Registry). Atlanta, GA: Agency for Toxic Substances and Disease Registry, Department of Health and Human Services, Public Health Service. ATSDR (Agency for Toxic Substances and Disease Registry). 1988b. The Nature and Extent of Lead Poisoning in Children in the United States: A Report to Congress. Atlanta, GA: Agency for Toxic Substances and Disease Registry, Public Health Service, Department of Health and Human Services. ATSDR (Agency for Toxic Substances and Disease Registry). 1992. The National Exposure Registry. Draft Document (5/19/92). [Atlanta, GA: Agency for Toxic Substances and Disease Registry, Department of Health and Human Services, Public Health Service.] Boyd, J.T. 1960. Climate, air pollution, and mortality. Br. J. Preventive Soc. Med. 14:123. Buck, S.F., and A.J. Wicken. 1967. Models for use in investigating the risk of mortality from lung cancer and bronchitis. Appl. Stat. 16:185. CDC (Centers for Disease Control). 1984. CDC Staff Manual on Confidentiality. Atlanta, GA: Department of Health and Human Services, Public Health Service. CDC (Centers for Disease Control). 1990. Guidelines for the determination of clusters. MMWR (no. RR-11). Commission for Racial Justice, United Church of Christ. 1987. Toxic-Wastes and Race in the United States. [New York]: Public Data Access, Inc. Deane, M., S.H. Swan, J.A. Harris, D.M. Epstein, and R. Neutra. 1989. Adverse pregnancy outcomes in relation to water contamination, Santa Clara County, California, 1980-1981. Am. J. Epidemiol. 129:894-904. DHHS (Department of Health and Human Services). 1991. Healthy People 2000: National Health Promotion and Disease Prevention Objectives. DHHS Pub. (PHS) 91-50212. Washington DC: US Government Printing Office. Edmonds, L.D., and L.M. James. 1990. Temporal trends in the prevalence of congenital malformations at birth based on the birth defects monitoring program, United States, 1979-1987. MMWR CDC Surveill. Summ. 39(4):19-23. EPA (US Environmental Protection Agency). 1985. Costs and Benefits of Reducing Lead in Gasoline. Final Regulatory Impact Analysis . EPA-230-05-85-006. Washington, DC: Office of Policy, Planning and Evaluation, US Environmental Protection Agency. EPA (U.S. Environmental Protection Agency). 1991. National Air Quality and Emissions Trends Report, 1989. Research Triangle Park, NC: Office of Air Quality Planning and Standards. EPA, NCHS, and ATSDR (U.S. Environmental Protection Agency, National Center for Health Statistics, and Agency for Toxic Substances and Disease Registry). 1992. Inventory of Exposure-Related Data Systems Sponsored by Federal Agencies. EPA/600/R-92/078. Prepared by Eastern Research Group, Inc., Arlington, VA, for US Environmental Protection Agency, National Center for Health Statistics, and Agency for Toxic Substances and Disease Registry. Frank, R.G., M.S. Kamlet, and S. Klepper. 1986. The impact of occupational exposure to toxic material on prevalence of chronic illness. Pp. 59-63 in Proceedings of the 1985

OCR for page 94
--> Public Health Conference on Records and Statistics. DHHS Pub. (PHS) 86-1214. Hyattsville, MD: US Government Printing Office. Fraser, P., C. Chilvers, and P. Goldblatt. 1982. Census-based mortality study of fertilizer manufactures. Br. J. Ind. Med. 39:323-329. Fraumeni, J.F., Jr. 1987. Keynote lecture: etiologic insights from cancer mapping. Int. Symp. Princess Takamatsu Cancer Res. Fund 18:13-25. Frisch, J.D., G.M. Shaw, and J.A. Harris. 1990. Epidemiologic research using existing databases of environmental measures. Arch. Environ. Health 45:303-307. Glasser, M., and L. Greenburg. 1971. Air pollution, mortality and weather. Arch. Environ. Health. 22:334-343. Goldman, L.R., D.F. Smith, R.R. Neutra, L.D. Saunders, E.M. Pond, J. Stratton, K. Waller, R.J. Jackson, and K.W. Kizer. 1990. Pesticide food poisoning from contaminated watermelons in California, 1985. Arch. Environ. Health 45:229-236. Health Officers Association of California. 1986. Directory of Automated Information Systems in Local Health Departments. Sacramento: Health Officers Association of California. Howe, G.R., and J.P. Lindsay. 1983. A follow-up study of a ten-% sample of the Canadian labor force. 1. Cancer mortality in males, 1965-73. J. Natl. Cancer Inst. 70:37-44. Langmuir, A.D. 1963. The surveillance of communicable diseases of national importance. N. Engl. J. Med. 286:182-192. Lave, L.B., and E.P. Seskin. 1973. Analysis of the association between U.S. mortality and air pollution. J. Am. Stat. A. 68:284-290. Lynch, C.F., R.D. Woolson, T. O'Gorman, and K.P. Cantor. 1989. Chlorinated drinking water and bladder cancer: effect of misclassification on risk estimates. Arch. Environ. Health 44:252-259. Mahaffey, K.R., J.L. Annest, J. Roberts, and R.S. Murphy. 1982. National estimates of blood lead levels: United States, 1976-1980: association with selected demographic and socioeconomic factors. N. Engl. J. Med. 307:573-579. Makulowich, J.S. 1993. The use of electronic communications in environmental health research. Environ. Health Perspect. 101:34-35. Mazumdar, S., S. Schimmel, and I.T. Higgins. 1982. Relation of daily mortality to air pollution: An analysis of 14 London winters, 1958/59-1971/72. Arch. Environ. Health 37:213-220. Miller, A.B., G.R. Howe, G.J. Sherman, J.P. Lindsay, M.J. Yaffe, P.J. Dinner, H.A. Risch, and D.L. Preston. 1989. Mortality from breast cancer after irradiation during fluoroscopic examinations in patients being treated for tuberculosis. N. Engl. J. Med. 321:1285-1289. Miller, A.B., C.J. Baines, T. To, and C. Wall. 1992. Canadian National Breast Screening Study. 1. Breast cancer detection and death rates among women 40 to 49 years. Can. Med. Assoc. J. 17:1459-1488. National Governors' Association. 1989. The Potential for Linking Environmental and Health Data. Washington, DC: National Governors' Association. National Task Force on Health Information. 1991. Implications of Privacy and Confidentiality Concerns on the Use of Health Information for Research and Statistics. Report of the Project Team to the National Task Force on Health Information. Ottawa: National Centre for Health Information, Statistics Canada. 43 pp. NRC (National Research Council). 1989. Biologic Markers in Reproductive Toxicology. Washington, DC: National Academy Press. NRC (National Research Council). 1991. Environmental Epidemiology. Vol. 1. Public Health and Hazardous Wastes. Washington DC: National Academy Press. Ostro, B.D., and S. Rothschild. 1989. Air pollution and acute respiratory morbidity: an observational study of multiple pollutants. Environ. Res. 50:238-247.

OCR for page 94
--> OTA (US Congress Office of Technology Assessment). 1986. Federal Government Information Technology: Electronic Record Systems and Individual Privacy. OTA-CIT-296. Washington, DC: US Government Printing Office. Ozkaynak H., J. Spengler, A. Garzd, et al. 1986. Assessment of population health risks resulting from exposure to airborne particles. In S. D. Lee, ed. Aerosols: Research, Risk Assessment, and Control Strategies. Chelsea, MI: Lewis Publishers. Pickle, L.W., T.J. Mason, N. Howard, R. Hoover, and J.F. Fraumeni Jr. 1987. Atlas of U.S. Cancer Mortality Among Whites: 1950-1980. DHHS Pub. (NIH) 87-2900. Washington DC: US Government Printing Office. Pope, C.A. 1991. Respiratory hospital admissions associated with PM10 pollution in Utah, Salt Lake, and Cache Valleys. Arch. Environ. Health 46:90-97. Roos, L.L., J.B. Nicol, C.F. Johnson, and N.P. Roos. 1979. Using administrative data banks for research and evaluation: a case study. Eval. Quart. 3:236-255. Rothenberg, R.B., K.K. Steinberg, and S.B.Thacker. 1990. The public health importance of clusters: a note from the Centers for Disease Control. Am. J. Epidemiol. 132(Suppl.1): S3-S5. Rothman, K.J. 1990. A sobering start for the Cluster Busters' Conference. Keynote Presentation. Am. J. Epidemiol. 132(Suppl.1):S6-S13. Rothwell, C.J., C.B. Hamilton, and P.E. Leaverton. 1991. Identification of sentinel health events as indicators of environmental contamination. Environ. Health Perspect. 94:261-263. Schwartz, J. 1989. Lung function and chronic exposure to air pollution: a cross-sectional analysis of NHANES II. 1989. Environ. Res. 50:309-321. Schwartz, J., and A. Marcus. 1990. Mortality and air pollution in London: a time series analysis. Am. J. Epidemiol. 131:185-194. Sexton, K, D. S. G. Selevan, K. Wagener, and J. A. Lybarger. 1992. Estimating human exposure to environmental pollutants: availability and utility of existing databases. Arch. Environ. Health 47:398-407. Sexton, K, D. K. Wagener, S. G. Selevan, T. O. Miller, and J. A. Lybarger. 1994. An inventory of human exposure-related databases. J. Exposure Anal. Env. Epidemiol. 4:95-109. Thacker, S.B., and R.L. Berkelman. 1988. Public health surveillance in the United States. Epidemiol. Rev. 10:164-190. Thacker, S.B., K. Choi, and P.S. Brachman. 1983. The surveillance of infectious diseases. J. Am. Med. Assoc. 249:1181-1185. Wagener, D.K. 1990. Using biomarkers to assess exposure. In J.S. Andrews, Jr., B.O. Askew, J.A. Bucsela, D.A. Hoffman, B.L. Johnson, and C. Xintaras, eds. Proceedings of the Fourth National Environmental Health Conference: Environmental Issues-Today's Challenge for the Future. Held June 20-23, 1989 in San Antonio, TX. Atlanta, GA: Department of Health and Human Services, Public Health Service, Centers for Disease Control. Wigle, D.T., R.M. Semenciw, K. Wilkins, D. Riedel, L. Ritter, H.I. Morrison, and Y. Mao. 1990. Mortality study of Canadian male farm operators: non-Hodgkin's lymphoma mortality and agricultural practices in Saskatchewan. J. Natl. Cancer Inst. 82:575-582. Wingren, G., B. Persson, K. Thoren, and O. Axelson. 1991. Mortality pattern among pulp and paper mill workers in Sweden: a case-referent study. Am. J. Ind. Med. 20:769-774.