The Scientific Process and the Universe of Data
In order to understand the complexities underlying the issue of data access, it is necessary to first examine some of the foundations and processes of science. It is important to note that few people outside the business or practice of science are likely to be familiar with these foundations or processes. Accordingly, the first panel discussion of the workshop sought to clarify some fundamental scientific issues: What is the universe of scientific data? What is scientific publication? How are scientific claims validated? The panel, moderated by David Korn, included Steven Goodman and Douglas W. Dockery.
At the outset, Dr. Goodman noted that a fundamental feature of scientific claims is that they are not “binary”; that is, they are rarely proven or disproven for all time. Instead, all scientific claims fall in the category of being uncertain to various degrees. He likened these claims to “shades of gray, with truth and falsity just beyond the bounds of what we can absolutely know.” Furthermore, no particular statistical formula exists that can put a number on how certain one can be on the basis of any scientific evidence. Therefore, although specific scientific research may support the hypothesis that a particular claim is true or false, it is important to realize that the researcher’s original hypothesis is subject to revision over time.
The scientific process itself, he said, is made up of several phases, including development, execution, inference, communication, and response/revision. Fundamental to this process is an indirect and circuitous feedback loop between the final response and revision of results
and the development of new questions and new studies. Scientists assemble and weigh evidence in its totality. This weighing requires an understanding of how to assess the strength of experimental design and execution, the strength of statistical methods and results, and the importance of multiple sources of related evidence.
What are data? One way to imagine the scientific method, said Dr. Goodman, is to visualize a single scientist toiling with a few students at the laboratory bench. Once a pertinent question has been studied, the activities, observations, calculations, and conclusions of such a scientist would be assembled, distilled, and published. Other researchers interested in the topic then would use the publication and perhaps ask for some of the scientist’s original materials to further their own studies of the topic. Bench scientists understand that if they do not report accurately and honestly their methods, results, and conclusions, their reputation within the scientific community could be jeopardized. This reality has always been a powerful force for integrity.
That information, as data, may then move through many levels during preparation of a study report: raw data, abstracted data, coded data, computerized data, cleaned or edited data, analyzable data, and, finally, analyzed data. It is important to realize that as data flows from one level to the next, researchers often have to evaluate or “clean” the particular items of data. For example, cleaned data often must be purged of “outlier” data that are interpreted as unlikely to be accurate or likely to distort the results. Only when the data are cleaned or edited are they brought into the analysis module of a statistical program and organized. This module typically is thus a very small subset of the total data, and it is organized to focus on one particular question. In complicated data sets, many questions are typically asked of the data over some years by multiple investigators. It is these analyzed data that appear in published form, highly compressed and processed, and often presented in graphs or tables. Although many choices are made in deciding which data enter the final analysis, these decisions are determined by trained scientists who have spent years understanding the limits of their methods and deciphering the particular research question.
What is peer review? One way to answer this question is by discerning what peer review is not. Peer review does not detect fraud, validate factual findings, dictate publication decisions, or substitute for the judgments of the scientific community as a whole. What it does do is provide a mechanism of independent outside advice to a journal editor about the importance of a paper’s findings, its strengths and weaknesses, and any modifications necessary to make the author’s claims match the strength of the reported evidence. Since it is those claims that are often what the
media and the lay public pay the most attention to, assessing their strengths and weaknesses is one of the main purposes of peer review.
What is publication? The Shelby Amendment calls for the release of data whose results have been published. Publication, said Dr. Goodman, is a highly compressed summary of the main study findings. The primary purpose of a scientific publication is to communicate to other scientists. It should not be considered the establishment of scientific “truth” or the final resolution of a question, but rather an addition to or clarification of an ongoing conversation about the state of knowledge. Publication should be considered part of a continuing process of discussion of a particular topic by the scientific community of evidence and claims.5
How reliable is a scientific finding? Since one cannot expect a simple true-or-false answer from most scientific studies, noted the speaker, a more useful question is: Was the study reliable enough to support an action as important as a policy decision or regulatory action? There are several ways to evaluate the soundness of a scientific study. First, one examines the strength of the design, the methods, and the statistical results. Next, one asks whether there is consistency within the data (pertaining to mechanisms of effect or related outcomes) and with other studies and scientific theories. Then, the robustness of the findings is evaluated through the use of different analytical approaches. Ultimately, the reliability of findings rests on trust and in believing that the investigators did what they said they did. This trust forms the bedrock of the scientific conversation, and its violation can damage or end a scientific career.
The ability to replicate a study is typically the gold standard by which the reliability of scientific claims are judged. In some types of experimental studies, it is possible to manipulate or exactly replicate the original study. For large epidemiological studies, however, repeating a study is seldom either possible or desirable. Preferable are studies examining the same hypothesis in other places, in other populations, and/or in other ways. In general, then, “replication” might involve any of the following:
Additional analyses done on the data set by the original or collaborating investigators;
New results generated from older data sets;
New studies addressing the same hypothesis;
Independent analysis of the same data set by different people;
Monitoring of the results of actions taken on the basis of the findings.
An additional layer of replication is meta-analysis, which is a systematic strategy for comprehensively describing and summarizing a body of research evidence from two or more studies. The goal is to produce a quantitative synthesis of the evidence presented in multiple studies that relate to a research question. In a typical meta-analysis, all the data used have been published in the public domain and are easy to inspect and analyze.
Although scientific studies and their replication generate results that fall into a range of shades of gray, there exist long established and proven mechanisms within the scientific community, such as peer review and publication, for judging scientific merit. It is due to the uncertain nature of science itself that scientific claims will always be subject to questioning, challenge, and refinement as additional questions are asked and newer data are generated.
THE SIX CITIES STUDY
History of the Study. Douglas Doherty, an epidemiologist associated with the Harvard Six Cities Study, provided an overview of the research project. In 1973, the OMB requested that the National Institute of Environmental Health Sciences (NIEHS), one of the institutes of the National Institutes of Health (NIH), review the health effects of sulfur oxides and propose a longitudinal study of the health effects of fossil fuel air pollution. Two researchers at the Harvard University School of Public Health, Ben Ferris and Frank Speizer, submitted a proposal in 1974 for a study of children and adults in six cities in the midwestern and eastern United States to evaluate the effects of the anticipated degradation in air quality. The study was approved and received funding for 20 years; the Electric Power Research Institute provided additional funding after the study began. Also, the EPA provided technical assistance, information from air monitors, equipment, and limited input into the conduct of the study.
The project studied areas around the major coal-fired power plants in the Midwest and looked at the health effects of the air pollution as it was transported northeastward by prevailing winds. To achieve a range of exposures, the researchers chose the cities of Watertown, Mass.; Steubenville, Ohio; Kingston, Tenn.; St. Louis, Mo.; Topeka, Kan.; and Portage, Ind. These communities ranged from rural areas upwind of the
power plants to downwind communities with more polluted air. The core studies measured air pollution in those communities and the health of a sample of the population. The Six Cities Study was not a single study but a constellation of studies tied together in their focus on these six communities. One study, for example, was a random sample of children enrolled as first graders and followed through high school graduation. Ultimately, different groups of investigators produced at least 15 identifiable data sets; new findings then fed back into new studies. During the course of the 20-year period, more than 100 publications emerged as a result of these studies.6
One of the studies monitored the effects of air pollution on mortality rates, using a random selection of adults aged 25 to 74 in each community, selected from census lists and city directories. Study participants took a pulmonary function test every 3 years and filled out a questionnaire, answering the same set of questions each time. The study kept track of which participants died in the interval.
Two of the goals of this study were to determine how long study participants lived and to identify predictors of survival. The study used the Social Security System and the National Death Index to determine information. The Index allowed the researchers to search through all the death certificates in the U.S. for matches with people on the Six Cities list. Although the matches were not perfect, and each name had to be validated by Social Security number, age, gender, and other factors, it saved the investigators from having to visit all the participants each year.
To validate that the people who died were in fact the same individuals as those in the studies, and to determine the cause of death, the investigators had to obtain the original death certificates and compare the information on the certificates with the original records. The study took into account several known predictors of survival: age, sex, personal habits (such as smoking), socioeconomic status, and occupation (including exposure to chemicals).
After adjusting for all known variables, the study found that people who lived in areas with higher air pollution, as determined by fine-particle concentration, had a shorter life expectancy than people living in the cleaner cities. According to Doherty, the investigators were surprised by
the magnitude of this effect because the concentrations of air pollutants seemed low.7
Impact of the study. Since the EPA first set standards for particulate matter in 1970, several hundred epidemiological studies of the effects of air pollution have been published. During the year when the Six Cities Study was released, it was one of several (12-15) published studies that examined the effects of particulate air pollution. While none of these studies was by itself able to capture the whole “truth” about air pollution, their cumulative power persuaded the American Lung Association (ALA) to sue the EPA to tighten the 1970 particulate standards. In 1997 a federal court found in favor of the ALA and mandated that the EPA set a new standard.
Rule making as a result of the study. The EPA evaluated the results of the mortality study, approved the basic performance of the methodology and the standards of publication, and used the study’s findings in setting its ambient air quality standards. Due to the strong interest in the research, the agency encouraged the researchers to allow interested scientists and agencies the opportunity to understand fully the basis of their work. The investigators understood this to mean discussions with other scientists regarding the approach, methodology, and results of the study. The EPA did not call for a release of all the underlying data. Indeed, as mentioned above, the data were encumbered by several types of confidentiality constraints. The investigators had assured the participants in the studies that their identity and their relationship to any information obtained would be kept confidential. Specific agreements were signed by each participant, the study director, and a witness. Assurances of confidentiality are typical in studies of this kind.8
In an experiment to discover whether confidentiality could be preserved while opening the data for public review, the study investigators attempted to disguise the identity of the study participants. They deleted as many features as possible from the questionnaires, such as the name, the state file number, the mother’s maiden name, and the name of the person providing the information. However, they needed to retain a minimum set of features if other scientists were to be able to replicate the basic findings of the study. They needed the place of death, because they were investigating the possibility of a link to air pollution exposure; they needed the date of death because the study concerned survival; and they needed both the age at death and gender in order to adjust for both factors. They found that even this minimum set of features could allow for identification of research participants. For example, Dr. Dockery performed a simple test of confidentiality by using the minimal information that a male of a certain age had died on a certain day in one of his six cities. On the Internet he was easily able to find newspaper obituaries of those who had died in that city on that day. From that relatively short list of descriptions he quickly identified one of his subjects simply by the facts of age and gender, added to the place and date of death.
In addition to the confidentiality agreements with the individual study participants, the investigators had had to sign certain statements in order to obtain death certificates from the states. They had had to agree that they would (1) limit access to the records to only the members of the research staff, (2) destroy records upon the completion of the study, and (3) not release the records to other agencies, publish data so individuals could be identified, or contact family members of decedents. Furthermore, the investigators had had to sign non-disclosure agreements to obtain information from the National Death Index. Hence, although the investigators were willing to share data sets, which is a common scientific practice, they believed that to open their thousands of boxes of original records would be unethical.9 Nonetheless, the study was criticized on the general grounds that the public had paid for the study (through NIH funding to Harvard) and, therefore, the public had a right to all the data.
The Health Effects Institute Analysis. In an effort to accommodate public demand for access to the data so the study findings could be contested or confirmed, the investigators ultimately agreed to make an arrangement with an independent entity, the Health Effects Institute (HEI), to re-analyze the data. Funded jointly by the EPA and the automobile manufac
turers, the HEI describes itself as an “unbiased source of information on the health effects of motor vehicle emissions.”10 The HEI was given the task of reanalyzing and validating the original data.
The HEI organized an open, international competition to assemble a “reanalysis team” and established a separate review committee to examine the results of the reanalysis. The review involved a sensitivity analysis using methods not available during the original analysis and an examination of new data. The HEI found the original data to be of high quality, and essentially confirmed the validity of the original findings and conclusions.
The following description of the HEI, located in Boston, Mass., is offered on its web site: “The Health Effects Institute (HEI) is an independent, nonprofit corporation chartered in 1980 to provide high-quality, impartial, and relevant science on the health effects of pollutants from motor vehicles and from other sources in the environment. Supported jointly by the U.S. Environmental Protection Agency (EPA) and industry, HEI has funded over 170 studies and published over 100 Research Reports and several Special Reports producing important research findings on the health effects of a variety of pollutants, including carbon monoxide, methanol and aldehydes, nitrogen oxides, diesel exhaust, ozone, and most recently, particulate air pollution.” See http://www.healtheffects.org/pubsspecial.htm.