Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
On Being a S c i e n t i s t The Treatment of Data In order to conduct research responsibly, graduate students need to understand how to treat data correctly. In 2002, the editors of the Journal of Cell Biology began to test the images in all accepted manu- scripts to see if they had been altered in ways that violated the jour- nalâs guidelines. About a quarter of the papers had images that showed evidence of inappropriate manipulation. The editors requested the original data for these papers, compared the original data with the submitted images, and required that figures be remade to accord with the guidelines. In about 1 percent of the papers, the editors found evidence for what they termed âfraudulent manipulationâ that affected conclusions drawn in the paper, resulting in the papersâ rejection. Researchers who manipulate their data in ways that deceive others, even if the manipulation seems insignificant at the time, are violating both the basic values and widely accepted professional standards of science. Researchers draw conclusions based on their observations of nature. If data are altered to present a case that is stronger than the data warrant, researchers fail to fulfill all three of the obligations described at the beginning of this guide. They mis- lead their colleagues and potentially impede progress in their field or research. They undermine their own authority and trustworthiness as researchers. And they introduce information into the scientific record that could cause harm to the broader society, as when the dangers of a medical treatment are understated. This is particularly important in an age in which the Internet al- lows for an almost uncontrollably fast and extensive spread of infor- mation to an increasingly broad audience. Misleading or inaccurate data can thus have far-reaching and unpredictable consequences of a magnitude not known before the Internet and other modern com- munication technologies. Misleading data can arise from poor experimental design or care- less measurements as well as from improper manipulation. Over time,
T h e T r e a t m e n t o f D a t a researchers have developed and have continually improved methods and tools designed to maintain the integrity of research. Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended. Because of the critical importance of methods, scientific papers must include a description of the procedures used to produce the data, sufficient to permit reviewers and readers of a scientific paper to evaluate not only the validity of the data but also the reliability of the methods used to derive those data. If this information is not available, other researchers may be less likely to accept the data and the conclusions drawn from them. They also may be unable to reproduce accurately the conditions under which the data were derived. The best methods will count for little if data are recorded incor- rectly or haphazardly. The requirements for data collection differ among disciplines and research groups, but researchers have a fun- damental obligation to create and maintain an accurate, accessible, and permanent record of what they have done in sufficient detail for others to check and replicate their work. Depending on the field, this obligation may require entering data into bound notebooks with sequentially numbered pages using permanent ink, using a computer application with secure data entry fields, identifying when and where work was done, and retaining data for specified lengths of time. In much industrial research and in some academic research, data note- books need to be signed and dated by a witness on a daily basis. Unfortunately, beginning researchers often receive little or no formal training in recording, analyzing, storing, or sharing data. Regularly scheduled meetings to discuss data issues and policies maintained by research groups and institutions can establish clear expectations and responsibilities.
10 On Being a S c i e n t i s t The Selection of Data Deborah, a third-year graduate student, and Kamala, a postdoc- toral fellow, have made a series of measurements on a new experimental semiconductor material using an expensive neutron test at a national laboratory. When they return to their own laboratory and examine the data, a newly proposed mathematical explanation of the semiconductorâs behavior predicts results indicated by a curve. During the measurements at the national laboratory, Deborah and Kamala observed electrical power fluctuations that they could not control or predict were affecting their detector. They suspect the fluctuations af- fected some of their measurements, but they donât know which ones. When Deborah and Kamala begin to write up their results to present at a lab meeting, which they know will be the first step in preparing a publication, Kamala suggests dropping two anomalous data points near the horizontal axis from the graph they are preparing. She says that due to their deviation from the theoretical curve, the low data points were obviously caused by the power fluctuations. Furthermore, the deviations were outside the expected error bars calculated for the remaining data points. Deborah is concerned that dropping the two points could be seen as manipulating the data. She and Kamala could not be sure that any of their data points, if any, were affected by the power fluctuations. They also did not know if the theoretical prediction was valid. She wants to do a separate analysis that includes the points and discuss the issue in the lab meeting. But Kamala says that if they include the data points in their talk, others will think the issue important enough to discuss in a draft paper, which will make it harder to get the paper published. Instead, she and Deborah should use their professional judgment to drop the points now. 1. What factors should Kamala and Deborah take into account in deciding how to present the data from their experiment? 2. Should the new explanation predicting the results affect their deliberations? 3. Should a draft paper be prepared at this point? 4. If Deborah and Kamala canât agree on how the data should be presented, should one of them consider not being an author of the paper?
T h e T r e a t m e n t o f D a t a 11 Most researchers are not required to share data with others as soon as the data are generated, although a few disciplines have ad- opted this standard to speed the pace of research. A period of confi- dentiality allows researchers to check the accuracy of their data and draw conclusions. However, when a scientific paper or book is published, other re- searchers must have access to the data and research materials needed to support the conclusions stated in the publication if they are to verify and build on that research. Many research institutions, funding agencies, and scientific journals have policies that require the sharing of data and unique research materials. Given the expectation that data will be accessible, researchers who refuse to share the evidentiary basis behind their conclusions, or the materials needed to replicate published experiments, fail to maintain the standards of science. In some cases, research data or materials may be too voluminous, unwieldy, or costly to share quickly and without expense. Neverthe- less, researchers have a responsibility to devise ways to share their data and materials in the best ways possible. For example, centralized facilities or collaborative efforts can provide a cost-effective way of providing research materials or information from large databases. Examples include repositories established to maintain and distribute astronomical images, protein sequences, archaeological data, cell lines, reagents, and transgenic animals. New issues in the treatment and sharing of data continue to arise as scientific disciplines evolve and new technologies appear. Some forms of data undergo extensive analysis before being recorded; con- sequently, sharing those data can require sharing the software and sometimes the hardware used to analyze them. Because digital tech- nologies are rapidly changing, some data stored electronically may be inaccessible in a few years unless provisions are made to transport the data from one platform to another. New forms of publication are challenging traditional practices associated with publication and the evaluation of scholarly work.