In order to conduct research responsibly, graduate students need to understand how to treat data correctly. In 2002, the editors of the Journal of Cell Biology began to test the images in all accepted manuscripts to see if they had been altered in ways that violated the journal’s guidelines. About a quarter of the papers had images that showed evidence of inappropriate manipulation. The editors requested the original data for these papers, compared the original data with the submitted images, and required that figures be remade to accord with the guidelines. In about 1 percent of the papers, the editors found evidence for what they termed “fraudulent manipulation” that affected conclusions drawn in the paper, resulting in the papers’ rejection.
Researchers who manipulate their data in ways that deceive others, even if the manipulation seems insignificant at the time, are violating both the basic values and widely accepted professional standards of science. Researchers draw conclusions based on their observations of nature. If data are altered to present a case that is stronger than the data warrant, researchers fail to fulfill all three of the obligations described at the beginning of this guide. They mislead their colleagues and potentially impede progress in their field or research. They undermine their own authority and trustworthiness as researchers. And they introduce information into the scientific record that could cause harm to the broader society, as when the dangers of a medical treatment are understated.
This is particularly important in an age in which the Internet allows for an almost uncontrollably fast and extensive spread of information to an increasingly broad audience. Misleading or inaccurate data can thus have far-reaching and unpredictable consequences of a magnitude not known before the Internet and other modern communication technologies.
Misleading data can arise from poor experimental design or careless measurements as well as from improper manipulation. Over time,
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 8
on Being a scientist
ThE TREATMENT Of DATA
In order to conduct research responsibly, graduate students need to
understand how to treat data correctly. In 2002, the editors of the
Journal of Cell Biology began to test the images in all accepted manu-
scripts to see if they had been altered in ways that violated the jour-
nal’s guidelines. About a quarter of the papers had images that showed
evidence of inappropriate manipulation. The editors requested the
original data for these papers, compared the original data with the
submitted images, and required that figures be remade to accord with
the guidelines. In about 1 percent of the papers, the editors found
evidence for what they termed “fraudulent manipulation” that affected
conclusions drawn in the paper, resulting in the papers’ rejection.
Researchers who manipulate their data in ways that deceive
others, even if the manipulation seems insignificant at the time, are
violating both the basic values and widely accepted professional
standards of science. Researchers draw conclusions based on their
observations of nature. If data are altered to present a case that is
stronger than the data warrant, researchers fail to fulfill all three of
the obligations described at the beginning of this guide. They mis-
lead their colleagues and potentially impede progress in their field or
research. They undermine their own authority and trustworthiness as
researchers. And they introduce information into the scientific record
that could cause harm to the broader society, as when the dangers of
a medical treatment are understated.
This is particularly important in an age in which the Internet al-
lows for an almost uncontrollably fast and extensive spread of infor-
mation to an increasingly broad audience. Misleading or inaccurate
data can thus have far-reaching and unpredictable consequences of
a magnitude not known before the Internet and other modern com-
munication technologies.
Misleading data can arise from poor experimental design or care-
less measurements as well as from improper manipulation. Over time,
OCR for page 8
t h e t r e at m e n t of d ata
researchers have developed and have continually improved methods
and tools designed to maintain the integrity of research. Some of
these methods and tools are used within specific fields of research,
such as statistical tests of significance, double-blind trials, and proper
phrasing of questions on surveys. Others apply across all research
fields, such as describing to others what one has done so that research
data and results can be verified and extended.
Because of the critical importance of methods, scientific papers
must include a description of the procedures used to produce the
data, sufficient to permit reviewers and readers of a scientific paper
to evaluate not only the validity of the data but also the reliability
of the methods used to derive those data. If this information is not
available, other researchers may be less likely to accept the data
and the conclusions drawn from them. They also may be unable
to reproduce accurately the conditions under which the data were
derived.
The best methods will count for little if data are recorded incor-
rectly or haphazardly. The requirements for data collection differ
among disciplines and research groups, but researchers have a fun-
damental obligation to create and maintain an accurate, accessible,
and permanent record of what they have done in sufficient detail for
others to check and replicate their work. Depending on the field,
this obligation may require entering data into bound notebooks with
sequentially numbered pages using permanent ink, using a computer
application with secure data entry fields, identifying when and where
work was done, and retaining data for specified lengths of time. In
much industrial research and in some academic research, data note-
books need to be signed and dated by a witness on a daily basis.
Unfortunately, beginning researchers often receive little or no
formal training in recording, analyzing, storing, or sharing data.
Regularly scheduled meetings to discuss data issues and policies
maintained by research groups and institutions can establish clear
expectations and responsibilities.
OCR for page 8
0 on Being a scientist
The Selection of Data
Deborah, a third-year graduate student, and Kamala, a postdoc-
toral fellow, have made a series of measurements on a new experimental
semiconductor material using an expensive neutron test at a national
laboratory. When they return to their own laboratory and examine the
data, a newly proposed mathematical explanation of the semiconductor’s
behavior predicts results indicated by a curve.
During the measurements at the national laboratory, Deborah and
Kamala observed electrical power fluctuations that they could not control
or predict were affecting their detector. They suspect the fluctuations af-
fected some of their measurements, but they don’t know which ones.
When Deborah and Kamala begin to write up their results to present
at a lab meeting, which they know will be the first step in preparing a
publication, Kamala suggests dropping two anomalous data points near
the horizontal axis from the graph they are preparing. She says that due
to their deviation from the theoretical curve, the low data points were
obviously caused by the power fluctuations. furthermore, the deviations
were outside the expected error bars calculated for the remaining data
points.
Deborah is concerned that dropping the two points could be seen
as manipulating the data. She and Kamala could not be sure that any of
their data points, if any, were affected by the power fluctuations. They
also did not know if the theoretical prediction was valid. She wants to do
a separate analysis that includes the points and discuss the issue in the lab
meeting. But Kamala says that if they include the data points in their talk,
others will think the issue important enough to discuss in a draft paper,
which will make it harder to get the paper published. Instead, she and
Deborah should use their professional judgment to drop the points now.
1. What factors should Kamala and Deborah take into account in
deciding how to present the data from their experiment?
2. Should the new explanation predicting the results affect their
deliberations?
3. Should a draft paper be prepared at this point?
4. If Deborah and Kamala can’t agree on how the data should
be presented, should one of them consider not being an author of the
paper?
OCR for page 8
t h e t r e at m e n t of d ata
Most researchers are not required to share data with others as
soon as the data are generated, although a few disciplines have ad-
opted this standard to speed the pace of research. A period of confi-
dentiality allows researchers to check the accuracy of their data and
draw conclusions.
However, when a scientific paper or book is published, other re-
searchers must have access to the data and research materials needed
to support the conclusions stated in the publication if they are to
verify and build on that research. Many research institutions, funding
agencies, and scientific journals have policies that require the sharing
of data and unique research materials. Given the expectation that data
will be accessible, researchers who refuse to share the evidentiary
basis behind their conclusions, or the materials needed to replicate
published experiments, fail to maintain the standards of science.
In some cases, research data or materials may be too voluminous,
unwieldy, or costly to share quickly and without expense. Neverthe-
less, researchers have a responsibility to devise ways to share their
data and materials in the best ways possible. For example, centralized
facilities or collaborative efforts can provide a cost-effective way of
providing research materials or information from large databases.
Examples include repositories established to maintain and distribute
astronomical images, protein sequences, archaeological data, cell
lines, reagents, and transgenic animals.
New issues in the treatment and sharing of data continue to arise
as scientific disciplines evolve and new technologies appear. Some
forms of data undergo extensive analysis before being recorded; con-
sequently, sharing those data can require sharing the software and
sometimes the hardware used to analyze them. Because digital tech-
nologies are rapidly changing, some data stored electronically may
be inaccessible in a few years unless provisions are made to transport
the data from one platform to another. New forms of publication are
challenging traditional practices associated with publication and the
evaluation of scholarly work.