to misrepresentation. Digital technologies can introduce technical sources of error into data analysis, communication, or storage systems. At the frontiers of human knowledge, the data that bear on a problem can be very difficult to separate from irrelevant information.10 Research methods may not be firmly established, and even the questions being asked may not be fully defined.

Furthermore, researchers may have incentives to structure research or gather data in ways that favor a particular outcome, as in the case of drug studies funded by companies that stand to profit from particular results.11 In addition, researchers can have philosophical, political, or religious convictions that can influence their work, including the ways they collect and interpret data.12 Because of the many ways in which data can depart from empirical realities, everyone involved in the collection, analysis, dissemination, and preservation of data has a responsibility to safeguard the integrity of data.


The example from the Journal of Cell Biology illustrates the different roles that individuals and groups can play in ensuring the integrity of data. For the purposes of this report, we have divided these individuals and groups into three categories—data producers, data providers, and data users—though it should be kept it mind that many individuals and organizations fall into more than one of these categories.

Data producers are the scientists, engineers, students, and others who generate data, whether through observations, experiments, simulations, or the gathering of information from other sources. Often the creation of data is an explicit objective of research, but data can be generated in many ways. For example, administrative records, archaeological artifacts, cell phone logs, or many other forms of information can be adapted to serve as inputs to research. Data also are produced by government agencies in the course of performing tasks for other purposes (such as remote sensing for weather forecasts or conducting the decadal censuses), and these data can be used extensively for research. This report focuses on data produced through activities that are related primarily to research, but the general principles laid out in this report apply to all data used in research.


E. Brian Davis. 2003. Science in the Looking Glass: What Do Scientists Really Know? New York: Oxford University Press.


Sheldon Krimsky. 2006. “Publication bias, data ownership, and the funding effect in science: Threats to the integrity of biomedical research.” Pp. 61–85 in Rescuing Science from Politics: Regulation and the Distortion of Scientific Research, eds. Wendy Wagner and Rena Steinzor. New York: Cambridge University Press.


National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. On Being a Scientist: Responsible Conduct in Research, 3rd ed. Washington, DC: The National Academies Press.

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement