Data providers consist of the individuals and organizations who are responsible, whether formally or informally, for making data accessible to others. Sometimes a data provider may be simply the producer of those data, because data producers generally are expected to make data available to verify research conclusions and allow for the continued progress of research. In other cases, data may be deposited in a repository, center, or archive that has the responsibility of disseminating the data. Journals also can be data providers, either through the articles they publish or through the provision of supplementary material that supports a published article.
Data users are the individuals and groups who access data in order to use those data in their own work, whether in research or in other endeavors. At one extreme, the users of data may belong entirely to the community of originating researchers (as in the case of elementary particle physics, which is described in this chapter). At the other extreme, a given body of data may be of wide interest to people outside a research field (as in the case of climate records, which is discussed in Chapter 3). Data producers are generally data users, but the collective body of data users extends beyond the research community to policy makers, educators, the media, the courts, and others. Data users can work in fields quite different from those of data producers, which means that they have an interest in being able to access data that are well annotated in order to use them accurately and appropriately.
As described below, each of these three groups has particular responsibilities in ensuring the integrity of research data.
In Chapter 1, we noted that measures of data integrity have both individual and collective dimensions. At an individual level, ensuring integrity means ensuring that the data are complete, verified, and undistorted. This is essential for science and engineering to progress, but it is not sufficient because progress in understanding the world requires that knowledge be shared. This process of submitting research data and results derived from those data to the scrutiny of others provides for a collective means of establishing and confirming data integrity. When others can examine the steps used to generate data and the conclusions drawn from those data, they can judge the validity of the data and results and accept (perhaps with reservations) or reject proffered contributions to science. Of course, the collective scrutiny of research results cannot guarantee that those results will be free of error or bias. For instance, it is noteworthy that important phenomena such as plate tectonics, chaotic motion in mechanical systems, or the functions of “junk” DNA were overlooked for decades because of theoretical perspectives that shaped the collection of data in those fields. Nevertheless, by bringing multiple perspectives to bear on a common body of