both of them. This possibility, compounded perhaps by the fact that the two scientists were in different fields and therefore were not in direct competition, led to a comfortable and extensive collaboration between the two.
Our informant geneticist has been the recipient of requests for findings that can be pooled with other data to increase the significance of findings and to test the reliability and accuracy of findings. Such requests are fairly common in certain areas of research such as epidemiology. I asked him what happens. His response was a short version of the whole long story: “Some people share and some don't.” Some of this willingness or aversion to sharing is chargeable to personal idiosyncrasy. After all, even back in nursery school, some people shared the Legos and some people did not. But the more interesting issues involve the social structural and economic considerations that shape the choices people make. Without a full understanding of such considerations, it is difficult to alter these choices by fiat. For example, the geneticist indicated that at times findings are released because of mandates by either journals or funders. He made the point, however, that even where disclosure was mandatory, findings were often released without important details whose lack rendered the data significantly less helpful for downstream users. Sometimes, data were incomplete. One of the crystallographers reported an occasion in which the coordinates of a structure were released for publication purposes omitting a water, without which the coordinates were not terribly helpful. Another told of the release of only the central chain. In some instances, results may be coded in such a way that the critical information cannot be accessed.
The informant geneticist reported his belief that some scientists intentionally modify their data so as to make them less useful to subsequent users, but other times the data are simply not in usable form because of the format in which they were originally collected. If the data producers must do any kind of real work in terms of modifying these findings to make them more usable to the downstream user, that person may well expect to be rewarded. For example, when this scientist requested that a data set to which he had been given access be updated, he was asked to include the names of the original data producers on subsequent scientific papers.
Sometimes, scientists are not averse in principle to releasing their data but believe that because of the nature of the data set, they must delay release. One crystallographer reported that colleagues had said, “you can't have it, it's a mess, so please, now, it's not good enough, so they didn't give it to [him] for six, eight months, but they gave [him] enough of the overall orientation, [so that he] could do work with it even at that . . . point.” He further stated that, “Never has anyone said, no, you can't have it, but if it isn't finished and if it's not there, you can't blame them.” Although this delay was presumably temporary, another crystallographer said that he often refrains from sharing the source code from his self-developed software because it takes too long to explain how he deals with each of the many glitches in it. In these cases again, the characteristics of the elements of the data stream contribute to shaping the timing and circumstances of access.
My point in laying out these details is that much of the significant sharing of data occurs not through publication but in these less formal contexts that I described. As one of my informants put it, “Being in touch replaces abstracts and publications. The most interesting stuff I hear is either presented at meetings or heard on an e-mail. Even the fastest publication is slow compared to that and if you have to wait until you see stuff in print, you're out of the loop.” One of the crystallographers referred to presenting an abstract as a “little trick” in the interests of “one-upmanship,” but it also reflects the way structural incentives in science can result in sharing. The way to influence the smartest scientists, one of the crystallographers said, is through “talks at national meetings that they happen to be at, discussions, interactions with high-profile people who they happen to run into at a meeting.” Too much of a focus on data sharing through formal publication and the incentives and disincentives that exist to publish at time A as opposed to time B will miss much of this critical sharing of data. 8
The publication process is, indeed, one important mechanism for the sharing of data and for entering scientific information into the public domain. What is equally if not more crucial for the progress of science, however, is the effect of social and economic pressures on the informal sharing of data by scientists and on the flow of data through the different scientific fields. As we consider the solutions to the problems of access and of the privatization of scientific data, we need to keep a close eye on these informal mechanisms of data sharing and on the ways in which they are shaped by the climate of science and society.
8Ironically, the publication process itself may lead to a certain amount of unintended informal data sharing as when a colleague reviewing an article for a major journal called one crystallographer to report on the progress of another group working on the same structure as he was.