Potential Effects of a Diminishing Public Domain in Biomedical Research Data
In Session 2, my colleague Sherry Brandt-Rauf presented some of the findings of our studies of data access practices among biomedical researchers.1 I envy her ability to talk about data that we actually collected, as opposed to my charge, which is to talk speculatively about what might happen if the public domain were to diminish in the biomedical area. I can only present guesses and conjectures about what might in fact occur.
The central question that I want to examine is what might happen to research systems if the public domain diminishes. I will confine my attention to “small science” biomedical research, which does not include areas like large clinical trials, which arguably are a form of big science, given their many collaborators and complex organizational structures (e.g., GUSTO III Investigators, 1997). I will focus on areas such as molecular biology, crystallography, structural biology, and cell biology. In these fields, academic research is usually conducted on a benchtop scale by small research groups, and there are many independent laboratories working and often directly competing. In these particular scientific cultures, it is understood that the scientist who runs the laboratory—the “lab head” as he or she is called—is the only person who can speak for the lab. Other people, such as postdocs, can only speak with special authorization from that person (Knorr Cetina, 1999). So this kind of science features a culture of autonomous, independent, highly entrepreneurial operators who work to build a research enterprise, produce findings, get grants, keep a lab going, and so forth. I want to speculate about what kinds of changes you might expect in this cultural setting in science, based on what we know from ethnographic studies about the social practices that regulate access to data in these areas of science.
Although Sherry Brandt-Rauf described some of our work on how scientists control access to data and resources, there are several points I want to underline about data access practices. First, it is very important to recognize that these practices are specific to particular research communities. Many scientists tend to talk about all science as if it were uniform in the ways it handles data access, without recognizing the diversity of scientific cultures. Thus, it is common to observe that “all scientists want to publish,” as if this were a universal truth about scientists. Indeed, there is no doubt that this statement is true at a very general level in all areas of academic science, as well as in some industrial contexts (Hicks, 1995). But the details of how publication is managed—what constitutes “enough” for a paper, which data are “ready” to be published when, who decides, and how strategic concerns about competition are addressed—vary tremendously across different scientific fields. Scientists do not
See Chapter 9 of these Proceedings, “The Role, Value, and Limits of S&T Data and Information in the Public Domain on Biomedical Research,” by Sherry Brandt-Rauf.
simply publish everything that they produce; they engage in strategic maneuvers about who is going to get access to what data and materials under what terms and conditions. Publication is only one move (albeit an extremely important one) in an extremely complex process. The practices used to regulate access to data are quite different in molecular biology, as opposed to high-energy physics, as opposed to a large clinical trial. There are different expectations and rules about control over the flow of data in those settings. As a result, any analysis of how changes in the public domain might affect science must focus on particular research communities, not science as a whole. When I refer to scientists, I am referring to researchers working in molecular biology and other benchtop biomedical fields that exhibit similar cultures.
DATA ACCESS PRACTICES
Brandt-Rauf and I set out to create a theoretical framework and analytic method for comparing data access practices across diverse scientific fields. We concluded that such a framework must treat the category “data” as problematic; that is, one cannot focus on what the scientists themselves in a particular area regard as “data,” as if their notion of data were unambiguous and universal to all fields, but instead to consider the full range of forms of data and heterogeneous resources that researchers produce and use (Hilgartner and Brandt-Rauf, 1994).
In molecular biology, these data and resources include all sorts of written inscriptions (such as sequence data) and biomaterials. They also include instruments, software, techniques, and a variety of “intermediate results.” In the laboratory, these entities are woven together into complicated assemblages. An isolated, single biological material sitting alone in a test tube is a useless thing; to be scientifically meaningful, it must be linked using labels and other inscriptions to the source of the sample and its particular characteristics. Moreover, to use the material, one needs a laboratory equipped with an appropriate configuration of people, techniques, instruments, and so forth. As scientific work proceeds, materials and inscriptions are processed and reprocessed, so these assemblages continuously evolve, producing new data and materials (Latour and Woolgar, 1979). Many of the items found in a laboratory can be found in any laboratory, but some of the items—especially those toward the “leading edges” of these evolving assemblages—are available only in a few places, or perhaps only in one place. These scarce and unique items can convey a significant competitive edge. For example, the laboratory that first develops a useful new technique, the researcher who collects a particularly interesting set of DNA samples, and the creator of a powerful new algorithm all end up controlling strategically important resources. They can enter into negotiations about collaborations and other exchanges from a strong position, owing to the value and scarcity of the resource.
In small-scale biomedical research, with its many independent operators, a dynamic, invisible economy exists below the radar screen of what looking at the published literature reveals. There is a huge range of transactions going on all the time. Scientists have to decide whether to publish a result immediately or delay publication until an even better result is achieved. In many areas, such as gene hunting, several research groups may be racing to reach the same goal, and an early publication from one group may help competing groups to catch up (Hilgartner, 1997). Given such strategic considerations, scientists have to decide whether to publish right away, or to delay publication, or to provide information on a limited basis to specially targeted audiences. Often, they work to negotiate agreements with the heads of other academic laboratories, or perhaps with commercial organizations. Many of these exchanges entail at least temporary restrictions on publication. As scientists work to build collaborations, they seek to avoid arrangements that will cause them to become merely the provider of a “service,” as molecular biologists put it, to another lab without benefiting themselves (Knorr Cetina, 1999). Sometimes researchers provide these services expecting a quid pro quo later. Sometimes they provide them out of the goodness of their hearts. Sometimes they provide them because funding agencies or other policy makers encourage them to do so (Hilgartner, 1998). But complex negotiations, replete with strategic gamesmanship and uncertainty, are routine in small-scale biomedical research.
Having briefly characterized the strategic role of data and the wide variety of transactions that surround data and resources, it is finally time to turn to my main question: What would happen to this area of science if the public
domain were to diminish? To address this question, it is first necessary to consider how the public domain fits into this research area. The public domain is a complex concept, and it is important to recognize that scientists may not think about this concept in precisely the same way that lawyers and legal scholars do. Legal categories permeate scientists’ consciousness, but not in the systematic, formalized ways that one might find in a law review article. For this reason, when scientists talk about “the public domain,” the concept that they have in mind may not neatly map onto a formal, legal definition. When a scientist asks whether a resource is in the public domain—or, more colloquially, “is that public?”—what they mean is something like “Can I get it? Can I use it? What do I have to do to get it and what encumbrances will restrict my use of it?” In other words, the central issues are usually availability and the terms of access.
Legal ownership is only one of many things that constitute availability and shape the terms of access. So scientists deal less with the public domain—if we construe that as a legal category produced in court decisions, statutes, briefs, and formal legal negotiations—than with resources that are more, and sometimes less, “available.” Shifting from a legal concept of the public domain to this more pragmatic concept centered on access directs our attention not only to formal ownership, but also to the practical difficulties of obtaining data and resources. When molecular biologists refer to some data and resources as “public” they typically mean that they are readily available to any scientist. I will refer to data that are public in this sense as “public resources.” Important public resources are found in many domains: from scientific literatures, to Internet-accessible databases, to biomaterials repositories, to stock centers that house strains of organisms (Fujimura, 1996; Kohler, 1994).
Scientists also may regard instruments as public resources if they are available at reasonable prices on open markets. In contrast, some instruments are not public resources. For example, access to beam time on a synchrotron may be allocated by peer review (White-DePace et al., 1992). Similarly, an instrument that is not yet commercially available might be offered to selected scientists for beta testing through a special arrangement that provides early access: “We have this new instrument,” says the firm, “and you want to try it out. Well, you get to use it first, but let us know how you like it, and if you like it, tell your friends and cite our product in your published work.” The point is that so long as the instrument is for sale (at a reasonable price), scientists often describe it as “public,” even though the instrument in fact is probably someone’s intellectual property.
As the above discussion suggests, scientists do not deal with an abstract public domain; they interact with diverse public domains, including open literatures, open databases, open materials repositories, and open markets. The plural term—public domains—is important here, both to emphasize their diversity and to underline how these public domains are not coterminous with abstract definitions of the public domain. These public domains are what we should consider when thinking about the effects on scientific research of increasing privatization and restriction of domains that were once public.
EFFECTS OF DIMINISHING PUBLIC DOMAINS
Before launching into a speculative discussion of the possible effects of diminishing public domains, one must ask a crucial question: Can public domains really diminish? One might be forgiven for suspecting that they cannot. After all, the scientific literature continues to expand rapidly, and biomedical science is experiencing an unprecedented deluge of biomolecular data (Lenoir, 1999). For example, the volume of DNA sequence information available in public databases has been increasing exponentially, and it probably will continue to grow rapidly for awhile. (Of course, at the same time, we know that the amount of sequence data available in private databases is also growing, although it is harder to estimate how rapidly, because such information is private.)
Even given an expanding literature and an explosion of data, public domains clearly can diminish in at least six ways. First, absolute reductions in public domains occur when particular items are removed from them for various reasons. Second, items that were previously conceived of as things to be shared openly among scientists can be redefined as proprietary (Mackenzie et al., 1990). Third, there can be delays of the release of information into public domains. We know that such delays occur with some regularity in such high-impact fields as genetics, and this constitutes at least a short-term limitation of the scope of such public domains as the scientific literature (Hilgartner, 1997; Campbell et al., 2002). Fourth, items in public domains can have new encumbrances attached to them. These encumbrances might include publication restrictions, reach-through licenses and similar mechanisms,
or just the transaction costs of negotiating access. Fifth, some items available for purchase can be acquired only at high prices. Finally, the relative size of public domains versus private domains can diminish, even when both public and private domains are growing. Such reductions in the relative size of public domains arguably constitute a form of diminishment. In short, there are a variety of ways that public domains can remain “public” in the sense that I have described, but at the same time diminish.
There are three different orders of effects that you might expect if public domains diminish. The first order includes direct effects on the transactions that drive this fast-moving world of exchanges among biomedical scientists. Second-order effects involve changes in research communities and cultures and how they manage data access, such as a shift toward more restrictive practices. Third-order effects include the ways that diminishing public domains might alter the position of science in the wider polity.
Let us turn initially to direct effects on transactions. As a starting point for considering these effects, imagine a small academic laboratory that engages in several kinds of transactions. It obtains some inputs for its research from public domains; it releases some of its outputs to public domains, such as the literature; and it gets some inputs from (and deploys some outputs in) restricted-access transactions. “I give you this, you give me that, maybe this deal is more to my advantage than to yours, but there will be another exchange later and that one will work out the other way.” Importantly, whenever this laboratory acquires an entity from a public domain, it immediately begins to process it, manipulate it, and combine it with other data and materials. Through this processing the laboratory reprivatizes the entity—or, more precisely, produces new entities that end up under its exclusive control. Put otherwise, laboratories not only release material into public domains, but they also continually incorporate entities from public domains into their own private domains. Viewed in this light, laboratories emerge not only as mechanisms for creating new knowledge, but also as devices for redrawing the contours of the public-private boundary.
What effects might diminishing public domains have on such a laboratory’s transactions? If public domains diminish, then there will be less material in them, at least in relative terms. Thus, we might expect that perhaps our imaginary laboratory would acquire fewer inputs from public domains. If fewer inputs are obtained from public domains, then the laboratory must get them from some other source (or do without). Most likely, it will acquire a higher percentage of its inputs from restricted-access transactions, and this will lead the laboratory into negotiations with people who hold resources privately.
Of course, these restricted-access transactions will most likely entail quid pro quos, such as confidentiality agreements or rights to prepublication review. As a result, one might expect the laboratory to release fewer of its outputs into the public domain, or at least to do so later. More generally, if many laboratories became increasingly entangled in proprietary agreements, this might drag down the quality of public domains on a wider scale, for example, by limiting their content or causing delays in the introduction of new information. Indeed, one can imagine a synergistic process that would increasingly lead laboratories to rely on restricted-access transactions, producing a progressive impoverishment of the public domain that would, in turn, encourage further reliance on restricted-access transactions. If things were to go very badly, such effects could reduce the vitality and creativity of biomedical science for some of the same reasons that the lack of a strong public domain restricts the creative use of European weather data.2
I want to move on to possible second-order effects. What kinds of effects might diminishing public domains have on research communities and research cultures? Research communities play a key role in constituting public domains in science. If you think of public domains not as an abstract legal category, but instead as material entities produced actively through social action, then research communities are central players in building them. It takes a tremendous amount of work to make science and scientific information “public” (Callon, 1994). Research communities accomplish this in part by building institutional arrangements that lead individual laboratory scientists to put things into public domains. The published literature itself is the solidified sediment of a huge set of institutional arrangements that give academic scientists incentives to publish information. There were not always scientific journals; their history goes back to the Enlightenment, and over time, they have grown into a central scientific institution. Today, a complex set of institutional arrangements—from the tenure system to the research funding
See Chapter 18 of these Proceedings, “Potential Effects of a Diminishing Public Domain in Environmental Information,” by Peter Weiss.
system to a socialization process that makes publication important to a scientist’s identity—encourages researchers to publish.
Building such institutions entails creating informal expectations and formal rules, and these expectations and rules are historical achievements, not timeless, stable features of science. Social and technological change creates openings for institutional innovations that can influence the contours of public domains. The emergence of DNA sequence databases, a new kind of public domain in science, provides a good example. DNA sequencing began in a small way in the 1970s, and a visionary group of scientists conceived of the Los Alamos Sequence Library (LASL) at the end of the decade. LASL gathered previously published sequences, which at that point were published in print in scientific journals, and prepared them in machine-readable form to permit mathematical analysis. In this way, these scientists created a new kind of public domain—the sequence database—for biology (Cinkosky et al., 1991; Hilgartner, 1995).
LASL later evolved into GenBank. Early in the history of GenBank, sequence data began accumulating so fast that journals became reluctant to publish it. GenBank decided to ask scientists to submit sequences directly to the database, but the incentive structures did not encourage them to do so. Sending in sequence data took time and required effort, and there was little payoff in terms of scientific credit for submitting sequences. Only later did GenBank, with help from the relevant scientific journals, negotiate a new deal to compel scientists to submit sequence data to the public databases (Hilgartner, 1995, p. 253). A policy-making journal publication contingent on database submission was first implemented by Nucleic Acids Research in 1988, and many other journals followed suit (Nucleic Acids Research, 1987; McCain, 1995). This example illustrates how new institutional arrangements, combined with technological developments such as DNA sequencing and the Internet, can be deployed to constitute important new public domains in science.
However, the ability to seize such opportunities depends on the existence of a scientific culture conducive to creating collective resources. Excessive concern with the protection of intellectual property can erect barriers to establishing new public domains. To illustrate this point, consider a counterfactual example. Imagine that you were trying to set up the first sequence database today. One proposed plan (which follows closely the model of LASL) might be to copy all the DNA sequences from the published literature, draw them together in machine-readable form, and provide access to the entire collection on the Internet. But in this post-Bayh-Dole era, with more than two decades of increasing commercialization of biology, would such a proposal be taken seriously? Perhaps not. And if it were, there is little doubt that one would need to convene a small army of university technology transfer officials, lawyers, and technology licensing specialists to negotiate about ownership of the database.
Of course, even given the increasing importance of proprietary regimes in biomedical science, commercial entities may at times decide to create public domains with unrestricted access. In Session 2, Robert Cook-Deegan mentioned the example of dbEST—the expressed sequence tag (EST) database funded by Merck.3 Michael Morgan will discuss the Single Nucleotide Polymorphism Consortium,4 which is a good example of a situation in which large pharmaceutical companies funded the development of a public-domain resource (in part to prevent other companies from creating monopolies over that resource). Clearly, public domains can still be constituted in a commercialized culture, but the question then becomes, how often will this happen? Can the scientific community safely assume that large corporations will create public domains in the future whenever they are needed? I think not.
I want to close by briefly mentioning possible third-order effects. Given the centrality of scientific knowledge and science advice to many critical public issues, it is worth considering how changes in public domains might affect the position of science in the wider polity. Arguably, the rapid commercialization and privatization of science has the potential to undermine the Enlightenment notion of science as a special form of knowledge, open to public scrutiny and collective verification (Shapin and Schaffer, 1985). If fundamental data pertaining to
important public issues get caught up in proprietary arrangements that make it difficult for people to access them, reanalyze them, criticize them, or incorporate them into critiques of things going on in the world, then the notion that science is public knowledge would be seriously threatened.
Perhaps such effects are the hardest to predict and the hardest to be certain about. But it is clearly worth asking how far science can move in the direction of privatization before people stop perceiving it as a credible and disinterested source of public knowledge, and instead begin to think of science as just another private interest— one that cannot be scrutinized and cannot be counted on to speak the truth.
Callon, Michel. 1994. “Is science a public good?” Science, Technology, and Human Values, 19(4): 395-424.
Campbell, Eric G., Brian R. Clarridge, Manjusha Gokhale, Lauren Birenbaum, Stephen Hilgartner, Neil A. Holtzman, and David Blumenthal. 2002. “Data withholding in academic genetics: evidence from a national survey,” Journal of the American Medical Association, Vol. 287(4), January 23/30, pp. 473-480.
Cinkosky, M. J., J. W. Fickett, P. Gilna, and C. Burks. 1991. “Electronic data publishing and GenBank,” Science, Vol. 252, pp. 1273-1277.
Fujimura, Joan H. 1996. Crafting Science. Harvard University Press, Cambridge, MA.
GUSTO III Investigators. 1997. “A comparison of reteplase with alteplase for acute myocardial infarction,” New England Journal of Medicine, 337(18): 1118-1123.
Hicks, Diana. 1995. “Published papers, tacit competencies and corporate manage of the public/private character of knowledge,” Industrial and Corporate Changes, Vol. 4(2), pp. 401-424.
Hilgartner, Stephen. 1995. “Biomolecular databases: new communication regimes for biology?” Science Communication, Vol. 17(2), pp. 240-63.
Hilgartner, Stephen. 1997. “Access to data and intellectual property: scientific exchange in genome research,” in National Research Council’s Intellectual Property and Research Tools in Molecular Biology: Report of a Workshop. National Academy Press, Washington, D.C.
Hilgartner, Stephen. 1998. “Data access policy in genome research.” pp. 202-18 in Arnold Thackray, ed., Private Science: Biotechnology and the Rise of the Molecular Sciences, University of Pennsylvania Press, Philadelphia.
Hilgartner, Stephen and Sherry-Brandt-Rauf. 1994. “Data access, ownership, and control: toward empirical studies of access practices,” Knowledge: Creation, Diffusion, Utilization, Vol. 15(4), pp. 355-72.
Knorr Cetina, Karin. 1999. Epistemic Cultures, Harvard University Press, Cambridge, MA.
Kohler, Robert E. 1994. Lords of the Fly: Drosophila Genetics and the Experimental Life. University of Chicago Press, Chicago, IL.
Latour, Bruno and Steve Woolgar. 1979. Laboratory Life. Sage Publications, Beverly Hills, CA.
Lenoir, Timothy. 1999. “Shaping biomedicine as an information science.” In Proceedings of the 1998 Conference on the History and Heritage of Science Information Systems, Mary Ellen Bowden, Trudi Bellardo Hahn, and Robert V. Williams, eds., ASIS Monograph Series, Information Today, Inc., Medford, NJ, pp. 27-45.
Mackenzie, Michael, Peter Keating, and Alberto Cambrosio. 1990. “Patents and free scientific information in biotechnology: making monoclonal antibodies proprietary,” Science, Technology, and Human Values, Vol. 15(1), pp. 65-83.
McCain, K.W. 1995. “Mandating sharing: journal policies in the natural sciences,” Science Communication, Vol. 16, pp. 403-436.
Nucleic Acids Research. 1987. “Deposition of nucleotide sequence data in the data banks,” Nucleic Acids Research, Vol. 15(18), front matter.
Shapin, Steven and Simon Schaffer. 1985. Leviathan and the Air-Pump, Princeton University Press, Princeton, NJ.
White-DePace, Susan, Nicholas F. Gmur, Jean Jordan-Sweet, Lydia Lever, Steven Kemp, Barry Karlin, Andrew Ackerman, and Jack Presses, eds. 1992. National Synchrotron Light Source: Experimenter’s Handbook, National Technical Information Service, Springfield, VA.