R. Stephen Berry
University of Chicago, United States
Open access and the ready exchange of data and other information are at the heart of the normal processes of science. This is perhaps even truer in the physical sciences than in other disciplines because they may offer fewer direct pathways to profitable applications. Obvious symptoms revealing this tradition are the longstanding practice of circulating preprints and other early-stage information among colleagues and the practice of presenting results in symposia and other conferences well before the work has been submitted for publication. Electronic media have made such access steadily easier and cheaper, and consequently have made it possible for science to progress faster and at a higher level of scholarship.1 We are in a period of adaptation and learning; this discussion starts with the perspective that we are trying to find effective ways to maintain procedures and values we know and trust while making optimal use of the powerful tools of electronic media and open access.
The following discussion will describe the kinds of information that physical scientists share, how they go about sharing that information, how the modes of sharing are changing, what the larger context is in which this sharing takes place, and then to the central point of this discussion, what challenges and problems face the scientific community and the infrastructure that supports it. Finally, I will introduce a proposal for one approach that may be a useful way of adapting to the evolving world of scientific communication.
The most obvious material that all scientists communicate is the substance contained in their formal publications. These papers in traditional journals go through a screening in the form of anonymous peer review that has set the standard threshold for acceptable distribution. There is a sort of tacit certification that goes with publication in these journals. However it must be understood that the threshold for acceptance is a relatively low (although sometimes capricious) one. The normal journal review process is certainly not capable of uncovering deliberate fraud, and very rarely can reveal subtle errors or inconsistencies that would require extensive work to discover.
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 85
Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium 21 International Transfer of Information in the Physical Sciences R. Stephen Berry University of Chicago, United States INTRODUCTION Open access and the ready exchange of data and other information are at the heart of the normal processes of science. This is perhaps even truer in the physical sciences than in other disciplines because they may offer fewer direct pathways to profitable applications. Obvious symptoms revealing this tradition are the longstanding practice of circulating preprints and other early-stage information among colleagues and the practice of presenting results in symposia and other conferences well before the work has been submitted for publication. Electronic media have made such access steadily easier and cheaper, and consequently have made it possible for science to progress faster and at a higher level of scholarship.1 We are in a period of adaptation and learning; this discussion starts with the perspective that we are trying to find effective ways to maintain procedures and values we know and trust while making optimal use of the powerful tools of electronic media and open access. The following discussion will describe the kinds of information that physical scientists share, how they go about sharing that information, how the modes of sharing are changing, what the larger context is in which this sharing takes place, and then to the central point of this discussion, what challenges and problems face the scientific community and the infrastructure that supports it. Finally, I will introduce a proposal for one approach that may be a useful way of adapting to the evolving world of scientific communication. WHAT PHYSICAL SCIENTISTS COMMUNICATE AND HOW THEY DO IT The most obvious material that all scientists communicate is the substance contained in their formal publications. These papers in traditional journals go through a screening in the form of anonymous peer review that has set the standard threshold for acceptable distribution. There is a sort of tacit certification that goes with publication in these journals. However it must be understood that the threshold for acceptance is a relatively low (although sometimes capricious) one. The normal journal review process is certainly not capable of uncovering deliberate fraud, and very rarely can reveal subtle errors or inconsistencies that would require extensive work to discover. 1 The author is not aware of any documentation to show the extent to which scientific progress has accelerated or current publications refer more extensively to relevant prior work than before electronic communication became widespread. The term “have made it possible” is used quite deliberately.
OCR for page 85
Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium Acceptance for publication in a journal implies that the work appears to be consistent and plausible enough to merit the attention of the relevant community of scientists and a topic for further discussion and investigation; it is neither clearly wrong nor outlandishly foolish, and may well be right. Reviewers, realistically, can devote only very limited effort to each manuscript they receive, so they screen the papers using a relatively low standard for acceptance: no obvious errors and relevance and interest for the audience of the journal. Scientists also communicate data, sometimes just as single facts or numbers, sometimes as extensive sets of tabulated results from experiment or theory. At one extreme are the new values of some freshly measured quantity; at the other are the carefully constructed tables of critically evaluated data such as those published in the Journal of Physical and Chemical Reference Data. An illustration of this latter category is the data sets maintained by the National Institute of Standards and Technology, which are most easily obtained on its Web page.2 There one can find the best current values of many quantities, for example those of the fundamental standards such as the speed of light, the masses of elementary particles, and Avogadro’s number. But there are also vast sets of data available in other open databases. One that overlaps the physical and biological sciences is the Protein Data Bank,3 a rich and fast-growing source of information about the sequences and structures that have been established and to some extent the level of uncertainty of the data. We should not lose sight, however, of the importance of one scientist relating to another the frequency of some critical, newly measured spectral line, or the strength of the bond between a protein molecule and an inhibitor molecule that sticks to the protein. In this context it is useful to distinguish simple data repositories from data in critically evaluated databases. The latter in the United States are considered worthy of copyright; hence they have value the law considers worth protecting. Unevaluated data deposited as they are generated may be useful but dangerous to use indiscriminately. For example, most scientists familiar with the sequences in the human genome database believe that there are many errors in those data and that considerable caution should be exercised by anyone trying to use those data. One might go so far as to say that unevaluated data deposited in an indiscriminate repository should be given no legal protection at all, until they have been scrutinized critically. Traditional journal publication and the almost-as-traditional preprint circulation have been the most important archival modes of communication. Apart from these, presentations at conferences provide another important mode of communicating scientific information. These presentations range from talks at very large meetings of professional societies to smaller, more focused meetings, such as Gordon Conferences, to the very small working groups that have met, for example, at the Aspen Center for Physics and the Telluride Summer Research Center. While these have been very important vehicles of communication, they rarely have had archival functions. In fact, Gordon Conference rules prohibit any record of the discussions or even the formal presentations; even quoting a presentation requires the permission of the person being quoted. Conference proceedings have in many instances been published, but unless the vehicle for publication is a standard journal and the issues containing the conference proceedings are circulated as normal issues, the proceedings become almost invisible to scientists later. There has certainly been disillusionment among researchers about publishing work in conference proceedings. On the other hand, the information exchanged among participants, especially in the smaller meetings, plays a very seminal role in moving science ahead. The most obvious changes of the past 20 years have been the increase in electronic modes of archiving, communicating, and accessing scientific information. There has probably been an increase in the frequency of small, specialized conferences and workshops as well. Electronic communication has its most known manifestations in electronic versions of journals and in less conventional electronic forms of communicating—publishing—completed work. Martin Blume distinguishes “Publishing” from “publishing”; the former implies appearance in a conventional journal, with reviewing and editing included in the process of “Publishing,” while “publishing” includes both that mode and also posting on an electronic server such as the arXiv4 that imposes no reviewing or editing and simply accepts what a scientist submits. 2 See http://www.nist.gov. 3 See http://www.rcsb.org/pdb/. 4 See http://arXiv.org,formerlyxxx.lanl.gov.
OCR for page 85
Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium There is another very important but more subtle change that the electronic communication of scientific information has engendered, particularly regarding international activities. The fast, relatively reliable, and marvelously convenient, boundary-free medium of e-mail has almost erased national and other geographic boundaries between interacting scientists. The extent of international collaboration has expanded enormously since the late 1980s because such communication is available. It is equally easy for a scientist in Chicago to collaborate with a colleague in Moscow or in Madison. The exchange of information, whether it be informal ideas, elaborate animations, funding proposals, manuscripts in preparation or completed, even drafts of full books, becomes a natural day-to-day process. We have already reached a point at which we take such communication for granted. If an institution’s Internet server crashes for a day, its research program comes to a panicked halt. A researcher may communicate with colleagues, many of them collaborators, in five or six countries in a single typical day. An especially relevant aspect of this internationalization is that many of the scientists participating in these exchanges are in developing countries, where the researchers now typically have access to computers, and hence to electronic archives but not to extensive traditional libraries. The importance of this point will become explicit when we examine how journals are distributing their publications. THE LARGER CONTEXT The scientific research we have been describing is largely conducted in academic settings, federal laboratories, and research institutions devoted to basic science. This research is supported largely by funding from federal sources and not-for-profit foundations. In the biological sciences the proportion from private sources, particularly pharmaceutical firms and others in related areas, is somewhat higher than in the physical sciences. The justification for this support is the collection of public goods produced by the research, goods that would not come into existence if it were not for federal and foundation funding. This aspect of basic research has been emphasized previously, but it should be stated again and again to reach wider audiences.5 A public good is a good whose value does not diminish with use. Public goods produced from science typically increase in value with use; this is one of two crucial characteristics of traditional science that must be a dominant factor in how we select any policy for science. Science’s product as a public good implies that any institution that funds research for the purpose of producing public goods carries the responsibility for seeing that the results of the research are disseminated. Dissemination is necessary for the public goods to emerge from research. Because the public goods from research amplify with increasing use, the benefits of dissemination typically produce a marginal return that may increase rapidly beyond the initial investment in that dissemination. The mechanisms of dissemination discussed in the previous section are undergoing dramatic changes. Fifty years ago, the symbiotic relation between journal publishers and scientific researchers provided a healthy way for science to disseminate and archive its products. The publishers—adding value to the material by editing, distributing, indexing, and archiving it—were able to profit monetarily while the scientists profited through access to the information. The number of scientific journals published in the United States increased steadily and exponentially from about 1840 until the late 1990s, when the growth rate slowed a bit.6 During the rapid expansion of basic scientific research during the 1960s and 1970s, many new, specialized journals appeared. Consequently, university libraries, federal laboratories, and industrial research centers subscribed to many more journals than they had prior to 1960. When budgets could no longer keep pace with rising numbers of journals and rising subscription prices, libraries became more selective and began to drop subscriptions and form coalitions to share the more specialized and less frequently used journals. The situation changed again with the evolution of electronic distribution. Electronic storage of data, especially large-volume databases, was being used by such agencies as the National Library of Medicine and National 5 See National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data, National Academy Press, Washington, D.C.; K. Dam. 1999. The Changing Character, Use and Protection of Intellectual Property, Stiftung, Deutsch-Amerikanishes Akademisches Konzil (DAAK) Symposium Band 11, Bonn, Germany, pp. 17-36; and R. S. Berry. 2000. “Full and Open Access” to Scientific Information: An Academic’s View, Learned Publishing, Baltimore, pp. 37-42. 6 Carol Tenopir and Donald W. King. 2000. Towards Electronic Journals, Special Libraries Publications, Washington, D.C.
OCR for page 85
Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium Aeronautics and Space Administration in the early 1980s. Electronic access to e-mail became widespread in the 1980s. Journals first began to use electronic methods in doing their composition; the next step was introducing modes of accessing individual papers.7 Journals then moved toward greater and greater use of electronic tools, starting with search procedures based on keywords; next, posting images of abstracts and then full manuscripts; putting up full texts in searchable form; and then to making available supplementary material that did not appear in the paper version of articles. The next stage, something envisioned in the mid-twentieth century but only realized in the late 1990s, was the posting of full archives of all issues of journals. Naturally, this question arose: what credentials would give a user access to the information put into such computer-accessible files? This issue is one I will explore further. At this point let us only point out that publishers have sought to give access to as wide an audience as each of them considers safe within the limits of stability of their publications. At issue are what those limits are, what those limits should be, and how those limits are affected by the procedures under which a publication operates; for example, some journal publishers have made their publications available without charge in developing countries. CHALLENGES AND PROBLEMS With the development of electronic distribution and the move toward open access, journal publishers—both private firms and professional societies—faced a major challenge to their financial stability. Professional societies adopted policies ranging from the enthusiastic acceptance of open access (e.g., the American Physical Society) to rather strongly protective positions (e.g., the American Chemical Society). This should be considered a healthy way the larger society can conduct experiments, and let the experiments play out. To go one step further with this logic: at this time in the evolution of new practices to adapt to new technology and its consequences, it is particularly important that any legislation and regulation that we adopt be permissive rather than restrictive. Restrictive laws and regulations at such a stage are extremely counterproductive, inhibiting competition among alternative courses of action. Any action such as the European Commission’s Database Directive that makes some courses difficult can only be interpreted as retrograde protection of narrow interests at the expense of the benefit of the larger society. If the more protective course were to win out over more open modes, so be it, but to prevent the possibility of one mode competing against others is simply dangerously bad economics. One possible scenario that might become reality splits the community of scientists and their publishers into one group that supports and uses open distribution of information and another group that follows a policy of strong protection; in such a case it would be very likely that the former group would simply exclude the latter from its regular attention, thereby neglecting the works produced by the more protectionist faction. There are two challenges most apparent for any new mode of operation within the electronic environment. The first is deciding how material intended for publication should pass certification, that is, what do we do about reviewing? The second is determining how we support the components of the scientific enterprise that have been paid for previously through the various forms of journal support. This funding includes the costs of publication, distribution, and archiving and in some cases the primary support of professional societies. Peer review has been the subject of a recent discussion by Paul Ginsparg.8 Here the range of solutions is likely to remain broad because of the range of practices among the sciences and the differences in levels of concern regarding the impact of published material on the readers. The practice of circulating preprints, a long-standing mode in high-energy physics, has given that community a sense of confidence in the effectiveness of postpublication review. Open commentary serving as postpublication review is a widely practiced way for the communities using the online arXiv to provide such “reviews” of articles published there. Ginsparg enjoys telling one example of an article posted on the arXiv at the same time it appeared in Physical Review Letters as a refereed paper. Within a day several contributors to the arXiv showed in comments published there and appearing with the 7 An extensive history of electronic methods in journals, and the prescient earlier publications on the subject, are provided in Tenopir and King, 2000. 8 See http://arXiv.org/blurb/pg02pr.html.
OCR for page 85
Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium original article that there were fundamental flaws in the published work. In other words, the online commentary provided a much more stringent, highly relevant review than did the normal anonymous refereeing through the journal. On the other hand, in the biological sciences, researchers worry that nonscientists would read unreviewed articles posted on an open source and try to use health care approaches that they find there. Without any evaluation of soundness many biomedical scientists feel this would be dangerous. Ginsparg proposes a procedure of systematic review for articles deemed worthy of particular scrutiny sometime after they have been put up on an electronic archive. His procedure would displace formal reviewing to a review stage after the normal audience had seen publications but before they went through any formal review. One motivation for such a procedure would be to reduce the costs of publication; reviewing all submitted papers as it is now done is one of the more costly stages of publication. The two-stage process would have the advantage of giving the professional audience rapid access, through the arXiv or a parallel counterpart, and would still provide certification for the relatively small fraction of papers that then go through formal review. The other major problem facing scientists and publishers is the issue of payment to present results of research to its audience. The primary source of financial support of printed journals is subscriptions bought by libraries. The prices for individual subscriptions of many journals are essentially the marginal costs of printing and mailing the journals. Commercial journals are priced to generate profits at a level chosen by the publisher. Journals published by professional societies (rather than by commercial publishers acting as agents or contractors for the societies) are priced considerably lower than those published by commercial houses. Even so, many professional societies use income from journal subscriptions as important sources of revenue for the activities of the societies. Such organizations are quite naturally very protective of the income they receive from publications. The world of science publishing is divided on how to treat the posting of articles on electronic servers, particularly on servers available without charge to readers. Commercial publishers that post the contents of their journals online typically exact a charge from subscribers, by way of institutional subscriptions that allow anyone from a particular institution to download articles or individual subscriptions that allow a particular user to have access. Some professional societies also follow that policy. Other journal publishers, such as the American Physical Society and the U.S. National Academy of Sciences, make the contents of their papers available without charge after some delay. The electronic arXiv makes papers available immediately. Journal policies are sharply divided regarding whether posting content on an open-source site, such as the arXiv, or presenting the work at a conference should influence the acceptability of a paper for publication. Some journals, such as those of the American Physical Society, actually cooperate with open-source providers. Others allow open-source distribution only after a specified interval following publication. Still others consider presentation of work at a conference prior to submission of a manuscript as grounds for disqualification of the manuscript for publication. The publishers that are reluctant to post published works on a no-fee basis are fearful that they will lose many subscriptions by such posting. Whether this is a correct assumption will presumably be demonstrated by the other publishers who do allow no-charge access. This is an important experiment, whose consequences should be examined and evaluated. It will not be sufficient to look only at the net revenue changes of the firms that post their articles. It will also be important to observe how the journals in one category compete with their parallels in another. Wherever there are competing journals in the same field offering authors the option of choosing one kind of publisher or the other, there is an opportunity to learn whether the posting policy of the publisher influences the choice of the author about which vehicle to use for publication. It will also be important to evaluate the impact of the two categories of journals; will no-fee electronic posting have a significant effect on the extent to which readers use a journal? We can easily predict with confidence that journals that distribute their publications without charge to developing countries will have significant impact in those places, but what about such distributions in developed countries? One facet of the financing issue is the question of which procedure should be used to pay the publisher. Subscriptions, one form of user fees, are the most widely used method now. During the years when electronic publishing was still a far-off goal, the readers’ pay-per-article model was discussed frequently. An altogether different approach is the author-based payment. Page charges were a form of this mode but were never constructed to cover more than a relatively small fraction of the total cost. Page charges were so unacceptable to the author community that they have become voluntary for many journals. If an author has to decide between paying a
OCR for page 85
Open Access and the Public Domain in Digital Data and Information for Science: Proceedings of an International Symposium voluntary page charge and buying a new computer, the decision is easy and obvious. The funds in a grant that are at the investigator’s disposal are simply too valuable to be used to publish a paper, if that paper can be published without charge. Yet, as previously mentioned, the funder of that grant has a responsibility to see that the results of the research supported by the grant are distributed and used. This paradox faces the community of scientists, publishers, and funders of research right now. A MODEST PROPOSAL This discussion concludes with a proposal to resolve the problem of paying for publication of scientific papers. The cost of publishing, high as it seems in the context of open access to scientific data and information, is a small fraction of the total cost of the research the publications describe. If a traditional journal can maintain high quality with low subscription costs and without its publisher charging authors so that the journal remains healthy and competitive, there is no need for any change. However the journal publishers, particularly the professional societies whose primary motivation is to serve their scientific memberships, see financing as a goal secondary to keeping the journals and the societies healthy. Their not-for-profit roles determine priorities that are quite different from those of commercial publishers. Consequently, a professional society is likely to regard electronic posting differently than a commercial publisher, and is likely to have a different kind of motivation toward actions that might threaten traditional revenue sources. The method we propose for paying for the publication of scientific papers is a fallback that will ensure that the funder’s goals are achieved from the perspective of the scientific community, which is the heaviest user of the published results. We propose an author-based fee system, but one in which the author is not expected to cover the publication costs using a research grant. Instead, we propose that the journal publishing an article bill the funding agency or foundation directly for the costs of editing, reviewing (so long as the paper goes through review), indexing, distributing, and archiving that are beyond what subscription revenue can cover. Direct billing would ensure that journals would survive however deeply they commit themselves to electronic accessibility and however much that accessibility cuts into subscriptions. This procedure was used, in fact, for some years by one U.S. foundation for publications generated in the research projects it supported. This method might appear superficially as a direct subsidy, but it is a rechanneling of funds to reduce transaction costs and achieve the goals of the researchers and their financial supporters. At present many of the funds that cover indirect costs of a grant are used to support institutional libraries. If subscription costs could be held down by shifting publication costs significantly to an author-based charging system, then some overhead funds could be included in a direct publication charge and would apply specifically to the user community of those publications. It would be reasonable to have some upper limit on what could be charged. The amount could be based on figures developed by professional societies working with funding agencies. If a publisher were to charge more than that minimum, the author would be expected to pay that difference with discretionary funds, presumably from a grant. This might happen, for example, with very-high-prestige journals of commercial publishers. It would be quite possible for some journals to opt not to participate in such a scheme. In fact, competition with other alternatives would be a desirable way for the method to be tested. It would require collaboration of the funding agencies, but it might be possible to adjust the distribution of funding so that the total budget for support of research could be nearly constant while the distribution of scientific results could become much more effective.