6.
WHAT IS PUBLISHING IN THE FUTURE?

INTRODUCTORY COMMENTS

Daniel Atkins, University of Michigan

The digital revolution is disaggregating the traditional processes of many knowledge-intensive activities, but in particular the publishing or scholarly communication processes, and it is offering alternatives in both how the various stages of these processes are conducted and who does them. The various functions—whether metadata creation, credentialing review, or long-term stewardship—can be separated, disaggregated, and players different from those who traditionally have carried out these tasks can, in theory, perform them.

The digital revolution is also changing the process by which knowledge is created, by which discovery takes place. This is most true within the scientific, technical, and medical (STM) arena. That is the central theme of a National Science Foundation (NSF) study, Revolutionizing Science and Engineering Through Cyberinfrastructure, recently chaired by Professor Atkins.14 The report clearly documented an emergent vision and enhanced aspiration of many science communities in the use of information and computing technologies (ICT) to build comprehensive information environments based on what is now being called cyberinfrastructure, such as collaboratories or grid communities. These are functionally complete, in the sense that all of the people, data, information, and instruments that one needs to do a particular activity in that particular scientific community of practice are available through the network and online.

A growing number of research communities are now creating ICT-based environments or cyberinfrastructure-based environments that link together these elements with relaxed constraints on distance, time, and boundaries. There is a general trend toward more interdisciplinary work and broader collaborations in many fields.

Publications now exist in many intermediate forms. We are moving toward more of a continuous-flow model, rather than a discrete-batch model. Raw data, processed data, replays of experiments, and deliberations that are mediated through a collaboratory can be captured, replayed, and re-experienced. Working reports, preprint manuscripts, credentialed or branded documents, or even post-peer-review annotated documents can now become available at varying times to different people with diverse terms and conditions.

Publications need not necessarily be precredentialed before publication on the Net. As George Furnas, at the University of Michigan, says, their use on the Net can itself be credentialing. In theory, every encounter with the document may be an opportunity to rank it in some way and to create some kind of a cumulative sense of its impact or importance. There could be alternative credentialing entities or methods, and you could pick your favorite.

The raw ingredients—the data, the computational models, the outputs of instruments, the records of deliberation—could be online and accessible by others, and could conceivably be used to validate or reproduce results at a deeper level than traditionally has been possible. The primary source data can be made available with a minimum set of metadata and terms and conditions. Third parties—particularly in an open-access, open-archives context—can then add value by harvesting, enriching, federating, linking, and mining selected content from such collections.

14  

National Science Foundation. 2003. Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. National Science Foundation, Arlington, VA.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium 6. WHAT IS PUBLISHING IN THE FUTURE? INTRODUCTORY COMMENTS Daniel Atkins, University of Michigan The digital revolution is disaggregating the traditional processes of many knowledge-intensive activities, but in particular the publishing or scholarly communication processes, and it is offering alternatives in both how the various stages of these processes are conducted and who does them. The various functions—whether metadata creation, credentialing review, or long-term stewardship—can be separated, disaggregated, and players different from those who traditionally have carried out these tasks can, in theory, perform them. The digital revolution is also changing the process by which knowledge is created, by which discovery takes place. This is most true within the scientific, technical, and medical (STM) arena. That is the central theme of a National Science Foundation (NSF) study, Revolutionizing Science and Engineering Through Cyberinfrastructure, recently chaired by Professor Atkins.14 The report clearly documented an emergent vision and enhanced aspiration of many science communities in the use of information and computing technologies (ICT) to build comprehensive information environments based on what is now being called cyberinfrastructure, such as collaboratories or grid communities. These are functionally complete, in the sense that all of the people, data, information, and instruments that one needs to do a particular activity in that particular scientific community of practice are available through the network and online. A growing number of research communities are now creating ICT-based environments or cyberinfrastructure-based environments that link together these elements with relaxed constraints on distance, time, and boundaries. There is a general trend toward more interdisciplinary work and broader collaborations in many fields. Publications now exist in many intermediate forms. We are moving toward more of a continuous-flow model, rather than a discrete-batch model. Raw data, processed data, replays of experiments, and deliberations that are mediated through a collaboratory can be captured, replayed, and re-experienced. Working reports, preprint manuscripts, credentialed or branded documents, or even post-peer-review annotated documents can now become available at varying times to different people with diverse terms and conditions. Publications need not necessarily be precredentialed before publication on the Net. As George Furnas, at the University of Michigan, says, their use on the Net can itself be credentialing. In theory, every encounter with the document may be an opportunity to rank it in some way and to create some kind of a cumulative sense of its impact or importance. There could be alternative credentialing entities or methods, and you could pick your favorite. The raw ingredients—the data, the computational models, the outputs of instruments, the records of deliberation—could be online and accessible by others, and could conceivably be used to validate or reproduce results at a deeper level than traditionally has been possible. The primary source data can be made available with a minimum set of metadata and terms and conditions. Third parties—particularly in an open-access, open-archives context—can then add value by harvesting, enriching, federating, linking, and mining selected content from such collections. 14   National Science Foundation. 2003. Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. National Science Foundation, Arlington, VA.

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium The goal of this session is to try to illuminate and inform the discussion about some of these emerging technologies, the related social processes, some specific pilot projects, and the challenges and opportunities that may provide the basis for these kind of future “publishing processes.” This is put in quotes, because we may someday not actually think about it explicitly as a publishing process, but as more holistically integrated into the “knowledge creation process.” IMPLICATIONS OF EMERGING RECOMMENDER AND REPUTATION SYSTEMS Paul Resnick, University of Michigan When some people think about changing the current publication process to a more open system, they express concerns that scholarly communication will descend into chaos, and that no one will know what documents are worth reading because we will not have the current peer-review process. This idea should be turned on its head. Instead of going without evaluation, there is a potential to have much more evaluation than we currently have in the peer-review process. We can look at what is happening outside of the scientific publication realm on the Internet to give us some clues about where this could go. The Democratization of Review and Feedback In today's publication system, there are reputations for publication venues. Certain journals have a better reputation than others, and certain academic presses have strong reputations. There are a few designated reviewers for each article, and these serve as gatekeepers for the publication. An article either gets into this prestigious publication, or not; it is a binary decision. Then afterward, we have citations as a behavioral metric of how influential the document was. We can examine some trends on the Internet to see how they apply to the scientific publication and communication process. There can be a great deal of public feedback, both before and after whatever is marked as the official publication time, and we can have lots of behavior indicators, not just the citation counts. Let us consider some examples of publicly visible feedback. Many Web sites now evaluate different types of products or services, and post reviews by individual customers. In the publishing world, many people are now familiar with the reviews at Amazon.com, both text reviews and numeric ratings that any reader can put in. Many of us find this quite helpful in purchasing books. We do not quite have it for individual articles in scientific publishing yet, but we do have it for books, even some scientific ones. Even closer to the scientific publishing world, there is a site called Merlot, which collects teaching resources and provides a peer-review process for them. There is a peer-review process before a publication is included in the collection, but even after it is included, members can add comments. Typically, such comments are made by teachers who have tried using it, and they report what happened in their classroom, and so on. The member comments do not always agree exactly with the peer-review comments. These examples provide a sense of what is happening with subjective feedback that people can give on the Internet, beyond the traditional peer-review approach. Behavioral Indicators With behavioral indicators, you do not ask people what they think about something, you watch what they do with it. That is like the citation count. For example, Amazon.com, in addition to its customer reviews, has a sales rank for each book. Another example is Netscan, which is a project that Mark Smith at Microsoft Research has been doing to collect behavioral metrics on Usenet newsgroups. It uses various types of metrics about the groups. Google uses behavioral metrics of links in their page rank algorithm. Many people check how they are ranked on Google on various search strings. However, Google is not just doing a text match; it

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium also takes into account how many links there are to the page from other Web pages, and weights it by the rank of the other pages. So this is a ranking system that takes into account the behavioral metric of who is linking to whom. A final example is the Social Science Research Network that was described by Michael Jensen. It uses the download count as a behavioral metric. The main one is the all-time, “top 10” downloads, but it also has many other different top-10 categories, so that more of the authors using the system have a chance to be a winner. Issues in Evolving Credentialing Processes The preceding examples provide some potential models for developing credentialing processes for scientific publication in the future. There are also some problems that require further thought. Some of these already are active areas of research for people who are working on recommender and reputation systems. An obvious potential problem is gaming the system. You can make a bunch of Web pages that all point to yours so that Google will rank yours higher. In fact, there is a whole cottage industry that does this. You can hire consultants who will help you get higher in the Google rankings. It is a little harder to do this with the Amazon sales rank. It requires you to actually buy some books. No matter what the system, however, people try to figure out what the scoring metric is and game the system. This means that those who are designing these metrics need to consider it from the outset. The ideal metric would be strategy-proof, meaning that the optimal behavior for the participants would be just to do their normal thing and not try to game the system, but it is not always so easy to design the metrics that way. Another problem is eliciting early evaluations. In those systems where there is widespread sharing of evaluations, there really is an advantage to going second, to let somebody else figure out whether an article is a good one to read or not. Of course, if we all try to go second, there is no one who goes first. Yet, another problem can be herding, where the later evaluators do not really reveal what they thought of the document, but are overly influenced by what the previous evaluators thought—they just go along with the herd. There are some interesting approaches that potentially would help with the herding problem. For example, you might reward evaluators for saying something that goes against the previous wisdom, but with which subsequent evaluators agree. That would be the person who finds the diamond in the rough and that person would get special rewards; the person who just gives random or tendentious reviews would get noticed and would get a bad rating. Such a process also would require going back and revisiting some of the decisions that we have made about anonymity and accountability in review processes: single-blind, double-blind, not blind at all. For different purposes different processes may be preferred. Experiments to Try in STM Publishing Some potential experiments are more radical than others. Journal Web sites might publish reviewer comments. The reviewers might take more care if they knew their comments were going to be published, even without their names attached. The reviews for the rejected articles could be published as well. There could be fewer really bad submissions if authors knew that they could potentially be hurting their reputation by having the reviews of their article up there. Then, after publication, the Web sites that publish full-text articles or abstracts could let people put comments there that would be publicly visible. Some other experiments might try to gather more metrics. Projects such as CiteSeer in the computer science area measure citations in real time. One might also use the link and download to find out how many people are actually reading online, how many times an article is being assigned in courses, and other behavioral metrics. Experiments in evaluating the evaluators are needed as well. The best place for this might be in some of the conference proceedings, where there are a number of evaluators for an individual article. One could examine the nature of the reviews in a more explicit attempt at evaluating evaluators. More attention, greater credit, and rewards need to be given to reviewers for evaluating early,

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium often, and well. Publishers already are complaining about the difficulty of finding good reviewers. One reason is that reviewers’ services are not valued sufficiently in the system. We need some metrics that go beyond merely noting that someone was a reviewer or an associate editor at some journal, but that actually evaluate and give proper credit to such functions. We might start thinking about that as a contribution to research, rather than as purely a service line. In the promotion and tenure reviews for academics, the categories typically include teaching, research, and service, but this evaluation and commentary activity might really be thought of as contributing to the growth of knowledge. If some metrics could be developed on how much they are contributing in what way, we might think about that as a research contribution rather than just a service contribution. PREPRINT SERVERS AND EXTENSIONS TO OTHER FIELDS Richard Luce, Los Alamos National Laboratory The preprint service is something that is a well-known, well-understood concept in the physics community, but not much in other communities. It is useful, at the outset, to distinguish between preprints and e-prints. Preprints have a “buyer beware” connotation to them in the physics community. They provide a means to obtain informal, non-peer-review feedback that is weighted very differently in the community than a formal refereed report. They are a way to get an early version of an article out to colleagues to get some feedback, if any may come back, to help decide whether or not to publish it later. e-Prints, on the other hand, tend to be more polished papers deposited by authors in a public online archive in order to speed up the communication process and give authors more control over the public availability of their work. The e-Print arXiv in Physics Back in 1991, Paul Ginsparg created the e-Print arXiv15 at the Los Alamos National Laboratory. That database archive, which today is at Cornell, has about 30 or so fields and subfields and 244,000 papers. It has succeeded in large part because Dr. Ginsparg is a physicist. As a high-energy physicist, he understood well how that community works and what its needs are. His notion was to take and streamline the communication process relative to preprints. Over the past decade, this approach has spread to some other fields, including mathematics, materials, nonlinear sciences, and computation. The adoption of this approach by other disciplines demonstrates that this is not a phenomenon only in high-energy physics or in the physics community, but can work in other research areas. It clearly has increased communication in the areas that it covers in physics. It is the dominant method for authors to register publicly when their idea first comes out, even though the article may not be formally published until six months later. According to Dr. Ginsparg, the cost is very low, and this is partially responsible for the wide acceptance of the system. There has been a continuing increase in submissions. The driver in the community clearly is speed, to make communication faster. The e-Print arXiv has set an example for both society and commercial publishers to consider. It is significant to note that in 1995-1996, the American Physical Society (APS) began to accept preprint postings, and later began to link back to them. This was the beginning of a more formalized recognition that there was a role for this approach—a bottom layer or first tier of scientific information—and that we could have a two-tier structure and start to link those things together. One might raise the question of the quality of the information in the e-Print arXiv. These articles are not peer reviewed. If one does an analysis over a period of time of the quality of the submissions, however, what you see is a field-specific track record in terms of what actually gets published. In high-energy physics theory, about 73 percent of the papers in the archive turn out to be 15   For additional information on the e-Print arXiv, see http://www.arXiv.org/.

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium published, whereas in condensed matter physics, it is only somewhere around one-third. Consequently, it is fairly field specific, but it does indicate that the archive does not just contain articles that have no future or role in the formal publication system itself. What lessons can we learn from this? Greater timeliness in the disclosure of research results is certainly one. Another is that a few passionate people can make a difference. For a decade, while this system was at Los Alamos, it was typically run by three to five people, sometimes spending far too many hours, but inspired by the view that this was really going to change the world. That small number of people and the relatively small amount of money to operate the system (about a half million dollars per year) became a very dominant factor in terms of the community itself. Most important is the lesson is that it addressed the common interests of a community in a sociologically compatible way, which is why the system worked for that particular community. Other Preprint Server Projects Scholarly communication is a very complex ecosystem. Clearly, not all fields are the same. The sociology, behavior, and traditions differ from field to field. Consequently, this solution is not the correct or only solution, nor could it be expected to fit in all other fields. One needs to understand the community behavior and traditions and then look for models that meet those kinds of needs and requirements. Nonetheless, there have been spinoffs into other fields organized by both universities and government agencies. CogPrints, at the University of Southampton in the United Kingdom,16 is a preprint system that is quite well known in cognitive science. NCSTRL, the Networked Computer Science Technical Reference Library, was an early effort to get computer science papers harvested together and then start to build a federated collection of that literature.17 NTLTD, developed by Ed Fox at the University of Virginia, provides thesis dissertations. Within the federal government, the NASA National Technical Reports Server (NTRS)18 was a pioneer in terms of trying to bring together and to make available a collection of federal reports, both in terms of metadata and the full text. PubMed Central is certainly well known in the life sciences community. Living Reviews, (www.livingreviews.org/) is a somewhat different model in gravitational physics at one of Max Planck's institutes in Germany, created a review that gets updated by the author over time. Rather than going back to read a decade-old review that is static and wondering what has happened to the field since it was published, readers of Living Reviews know that it is current, because the authors have committed to a process of keeping the material they put into that online publication up to date. There are perhaps a dozen well known e-print systems with servers active today. There are around 100 other servers who claim to have some kind of an e-print system. They use a standardized protocol and allow people to come in and at least harvest some of the material that is collected there. One problem is the absence of an enabling infrastructure. The open archives initiative was not meant to be the end-all, be-all fix to the system. It sought a solution that would allow a discipline-specific e-print archive to be able to talk to or communicate with other systems, so that users would have the opportunity to look at a pool of material. The problem is, how to ensure access to this variety of different systems. The protocol specifies the method by which material on the different systems can be harvested. We are just reaching the point where we are starting to see what kinds of interesting services can be put on top of that. That development perhaps has been slower than expected, but is now beginning to take off with a variety of different systems. One example is CiteSeer, mentioned by Paul Resnick. This begins to hint at the kinds of things that people might do in an open environment. Implications for the Future What do these developments mean beyond the physics community, and what new efforts can we 16   For additional information on CogPrints cognitive sciences e-print archive, see http://cogprints.ecs.soton.ac.uk/. 17   For additional information on the NCSTRL archive, see http://www.ncstrl.org/. 18   For additional information on NASA’s National Technical Reports Server, see http://ntrs.nasa.gov/.

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium foresee? It is important to note that there has been very powerful opposition coming from traditional institutions that have used journal publishing as a cash cow, or from secondary publishers who see their secondary databases essentially as a birthright. For example, there was a lot of political pressure leading to the demise of the Department of Energy’s PubScience Web portal because it was perceived as threatening to those other kinds of interests. A major question that needs to be addressed is how peer review will work in the preprint and open-access context. How should we evaluate, recognize, and reward? Another question that is related in part to the peer review issue is: What is the influence, and how do we detect it? There are a variety of methods that one needs to consider, essentially a composite approach. Today we use citations as the sole indicator of influence, that is, an author-derived statement about what is important, what has influenced the research. We might look at a complementary path, which is the notion of reader behavior related to determining influence. Digital libraries or service providers can provide analytical tools to generate new metrics based on user behavior, which complements or may even surpass citation ranking and impact factors. What is the problem with impact factors? To some extent, it is the lazy person's notion of how to figure out what is important in terms of journal ranking. It can be very convenient for publishers to state that their journal is ranked well in terms of impact factors. It also is relatively easy for librarians to justify why they buy one title over another title. The problem is that the citation is only an indicator of influence. There are many reasons that people might cite a paper: to show that they have read the literature in a field, to disagree with somebody and prove them wrong, to help a colleague get some visibility and enhance reputation, or to give credit to someone’s ideas. Although impact factors are widely used to rank and evaluate journals they frequently may be used inappropriately. Then there is a whole field of bibliometrics, which enables one to track science and to find out the interrelationships between authors, citations, journal citations, and the subjects that are covered. This is still an emerging field, but one that may become more extensively used. A better approach, however, might be to supplement the current system with a multidimensional model to balance bias. What would such a model look like? An ideal system might have the following elements: citations and co-citations to determine the proximity indicator; the semantics, or the content and the meaning of the content in articles to see how they are related; and user behavior, with regard to behavior metrics. At Los Alamos, about 95 percent of information resources are electronic-only, so it is possible to detect and determine community-specific research trends and look at where those trends differ from the ISI impact factors. However, this is quite complex and raises privacy concerns. Finally, there is the problem of long-term curation, which has several aspects. The first element is the published literature, which is what most people think about in the context of preservation. There is also the issue of the relationships in the rich linking environment that one might want to collect and preserve over time, however, and make available in the future. INSTITUTIONAL REPOSITORIES Hal Abelson, Massachusetts Institute of Technology This panel is supposed to be about this wonderful cyberinfrastructure future. One is reminded of William Gibson, the outstanding cyberpunk writer, who 20 years ago gave us the word cyberspace. He said that he does not like to write about the future, because for most people the present is already quite terrifying enough. It is in that spirit that this presentation looks at the present. The Changing Role of Universities in Making Their Research Output Openly Accessible The main action one should watch for is not in the new technology, but the possibility of new players finding institutional reasons related to their other primary missions to participate in a new game of disintermediate thy neighbor. In particular, do universities have institutional roles to play here other than what they have done so far, which is to be the place where the authors are? Do universities have a

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium reason in their institutional missions to start participating in making their research output openly accessible? According to MIT's mission statement (and probably that of many other universities), MIT is committed not only to generating knowledge, but also to disseminating and preserving it. So how does a place like MIT implement this mission? One initiative, OpenCourseWare,19 is already getting quite famous. According to the MIT President's Report in 2001, the university has made an institutional commitment “to provide free access to the primary materials for virtually all our courses. We are going to make our educational material available to students, faculty, and other learners, anywhere in the world, at any time, for free.” The reason MIT did that is not that it was overcome by some fit of altruism; rather, it decided that given the way the world is going, it would be better for MIT and indeed all universities, in terms of fulfilling their primary mission to educate students, to put their primary educational material on the Web. The OpenCourseWare Web site is a prototype and has 50 courses up. There is a group at MIT that is madly trying to get the first 500 courses online by September 2003. They are on a timeline to get all MIT courses up by 2007. It is an institutional publication process, to which the university has committed as a permanent activity. The DSpace initiative,20 which is the sister project of OpenCourseWare, is a prepublication archive for MIT's research. The difference between DSpace and most other prepublication activities is that there is an institutional commitment by the MIT libraries, justified by MIT's mission to maintain it. OpenCourseWare would make sense if only MIT did it, but DSpace cannot possibly be like that. DSpace, in addition to being a preprint archive for MIT, is meant to be a federation that collects together the intellectual output of the world's leading researchers. MIT now has six institutional partners working with it. What is important about DSpace is that there is a group of universities working out the management and sustainability processes in terms of their institutional commitments for how to set up such a federated archive. Responses to the Proprietization of Knowledge Both OpenCourseWare and DSpace are ways that MIT and other universities are asking what their institutional role should be in disseminating and preserving their research output. Why are these questions coming up now? Why might universities start wanting to play institutional roles in the publication process, other than serving as places where the authors happen to be? The answer is that the increasing tendency to proprietize knowledge, to view the output of research as intellectual property, is hostile to traditional academic values. What are some of the challenges that universities see? They include high and increasing costs; imposition of arbitrary and inconsistent rules that restrict access and use; impediments to new tools for scholarly research; and risk of monopoly ownership and control of the scientific literature. The basic deal, as seen by universities, is that the authors, the scientists, give their property away to the journals. The journals now own this property and all rights to it forever. Lifetime of the author plus 70 years is forever in science. If that regime had been in place 100 years ago, we today would be looking forward to the opportunity in 2007 to get open access to Rutherford's writings on his discovery of the atomic nucleus. Then the publishers take their property and magnanimously grant back to the authors some limited rights that are determined arbitrarily and totally at the discretion of the publisher. The universities, who might think they had something to do with this, generally get no specific rights at all, and the public is not even in this discussion. Jane Ginsburg already showed some examples of the restrictions imposed by publishers on authors, but it is most useful to mention some here. For example, some rights generously granted by Reed Elsevier to its authors include the right to photocopy or make electronic copies of the article for their own personal use, the right to post the final unpublished version of the article on a secure network (not accessible to the public) within the author’s institution, and the right to present the paper at a meeting or conference and to hand copies of the paper to the delegates attending the meeting. The rights 19   For additional information on MIT’s OpenCourseWare project, see http://ocw.mit.edu/index.html. 20   For additional information on the DSpace Federation, see http://www.dspace.org/.

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium generously granted to authors by the journals of the American Chemical Society are that authors may distribute or transmit their own paper to not more than 50 colleagues, and authors may post the title, abstract (no other text), tables, and figures from their own papers on their own Web sites. The rights from the New England Journal of Medicine to its authors are even more limited: “The (Massachusetts Medical) Society and its licensees have the right to use, reproduce, transmit, derive works from, publish, and distribute the contribution, in the Journal or otherwise, in any form or medium. Authors will not use or authorize the use of the contribution without the Society’s written consent, except as may be allowed by U.S. fair-use law.” It is instructive to list some of the elements that are valuable for promoting the progress of science. They include quality publications and a publication process with integrity, certainly, but also: open, extensible indexes of publications; automatic extraction of relevant selections from publications; automatic compilation of publication fragments; static and dynamic links among publications, publication fragments, and primary data; data mining across multiple publications; automatic linking of publications to visualization tools; integration into the semantic web; and hundreds of things no one has thought of yet. Will the information technology to support scholarly research be stillborn because everything is hidden behind legal and electronic fences? Probably not, because it is too valuable and people are going to invest in it anyway. The more serious possibility is that the spread of these tools will be done in a way that stimulates network effects that will further concentrate and monopolize ownership of the scientific literature. If a search engine only searches the publications of one publisher, that becomes valuable enough for the publisher then to come in and do what the librarians call the big deal. As Derk Haank, CEO of Elsevier Science, noted recently: “We aim to give scientists desktop access to all the information they need, for a reasonable price, and to ensure that the value of the content and the context in which it is presented are reflected in the information provision. The information is made available to researchers under licenses accorded to their institutes, and they have all the access they wish.” Does this mean that science will be restricted with monopoly ownership, or rather will we enjoy a system that participates through open standards? One impediment to openness is copyright. It turns out to be surprisingly difficult—in the wonderful legal phrase—to abandon your work to the public domain. It is even more difficult to specify some rights reserved, rather than all rights reserved, which is the default rule under copyright. The Creative Commons was founded recently to encourage people to allow controlled sharing of their work on the Internet.21 To sum up, the world is disaggregating, and there is a big game of disintermediation going on. The place to look for access is not new technology. The places to look for the action are the new institutional players coming into the game. New technologies might enable new roles for the universities, and that will lead to the promotion of the progress of science. DISCUSSION OF ISSUES The Role of Peer Review and Other Review Mechanisms Steven Berry began by pointing out that there is one aspect of the refereeing process that Paul Resnick either dismissed or overlooked. Reviews by and large have a lot of influence on what actually gets published. It is not a simple rejection process. Of the papers that are reviewed and published, a very high percentage are revised as a result of the reviews. Furthermore, we have to recognize that the review process provides only a very low threshold. It simply identifies material that is of sufficient quality for scientific discourse. So it is not necessarily a judgment of whether it is right or wrong. This function could be done in other ways, of course, but we have to recognize that in some fields, reviewing even at that low threshold is looked upon as a very important protection. The arguments between physical and biological scientists about this are instructive. Physical scientists, as Rick Luce pointed out, are very ready to accept the archive model without review first, and use online reader review. The biological scientists are worried that without that low threshold review, information could be put online that might be dangerous to lay users. They feel that there is a large 21   See http://www.creativecommons.org.

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium audience of nonprofessional readers of biological, and especially biomedical, articles that simply does not have the judgment that the professionals have, and that it is simply dangerous. Paul Resnick responded by emphasizing that in considering these alternative review mechanisms, he views them as being in addition to peer review, not instead of it. We also could get more out of the peer-review process than we are getting now, if the reviews were made public. For example, you do not want to let bad information get out there for the general public to see without anybody from the scientific community saying it is rubbish. Fine, put it out there along with the reviews of scientists who said it is rubbish. Why is it better to just hide it than to disclose it with the commentary from the scientific experts? Dan Atkins read a question from a member of the Webcast audience, who asked: Might not the thread of the original version of a paper, along with reviewers' comments and authors' revisions or responses to the comment, as well as the journal editor's additional comments and interpretations, be used as educational material and enrichment for university students? This could be done either anonymously or by attribution. Paul Resnick said he thought it was a great idea. One of the problems for Ph.D. students is that, more often than not, they do not get to see a paper through its whole process until they do it themselves and have cleared it the hard way. An adviser can show them the reviews submitted on the adviser’s papers, but this is not done routinely. Making it a more public process would be helpful for education. Rick Luce added that there needs to be a proper balance in that discussion, whether we should let the system filter out the bad material or let the user filter it out. The Role of the Public Press Ted Campion noted that one big player in scientific publishing that has barely been mentioned is the public press. Scientific publication, particularly in biomedicine, is largely being judged now, at least by authors and even academe more generally, by how much press coverage it gets. It is not just studies of estrogen. Zebrafish and hedgehog mutations are getting into the press. The scientific press is not only being judged now by the boring citation indices, but by whether the network news covers it. This, of course, is all being driven by the public's increasingly sophisticated understanding and interest in science, and in biomedical sciences in particular. What effect is this having on scientific and biomedical publication? Hal Abelson responded that he had a discussion with his daughter, who is in medical school, about MIT's OpenCourseWare, and she told him she thought it would be a terrible thing for medical schools to put their course curriculum on the Web, simply because you had to be a professional medical student in order to evaluate and use that information, so it would be dangerous to have it out there. It is hard to know what to do about sensationalism in the public media. At the same time, you could do a lot worse than to have Peter Jennings talk about an article in the New England Journal of Medicine. It has been a tradition in the United States that the cure for speech is more speech. Maybe if there were other channels for people to respond, things would be better, but restrictions on the press are not the answer. Maybe part of the answer is that it is up to the press to worry about the novelty and up to the journals to worry about the authenticity. Curation and Preservation of Digital Materials Mark Doyle, of the APS, said that like MIT, part of his society’s mission is explicitly the advancement in the diffusion of knowledge in physics. In addition to peer review, the society considers very important the responsibility to do the archiving. It has already gone through the transition, going to fully electronic publications. The core of its archive is a very richly marked up electronic file from which the print and the online material are all derived. What still appears to be absent in DSpace or in the e-Print arXiv are any efforts to build the infrastructure for curating that kind of material. It is one of the most important things to do. One other related issue is the low cost of arXiv.org. The key problem is that there is a two to three order of magnitude difference between what it costs the society or other publishers to publish—$1,500 per article—and what it costs to publish in the arXiv. That is really where all the tension in the

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium economic models come from, when you have such large orders of magnitude difference. That is what puts all the pressure on people to change the way that things are done. Rich Luce responded that curation is an interesting question. Most libraries are not going to be able to do this. There are probably one to two dozen libraries globally that see that as an important role, together with a significant capability in which to make an investment, because they are thinking about centuries of preservation, not years or decades. There are many publishers who are quite aware of the problem, but very few make the necessary investments. The vast majority of publishers are simply too small to have the wherewithal, both technologically and financially, to be able to pull it off. It is going to take some sort of a hybrid relationship between some publishers and some libraries, who see that as their role. Hal Abelson added that when MIT designed the DSpace, it was absolutely essential and deliberate that it be housed in the MIT libraries. The reason was that whether or not the MIT libraries will preserve something for 200 years, they are certain to preserve it for 50 years. The developers of DSpace wanted to work with an organization that understood what archiving and curation meant, and that is what libraries do. To build on what he said earlier, the critical thing is not the technology for archiving, because that technology is well under control. Rather, it is to find an institution that will commit, as part of its core mission, to keep and preserve the material. For example, if it is part of the core mission of the American Chemical Society to be the repository of all chemical literature and to have every other organization in the world be its franchisee, that is an important thing to say. It also is very important not to get trapped into the idea of building the monolithic, end-to-end solution. DSpace will never be that. It might be a place where people who are building peer-review and curating systems, and all kinds of other systems, can link to and build on. The trap, which was alluded to yesterday by Gordon Tibbitts, is that you do not want to be in a position where you build the whole system. Instead, you want to have elements that are communicating through open standards, so that lots of people can come in at different places in the value chain and add value in different ways. Evaluating STM Literature Outside Academia Donald King noted that in his studies of the amount and types of use of STM articles, approximately 25 percent of those articles are read by academicians, and the rest are read outside of the academic community. When you begin these feedback systems that Paul Resnick was describing, you need to think in terms of the enormous value that is derived from the literature outside academia. There are two purposes for doing this. One is that it is a better metric for assessing journals and authors. The other is that it also will begin to develop a means for the authors to better recognize that their larger audience is outside of their immediate peers. Professor King added that he has done a lot of focus group interviews and in-depth interviews of faculty, and they seem to think that they are writing only for the people they know, their immediate community. There would be some value in the assessment system, therefore, if there were some acknowledgment or recognition that there are other uses of that information outside academia. Paul Resnick agreed that providing such external feedback to the authors on how their works are being used would be useful. Adapting Tenure Requirements to Open Source Lennie Rhine, University of Florida, asked how universities adapt the tenure process to the open-source environment. Most academics are tied to the peer-reviewed journal system as a mechanism to be ranked hierarchically and to be evaluated in the tenure process. How do you incorporate this more ephemeral open-source information into that process? Paul Resnick responded that in the current tenure process, the department heads look at the journal rankings and count the number of journals in which you are published and cited, and different departments weight it differently. They may or may not construct a numeric score. One could actually develop a more open system for computing metrics like that. Consider how U.S. News & World Report does its rankings of schools; it has a particular way of weighting everything. Now, imagine a more open version where we collect all the data, know what things have been cited and read, have all the reviews,

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium and have both the behavioral and the subjective feedback. Then the teaching institution can create its own metrics based on its own tenure criteria, and the research institution can create a different metric, and you can have many different ways of using those data. Hal Abelson added that one of the marvelous things that the NSF did several years ago was to limit the number of papers that one can cite in preparing a grant proposal. It is not really a question of quantity or vast numbers of papers published in third-rate journals. MIT has been trying to get its teaching staff who are coming up for tenure to identify three things on which they should be evaluated. The goal is to try to get the enterprise focused on what Alan Kaye used to call “the metric of Sistine Chapel ceilings per century, rather than papers per week.” Problems with Metrics Martin Blume was happy to hear Rick Luce use the word indicators, rather than metrics, because there is no one number that can be used as a measure of quality. There are things wrong with all of them, including the opportunities for gaming the system. There is a Dilbert cartoon that shows the human resources manager saying that metrics are very important, and a very good one is the rate of employee turnover. The manager replies that they do not have any turnover, they only hire people who could not possibly get work anywhere else. Many metrics suffer from this, and they can be manipulated. You really have to look into them and use them as indicators, and it takes a fair number of them if you are going to get a fair measure of quality. Open Versus Confidential Peer Review Dr. Blume also commented on peer review, and the concerns about public comment. Although he thinks public commentary is a good thing to do, nevertheless there is a sort of Gresham's law of refereeing, in that the bad referees tend to drive out the good ones. All of us who take part in listservs of one sort or another know of the loudmouth who will not be contradicted or denied, and eventually the rest of us give up and say we are not going to take part in this anymore. You need to expect something like this, and you have to have a degree of moderation in it. That is where an editor’s role in the peer-review process comes in. Also, the knowledge that a paper is going to be peer reviewed does have an effect on authors. It means that they try to improve it at first so that it will pass this barrier. Dr. Blume then presented some statistics on peer reviews from the APS’s journals. The society looks at the first 100 articles submitted to one of its journals and tracks them through a year to see what has happened to them in the end. Of the first 100 submitted in one of the years, 61 were accepted and the remaining 39 were rejected or recommended for publication elsewhere. Of the 61 that were accepted, 14 were approved without any revision after one report, 22 were approved after resubmission after some modifications, 14 after a second review, and 4 after the third resubmission, all of these leading to improvements. Some wags would say that the improvements for some of them are largely adding references to the referees' papers, but even that is an improvement if there is not enough citation of other work. Of the rejects, 19 were rejected after one report, 13 after two, 4 after three, 2 after four, and 1 after six reports. It is much more costly and difficult to reject a paper than to accept it. This provides an indication of some of the value added in the course of traditional peer review. It is something that has to coexist with the other types of assessments. APS also tries to avoid using referees of the type that would lead to Gresham's law such as one sees on the listserv. It is aware of them, and tries to select accordingly. Paul Resnick said that one way to evaluate the reviewers is to have an editor who chooses them or moderates. You also could come up with some system where you calibrate reviewers against each other. Dr. Blume noted that his society actually does this. It keeps a private file based on the reports it receives. Unfortunately, this leads to an overburdening of the good reviewers, so they are punished for the good work that they do. Dr. Resnick pointed out that that kind of system that Dr. Blume is using privately and internally could be adapted to a more public version. If you go to a more public system, you do not necessarily

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium have to give all the lousy reviewers an equal voice. Dr. Blume added that his society would certainly want to pick people and would probably continue to do it anonymously. A reviewer is always free to reveal his or her identity, but the APS will not. Relationship Between Open Archives and Traditional Publishers Fred Friend asked Hal Abelson how he sees the future long-term relationship between open repositories and traditional publishers. One can see that repositories such as DSpace have a very valuable role in shaking up the system and in helping establish or return to better priorities in scholarly publication, but what is the future role for traditional publishers in that situation? Hopefully, publishers will respond in a positive way to these changes, and may come out in the end with a better role than they have at the moment. Or could institutional repositories take the place of traditional publishers completely? Professor Abelson noted that it was Yogi Berra who said, “It is really hard to make predictions, especially about the future.” The main point is that as you have new players, you have different kinds of roles. There is no inherent hostility between institutional archives and traditional publishers. MIT, for example, has a very respectable journal operation in the MIT press, and it is looking for ways to find joint projects with the DSpace archives. One can imagine a university holding both the preprint through the edited version, and the journals coming in with some kind of authentication and review cycle. There may be lots of opportunities. The trick is to free up the system and allow other players to provide pieces of that process that the journals, for various reasons, have not been. The danger is for some individual player to come in and try to lock everything up in some complete system. The problem with the World Wide Web has always been that everybody wants to be the spider at the middle of it, and that is the outcome that we have to resist. Dan Atkins recalled the point made by Rick Luce that preprint servers are the repositories at the lower layer, and they provide a platform or an infrastructure on which a whole host of yet to be fully imagined value-added entities could be built, some of them for-profit entities. The idea is to create a more open environment for the primary, or upstream, parts of the value chain, and then to encourage an economy of activity on top of that. One of the themes that came out loud and clear in the NSF’s recent cyberinfrastructure study was the huge latent demand for serious curation of scientific data, and mechanisms for the credentialing and validating of data and for encouraging interoperability between data in different fields as people create more comprehensive computational models. Of course, there is a lot of synergy between that and some of the long-term preservation and access issues. In fact, the volume of bits that, for example, just the high-energy physics community generates in a year probably exceeds most of the scientific literature worth keeping. Fred Friend added that, traditionally, the formal publications have been viewed as being the record. Yet we seem to be saying that long-term archiving is not for publishers. So that perhaps rules out the recordkeeping function for traditional publishers. What are they then left with? Rich Luce noted that an issue of deep concern to the governmental sector, and to public institutional repositories, is being able to have access to material created with public monies, and to make these materials publicly available. One can easily imagine a system again that is open at the bottom level, where there are some nuggets that perhaps publishers might look at and start to mine for opportunities to add value to on a more formal basis. These activities do not have to be competitive. In fact, they can coexist in a way that is very complementary. Universal Search Engines for Licensed Proprietary Content Mark Krellenstein, from Reed Elsevier, referred to Hal Abelson’s comments about Derk Haank's statement about Reed Elsevier producing a universal search engine for its licensed and proprietary content. That initiative came in response to the publisher’s users and the libraries. Reed Elsevier has found that people want as close to a universal search engine for proprietary materials as they have with Google for the open materials on the Internet. The idea of multiple players in the value chain, which appeals to expert researchers, frequently is

OCR for page 67
Electronic Scientific, Technical, and Medical Journal Publishing and its Implications: Proceedings of a Symposium less appealing to undergraduates, in particular, who really want a single solution for all their search needs. Reed Elsevier does not expect to be that single solution, but it is trying to do as much as it can. It is charging for it because of the model it has for charging for content. Google supports itself via advertising. That could be an option perhaps for a company like Reed Elsevier, but probably not in the scientific context to make the kind of revenue that is necessary, given other models discussed at this symposium. Another point is that Reed Elsevier is also open to other players doing the same kind of thing. There is a metasearch initiative going on right now in the National Information Standards Organization (NISO), which is trying to respond to libraries' request to have a small number of services for proprietary content, rather than a long list of 60 providers. Reed Elsevier is working together with NISO and other large publishers to develop open standards, so that any metasearch company could come in and search these proprietary services. Finally, Reed Elsevier offers a service called Scirus (www.scirus.com), which does provide access to some of that hidden content that Google does not provide access to. It has scientific papers from the Web, with 150 million Web documents. It has all the Reed Elsevier proprietary content, plus whatever it can license from other publishers. Most of the proprietary content is available for a fee. The abstracts are free, but if you click through to that content you go right through to the full text, if your site is licensed, and if not, there is a pay-per-view model, at least for the Reed Elsevier material. What Google has in fact done is to create a successful application for open content; however, something equivalent is not considered desirable, to some extent, on the licensed-content side. Improving Assessments of Publications Donald King noted that there is another dimension that needs to be considered in assessing the materials that are published. That is the dimension of time following the publication of the articles, or the point of their availability in the preprint archives. The median age of a citation is approximately six to eight years, depending on the field of science. Most of the reading, about 60 percent, that takes place is within the first year of publication, but almost all of that is for the purpose of keeping up with the literature and knowing what peers are doing. As the age of the material gets older, the usefulness and the value of that material ages as well, but about 10 percent of the articles that are read are over 15 years old. That is particularly useful information in industry, where people are assigned a new area of work that requires them to go back into the literature. Such feedback on uses can help make the publications better. The Issue of Cultural Diversity in Publishing Michael Smolens, with a company called 3BillionBooks, said that up to this point in the symposium he had not heard a term that might be referred to as cultural diversity. There are a lot of different cultures and language groups in the world that have a lot to say about many of the issues being discussed at this meeting. There was an organization founded in 1998 called the International Network of Cultural Diversity. It was started by someone in Sweden who got 30,000 artists, writers, and musicians together because they could not deal with the European Union in their own language of Swedish. This posed a problem for a broad range of interactions on many issues. Their goal is to try to have the subject of very small cultures and language groups be heard at international meetings and consortia, so that when the World Trade Organization is dealing with trade issues, someone there is at least thinking about the fact that language groupings are disappearing very rapidly and cultural diversity should be maintained. The cultural diversity issue around the world is a very sensitive one that everyone needs to keep in mind when discussing publishing on the Internet.