National Research Council
I think that everyone would agree that the rate of change in new technological systems often outpaces human capacity to adapt to the technological advances—and, even more so, the ability to exploit those advances for maximum social and economic benefits. This is particularly true for transformational technologies that displace their antecedent ones and their associated organizational paradigms.
In such cases, not only is it necessary to adopt new management approaches in response to technological progress, but it is necessary to overcome the substantial resistance to change by entrenched interests whose business model is based on the superseded technology. Such a transformation has been taking place over the past couple of decades as a result of the technological revolution brought about by the combination of digital information technologies and global communication networks.
Table 10–1 presents a comparison of the characteristics of publishing under the print paradigm with those of disseminating information via global digital networks.
|Comparison of some key characteristics of the print dissemination and digitally networked paradigms:|
|GLOBAL DIGITAL NETWORKS|
|(pre) Industrial Age||post-industrial Information Age|
|fixed, static||transformative, interactive|
|limited content and types||unlimited contents and multimedia|
|distribution difficult, slow||easy and immediate dissemination|
|copying cumbersome, not perfect||copying simple and identical|
|significant marginal distribution cost||zero marginal distribution cost|
|single user (or small group)||multiple, concurrent users/producers|
|centralized production||distributed and integrated production|
|slow knowledge diffusion||accelerated knowledge diffusion|
TABLE 10–1 Print versus digital network paradigms.31
Although these comparisons may be familiar, it bears emphasizing that the magnitude of the changes made possible by the shift from print to digital technologies and networks cannot be overstated, either quantitatively or qualitatively. The explosion in the production of digital bits is now well known as a function of Moore’s law. Digital networks also have well-known quantitative advantages over the previous print paradigm in time, geographical extent, and cost; that is, digital networks can provide instantaneous,
30 Presentation slides available at: http://sites.nationalacademies.org/xpedio/idcplg?IdcService=GET_FILE&dDocName=PGA_053717&RevisionSelectionMethod=Latest.
This presentation is based in large part on the draft monograph, Reichman, J.H., T. Dedeurwaerdere, and P. F. Uhlir. Designing the microbial research commons: Global intellectual property strategies for accessing and using essential public knowledge assets (Cambridge Univ. Press, forthcoming 2013).
31 Uhlir, Paul F. (2006) The emerging role of open repositories for scientific literature as a fundamental component of the public research infrastructure. In: Open Access: Open Problems. Polimetrica Publisher, pp. 59-103.
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 77
10. Designing the Digital Commons in Microbiology—Moving from Restrictive Dissemination of Publicly Funded Knowledge to Open Knowledge Environments: A Case Study in Microbiology –Paul Uhlir30 National Research Council I think that everyone would agree that the rate of change in new technological systems often outpaces human capacity to adapt to the technological advances—and, even more so, the ability to exploit those advances for maximum social and economic benefits. This is particularly true for transformational technologies that displace their antecedent ones and their associated organizational paradigms. In such cases, not only is it necessary to adopt new management approaches in response to technological progress, but it is necessary to overcome the substantial resistance to change by entrenched interests whose business model is based on the superseded technology. Such a transformation has been taking place over the past couple of decades as a result of the technological revolution brought about by the combination of digital information technologies and global communication networks. Table 10–1 presents a comparison of the characteristics of publishing under the print paradigm with those of disseminating information via global digital networks. Comparison of some key characteristics of the print dissemination and digitally networked paradigms: PRINT GLOBAL DIGITAL NETWORKS (pre) Industrial Age post-industrial Information Age fixed, static transformative, interactive rigid flexible, extensible physical virtual local global linear non-linear, asynchronous limited content and types unlimited contents and multimedia distribution difficult, slow easy and immediate dissemination copying cumbersome, not perfect copying simple and identical significant marginal distribution cost zero marginal distribution cost single user (or small group) multiple, concurrent users/producers centralized production distributed and integrated production slow knowledge diffusion accelerated knowledge diffusion TABLE 10–1 Print versus digital network paradigms.31 Although these comparisons may be familiar, it bears emphasizing that the magnitude of the changes made possible by the shift from print to digital technologies and networks cannot be overstated, either quantitatively or qualitatively. The explosion in the production of digital bits is now well known as a function of Moore’s law. Digital networks also have well-known quantitative advantages over the previous print paradigm in time, geographical extent, and cost; that is, digital networks can provide instantaneous, 30 Presentation slides available at: http://sites.nationalacademies.org/xpedio/idcplg?IdcService=GET_FILE&dDocName=PGA_053717&Rev isionSelectionMethod=Latest. This presentation is based in large part on the draft monograph, Reichman, J.H., T. Dedeurwaerdere, and P. F. Uhlir. Designing the microbial research commons: Global intellectual property strategies for accessing and using essential public knowledge assets (Cambridge Univ. Press, forthcoming 2013). 31 Uhlir, Paul F. (2006) The emerging role of open repositories for scientific literature as a fundamental component of the public research infrastructure. In: Open Access: Open Problems. Polimetrica Publisher, pp. 59-103. 77
OCR for page 77
concurrent, and global availability at near-zero marginal cost of access by each additional user. These quantitative improvements make possible, even if it has not yet been realized, the universal availability of information.32 The qualitative advantages of digital technologies and networks in accelerating the dissemination of information and the diffusion of knowledge are just as important as the quantitative ones. Because networks provide the opportunity for non-linear, interactive, and asynchronous communication with multimedia capabilities, the potential to improve the dissemination and diffusion processes has been greatly magnified. The digital nature of the information imbues it with flexible transformative properties, making it subject to easy manipulation and straightforward integration with other types of information, which in turns allows the creation of new knowledge that was either not possible or much more difficult in the print context. Moreover, the network makes possible entirely new forms of collaborative knowledge production on a broadly distributed and interactive basis, transforming or dismantling the hierarchical and centralized organizational models through which information was produced and knowledge diffused in previous eras. Perhaps most important, digital networks make possible entirely automated approaches to the extraction, processing, integration, and organization of vast amounts of information, which can in turn be transformed into unlimited new discoveries and products, eclipsing the capabilities of purely human information production, dissemination, and use.33 As both the principal inventors and pervasive users of the Internet, scientists have a great deal at stake in fully exploiting the potential of this new medium for accelerating scientific progress and its benefits to society. Table 10–2 offers a summary of some of the advantages to science of open access to—and unrestricted reuse of—publicly generated or funded data and information on digital networks. Advantages to science of open access to and unrestricted reuse of publicly generated or funded data and information on digital networks: Promotes interdisciplinary, inter-institutional, and international research Enables automated knowledge discovery Avoids inefficiencies, including duplication of research Promotes new research and new types of research Reinforces open scientific inquiry and encourages diversity of analysis and opinion Allows for the verification of previous results Makes possible the testing of new or alternative hypotheses and methods of analysis Supports studies on data collection methods and measurement Facilitates the education of new researchers Promotes citizen scientists and serendipitous results, enabling the exploration of topics not envisioned by the initial investigators and the primary research community Permits the creation of new datasets when data from multiple sources are combined Promotes capacity building in developing countries and global research Supports economic growth and social welfare Generally provides greater returns from public investments in research TABLE 10–2 Advantages of open access to an unrestricted use of digital information. 32 Ibid. 33 Ibid. 78
OCR for page 77
If one were to start over and construct a new institutional regime for scholarly communication on digital networks, what should the guiding principles be? I would suggest the following: 1. Maximize public-good aspects of publicly funded research data and info; 2. Avoid monopolies and artificial markets (service, not captured product); 3. Take advantage of zero marginal cost for global dissemination; 4. Support freedom of inquiry and collaborative research; 5. Optimize content for automated knowledge discovery tools; and 6. Maintain the traditional characteristics that are essential to the research community and the progress of science (quality control, reputational benefits, research impact, speed of publication, ease of access, and long-term preservation and sustainability). The bottom line is that open access online and the unrestricted reuse of research data and information produced from public funding is, in most cases, far superior to proprietary and restricted dissemination, as it maximizes value for the content producers and the user community rather than for the intermediaries who perform the dissemination services. The question is: How to get there? As part of our study, we analyzed the access and reuse policies and licenses of both the microbial journal literature and of some databases used in microbial research. The traditional practice for researchers publishing scientific articles is for the authors to assign their copyrights to the publishers, who are either commercial entities or learned societies and other not-for-profit scientific organizations. As a result, it is the publishers rather than the authors who initially determine the conditions for access to these articles and for reuse of the information and data they contain. Today, access to the contents of microbial journals is usually regulated by two sets of contracts. First, the publisher’s contract with the author will determine what the publisher owns and—to some extent—what it can do with the material. In the pre-digital age, this contract was usually the only one at issue, because readers’ and users’ rights were determined by statutory intellectual property laws (i.e., copyright laws) and, since 1996 in the European Union, by database protection laws. In our empirical research of the journal literature, we assessed the copyright and access policies of publishers responsible for journals containing primary research articles and reviews in the field of microbiology. We also selected science journals from other areas, such as immunology, that regularly publish articles in the field of microbiology. Most of the open access journals were obtained from the Directory of Open Access Journals (DOAJ) and from individual publisher websites, such as that of Horizon Press. The hybrid and subscription journals were selected primarily from the publisher websites and a few other Web resources. Sixty-four percent of the selected journals include articles about microbiology only, while the remaining journals publish articles from other areas as well. We analyzed a total of 303 journals dedicated in whole or in part to microbial research results. Some of the highlights of our findings include: About 30 percent were full open access (OA), including hybrid (both • purchased immediate OA and subscription); 20 percent were openly available but read-only; and 50 percent were subscription based. 79
OCR for page 77
80 percent of subscription journals allow author self-archiving on personal • websites, but almost 90 percent do not allow archiving on the author’s institutional websites and most are silent on external repository deposits (e.g., on PubMedCentral). 98 percent of subscription journals require transfer of copyright, although we • do not know the number that would approve an author’s request to retain copyright and grant only a nonexclusive license to publish. About 75 percent of all journals surveyed are published by for-profit • publishers. 96 percent of subscription journals give no direct discount to developing • country subscribers (but some may participate in group discounts to libraries through the INASP or HINARI programs). We also briefly analyzed the scientific databases used in microbial research. This survey was less comprehensive or rigorous than the one we did for the journal literature, in part because the information about these databases is less standardized and more diffuse. We found that: • Many molecular biology databases (genomic, proteomic) and taxonomic databases are openly available and free to use. • Molecular biology data in a lot of specialized research (e.g., energy and environment) are not deposited and not available. • There are many legal, policy, economic, and cultural pressures for the researcher to keep data secret, either because of the data’s commercial potential or strategic advantage or because of the burden of making the data useful to others. The intention of latter-day intellectual property laws is to secure rents from specified end uses of relevant knowledge goods, such as music, films, and software. The beneficiary industries do not contemplate uses, reuses, or redistribution of their products beyond those income-producing activities regulated by these laws, although the state may require them to tolerate some uncompensated uses in the larger public interest. Courts have traditionally narrowly interpreted the limitations and exceptions in favor of strengthening the right holders’ exclusive rights and the incentive effects they are supposed to provide. This approach conflicts directly with the needs of science, however. This is particularly true for public science in the digital domain, whose norms favor maximum use, reuse, and redistribution by third parties of the knowledge that publicly funded researchers generate. In the pre-digital epoch, legislation—and copyright legislation in particular—did contain some measures that attenuated this conflict in the interest of science, but the digital revolution that has created such promising opportunities for scientific research has also generated intense fears that publishers of literary and artistic works generally would become vulnerable to massive infringements online and to other threats of market failure. In response, publishers have pushed legislatures to recast and restructure copyright law in the online environment so as to preserve business models built around the print media. 80
OCR for page 77
Thus copyright laws in Organisation for Economic Co-operation and Development (OECD) countries and database protection laws in the European Union are on a collision course with some of the most promising scientific movements in history. These impediments to the global exchange of basic scientific information are then magnified by the ability of intellectual property rights holders to override relevant exceptions and limitations by a combination of technological protection measures and even more restrictive contractual conditions. In this legal environment, the continued ability of scientists to access, use, and reuse essential upstream knowledge assets depends increasingly on their willingness to disregard—consciously or unconsciously—the legal and contractual constraints on their everyday research. However, the implicit assumption that proprietary intermediaries will not detect violations of statutory or contractual restrictions on their continued treatment of these assets as public goods or, if they detect those violations, will not enforce their rights is neither tenable nor desirable. Sooner or later, there could be a clamorous case involving academics after which risk-adverse universities and university technology transfer offices would shut down the secret or arguably unintentional infringing activities now going on at many universities and scientific laboratories. The existing system thus offers only three unsatisfactory pathways for making available the basic building blocks of digitally integrated microbial research. The first is to continue to muddle through by ignoring a hostile legal environment, with all the attendant risks of civil disobedience generally. The second is to embrace the tendencies to privatize public goods by adopting the commercial and restrictive practices that are thought necessary to generate both research funds and revenues from downstream commercial applications. This commercializing trend will increase the costs of publicly funded research, which depends on access to general purpose research tools. It will also severely restrict, if not make entirely impossible, the exploitation of automated knowledge-generating opportunities through a proliferation of legally contrived thickets of rights and restrictive licensing conditions. The third pathway attempts to build an alternative open-access infrastructure, which could generate important payoffs in terms of enabling cumulative public research. However, a lack of coordination with respect to intellectual property provisions intended to maximize these different expected payoffs hampers the further development of any such alternative infrastructure. For this reason we examined a number of top-down and bottom-up responsive measures for making the current legal environment more science friendly. It is worth considering what sorts of legislative changes at both the domestic and international levels would be needed to improve the prospects for digitally integrated research. We have suggested several legislative changes to help balance the intellectual property (IP) regime between rights holders and public-interest, publicly-funded research and education users. Legislatures could provide more robust limitations and exceptions to traditional copyright law for not-for-profit, publicly funded research, for example. Laws could allow for greater access to and use of public research in digital copyright. And research funders could (with enabling legislation) mandate such things as author deposits and copyright retention with the authors. In our opinion, however, most of the needed legislative reforms have little or no chance of being enacted under the existing political–economic situation, or until new forces emerge, perhaps in the developing world, to rebalance the system. The balance of 81
OCR for page 77
such political forces remains decidedly contrary to such efforts, and the drift of bilateral relations, at least, is towards even higher levels of protection. Looking beyond these unlikely legislative solutions, there are numerous encouraging bottom-up initiatives that are already underway, where some progress has been made in achieving higher levels of access to relevant scientific literature and data. The open-source software movement is one example; the establishment of open repositories for publications in a specific area is another. The challenges in deriving maximum scientific value from still under-exploited technological opportunities lie largely in changing the social systems—the institutional, legal, economic, and sociological aspects—rather than in the technological advances, which will continue even without advances in the social systems. To make progress on these human behavioral aspects, all of the stakeholders involved worldwide in public research and in the process of communicating research results should take part in the unfolding debate, at some level, because they have a vested interest in its outcome. Up to this point, most of the advances that have been made toward opening up the information created by publicly funded research have come from the bottom up, from the work of many dedicated and visionary individuals and institutions. These actors have been the pathmakers in developing a broad range of initially disparate, but related institutional and policy initiatives in diverse information types, disciplines, and countries. As these projects proliferate and become better established, they are coming together to form a nascent, interoperable global information commons for public science. Those who fund and regulate public science from the top down are beginning to take notice. They are starting to build upon the tactical successes of the pathmakers and integrating them into broader national and international strategies for the investment and management of public science. A gradual restructuring of the scientific information sector and of the processes of scientific communication is thus now well underway, with the aim of taking more complete advantage of the transformational capabilities of digitally networked technologies. In light of the clear benefits to the research enterprise and to society from the open availability of publicly funded scientific information in the digitally networked environment, it is not surprising that a variety of new models have already been developed within the research community. As I noted in Past, Present and Future of Research in the Information Society,34 the common element of all these different types of initiatives is that the information is made openly and freely available digitally and online. In many cases, the material is made available under suitably reduced proprietary terms and conditions through permissive licenses (e.g., the GNU license for open source software, or Creative Commons licenses for open access journals or for some works in open repositories), or else the material is put into the public domain. In other cases, such as the delayed open availability that some publishers use for their journal articles, the works remain protected under full copyright, but eventually they become freely and openly accessible on a read-only basis. Just as the desirability of providing open availability to publicly funded scientific information online was substantiated in our survey of the microbiology literature and 34 Olson GM, David PA, Eksteen J, Sonnenwald DH, Uhlir PF, Tseng S-F, Huang H-I. International Collaborations Through the Internet. In: Shrum W., Benson, K.R., Bijker W. E., Brunnstein. K., editors. Past, Present and Future of Research in the Information Society. Boston, MA: Springer US; 2007 p. 97- 114.[cited 2011 Aug 15] Available from: http://www.springerlink.com/index/10.1007/978-0-387-47650- 6_7. 82
OCR for page 77
databases, the many different models that have already been established attest to the feasibility of doing so. These various examples now provide valid proofs of concept for all information types, for most disciplines—including microbiology, which has been done in many countries—and for all types of institutions, including government agencies, universities, not-for-profit organizations, and even for-profit firms. Taken together, these activities can be seen as part of an emerging broader movement in support of both formal and informal peer production and dissemination of publicly funded scientific (and other) information in a globally distributed, volunteer, and open-networked environment. These activities are based on principles that reflect the cooperative ethos that traditionally has imbued much of academic and government research agencies. Their norms and governance mechanisms may be characterized as those of the “public scientific information commons” rather than of a market system based upon proprietary data and information. The activities of such information commons activities respond—either explicitly or tacitly—to the needs of science and scientists. Although much industrial microbiology is conducted in private laboratories, the bulk of research in this area takes place at universities. This research has become increasingly computational and data driven. Universities already host many culture collections, and they also hold a vast amount of microbial materials in research collections outside the formally constituted culture collections. University research on all these materials has increasingly become a networked digital process linking distributed thematic communities. As the digital component increases in importance, the research becomes more interdisciplinary and dependent on inputs from bioinformatics, computational science, genomics and proteomics, environmental science, agriculture, and health. These interdisciplinary activities, although emanating from a core thematic group based at one or more university centers, operate across university boundaries—and even national boundaries— in order to pursue the thematic interest on an increasingly global basis. In successful cases, the research outputs of these knowledge hubs are usually the fruits of resources that the networked participants have voluntarily pooled from the outset. These outputs are made available for use and reuse to an ever-expanding open community of interested scientists on terms determined by the thematic community. The productivity of these thematic communities is then further enhanced by a growing array of digital and computational tools and techniques, which are put to a common purpose. When these joint research activities reach the point of yielding published research results, however, they are typically outsourced to a professional society or a publisher, and this step then normally triggers all the legal constraints and restrictions we have described. This customary institutional arrangement in turn limits access to and use of the knowledge assets that the digitally sophisticated scientific community has at its disposal, even when it is the source of those very same assets. The logical response is to cut the Gordian knot by retaining ownership and control of all knowledge assets produced by the relevant research community with public funding within the public science framework itself, rather than assigning them to external publishing intermediaries. Although this was customary in the past, when the print medium dictated high front-end costs, it is not necessary in a digital world. Once possessed of ownership and control, the scientists and their universities will be in a position to do two things: (1) to avoid all the technical and legal restrictions described above, and (2) to organize the use and reuse of these knowledge assets by means of new 83
OCR for page 77
institutional frameworks that are specifically designed to promote collaborative research within fully integrated digital networks. Such an institutional framework would, for example, give universities the power to determine the conditions under which research results were disseminated and reused, in a manner consistent with the needs of microbial research and education. In this approach, if external intermediaries were used, these intermediaries would operate as service providers on science-friendly terms and with open access prerequisites, as prescribed by the universities. The quid quo pro would be the provision of efficient services that the universities, for various reasons, did not wish to undertake. Another option would be for the university to integrate the publishing function into the work of the emerging knowledge hubs themselves. In such a case, the funder’s support would enable interdisciplinary collaboration in the production and rapid dissemination of research results that were themselves publicly funded, thereby magnifying the social benefits of the public investment. At the same time, the knowledge hubs could evolve into a more solid institutionalized platform, with a view to integrating and systematizing all the knowledge resources needed by the community and all the digital services that made access to use and reuse of these resources as easy and efficient as possible, while also stimulating related educational activities and downstream commercial applications. In this scenario, public funds would remain within the circle of knowledge creators and would nourish all the relevant services, with very low transaction costs and without dissipation to unnecessary external information brokers. Furthermore, taking microbial journals back to universities and certain other public research institutes would also make it possible to exploit the interdisciplinary resources and inputs of different departments, including, for example, computer science and engineering departments, medical schools, public policy institutes, environmental institutes, and library information services and resources. Moreover, these advantages might be compounded if a consortium of universities pooled their resources to manage and produce a given journal or a set of journals organized on thematic lines. Scientific control over contents through the universities should ensure that high-quality standards were maintained and that the journals would be open access from the start and optimized for network exploitation. Indeed, once the opportunities of digital networks are taken into account, placing microbial journals in the universities would appear to offer many more advantages than keeping them at commercial publishers or even at professional societies. For example, the societies cannot provide the educational and research opportunities that already exist at the universities, and so they would remain essentially extrinsic, semiautonomous bodies that depend on services provided by individual scientists. Nor can the professional societies make available the kind of interdisciplinary resources available at the universities without transforming themselves into quasi-universities themselves, which, even if otherwise feasible, would be a wasteful and duplicative use of the relevant funds. An even more powerful argument for preferring the universities to either professional societies or commercial publishers is that microbial science journals should no longer be seen as ends in themselves. Rather, by repositioning them within the universities, the journals could become cogs in—and stepping stones to—the realization of digital knowledge hubs in which journals are but one component. From this perspective, all the microbial journals thus repositioned should become open access by definition, and all their contents should become available for harvesting by others, for thematic re-integration in other collections, and for various forms of digital 84
OCR for page 77
manipulation. More broadly, the publishing function that supports the journals would logically be expanded to support specialized knowledge environments built around the relevant user and research communities and themes. By thus deconstructing the print publishing model and moving the journals or the articles in them into an academic environment, one begins to reconstruct a digitally networked scientific communications model, in which the content providers are the communicators, the intermediaries, the users, and the governors of a dynamically constituted knowledge environment. We call this digitally networked scientific communications model an “open knowledge environment” (OKE). Over time, these knowledge environments, although hosted by different universities, could be linked together in an integrated knowledge ecology that would enhance the reputational benefits of the participating universities and yield scientific payoffs greater than any single source could produce. Integrating openly available scientific information resources with open-source collaborative tools online would enable the formation of OKEs for the creation of new knowledge, the enhancement of educational opportunities, and the stimulation of downstream applications. Such an approach would harness the social and technical power of the network which, if properly managed, could greatly increase the value of the knowledge in ways not currently possible with the traditional information production and dissemination processes, and it generally could do so at a much lower cost than the traditional approach. At the core of an OKE are interactive portals focused on knowledge production and on collaborative research and educational opportunities in specific thematic areas. Ideally, OKEs would be developed around one or more thematically linked, open-access journals and would be augmented by openly available reports, grey literature, and data. Various interactive functions (wikis, discussion forums, blogs, post-publication reviews, and perhaps distributed grid computing) would be added to stimulate discussion and contributions related to specific issues. The OKEs we envision could readily be hosted at single universities, or their components could be distributed among a consortium of universities having a strong interest in the relevant subject matter. They could also be based at other not-for-profit research centers or at government agencies, although this would compromise the educational function that we also seek to promote. In every case they would be multidisciplinary in character, not only bringing in experts with the appropriate subject- matter expertise, but also involving computer engineers, information scientists, librarians, and other potential contributors to help establish and manage the OKEs and to learn from operating them. Such a knowledge-production project not only would involve senior faculty and experts in its development and application, but also would serve as a mechanism for teaching students in the related departments at the university and as a vehicle for involving the students in the management of the OKE itself. At the same time, the thematic OKEs could integrate information beyond the conventional disciplinary boundaries, making them tools that are especially well suited to interdisciplinary environments. The OKE concept proposed here would thus build upon a number of recent, but already tested, advances in the online peer production of knowledge and participative Web 2.0 techniques. Such capabilities are virtually impossible under the proprietary journal model. Indeed, within our proposed open knowledge environments, the narrowly stove-piped, print-paradigm journal model would be transformed into a truly interactive networked initiative. Nonetheless, we stress that these OKEs should maintain the highest-quality 85
OCR for page 77
standards of scholarly endeavor, and they should strive to promote the reputational benefits of the participants and of their universities. Most of these thematic knowledge hubs would also provide essential digital infrastructure functions in support of the microbial research community. Such service functions could include high performance search engines that would enhance the possibilities for finding relevant information in publications and would allow for cross- linking and text mining based on standardized metadata. While these collaborative functions of the OKE may seem futuristic, they are already being implemented in some microbial science communities as well as in other disciplines. What makes the concept seem futuristic is the existing condition of publishing. The legal terms and conditions in many of the publishers’ contracts, buttressed by the larger statutory environment, aim tacitly to protect the print model against the challenges—perceived as risks rather than opportunities—of the digital networked environment. It is this limited vision and obstructive legal culture, in addition to certain other challenging problems, such as obtaining sustainable funding, that makes it difficult to broadly realize OKEs. Nevertheless, there are some examples of the OKE concept already operating. The move towards an integrated microbial research commons requires linking the materials, digital data, literature and other information resources available from a globally distributed open-access infrastructure and providing interactive platforms for scientists to build on those resources and contribute to them. Effective links between the different open-access components of the material and digital commons are needed to improve the efficacy of cumulative research and to increase the speed of the entire research cycle. Moreover, in specific cases, the combined use of in vitro and in silico biology offers new opportunities for research, as we noted above. For instance, the task of searching for sequence similarities between the results of high-throughput screening and similar sequences with known properties available from public databases has become a key tool of metagenomics research. Without the aid of computers, the full genome sequences, which are sometimes several hundred pages in length when printed, are not interpretable. Hence, in genomics, advances in computing and in molecular analysis go hand in hand. Under the larger framework we envision—with a federated network of interactive portals to all the materials, databases, and literature made openly available—it would become possible to establish a registration system administered by a governing body or a trusted intermediary (or an international database collaboration agreement). The World Federation of Culture Collections (WFCC) already hosts different open-access components of the research infrastructure, such as the World Data Center for Microorganisms and the StrainInfo.net bioportal for data and access to the materials held in the culture collections. Moreover, many individual scientists who are active within the WFCC also play key roles in sister organizations, such as the International Union for Microbial Sciences, that also promote open access, especially for research results in the scientific literature. Hence, the WFCC could play a key role in catalyzing the establishment of a governing body for the fully integrated system, which could grow out of the StrainInfo experiment and be established under its own umbrella or within a new organizational and collaborative framework. In addition to its publishing aspects, this restructuring should considerably augment the scientific payoffs by accelerating the diffusion and reuse of research results, by integrating disparate knowledge components into a dynamically evolving whole, by 86
OCR for page 77
facilitating automated knowledge discovery, and by making published research results openly available to nontraditional users or reusers in other disciplines and in developing countries. This restructuring would prove particularly beneficial for microbial science as a whole, which seems poised to enter a “big science” framework but remains hindered by a disaggregated “small science” heritage and corresponding mentality. By embracing the open knowledge environment vision, microbial science could break out of the organizational limitations inherited from the past and move to the forefront of life science research. The likely result would be a more powerful collaborative approach that would expand the existing knowledge base while fostering greater technical and intellectual capacity to exploit. Moreover, this restructuring could produce the critical mass needed to self- organize in a way that limits the undue influence of commoditizing pressures on public and upstream research, while creating mechanisms for greater cooperation in pre- competitive and noncommercial research activities; such cooperation has to date been lagging in microbial science. We have in mind the example of molecular biology in the late 1980s, which self-organized and developed a big science infrastructure and became a leader in the life sciences open access movement. More broadly, the OKE model could have far-reaching implications for the work of universities and research policy institutions, both for targeted problem solving and for the dissemination and impact of high-level reports and research results. This approach could eventually become an integral part of many research plans and budgets. In addition, it is easy to envision many other organizations, at both the national and international levels, applying such methods to developing their knowledge inputs and outputs. Finally, these insights also suggest why open knowledge environments provide a promising solution to the hard problem of hoarded data. Viewed in isolation, a data pool is only as good as its single components. But an OKE puts all the strength of the microbial research community behind the pool, in the sense that the data pool is itself just one component of a larger whole that combines the data with the literature, materials, and technical services in one community-managed resource. In the context of OKEs, the exchange process is established on a solid and reliable foundation, one that makes full use of automated knowledge tools that are geared to community-determined goals. While these goals evolve and shift over time, in keeping with the relevant sub-communities’ own research needs, an ever-expanding infrastructure supports and magnifies all of the reciprocity gains from “formalizing the informal process” of the exchange of data, information, and materials. 87
OCR for page 77
Question and Answer Session PARTICIPANT: Having myself, on occasion, offered wonderful visions of the future, I applaud another great vision. As a longtime university professor, however, I say to myself that this is one more social problem, one more societal need that is being put in a truck and driven over to the nearest university with the instructions, “You solve it.” Now, the simple question I ask is: Universities have been already encouraged to spin off the results of research projects into commercial ventures because that was regarded as a social good. Why should they not also spin out initiatives that come out of the research communities, such as StrainInfo, into a not-for-profit corporate organization, in which university professors could be allowed to participate as they do when they are working across the street in their commercial lab. Why cannot universities and foundations raise funds for this? Because this is one more task that deflects from others, unless the reason includes a way to generate more funds for university. If we could have more funds to support these activities within the framework of research, we would have many more documented usable and early-released databases that we now have, and part of the problem is the funding agencies do not want open-ended commitments to support the infrastructure. So, the question is: It is a great idea, it can work on a small scale, but you are using the marginal resources of the university to do something that really should be done properly and recognized as an important infrastructure. You are talking about changing the model of publication, and that should be done on an experimental basis to see if it works with foundation funding. MR. UHLIR: I agree with all these comments. I left out quite a bit that we have in our draft monograph that addresses some of these issues. First, there is a model already existing in universities—the law reviews, which are run by students and which are effectively open access and published at very low cost within the university, generally without any extra funds. So, there is a proof of concept already in a different context. Now, we recognize also that there are imperfections of analogy there, so the model that we have been developing for science is somewhat different than for the law journals, but it is related. In particular, there are three examples that I did not have time to get into, but which will be discussed by different people. Peter Dawyndt will be talking about StrainInfo, and the CAMERA Project has already been noted by Mark Ellisman and will be discussed tomorrow as well by Paul Gilna. The Genome Standards Consortium (GSC) will also be discussed to some extent, I believe. So, there is some experimentation going on, and it is coming from the bottom up. The GSC has an open-access journal that it just launched as part of what I would call its open knowledge environment, or open interactive portal. There are some proofs of concept in the science field, as well. The infrastructure aspect is really fairly low cost. Our model depends on a lot of existing expertise and labor within the universities—within, say, the libraries, the computer departments, and the information schools—which would all be brought into creating such environments. The students would be involved in the management. It would be part teaching tool, part knowledge production and dissemination. It would also generate interest by funders to provide grants and attract collaborations because it would be a new kind of thematic hub relating to a certain area of research. And, so, it would become, I think, a much more 88
OCR for page 77
vigorous and attractive knowledge production and educational tool with fairly low costs for implementation. But it remains to be tested, and I agree that it needs pilot projects that would be funded by let us say NSF or foundations. Certainly we do not expect all the journals to be superseded by this kind of process and it would all be done in an incremental way. It would be a way to get away from the stovepipe print-paradigm journal system, with journals that have a bunch of unrelated articles in each issue that are not optimized for automated knowledge discovery. 89
OCR for page 77