Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 77
10. Designing the Digital Commons in Microbiology—Moving from Restrictive
Dissemination of Publicly Funded Knowledge to Open Knowledge
Environments: A Case Study in Microbiology
–Paul Uhlir30
National Research Council
I think that everyone would agree that the rate of change in new technological
systems often outpaces human capacity to adapt to the technological advances—and,
even more so, the ability to exploit those advances for maximum social and economic
benefits. This is particularly true for transformational technologies that displace their
antecedent ones and their associated organizational paradigms.
In such cases, not only is it necessary to adopt new management approaches in
response to technological progress, but it is necessary to overcome the substantial
resistance to change by entrenched interests whose business model is based on the
superseded technology. Such a transformation has been taking place over the past couple
of decades as a result of the technological revolution brought about by the combination of
digital information technologies and global communication networks.
Table 10–1 presents a comparison of the characteristics of publishing under the
print paradigm with those of disseminating information via global digital networks.
Comparison of some key characteristics of the print dissemination and digitally networked paradigms:
PRINT GLOBAL DIGITAL NETWORKS
(pre) Industrial Age post-industrial Information Age
fixed, static transformative, interactive
rigid flexible, extensible
physical virtual
local global
linear non-linear, asynchronous
limited content and types unlimited contents and multimedia
distribution difficult, slow easy and immediate dissemination
copying cumbersome, not perfect copying simple and identical
significant marginal distribution cost zero marginal distribution cost
single user (or small group) multiple, concurrent users/producers
centralized production distributed and integrated production
slow knowledge diffusion accelerated knowledge diffusion
TABLE 10–1 Print versus digital network paradigms.31
Although these comparisons may be familiar, it bears emphasizing that the
magnitude of the changes made possible by the shift from print to digital technologies
and networks cannot be overstated, either quantitatively or qualitatively. The explosion in
the production of digital bits is now well known as a function of Moore’s law. Digital
networks also have well-known quantitative advantages over the previous print paradigm
in time, geographical extent, and cost; that is, digital networks can provide instantaneous,
30
Presentation slides available at:
http://sites.nationalacademies.org/xpedio/idcplg?IdcService=GET_FILE&dDocName=PGA_053717&Rev
isionSelectionMethod=Latest.
This presentation is based in large part on the draft monograph, Reichman, J.H., T. Dedeurwaerdere, and P.
F. Uhlir. Designing the microbial research commons: Global intellectual property strategies for accessing
and using essential public knowledge assets (Cambridge Univ. Press, forthcoming 2013).
31
Uhlir, Paul F. (2006) The emerging role of open repositories for scientific literature as a fundamental
component of the public research infrastructure. In: Open Access: Open Problems. Polimetrica Publisher,
pp. 59-103.
77
OCR for page 78
concurrent, and global availability at near-zero marginal cost of access by each additional
user. These quantitative improvements make possible, even if it has not yet been realized,
the universal availability of information.32
The qualitative advantages of digital technologies and networks in accelerating
the dissemination of information and the diffusion of knowledge are just as important as
the quantitative ones. Because networks provide the opportunity for non-linear,
interactive, and asynchronous communication with multimedia capabilities, the potential
to improve the dissemination and diffusion processes has been greatly magnified. The
digital nature of the information imbues it with flexible transformative properties, making
it subject to easy manipulation and straightforward integration with other types of
information, which in turns allows the creation of new knowledge that was either not
possible or much more difficult in the print context.
Moreover, the network makes possible entirely new forms of collaborative
knowledge production on a broadly distributed and interactive basis, transforming or
dismantling the hierarchical and centralized organizational models through which
information was produced and knowledge diffused in previous eras. Perhaps most
important, digital networks make possible entirely automated approaches to the
extraction, processing, integration, and organization of vast amounts of information,
which can in turn be transformed into unlimited new discoveries and products, eclipsing
the capabilities of purely human information production, dissemination, and use.33 As
both the principal inventors and pervasive users of the Internet, scientists have a great
deal at stake in fully exploiting the potential of this new medium for accelerating
scientific progress and its benefits to society.
Table 10–2 offers a summary of some of the advantages to science of open access
to—and unrestricted reuse of—publicly generated or funded data and information on
digital networks.
Advantages to science of open access to and unrestricted reuse of publicly generated or
funded data and information on digital networks:
Promotes interdisciplinary, inter-institutional, and international research
Enables automated knowledge discovery
Avoids inefficiencies, including duplication of research
Promotes new research and new types of research
Reinforces open scientific inquiry and encourages diversity of analysis and opinion
Allows for the verification of previous results
Makes possible the testing of new or alternative hypotheses and methods of analysis
Supports studies on data collection methods and measurement
Facilitates the education of new researchers
Promotes citizen scientists and serendipitous results, enabling the exploration of topics not envisioned
by the initial investigators and the primary research community
Permits the creation of new datasets when data from multiple sources are combined
Promotes capacity building in developing countries and global research
Supports economic growth and social welfare
Generally provides greater returns from public investments in research
TABLE 10–2 Advantages of open access to an unrestricted use of digital information.
32
Ibid.
33
Ibid.
78
OCR for page 79
If one were to start over and construct a new institutional regime for scholarly
communication on digital networks, what should the guiding principles be? I would
suggest the following:
1. Maximize public-good aspects of publicly funded research data and info;
2. Avoid monopolies and artificial markets (service, not captured product);
3. Take advantage of zero marginal cost for global dissemination;
4. Support freedom of inquiry and collaborative research;
5. Optimize content for automated knowledge discovery tools; and
6. Maintain the traditional characteristics that are essential to the research
community and the progress of science (quality control, reputational benefits,
research impact, speed of publication, ease of access, and long-term
preservation and sustainability).
The bottom line is that open access online and the unrestricted reuse of research
data and information produced from public funding is, in most cases, far superior to
proprietary and restricted dissemination, as it maximizes value for the content producers
and the user community rather than for the intermediaries who perform the dissemination
services. The question is: How to get there?
As part of our study, we analyzed the access and reuse policies and licenses of
both the microbial journal literature and of some databases used in microbial research.
The traditional practice for researchers publishing scientific articles is for the authors to
assign their copyrights to the publishers, who are either commercial entities or learned
societies and other not-for-profit scientific organizations. As a result, it is the publishers
rather than the authors who initially determine the conditions for access to these articles
and for reuse of the information and data they contain.
Today, access to the contents of microbial journals is usually regulated by two
sets of contracts. First, the publisher’s contract with the author will determine what the
publisher owns and—to some extent—what it can do with the material. In the pre-digital
age, this contract was usually the only one at issue, because readers’ and users’ rights
were determined by statutory intellectual property laws (i.e., copyright laws) and, since
1996 in the European Union, by database protection laws.
In our empirical research of the journal literature, we assessed the copyright and
access policies of publishers responsible for journals containing primary research articles
and reviews in the field of microbiology. We also selected science journals from other
areas, such as immunology, that regularly publish articles in the field of microbiology.
Most of the open access journals were obtained from the Directory of Open
Access Journals (DOAJ) and from individual publisher websites, such as that of Horizon
Press. The hybrid and subscription journals were selected primarily from the publisher
websites and a few other Web resources. Sixty-four percent of the selected journals
include articles about microbiology only, while the remaining journals publish articles
from other areas as well. We analyzed a total of 303 journals dedicated in whole or in
part to microbial research results. Some of the highlights of our findings include:
About 30 percent were full open access (OA), including hybrid (both
•
purchased immediate OA and subscription); 20 percent were openly available
but read-only; and 50 percent were subscription based.
79
OCR for page 80
80 percent of subscription journals allow author self-archiving on personal
•
websites, but almost 90 percent do not allow archiving on the author’s
institutional websites and most are silent on external repository deposits (e.g.,
on PubMedCentral).
98 percent of subscription journals require transfer of copyright, although we
•
do not know the number that would approve an author’s request to retain
copyright and grant only a nonexclusive license to publish.
About 75 percent of all journals surveyed are published by for-profit
•
publishers.
96 percent of subscription journals give no direct discount to developing
•
country subscribers (but some may participate in group discounts to libraries
through the INASP or HINARI programs).
We also briefly analyzed the scientific databases used in microbial research. This
survey was less comprehensive or rigorous than the one we did for the journal literature,
in part because the information about these databases is less standardized and more
diffuse. We found that:
• Many molecular biology databases (genomic, proteomic) and taxonomic
databases are openly available and free to use.
• Molecular biology data in a lot of specialized research (e.g., energy and
environment) are not deposited and not available.
• There are many legal, policy, economic, and cultural pressures for the
researcher to keep data secret, either because of the data’s commercial
potential or strategic advantage or because of the burden of making the data
useful to others.
The intention of latter-day intellectual property laws is to secure rents from
specified end uses of relevant knowledge goods, such as music, films, and software. The
beneficiary industries do not contemplate uses, reuses, or redistribution of their products
beyond those income-producing activities regulated by these laws, although the state may
require them to tolerate some uncompensated uses in the larger public interest. Courts
have traditionally narrowly interpreted the limitations and exceptions in favor of
strengthening the right holders’ exclusive rights and the incentive effects they are
supposed to provide.
This approach conflicts directly with the needs of science, however. This is
particularly true for public science in the digital domain, whose norms favor maximum
use, reuse, and redistribution by third parties of the knowledge that publicly funded
researchers generate. In the pre-digital epoch, legislation—and copyright legislation in
particular—did contain some measures that attenuated this conflict in the interest of
science, but the digital revolution that has created such promising opportunities for
scientific research has also generated intense fears that publishers of literary and artistic
works generally would become vulnerable to massive infringements online and to other
threats of market failure. In response, publishers have pushed legislatures to recast and
restructure copyright law in the online environment so as to preserve business models
built around the print media.
80
OCR for page 81
Thus copyright laws in Organisation for Economic Co-operation and
Development (OECD) countries and database protection laws in the European Union are
on a collision course with some of the most promising scientific movements in history.
These impediments to the global exchange of basic scientific information are then
magnified by the ability of intellectual property rights holders to override relevant
exceptions and limitations by a combination of technological protection measures and
even more restrictive contractual conditions.
In this legal environment, the continued ability of scientists to access, use, and
reuse essential upstream knowledge assets depends increasingly on their willingness to
disregard—consciously or unconsciously—the legal and contractual constraints on their
everyday research. However, the implicit assumption that proprietary intermediaries will
not detect violations of statutory or contractual restrictions on their continued treatment
of these assets as public goods or, if they detect those violations, will not enforce their
rights is neither tenable nor desirable. Sooner or later, there could be a clamorous case
involving academics after which risk-adverse universities and university technology
transfer offices would shut down the secret or arguably unintentional infringing activities
now going on at many universities and scientific laboratories.
The existing system thus offers only three unsatisfactory pathways for making
available the basic building blocks of digitally integrated microbial research. The first is
to continue to muddle through by ignoring a hostile legal environment, with all the
attendant risks of civil disobedience generally. The second is to embrace the tendencies to
privatize public goods by adopting the commercial and restrictive practices that are
thought necessary to generate both research funds and revenues from downstream
commercial applications. This commercializing trend will increase the costs of publicly
funded research, which depends on access to general purpose research tools. It will also
severely restrict, if not make entirely impossible, the exploitation of automated
knowledge-generating opportunities through a proliferation of legally contrived thickets
of rights and restrictive licensing conditions.
The third pathway attempts to build an alternative open-access infrastructure,
which could generate important payoffs in terms of enabling cumulative public research.
However, a lack of coordination with respect to intellectual property provisions intended
to maximize these different expected payoffs hampers the further development of any
such alternative infrastructure. For this reason we examined a number of top-down and
bottom-up responsive measures for making the current legal environment more science
friendly.
It is worth considering what sorts of legislative changes at both the domestic and
international levels would be needed to improve the prospects for digitally integrated
research. We have suggested several legislative changes to help balance the intellectual
property (IP) regime between rights holders and public-interest, publicly-funded research
and education users. Legislatures could provide more robust limitations and exceptions to
traditional copyright law for not-for-profit, publicly funded research, for example. Laws
could allow for greater access to and use of public research in digital copyright. And
research funders could (with enabling legislation) mandate such things as author deposits
and copyright retention with the authors.
In our opinion, however, most of the needed legislative reforms have little or no
chance of being enacted under the existing political–economic situation, or until new
forces emerge, perhaps in the developing world, to rebalance the system. The balance of
81
OCR for page 82
such political forces remains decidedly contrary to such efforts, and the drift of bilateral
relations, at least, is towards even higher levels of protection.
Looking beyond these unlikely legislative solutions, there are numerous
encouraging bottom-up initiatives that are already underway, where some progress has
been made in achieving higher levels of access to relevant scientific literature and data.
The open-source software movement is one example; the establishment of open
repositories for publications in a specific area is another.
The challenges in deriving maximum scientific value from still under-exploited
technological opportunities lie largely in changing the social systems—the institutional,
legal, economic, and sociological aspects—rather than in the technological advances,
which will continue even without advances in the social systems. To make progress on
these human behavioral aspects, all of the stakeholders involved worldwide in public
research and in the process of communicating research results should take part in the
unfolding debate, at some level, because they have a vested interest in its outcome.
Up to this point, most of the advances that have been made toward opening up the
information created by publicly funded research have come from the bottom up, from the
work of many dedicated and visionary individuals and institutions. These actors have
been the pathmakers in developing a broad range of initially disparate, but related
institutional and policy initiatives in diverse information types, disciplines, and countries.
As these projects proliferate and become better established, they are coming together to
form a nascent, interoperable global information commons for public science.
Those who fund and regulate public science from the top down are beginning to
take notice. They are starting to build upon the tactical successes of the pathmakers and
integrating them into broader national and international strategies for the investment and
management of public science. A gradual restructuring of the scientific information
sector and of the processes of scientific communication is thus now well underway, with
the aim of taking more complete advantage of the transformational capabilities of
digitally networked technologies.
In light of the clear benefits to the research enterprise and to society from the
open availability of publicly funded scientific information in the digitally networked
environment, it is not surprising that a variety of new models have already been
developed within the research community. As I noted in Past, Present and Future of
Research in the Information Society,34 the common element of all these different types of
initiatives is that the information is made openly and freely available digitally and online.
In many cases, the material is made available under suitably reduced proprietary terms
and conditions through permissive licenses (e.g., the GNU license for open source
software, or Creative Commons licenses for open access journals or for some works in
open repositories), or else the material is put into the public domain. In other cases, such
as the delayed open availability that some publishers use for their journal articles, the
works remain protected under full copyright, but eventually they become freely and
openly accessible on a read-only basis.
Just as the desirability of providing open availability to publicly funded scientific
information online was substantiated in our survey of the microbiology literature and
34
Olson GM, David PA, Eksteen J, Sonnenwald DH, Uhlir PF, Tseng S-F, Huang H-I. International
Collaborations Through the Internet. In: Shrum W., Benson, K.R., Bijker W. E., Brunnstein. K., editors.
Past, Present and Future of Research in the Information Society. Boston, MA: Springer US; 2007 p. 97-
114.[cited 2011 Aug 15] Available from: http://www.springerlink.com/index/10.1007/978-0-387-47650-
6_7.
82
OCR for page 83
databases, the many different models that have already been established attest to the
feasibility of doing so. These various examples now provide valid proofs of concept for
all information types, for most disciplines—including microbiology, which has been done
in many countries—and for all types of institutions, including government agencies,
universities, not-for-profit organizations, and even for-profit firms.
Taken together, these activities can be seen as part of an emerging broader
movement in support of both formal and informal peer production and dissemination of
publicly funded scientific (and other) information in a globally distributed, volunteer, and
open-networked environment. These activities are based on principles that reflect the
cooperative ethos that traditionally has imbued much of academic and government
research agencies. Their norms and governance mechanisms may be characterized as
those of the “public scientific information commons” rather than of a market system
based upon proprietary data and information. The activities of such information commons
activities respond—either explicitly or tacitly—to the needs of science and scientists.
Although much industrial microbiology is conducted in private laboratories, the
bulk of research in this area takes place at universities. This research has become
increasingly computational and data driven. Universities already host many culture
collections, and they also hold a vast amount of microbial materials in research
collections outside the formally constituted culture collections. University research on all
these materials has increasingly become a networked digital process linking distributed
thematic communities.
As the digital component increases in importance, the research becomes more
interdisciplinary and dependent on inputs from bioinformatics, computational science,
genomics and proteomics, environmental science, agriculture, and health. These
interdisciplinary activities, although emanating from a core thematic group based at one
or more university centers, operate across university boundaries—and even national
boundaries— in order to pursue the thematic interest on an increasingly global basis. In
successful cases, the research outputs of these knowledge hubs are usually the fruits of
resources that the networked participants have voluntarily pooled from the outset. These
outputs are made available for use and reuse to an ever-expanding open community of
interested scientists on terms determined by the thematic community. The productivity of
these thematic communities is then further enhanced by a growing array of digital and
computational tools and techniques, which are put to a common purpose.
When these joint research activities reach the point of yielding published research
results, however, they are typically outsourced to a professional society or a publisher,
and this step then normally triggers all the legal constraints and restrictions we have
described. This customary institutional arrangement in turn limits access to and use of the
knowledge assets that the digitally sophisticated scientific community has at its disposal,
even when it is the source of those very same assets.
The logical response is to cut the Gordian knot by retaining ownership and control
of all knowledge assets produced by the relevant research community with public funding
within the public science framework itself, rather than assigning them to external
publishing intermediaries. Although this was customary in the past, when the print
medium dictated high front-end costs, it is not necessary in a digital world. Once
possessed of ownership and control, the scientists and their universities will be in a
position to do two things: (1) to avoid all the technical and legal restrictions described
above, and (2) to organize the use and reuse of these knowledge assets by means of new
83
OCR for page 84
institutional frameworks that are specifically designed to promote collaborative research
within fully integrated digital networks.
Such an institutional framework would, for example, give universities the power
to determine the conditions under which research results were disseminated and reused,
in a manner consistent with the needs of microbial research and education. In this
approach, if external intermediaries were used, these intermediaries would operate as
service providers on science-friendly terms and with open access prerequisites, as
prescribed by the universities. The quid quo pro would be the provision of efficient
services that the universities, for various reasons, did not wish to undertake.
Another option would be for the university to integrate the publishing function
into the work of the emerging knowledge hubs themselves. In such a case, the funder’s
support would enable interdisciplinary collaboration in the production and rapid
dissemination of research results that were themselves publicly funded, thereby
magnifying the social benefits of the public investment. At the same time, the knowledge
hubs could evolve into a more solid institutionalized platform, with a view to integrating
and systematizing all the knowledge resources needed by the community and all the
digital services that made access to use and reuse of these resources as easy and efficient
as possible, while also stimulating related educational activities and downstream
commercial applications. In this scenario, public funds would remain within the circle of
knowledge creators and would nourish all the relevant services, with very low transaction
costs and without dissipation to unnecessary external information brokers.
Furthermore, taking microbial journals back to universities and certain other
public research institutes would also make it possible to exploit the interdisciplinary
resources and inputs of different departments, including, for example, computer science
and engineering departments, medical schools, public policy institutes, environmental
institutes, and library information services and resources. Moreover, these advantages
might be compounded if a consortium of universities pooled their resources to manage
and produce a given journal or a set of journals organized on thematic lines. Scientific
control over contents through the universities should ensure that high-quality standards
were maintained and that the journals would be open access from the start and optimized
for network exploitation.
Indeed, once the opportunities of digital networks are taken into account, placing
microbial journals in the universities would appear to offer many more advantages than
keeping them at commercial publishers or even at professional societies. For example, the
societies cannot provide the educational and research opportunities that already exist at
the universities, and so they would remain essentially extrinsic, semiautonomous bodies
that depend on services provided by individual scientists. Nor can the professional
societies make available the kind of interdisciplinary resources available at the
universities without transforming themselves into quasi-universities themselves, which,
even if otherwise feasible, would be a wasteful and duplicative use of the relevant funds.
An even more powerful argument for preferring the universities to either
professional societies or commercial publishers is that microbial science journals should
no longer be seen as ends in themselves. Rather, by repositioning them within the
universities, the journals could become cogs in—and stepping stones to—the realization
of digital knowledge hubs in which journals are but one component.
From this perspective, all the microbial journals thus repositioned should become
open access by definition, and all their contents should become available for harvesting
by others, for thematic re-integration in other collections, and for various forms of digital
84
OCR for page 85
manipulation. More broadly, the publishing function that supports the journals would
logically be expanded to support specialized knowledge environments built around the
relevant user and research communities and themes. By thus deconstructing the print
publishing model and moving the journals or the articles in them into an academic
environment, one begins to reconstruct a digitally networked scientific communications
model, in which the content providers are the communicators, the intermediaries, the
users, and the governors of a dynamically constituted knowledge environment.
We call this digitally networked scientific communications model an “open
knowledge environment” (OKE). Over time, these knowledge environments, although
hosted by different universities, could be linked together in an integrated knowledge
ecology that would enhance the reputational benefits of the participating universities and
yield scientific payoffs greater than any single source could produce.
Integrating openly available scientific information resources with open-source
collaborative tools online would enable the formation of OKEs for the creation of new
knowledge, the enhancement of educational opportunities, and the stimulation of
downstream applications. Such an approach would harness the social and technical power
of the network which, if properly managed, could greatly increase the value of the
knowledge in ways not currently possible with the traditional information production and
dissemination processes, and it generally could do so at a much lower cost than the
traditional approach.
At the core of an OKE are interactive portals focused on knowledge production
and on collaborative research and educational opportunities in specific thematic areas.
Ideally, OKEs would be developed around one or more thematically linked, open-access
journals and would be augmented by openly available reports, grey literature, and data.
Various interactive functions (wikis, discussion forums, blogs, post-publication reviews,
and perhaps distributed grid computing) would be added to stimulate discussion and
contributions related to specific issues.
The OKEs we envision could readily be hosted at single universities, or their
components could be distributed among a consortium of universities having a strong
interest in the relevant subject matter. They could also be based at other not-for-profit
research centers or at government agencies, although this would compromise the
educational function that we also seek to promote. In every case they would be
multidisciplinary in character, not only bringing in experts with the appropriate subject-
matter expertise, but also involving computer engineers, information scientists, librarians,
and other potential contributors to help establish and manage the OKEs and to learn from
operating them. Such a knowledge-production project not only would involve senior
faculty and experts in its development and application, but also would serve as a
mechanism for teaching students in the related departments at the university and as a
vehicle for involving the students in the management of the OKE itself.
At the same time, the thematic OKEs could integrate information beyond the
conventional disciplinary boundaries, making them tools that are especially well suited to
interdisciplinary environments. The OKE concept proposed here would thus build upon a
number of recent, but already tested, advances in the online peer production of
knowledge and participative Web 2.0 techniques.
Such capabilities are virtually impossible under the proprietary journal model.
Indeed, within our proposed open knowledge environments, the narrowly stove-piped,
print-paradigm journal model would be transformed into a truly interactive networked
initiative. Nonetheless, we stress that these OKEs should maintain the highest-quality
85
OCR for page 86
standards of scholarly endeavor, and they should strive to promote the reputational
benefits of the participants and of their universities.
Most of these thematic knowledge hubs would also provide essential digital
infrastructure functions in support of the microbial research community. Such service
functions could include high performance search engines that would enhance the
possibilities for finding relevant information in publications and would allow for cross-
linking and text mining based on standardized metadata.
While these collaborative functions of the OKE may seem futuristic, they are
already being implemented in some microbial science communities as well as in other
disciplines. What makes the concept seem futuristic is the existing condition of
publishing. The legal terms and conditions in many of the publishers’ contracts,
buttressed by the larger statutory environment, aim tacitly to protect the print model
against the challenges—perceived as risks rather than opportunities—of the digital
networked environment. It is this limited vision and obstructive legal culture, in addition
to certain other challenging problems, such as obtaining sustainable funding, that makes
it difficult to broadly realize OKEs. Nevertheless, there are some examples of the OKE
concept already operating.
The move towards an integrated microbial research commons requires linking the
materials, digital data, literature and other information resources available from a
globally distributed open-access infrastructure and providing interactive platforms for
scientists to build on those resources and contribute to them. Effective links between the
different open-access components of the material and digital commons are needed to
improve the efficacy of cumulative research and to increase the speed of the entire
research cycle. Moreover, in specific cases, the combined use of in vitro and in silico
biology offers new opportunities for research, as we noted above. For instance, the task
of searching for sequence similarities between the results of high-throughput screening
and similar sequences with known properties available from public databases has become
a key tool of metagenomics research. Without the aid of computers, the full genome
sequences, which are sometimes several hundred pages in length when printed, are not
interpretable. Hence, in genomics, advances in computing and in molecular analysis go
hand in hand.
Under the larger framework we envision—with a federated network of interactive
portals to all the materials, databases, and literature made openly available—it would
become possible to establish a registration system administered by a governing body or a
trusted intermediary (or an international database collaboration agreement). The World
Federation of Culture Collections (WFCC) already hosts different open-access
components of the research infrastructure, such as the World Data Center for
Microorganisms and the StrainInfo.net bioportal for data and access to the materials held
in the culture collections. Moreover, many individual scientists who are active within the
WFCC also play key roles in sister organizations, such as the International Union for
Microbial Sciences, that also promote open access, especially for research results in the
scientific literature. Hence, the WFCC could play a key role in catalyzing the
establishment of a governing body for the fully integrated system, which could grow out
of the StrainInfo experiment and be established under its own umbrella or within a new
organizational and collaborative framework.
In addition to its publishing aspects, this restructuring should considerably
augment the scientific payoffs by accelerating the diffusion and reuse of research results,
by integrating disparate knowledge components into a dynamically evolving whole, by
86
OCR for page 87
facilitating automated knowledge discovery, and by making published research results
openly available to nontraditional users or reusers in other disciplines and in developing
countries.
This restructuring would prove particularly beneficial for microbial science as a
whole, which seems poised to enter a “big science” framework but remains hindered by a
disaggregated “small science” heritage and corresponding mentality. By embracing the
open knowledge environment vision, microbial science could break out of the
organizational limitations inherited from the past and move to the forefront of life science
research. The likely result would be a more powerful collaborative approach that would
expand the existing knowledge base while fostering greater technical and intellectual
capacity to exploit.
Moreover, this restructuring could produce the critical mass needed to self-
organize in a way that limits the undue influence of commoditizing pressures on public
and upstream research, while creating mechanisms for greater cooperation in pre-
competitive and noncommercial research activities; such cooperation has to date been
lagging in microbial science. We have in mind the example of molecular biology in the
late 1980s, which self-organized and developed a big science infrastructure and became a
leader in the life sciences open access movement.
More broadly, the OKE model could have far-reaching implications for the work
of universities and research policy institutions, both for targeted problem solving and for
the dissemination and impact of high-level reports and research results. This approach
could eventually become an integral part of many research plans and budgets. In addition,
it is easy to envision many other organizations, at both the national and international
levels, applying such methods to developing their knowledge inputs and outputs.
Finally, these insights also suggest why open knowledge environments provide a
promising solution to the hard problem of hoarded data. Viewed in isolation, a data pool
is only as good as its single components. But an OKE puts all the strength of the
microbial research community behind the pool, in the sense that the data pool is itself just
one component of a larger whole that combines the data with the literature, materials, and
technical services in one community-managed resource. In the context of OKEs, the
exchange process is established on a solid and reliable foundation, one that makes full
use of automated knowledge tools that are geared to community-determined goals. While
these goals evolve and shift over time, in keeping with the relevant sub-communities’
own research needs, an ever-expanding infrastructure supports and magnifies all of the
reciprocity gains from “formalizing the informal process” of the exchange of data,
information, and materials.
87
OCR for page 88
Question and Answer Session
PARTICIPANT: Having myself, on occasion, offered wonderful visions of the future, I
applaud another great vision. As a longtime university professor, however, I say to
myself that this is one more social problem, one more societal need that is being put in a
truck and driven over to the nearest university with the instructions, “You solve it.”
Now, the simple question I ask is: Universities have been already encouraged to
spin off the results of research projects into commercial ventures because that was
regarded as a social good. Why should they not also spin out initiatives that come out of
the research communities, such as StrainInfo, into a not-for-profit corporate organization,
in which university professors could be allowed to participate as they do when they are
working across the street in their commercial lab.
Why cannot universities and foundations raise funds for this? Because this is one
more task that deflects from others, unless the reason includes a way to generate more
funds for university. If we could have more funds to support these activities within the
framework of research, we would have many more documented usable and early-released
databases that we now have, and part of the problem is the funding agencies do not want
open-ended commitments to support the infrastructure.
So, the question is: It is a great idea, it can work on a small scale, but you are
using the marginal resources of the university to do something that really should be done
properly and recognized as an important infrastructure. You are talking about changing
the model of publication, and that should be done on an experimental basis to see if it
works with foundation funding.
MR. UHLIR: I agree with all these comments. I left out quite a bit that we have in our
draft monograph that addresses some of these issues. First, there is a model already
existing in universities—the law reviews, which are run by students and which are
effectively open access and published at very low cost within the university, generally
without any extra funds.
So, there is a proof of concept already in a different context. Now, we recognize
also that there are imperfections of analogy there, so the model that we have been
developing for science is somewhat different than for the law journals, but it is related. In
particular, there are three examples that I did not have time to get into, but which will be
discussed by different people. Peter Dawyndt will be talking about StrainInfo, and the
CAMERA Project has already been noted by Mark Ellisman and will be discussed
tomorrow as well by Paul Gilna. The Genome Standards Consortium (GSC) will also be
discussed to some extent, I believe.
So, there is some experimentation going on, and it is coming from the bottom up.
The GSC has an open-access journal that it just launched as part of what I would call its
open knowledge environment, or open interactive portal.
There are some proofs of concept in the science field, as well. The infrastructure
aspect is really fairly low cost. Our model depends on a lot of existing expertise and labor
within the universities—within, say, the libraries, the computer departments, and the
information schools—which would all be brought into creating such environments. The
students would be involved in the management. It would be part teaching tool, part
knowledge production and dissemination. It would also generate interest by funders to
provide grants and attract collaborations because it would be a new kind of thematic hub
relating to a certain area of research. And, so, it would become, I think, a much more
88
OCR for page 89
vigorous and attractive knowledge production and educational tool with fairly low costs
for implementation.
But it remains to be tested, and I agree that it needs pilot projects that would be
funded by let us say NSF or foundations. Certainly we do not expect all the journals to be
superseded by this kind of process and it would all be done in an incremental way. It
would be a way to get away from the stovepipe print-paradigm journal system, with
journals that have a bunch of unrelated articles in each issue that are not optimized for
automated knowledge discovery.
89
OCR for page 90
90