5
Summaries of Presentations on Thematic Issues

There were several examples of existing scientific data and information activities in China, the United States, and internationally highlighted at the workshop. These activities can benefit from the policies and management practices outlined in the previous chapter. They also provide practiced experience at the working level and models for other similar activities in other institutional, disciplinary, and national contexts. They thus represent a bottom-up approach to the evolution of national and international policy and practice with regard to public scientific data and information resources. The examples summarized below are organized according to three thematic areas that were the focus of the workshop: (1) life sciences and public health data; (2) earth sciences, environmental, and natural resources data; and (3) scientific information, journals, and digital libraries.

EXAMPLES OF LIFE SCIENCES AND PUBLIC HEALTH DATA ACTIVITIES

The Chinese Management and Sharing System of Scientific Data for Medicine1

The initiation of the Management and Sharing System of Scientific Data for Medicine (the Medical Data Sharing System) was a key project in

1

Based on a presentation by Depei Liu, Chinese Academy of Medicine and Chinese Academy of Engineering, available at http://www7.nationalacademies.org/usnc-codata/Liu_Depei_Presentation.ppt.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop 5 Summaries of Presentations on Thematic Issues There were several examples of existing scientific data and information activities in China, the United States, and internationally highlighted at the workshop. These activities can benefit from the policies and management practices outlined in the previous chapter. They also provide practiced experience at the working level and models for other similar activities in other institutional, disciplinary, and national contexts. They thus represent a bottom-up approach to the evolution of national and international policy and practice with regard to public scientific data and information resources. The examples summarized below are organized according to three thematic areas that were the focus of the workshop: (1) life sciences and public health data; (2) earth sciences, environmental, and natural resources data; and (3) scientific information, journals, and digital libraries. EXAMPLES OF LIFE SCIENCES AND PUBLIC HEALTH DATA ACTIVITIES The Chinese Management and Sharing System of Scientific Data for Medicine1 The initiation of the Management and Sharing System of Scientific Data for Medicine (the Medical Data Sharing System) was a key project in 1 Based on a presentation by Depei Liu, Chinese Academy of Medicine and Chinese Academy of Engineering, available at http://www7.nationalacademies.org/usnc-codata/Liu_Depei_Presentation.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop 2003 of the China Scientific Data Sharing Program (SDSP) under the National Basic Platform for Science and Technology in the Ministry of Science and Technology. Oversight of this Medical Data Sharing System has been undertaken by the Chinese Academy of Medical Sciences, the Chinese Center for Disease Prevention and Control, the Chinese People’s Liberation Army General Hospital and Graduate Medical School, and the Chinese Academy of Traditional Chinese Medicine. Various scientific data resources in medicine are being integrated together by the Medical Data Sharing System. The system covers most fields of medicine, including basic and clinical medicine, public health, traditional Chinese medicine, other special areas of medicine, and pharmacology. The system also has a database specifically for SARS and respiratory diseases. The Medical Data Sharing System has important roles in many respects. It is used to: Serve medical research and teaching, and improve innovation in medical sciences and technology; Improve the overall level of prevention, diagnosis, and treatment of diseases; Enhance the personal health consciousness of people, thus promoting good health in society; Strengthen China’s ability to plan for and respond to sudden incidents in public health; and Provide an improved basis for government policies, as well as promote the development of the national economy and the medical system. The organizational and management model for this system consists of a leading group, an expert group, and a working group. The leading group consists of the leaders of participating institutions and the professionals in charge of scientific administration who are responsible for the decision making, organization, and coordination of the system. The expert group is composed of experts from the China-SDSP and professionals in project-supported institutions. The expert group is responsible for steering and authorization of the general design, selection of participating institutions, and inspection of the project’s progress. The working group consists of members selected from the working group of the China-SDSP and employees of the participating institutions. It is responsible for drafting the feasibility report, general project design and implementation, and development of the main database.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop The Medical Data Sharing System benefits all sectors of society, including the government; medical, scientific, and educational institutions; private enterprises; and the general public. The system promotes a cooperative environment and will be operated on a free and not-for-profit basis. International Medical Scientific Data Sharing2 There are many kinds of medical scientific data that can be shared internationally. They include primary data, data products, and related information obtained from the activities of medical clinics, teaching, and research. Medical data are necessary resources for the development of medical sciences and have characteristics common to other kinds of scientific data, including reusability and potential long-term value. The Human Genome Project has provided a successful example of access to and sharing of biomedical scientific data. This project used modern informatics techniques in human genomic research and organized top scientists and facilities to cooperate on the same project based on the same standards. In recent years, similar data sharing projects have been initiated in other areas of science internationally, such as neurology, human anatomy, and proteomics, all of which are described in the sections that follow. These projects have produced large changes in the behavior of medical researchers toward greater cooperation, but there are many unmet needs for medical scientific data access and sharing by the research, teaching, and biomedical industry communities. Several problems can be identified. The lack of data access and sharing is especially acute in developing and least-developed countries. This results in redundant research and inefficiencies. Medical data resources are separated and many lack quality and uniform international standards and data exchange protocols. There also is a lack of adequate investment and attention to these problems. The following suggestions are focused on improving the status of international medical scientific data access and sharing: It would be helpful to set up an international coordination committee for access to and sharing of medical scientific data, responsible for developing plans for such activities and proposing appropriate guidelines. 2 Based on a presentation by Ling Yin, People’s Liberation Army General Hospital and Graduate Medical School, China.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop In order to strengthen international cooperation in the establishment of medical scientific databanks it is necessary to establish relevant standards and uniform criteria for data contributions by the globally distributed scientists and facilities. New databank groups for medical scientific data similar to Genbank and the Protein Data Bank need to be established to make the data sharable, extendable, and authoritative. To truly implement data access and sharing among the different medical specialties, China needs to establish an integrated international service system to break the barriers among basic, clinical, and preventive medicine, public health, pharmacology, and other areas. This will offer one-stop service via index and catalogue inquiries for data services, and a networked environment for therapeutics, prevention, control, and research on specific diseases. China also needs to speed up the recruiting of expert personnel to implement access to and sharing of medical scientific data, and to offer effective training programs. Finally, funding needs to be secured from multiple sources to guarantee long-term financial support and to facilitate global access to and sharing of medical scientific data and information. China’s Contributions to the Organisation for Economic Co-operation and Development’s Neuroinformatics Data Sharing Initiative3 The human brain is the most complex system known. Achieving a better understanding of it is a key scientific challenge for the 21st century. Having developed sophisticated methods to investigate the brain in the finest possible detail, neuroscientists now face the challenge of managing the enormous amounts of raw data and the many useful inferences drawn from them. The neuroinformatics field therefore would benefit greatly from increased data sharing. Neuroinformatics research already has received billions of U.S. dollars and Euros to establish individual databases and platforms and to lay the groundwork for data sharing in the future. 3 Based on a presentation by Yiyuan Tang, Institute of Neuroinformatics, Dalian University of Technology; Ling Yin, Neuroinformatics Center, PLA General Hospital and Graduate Medical School; and Xiaowei Tang, Neuroinformatics Center, Zhejiang University, China, available at http://www7.nationalacademies.org/usnc-codata/Yin_Ling_Presentation.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop In 2000, the Organisation for Economic Co-operation and Development’s (OECD) Global Science Forum approved the international Working Group on Neuroinformatics (WG-NI), which included 21 members and observers from OECD countries. Their goal was to establish a global knowledge management system and Internet research environment in which the data and results from research on the human neurological system would be available. Such an information initiative would be helpful for improving public health, scientific research, medical education, and the pharmaceutical industries. The three principal aims of the WG-NI group were to promote: (1) the establishment of databases and the integration of data resources; (2) data sharing policies for OECD members and observers and the development of working guidelines and rules for neuroinformatics data; and (3) the establishment of international neuroinformatics research networks. The Chinese government also began to pay more attention to the Human Brain Project (HBP) and to neuroinformatics research. The Xiang Shan Science Conference for HBP and neuroinformatics, held in September 2001, was a major event that included senior Chinese government representatives from the National Natural Science Foundation of China and the Ministry of Science and Technology. The conference participants agreed that China should start work on the Chinese neuroinformatics project as soon as possible. Since 2001, the Chinese government has awarded several grants in the neuroinformatics field and supported the development of a neuroinformatics platform and digital network according to the OECD’s WG-NI standards. In 2003, the WG-NI requested the establishment of an international coordinating mechanism, the International Neuroinformatics Coordinating Facility (INCF). The proposed role of INCF was to optimize the accumulation, integration, standardization, exploitation, and sharing of very large amounts of highly diverse primary data and of large, structured neuroscience databases that are being generated worldwide by researchers who study the brain. The first INCF meeting was held in April 2004 at OECD in Paris to establish the INCF secretariat, initiate substantive activities, and develop a proposed funding scheme (the Program in International Neuroinformatics). This new international program is intended to promote international collaboration among researchers whose work will be funded by existing (or possibly new) national programs, eliminate national and disciplinary barriers, and provide a more efficient approach to global collaborative research and data sharing.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop In order to accelerate Chinese neuroinformatics development, the neuroinformatics project has been suggested to become an important part of the China Science Data Sharing Project, especially for promoting international collaboration. Consistent with INCF’s working plan, the Chinese neuroinformatics community will develop tools for manipulating and managing the data and standards and mechanisms for sharing these data among global researchers. The Chinese also will design and develop special-purpose analytical tools and algorithms, and create computational models of brain structure and function that can be validated using diverse data. These actions will advance the understanding of the human brain and may be expected to lead to breakthroughs in the prevention and cure of nervous system disorders and to improvements in the quality of life for humanity. Long-Term Studies of Human Anatomy Using the Digital Human and Scientific Data Sharing4 Human anatomy is a cornerstone of modern medicine. In 1543, Vesalius published the seminal anatomy book, On the Structure of the Human Body, which was one of the starting points of modern medicine. Digital anatomy and the digital human represent a new revolution in medicine. The U.S. National Library of Medicine (NLM) began to discuss a long-range plan for the digital human in 1985. In 1989, the NLM Board of Regents submitted a long-range plan for the next 10 to 30 years of electronic imaging in biomedical research. The first stage of the plan was the Visible Human Project (VHP). The plan encouraged scientists to conduct research in various fields, including anatomical structure informatics, graphics technologies in biomedical imaging, basic medical research (e.g., developmental biology, neuroscience, cell and histological science, and molecular structure), clinical applications (e.g., image-guided surgery, actinotheraphy, anaesthesia, radiology, organ system imaging, orthopedics), and the development of related medical equipment using digital technologies. In 1991, the University of Colorado signed a VHP contract with NLM. They completed two cryomacrotoming data sets (one male and one female) separately in 1994 and 1995. Since that time, the digital anatomy community has developed important applications for medical teaching and 4 Based on a presentation by Donglie Qin, BME College, Capital University of Medical Sciences, China.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop clinical practice, including breakthrough advances in medical imaging, such as more realistic rendering of three-dimensional images. In 2001, the Federation of American Scientists presented the digital human initiative, which is based on human body digital simulations at three levels: the microlevel (molecular, gene, cell, and nanoscale), the mesolevel (tissue and organ), and the macrolevel (whole body). Since 2001, the emerging Grid, Internet 2, and other advanced information technology capabilities and imaging devices have been used in the VHP and similar digital human activities. In addition to the VHP, there are now many other digital human projects outside the United States, including the European Union, Japan, Korea, Singapore, and China. In China, the proposed draft for a digital human initiative was submitted to the government by scientists in 2001. Two experimental whole-body data sets and several high-resolution organ data sets (heart, kidney, and liver) have been completed. The next generation digital data sets and applications in medical teaching and clinical practices are expected to occur within the next 5 to 10 years. This brief history of digital human research underscores the need for implementing scientific data sharing in support of both research and applications in these digital human initiatives. The Protein Data Bank: A Key Biological Resource5 As noted in the previous chapter, the Protein Data Bank (PDB) is the single international repository of three-dimensional data for biological macromolecules and currently contains over 25,000 entries. The concept of the PDB began to be formed during the late 1960s and early 1970s with community discussions about the need for such a resource. Protein crystallography was still in its infancy, but it was apparent to the producers of these structures as well as to the potential users that every structure contained valuable information that needed to be archived and maintained for posterity. In June 1971, key representatives of the two communities attended the Cold Spring Harbor Symposium on Quantitative Biology and agreed that the time was right to create the PDB. The PDB was established in October of that year at the Brookhaven National Laboratories as an archive for biological macromolecular crystal structures. 5 Based on a presentation by Zukang Feng, Protein Data Bank, United States, available at http://www7.nationalacademies.org/usnc-codata/ZukangFengPresentation7.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop In the 1980s, the number of deposited structures began to increase dramatically. This was primarily due to the improved technology for all aspects of the crystallographic process, the addition of structures determined by nuclear magnetic resonance methods, and the changes in the research community’s views about data sharing. By the early 1990s, the majority of journals required a PDB accession code and government funding agencies adopted the guidelines published by the International Union of Crystallography requiring the deposition of data for all protein structures. The archive’s growth has been accompanied by increases in both data content and the structural complexity of individual entries over the years. In October 1998, the management of the PDB became the responsibility of the Research Collaboratory for Structural Bioinformatics (RCSB) at Rutgers University, together with the University of San Diego Supercomputing Center and the National Institute of Standards and Technology. The vision of the RCSB is to create a resource based on the most modern technology that facilitates the use and analysis of structural data and thus creates an enabling resource for biological research. Its mission is to provide the most accurate, well-annotated data in the most timely and efficient way possible to facilitate new discoveries and advances in science. The challenges that the PDB is addressing include the continuing increase in the number and complexity of structures, the need to develop new methods for structure determination, satisfying user demands for response to complex queries and better annotation, integrating PDB data with other genomic and proteomic information, and serving a growing and more diverse community of users. The PDB’s strategy for meeting these challenges involves the adoption of new technologies, creating extensible and portable data systems, making the archive as uniform as possible, improving communication with the users, and helping to create and enforce community policies and standards. The Safeguarding and Sharing of Traditional Chinese Medicine Database Resources6 A big effort to develop traditional Chinese medicine database resources began in the 1980s. Since that time, nearly one hundred such databases of various sizes have been constructed by numerous universities, colleges, and 6 Based on a presentation by Baoyan Liu and Meng Cui, China Academy of Traditional Chinese Medicine.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop institutes. The digitization of information in traditional Chinese medicine information has been completed on a preliminary basis, including modern and ancient literature databases, factual and structural databases, and data warehouses. The main organizations involved in the modern literature and factual databases are the Traditional Chinese Medical literature center and branch center, which are affiliated with the State Administration of Traditional Chinese Medicine. The national structural databases are being developed now. In 2001, the scientific experiments information database of traditional Chinese medicine was initiated using data warehouse technology and a virtual research center platform, which is now operational. The ancient literature database contains various e-books. The development of traditional Chinese medicine databases already has made significant achievements and established a basis for the digitization of traditional Chinese medical information at the national level. Because this scientific and technical literature, and the underlying data and scientific experiments, are the result of fundamental research work for the public welfare, it needs the public support and funding of the government to guarantee its continuation. If this initiative obtains the support of the government, the main database can be completed quickly and provide a sharing mechanism as a public good. Open Access to Scientific Data on Biological Diversity: An Urgent Need for China7 Biological diversity (also called biodiversity) is generally divided into three categories: genetic-level diversity, species-level diversity, and ecosystem-level diversity. The focus here is on data in the latter two categories in China. China’s species-level biodiversity is immense. For example, it contains about 30,000 different plant species, and nearly 500 mammal species. In fact, it has been estimated that the 17 so-called “megadiverse” countries, which include China, contain 70 percent of the world’s species of plants and animals within their borders. China has databases for some of its species-level biodiversity. For example, the State Environmental Protection 7 Based on a presentation by James Edwards, Global Biodiversity Information Facility, Denmark, available at http://www7.nationalacademies.org/usnc-codata/JamesEdwardsPresentation.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop Agency of China lists 67 databases of plants, animals, and microorganisms.8 However, these are mostly small databases, except for birds and plants, and most do not have georeferenced data or follow international standards. Moreover, unlike some of the other megadiverse countries—for example Costa Rica and Mexico—China has not initiated systematic efforts to develop computerized databases about its biota, or to access the great wealth of biodiversity information about China that is contained in the world’s natural history collections. As a result, China cannot currently use this considerable body of knowledge for informed decision making. The Global Biodiversity Information Facility (GBIF) is an international consortium aimed at making the world’s primary biodiversity data freely and openly available over the Internet to benefit society, science, and a sustainable future. Begun in 2001, GBIF’s members as of June 2004 include 41 countries and 24 international organizations, each of which agrees to set up a computer node to share primary biodiversity data. Control of the data, including the decision on what information to make available, resides with the data providers in each country or organization. GBIF’s role is to aid the data providers in setting up their databases and to provide a portal that allows users to search all the databases at once.9 GBIF is thus a network of participant nodes and other partners that agree to use common standards for data and metadata, encourage the generation and contribution of additional data and information for the network, and assure that data providers retain control of their own data. As of June 2004, the GBIF data portal (initiated only a few months earlier) is serving nearly 24 million records containing information about specimens in natural history collections, as well as observational data. These records are being served by 63 data providers from around the world. Even though China is not yet a member of GBIF, the portal already contains approximately 45,000 records of plants and animals, representing more than 9,000 species that were collected in China. The data being served by GBIF can be a valuable resource for many scientific and societal problems, including tracking invasive species, predicting the spread of emerging infectious diseases, optimal design of protected areas, and making decisions about where to undertake field trials of genetically modified crops. Other innovative examples of how megadiverse countries have used these data are being compiled by GBIF. 8 See http://www.zhb.gov.cn/english/. 9 See http://www.gbif.net.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop China has been much more successful at developing and archiving ecosystem-level biodiversity data than species-level data. The Chinese Ecosystem Research Network (CERN)10 is a consortium of 33 field research stations and one synthesis center. CERN was established in 1988, and currently provides access to a wide range of ecological and environmental data, including more than 3,000 historical datasets.11 CERN has also developed a comprehensive data sharing policy and joined the International Long-Term Ecological Research Network. China should be encouraged to give similar attention to developing and archiving its species-level biodiversity information. The NIH Roadmap for Medical Research12 The U.S. National Institutes of Health (NIH) launched the Roadmap for Medical Research initiative in 2003. It is focused on important public health challenges such as acute and chronic medical conditions, an aging population, health disparities in society, emerging diseases, and biodefense concerns. The roadmap provides a framework of priorities and a vision for a more efficient, innovative, and productive research system. It also establishes a set of initiatives that are central to improving the quality of healthy life for people in the United States and around the world. One research priority of the NIH roadmap is the reengineering of the clinical research enterprise. This reengineering effort has multiple components, including integration of a clinical research network and facilitating data mining to advance data sharing goals. Examples of health databases and technologies that are part of this NIH focus are the lung image database, a network for translational research for optical imaging, a biomedical informatics research network, bioinformatics roadmap centers, and an insight segmentation and registration toolkit in support of the VHP. The National Institute of Biomedical Imaging and Bioengineering, a component of the NIH, is addressing specific issues in the clinical research network initiative. There are a number of barriers to creating a successful 10 See “The Data Sharing Policy of the Chinese Ecosystem Research Network” in Chapter 4 of this report. 11 See http://www.cern.ac.cn:8080/index.jsp. 12 Based on a presentation by Belinda Seto, National Institute of Biomedical Imaging and Bioengineering, U.S. National Institutes of Health, available at http://www7.nationalacademies.org/usnc-codata/BelindaSetoPresentation7.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop THEMATIC ISSUES IN SCIENTIFIC INFORMATION, JOURNALS, AND DIGITAL LIBRARIES Policies and Mechanisms for Literature Resource Sharing—The Practice of the Chinese National Scientific and Technical Library23 Information resource sharing benefits society and the economy by the spread of knowledge at a low cost. Intellectual property rights (IPRs) stimulate and protect creative innovation, but restrict the public diffusion and application of knowledge. They are thus positive on the one hand, but negative on the other. Enhancing the resource sharing requirement may limit the interests and benefits of the IPR owner, whereas increasing the IPRs of the owner may limit the benefits that society could gain from resource sharing. A proper balance between the two needs to be found to maximize the social benefits. The policy and legal system is facing an increasing challenge in reconciling these opposing effects. Information resource sharing is a powerful instrument for reducing the digital divide, but the threshold for doing it is quite high due to the price of literature. At the same time, the strengthening of IPRs is the current trend in international law and trade. This is related with the fact that the main exporters of knowledge and information products are in developed countries. For developing economies, the level of IPR protection should be synchronized with the level of development of the economic and legal system. Setting the protection standards too high can be harmful. There are choices and tradeoffs in enacting these policies. If the rigorous application of copyright law requires payment for every drop of knowledge, then the capacity of public libraries will be very limited by the shortage of funds. The urgent problem is: How to observe international IPR regulations while at the same time improving the scientific information resource capacity building and meeting the demands of social progress? The Practice of the Chinese National Scientific and Technical Library of China The National Scientific and Technical Library (NSTL) of China is a virtual scientific literature service center initiated in June 2000 by the MoST 23 Based on a keynote presentation by Qiheng Hu, Vice President, Chinese Association for Science and Technology, available at http://www7.nationalacademies.org/usnc-codata/Hu_Qiheng_Presentation.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop and the Ministry of Finance in cooperation with five other ministries, and approved by the State Council. The NSTL is composed of seven member libraries: the Center for Literature and Information of the Chinese Academy of Sciences, the China S&T Information Institute, the Machinery Industry Information Institute, the Metallurgy Industry Information Standard Institute, the Chemical Industry Information Center, the Literature Center of the Chinese Academy for Agricultural Science, and the Information Center of the Chinese Academy for Medical Science. The NSTL office is in charge of coordinating and managing the services. The principles for operation of the NSTL are: “unified purchasing, normalized processing, combined networking, and resource sharing.” The main goals are to: Build and share the scientific literature and information resources among all members through a convenient network so that this virtual library can provide better service to the research community in the areas of basic science, engineering, agriculture, and medicine; Develop a high-level scientific literature collection and service center; Demonstrate effective applications of information technologies in scientific literature and information services; Become a pivotal force in cooperation with the broader Chinese library system and the leader in the scientific library system of China; Play a major role in exchanges with the international library community; and Establish the information resource base for research, training, and the popularization of scientific education. The responsibilities of NSTL are diverse, consistent with its main goals. These responsibilities include: planning and funding of comparatively complete literature resource acquisitions; coordination of the literature collection to avoid unnecessary redundancies; formulation of standards, norms, and formats for the unified database; providing service to users throughout the country with the NSTL network service platform; development and application of in-depth resources; and domestic and international exchanges and collaborations. The responsibilities of the NSTL are directed by a 19-member Council, which is the decision-making and leading body, and also represents NSTL members and users. The NSTL Director is appointed by the Council. The Director General is in charge of operations and hires the office staff. The Council and the Director are advised by two advisory groups, the

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop Expert Advisory Committee for Literature Resource Construction and the Expert Advisory Committee on Computer-Network Construction. The NSTL has made significant progress in several areas, three of which are highlighted here. First, it has increased the total number and variety of the scientific literature resources in China. The investments in scientific literature have been used more effectively since the NSTL member libraries stopped the counterproductive duplication and competition in purchasing international scientific periodicals as a result of NSTL planning. Prior to 2000, there were only about 3,000 English-language scientific journals and periodicals in the NSTL member libraries. The NSTL increased the total number to 10,653 in 2000. By 2003, the NSTL holdings of journals and periodicals in English numbered approximately 13,500 with an additional 5,000 conference proceedings and handbooks in English. Also in English are 15 kinds of network edition periodicals and 65 kinds (physics and chemistry) of network edition subscriptions with access controls. The NSTL also has a rich variety of literature in the Chinese language, including more than 4,000 kinds of journals and periodicals, over 22,000 conference proceedings, and 470,000 theses for Masters and PhD degrees. Second, NSTL has constructed a 1 gigabit/second broadband network linking the NSTL’s seven member libraries, thereby making the separate libraries a united library with digitized resources in the networked environment. The NSTL service network is also linked with the National Library, the China Education and Research Network, and the China Scientific and Technical Network via 100 megabit/second connections. The third major area of NSTL’s progress is in developing the rapid growth of online digitized resources and related services. In 2000, before the launch of the NSTL network, there were 1.7 million information items of online data. This increased to 27 million items in 39 databases by December 2003. The NSTL provides many services including: literature searches, full-text provision, periodicals cataloging, common directory query, full-text database access for the network-edition literature, literature directory database searching, expert consulting and information services, online resources search engine, preprints system in Chinese, and a portal for preprints in English. Services that are provided to users via the Internet include free literature searching and online payment for services (around 44 percent of payable services are paid online). Requests for full-text provision are processed within two working days. All users can access and download for free the 15 network-edition periodicals in English for which NSTL buys access. The other 65 kinds of

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop periodicals on physics and chemistry are available for 80 universities and research institutes according to the subscription license, based on access controls. Over 99 percent of the periodicals in English are accessed each year. Despite these accomplishments, the NSTL, like all public libraries, continues to face a serious problem. The benefit of using information technology to decrease the costs of knowledge sharing is greatly counteracted by the rising costs of the imported information resources. From the experience of the NSTL, the network edition’s cost is typically very high and is proportionate to the scale of information sharing. There may be no other choice than to go back to printed editions for such high-cost imported information. Does this sound like progress in the age of digital networks? Perspectives on the Future of the Library and on the Economics of Open Access24 The big question facing the circulation of scientific information is: What is going to become of the emerging open-access movement? There are growing numbers of arguments in support of increasing access to research through a variety of open-access models. The case for increasing access has components that can be roughly categorized as epistemological, historical, developmental, political, and public—as well as economic and legal, which are discussed elsewhere—and is directed primarily toward faculty members, students, librarians, policy makers, and the public. The epistemological argument for open access, for example, has to do with how dependent a knowledge claim is on being fully open to review and critique. Anything that unduly restricts the circulation of knowledge, especially among “legitimate” participants in its construction, reduces that body of knowledge’s claims to validity and reliability. If the current subscription publishing model can be shown to contribute to suboptimal levels of access, then those models are not what we might call epistemologically conducive to the development of knowledge. The concern with exploring new publishing models leads to the historical argument, which draws on precedents from an earlier era of publishing innovation, using Isaac Newton as a leading instance. Newton is well known for being a highly secretive scientist and a reluctant author. Neverthe- 24 Based on a presentation by John Willinsky, University of British Columbia, Canada.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop less, after he had tracked for only a few years the birth and emergence of the scientific periodical, with the launching of Philosophical Transactions in 1655, he understood that this publicly circulated, relatively inexpensive 16-page journal represented something important to science. He allowed one of his letters, on optics, to be published in that journal. It was a move he came to regret and did not do again, but this early experience in open access went a long way in shaping what became the norms of the scientific article. The developmental (or developing countries) argument for increasing access has everything to do with the parallel development in China of both economic growth and scientific papers published, with an increase by a factor of 10 since the 1980s. Developing countries are suffering a knowledge gap, even as their university population grows. The universities in the West are contributing to that gap, as their work becomes increasingly expensive to access (with some generous exceptions negotiated with publishers by the International Network for the Availability of Scientific Publications and some other organizations). The political and public arguments for open access to research are about people’s basic right to know, especially in matters of publicly funded research and scholarship. The value of exercising that right is affirmed by the health revolution brought about by public access to medical information. Public access to research also speaks to greater accountability demanded of professionals (e.g., physicians, educators, lawyers) and the increasing role of interest groups in selectively presenting information to the public, against which full access to the research would act as a safeguard. An Open-Access Future25 The open-access movement has gained momentum over the past several years, with increased visibility and recognition from the various stakeholder communities, including research and publishing communities. Since the Budapest Open Access Initiative26 began collecting signatures in February 2002, more than 3,500 individuals and organizations have signed on with their support for free access to information. The Directory of Open Access Journals27 at Lund University, which contained over 1,100 journals 25 Based on a presentation by Helen Doyle, Public Library of Science, United States. 26 See http://www.soros.org/openaccess/. 27 See http://www.doaj.org/.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop as of June 2004, has announced the launch of its second phase, allowing sophisticated searching of the full text of the Directory’s articles. The U.K.-based open-access publisher BioMedCentral now publishes over 100 open-access journals. Since its launch in October 2003, PLoS Biology28—the first peer-reviewed, open-access journal of the Public Library of Science (PLoS)—is demonstrating remarkable strength as a competitive new journal: submissions are increasing; readership, measured as visits to the site and as downloads of individual articles, is growing; and PLoS Biology’s reputation as a high-quality, peer-reviewed journal is improving among scientists, publishers, librarians, and other stakeholder groups. In addition to a transformation of the economics of scientific publishing, open-access publishing also represents a modernization of traditional copyright laws that are based on an outdated print-based economic model. In the open-access definitions used in both the Bethesda Principles29 and the Berlin Declaration,30 an open-access article can be reused and redistributed freely and without permission from the publisher, for any responsible purpose. Authors retain their copyright. In the case of PLoS journals, the copyright license is the Creative Commons31 “attribution license,” which preserves the author’s right to be acknowledged for the original work. Many journals that are labeled open access are in fact free access, meaning that the restrictions on use and distribution are the same as for many subscription-based journals. It is worth noting that researchers themselves virtually never benefit financially from publication of their peer-reviewed articles. Several recent policies that may appear to be a liberalization of subscription policies are in fact small concessions to the growing demand for greater access from the scientific community that produces the articles, concessions made at little economic risk to the publishers. The sharing of data, reagents, and ideas is fundamental to the scientific process itself. Open-access publishing, including both the unfettered distribution and searching afforded by online free access and the unlimited creative reuses permitted by less restrictive copyright licenses, will facilitate the advance of science and medicine. 28 See http://biology.plosjournals.org/perlserv/?request=index-html&issn=1545-7885/. 29 See http://www.biomedcentral.com/openaccess/bethesda/. 30 See http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html. 31 See http://creativecommons.org/.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop Other Opportunities in the Changing Information Environment32 It has been recognized that research and access to research results play a vital role in the development of all countries, and that transitional and developing countries often cannot obtain and make use of research that would benefit them. In response to this “information divide,” the International Network for the Availability of Scientific Publications (INASP) was established in 1992 as a program of ICSU to provide support to networking and partnerships with the aim of bridging the increasing information divide between the developed and developing world. It now operates a range of programs to support access to information for researchers, health workers, and rural development experts. Access to research information is facilitated through its Programme for the Enhancement of Research Information (PERI). This program not only provides access to international research, but facilitates activities to support national publications to increase their visibility and long-term sustainability. Research information is provided in many different forms (e.g., datasets, publications), but the scholarly journal remains one of the prime vehicles for accrediting and disseminating research information. Of course, both national and international information is of importance to research. PERI provides support for (1) access to global information, (2) increased visibility for local publications, (3) training in the use and management of online information, (4) support for publishers and editors, and (5) research and networking support. INASP negotiates access to as many required resources as possible with content owners and publishers. The exact cost of each resource is related to the GDP of the country, and although many of the resources are available without cost as part of PERI, others are obtainable at up to 98 percent discount on the normal subscription rates. Nearly all of the resources are available on a countrywide license basis, meaning that anyone in an educational, research, or nonprofit environment is eligible to access them. The Journals OnLine (JOL) project supports a methodology to enable national publications to have an online presence, to increase their visibility, and to promote communication with readers and authors. It has been particularly successful in Africa with the African Journals Online (AJOL) ser- 32 Based on a presentation by Pippa Smart, International Network for the Availability of Scientific Publications, United Kingdom, available at http://www7.nationalacademies.org/usnc-codata/PippaSmartPresentation.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop vice that now includes 184 journals, and also supports full-text, online publishing. This service has been operational since 1998 and was recently re-launched on a new platform. The new software enables individual publishers to load, edit, and correct their own content to further support development and use of online communication. The JOL methodology and software are available for other countries and regions to adopt for their own publications. Of course, there continue to be many challenges for information access, and INASP is continually updating its activities to respond to requests from partners and to increase its capabilities to develop sustainable methodologies. Scientific Information and Digital Libraries: Can Developing Countries Become Key Players in the Information Society?33 In most developing countries, politics rather than markets drive knowledge diffusion. Whether new technology is adopted or not by a country is largely determined by political will, its patterns of power and influence, and resource-allocation policies. Politics are thus a decisive variable driving the transition of developing countries to the information society. In addition to the political hurdles in most developing countries, there are many other factors working against preservation of and open access to scientific information in developing countries. These factors may be characterized as institutional, economic, and social, although many of them intersect and are difficult to separate out. Generally speaking, the institutional culture creates barriers to the creation and diffusion of knowledge. For example, government officials tend to believe that access to knowledge is automatic once connectivity to digital networks is ensured. There is no institutional framework for data preservation mechanisms (e.g., the demise of the South African Data Archive). Moreover, there is no legislative mandate that obliges researchers that are publicly funded to make available research findings publicly accessible. The lack of comprehensive strategies and policies for data management are a major barrier to open access in developing countries. From an economic standpoint, a large percentage of the population in 33 Based on a presentation by Lulama Makhubela, National Development Agency, South Africa, available at http://www7.nationalacademies.org/usnc-codata/LulamaPresentation.ppt.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop developing countries currently falls below the poverty line. Income distribution remains highly unequal, and poverty and inequality continue to exhibit strong geographic and racial biases. Public funding for the development and connectivity of information and communication technologies (ICTs)—from both national and global sources—continues to be inadequate. The resulting lack of access to ICTs further restricts the potential use of even the information that is otherwise openly accessible online. The social barriers are no less daunting. Human resources in the scientific and technical domains are not adequately developed. Moreover, there is a “brain drain” of highly trained scientists and professionals from the developing to the more developed countries. The organizational complexity of scientific communities and the esoteric language of scholarly journals create further roadblocks to the broad transfer of knowledge. Within the broader society, the reading environment is poorly developed. Only a small percentage of the population uses library facilities, and most rural areas do not have libraries. Educational gaps mean that access to information does not necessarily result in access to knowledge. The simple availability of information technologies has shown to be insufficient to guarantee proper diffusion of scientific knowledge across society. Achieving mastery of technological change in the economy and society remains elusive. Some actions that may be considered to help ameliorate the barriers outlined above include the following: There must be a greater effort to develop human resources in research, especially in national tertiary education programs, and to connect research more effectively with productive sectors and the specific needs of society. The curricula for research professionals should include digital data and information management principles and techniques. The adoption of rigorous and more collaborative approaches in addressing the lack of leadership and professional management, which has resulted in the poor implementation of data preservation in many areas, is vital. This needs to be coupled with a greater effort on developing a better understanding of the value of data and information preservation for future access among working scientists. A strategic approach is also needed to leverage resources and maximize effectiveness by increased collaboration among developing countries, and to document and disseminate best practices. This goal can be promoted by forming data archiving groups at different geographic levels to overcome the isolation of individual data archivists and promote beneficial

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop exchanges. This will help harness the human intellectual capacity with collective expertise in issues of data preservation and open access. Cooperation should be encouraged for richer quality and more impact, rather than merely for the sake of cooperation. At the broader social level it is important to promote the use of scientific knowledge and of information technologies among all population strata, as well as to reach out to the literate poor. Investments in technological innovations deserve high priority because in some cases they can overcome the constraints of low incomes and weak institutions. In conclusion, the discourse on scientific data and information resources and on digital libraries needs to be understood in the political, economic, and social context of developing countries. The future global information society may be one of widespread and beneficial international collaboration, or one of highly stratified access to knowledge. In order for the first scenario to prevail, the external barriers to access to knowledge need to be reduced, and the perverse internal dynamics preventing many developing countries from joining the global scientific community as active participants must be changed. A search for effective strategies for preservation of and open access to scientific information in developing countries will remain utopian unless those countries themselves become actively integrated in the broader information society.

OCR for page 62
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop This page intially left blank