National Academies Press: OpenBook

Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop (2006)

Chapter: 5 Summaries of Presentations on Thematic Issues

« Previous: 4 Summaries of Presentations on Cross-Disciplinary Issues
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

5
Summaries of Presentations on Thematic Issues

There were several examples of existing scientific data and information activities in China, the United States, and internationally highlighted at the workshop. These activities can benefit from the policies and management practices outlined in the previous chapter. They also provide practiced experience at the working level and models for other similar activities in other institutional, disciplinary, and national contexts. They thus represent a bottom-up approach to the evolution of national and international policy and practice with regard to public scientific data and information resources. The examples summarized below are organized according to three thematic areas that were the focus of the workshop: (1) life sciences and public health data; (2) earth sciences, environmental, and natural resources data; and (3) scientific information, journals, and digital libraries.

EXAMPLES OF LIFE SCIENCES AND PUBLIC HEALTH DATA ACTIVITIES

The Chinese Management and Sharing System of Scientific Data for Medicine1

The initiation of the Management and Sharing System of Scientific Data for Medicine (the Medical Data Sharing System) was a key project in

1

Based on a presentation by Depei Liu, Chinese Academy of Medicine and Chinese Academy of Engineering, available at http://www7.nationalacademies.org/usnc-codata/Liu_Depei_Presentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

2003 of the China Scientific Data Sharing Program (SDSP) under the National Basic Platform for Science and Technology in the Ministry of Science and Technology. Oversight of this Medical Data Sharing System has been undertaken by the Chinese Academy of Medical Sciences, the Chinese Center for Disease Prevention and Control, the Chinese People’s Liberation Army General Hospital and Graduate Medical School, and the Chinese Academy of Traditional Chinese Medicine.

Various scientific data resources in medicine are being integrated together by the Medical Data Sharing System. The system covers most fields of medicine, including basic and clinical medicine, public health, traditional Chinese medicine, other special areas of medicine, and pharmacology. The system also has a database specifically for SARS and respiratory diseases.

The Medical Data Sharing System has important roles in many respects. It is used to:

  1. Serve medical research and teaching, and improve innovation in medical sciences and technology;

  2. Improve the overall level of prevention, diagnosis, and treatment of diseases;

  3. Enhance the personal health consciousness of people, thus promoting good health in society;

  4. Strengthen China’s ability to plan for and respond to sudden incidents in public health; and

  5. Provide an improved basis for government policies, as well as promote the development of the national economy and the medical system.

The organizational and management model for this system consists of a leading group, an expert group, and a working group. The leading group consists of the leaders of participating institutions and the professionals in charge of scientific administration who are responsible for the decision making, organization, and coordination of the system. The expert group is composed of experts from the China-SDSP and professionals in project-supported institutions. The expert group is responsible for steering and authorization of the general design, selection of participating institutions, and inspection of the project’s progress. The working group consists of members selected from the working group of the China-SDSP and employees of the participating institutions. It is responsible for drafting the feasibility report, general project design and implementation, and development of the main database.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

The Medical Data Sharing System benefits all sectors of society, including the government; medical, scientific, and educational institutions; private enterprises; and the general public. The system promotes a cooperative environment and will be operated on a free and not-for-profit basis.

International Medical Scientific Data Sharing2

There are many kinds of medical scientific data that can be shared internationally. They include primary data, data products, and related information obtained from the activities of medical clinics, teaching, and research. Medical data are necessary resources for the development of medical sciences and have characteristics common to other kinds of scientific data, including reusability and potential long-term value.

The Human Genome Project has provided a successful example of access to and sharing of biomedical scientific data. This project used modern informatics techniques in human genomic research and organized top scientists and facilities to cooperate on the same project based on the same standards. In recent years, similar data sharing projects have been initiated in other areas of science internationally, such as neurology, human anatomy, and proteomics, all of which are described in the sections that follow. These projects have produced large changes in the behavior of medical researchers toward greater cooperation, but there are many unmet needs for medical scientific data access and sharing by the research, teaching, and biomedical industry communities.

Several problems can be identified. The lack of data access and sharing is especially acute in developing and least-developed countries. This results in redundant research and inefficiencies. Medical data resources are separated and many lack quality and uniform international standards and data exchange protocols. There also is a lack of adequate investment and attention to these problems.

The following suggestions are focused on improving the status of international medical scientific data access and sharing:

  1. It would be helpful to set up an international coordination committee for access to and sharing of medical scientific data, responsible for developing plans for such activities and proposing appropriate guidelines.

2

Based on a presentation by Ling Yin, People’s Liberation Army General Hospital and Graduate Medical School, China.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
  1. In order to strengthen international cooperation in the establishment of medical scientific databanks it is necessary to establish relevant standards and uniform criteria for data contributions by the globally distributed scientists and facilities. New databank groups for medical scientific data similar to Genbank and the Protein Data Bank need to be established to make the data sharable, extendable, and authoritative.

  2. To truly implement data access and sharing among the different medical specialties, China needs to establish an integrated international service system to break the barriers among basic, clinical, and preventive medicine, public health, pharmacology, and other areas. This will offer one-stop service via index and catalogue inquiries for data services, and a networked environment for therapeutics, prevention, control, and research on specific diseases.

  3. China also needs to speed up the recruiting of expert personnel to implement access to and sharing of medical scientific data, and to offer effective training programs.

  4. Finally, funding needs to be secured from multiple sources to guarantee long-term financial support and to facilitate global access to and sharing of medical scientific data and information.

China’s Contributions to the Organisation for Economic Co-operation and Development’s Neuroinformatics Data Sharing Initiative3

The human brain is the most complex system known. Achieving a better understanding of it is a key scientific challenge for the 21st century. Having developed sophisticated methods to investigate the brain in the finest possible detail, neuroscientists now face the challenge of managing the enormous amounts of raw data and the many useful inferences drawn from them. The neuroinformatics field therefore would benefit greatly from increased data sharing. Neuroinformatics research already has received billions of U.S. dollars and Euros to establish individual databases and platforms and to lay the groundwork for data sharing in the future.

3

Based on a presentation by Yiyuan Tang, Institute of Neuroinformatics, Dalian University of Technology; Ling Yin, Neuroinformatics Center, PLA General Hospital and Graduate Medical School; and Xiaowei Tang, Neuroinformatics Center, Zhejiang University, China, available at http://www7.nationalacademies.org/usnc-codata/Yin_Ling_Presentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

In 2000, the Organisation for Economic Co-operation and Development’s (OECD) Global Science Forum approved the international Working Group on Neuroinformatics (WG-NI), which included 21 members and observers from OECD countries. Their goal was to establish a global knowledge management system and Internet research environment in which the data and results from research on the human neurological system would be available. Such an information initiative would be helpful for improving public health, scientific research, medical education, and the pharmaceutical industries. The three principal aims of the WG-NI group were to promote: (1) the establishment of databases and the integration of data resources; (2) data sharing policies for OECD members and observers and the development of working guidelines and rules for neuroinformatics data; and (3) the establishment of international neuroinformatics research networks.

The Chinese government also began to pay more attention to the Human Brain Project (HBP) and to neuroinformatics research. The Xiang Shan Science Conference for HBP and neuroinformatics, held in September 2001, was a major event that included senior Chinese government representatives from the National Natural Science Foundation of China and the Ministry of Science and Technology. The conference participants agreed that China should start work on the Chinese neuroinformatics project as soon as possible. Since 2001, the Chinese government has awarded several grants in the neuroinformatics field and supported the development of a neuroinformatics platform and digital network according to the OECD’s WG-NI standards.

In 2003, the WG-NI requested the establishment of an international coordinating mechanism, the International Neuroinformatics Coordinating Facility (INCF). The proposed role of INCF was to optimize the accumulation, integration, standardization, exploitation, and sharing of very large amounts of highly diverse primary data and of large, structured neuroscience databases that are being generated worldwide by researchers who study the brain. The first INCF meeting was held in April 2004 at OECD in Paris to establish the INCF secretariat, initiate substantive activities, and develop a proposed funding scheme (the Program in International Neuroinformatics). This new international program is intended to promote international collaboration among researchers whose work will be funded by existing (or possibly new) national programs, eliminate national and disciplinary barriers, and provide a more efficient approach to global collaborative research and data sharing.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

In order to accelerate Chinese neuroinformatics development, the neuroinformatics project has been suggested to become an important part of the China Science Data Sharing Project, especially for promoting international collaboration. Consistent with INCF’s working plan, the Chinese neuroinformatics community will develop tools for manipulating and managing the data and standards and mechanisms for sharing these data among global researchers. The Chinese also will design and develop special-purpose analytical tools and algorithms, and create computational models of brain structure and function that can be validated using diverse data. These actions will advance the understanding of the human brain and may be expected to lead to breakthroughs in the prevention and cure of nervous system disorders and to improvements in the quality of life for humanity.

Long-Term Studies of Human Anatomy Using the Digital Human and Scientific Data Sharing4

Human anatomy is a cornerstone of modern medicine. In 1543, Vesalius published the seminal anatomy book, On the Structure of the Human Body, which was one of the starting points of modern medicine. Digital anatomy and the digital human represent a new revolution in medicine. The U.S. National Library of Medicine (NLM) began to discuss a long-range plan for the digital human in 1985. In 1989, the NLM Board of Regents submitted a long-range plan for the next 10 to 30 years of electronic imaging in biomedical research. The first stage of the plan was the Visible Human Project (VHP). The plan encouraged scientists to conduct research in various fields, including anatomical structure informatics, graphics technologies in biomedical imaging, basic medical research (e.g., developmental biology, neuroscience, cell and histological science, and molecular structure), clinical applications (e.g., image-guided surgery, actinotheraphy, anaesthesia, radiology, organ system imaging, orthopedics), and the development of related medical equipment using digital technologies.

In 1991, the University of Colorado signed a VHP contract with NLM. They completed two cryomacrotoming data sets (one male and one female) separately in 1994 and 1995. Since that time, the digital anatomy community has developed important applications for medical teaching and

4

Based on a presentation by Donglie Qin, BME College, Capital University of Medical Sciences, China.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

clinical practice, including breakthrough advances in medical imaging, such as more realistic rendering of three-dimensional images.

In 2001, the Federation of American Scientists presented the digital human initiative, which is based on human body digital simulations at three levels: the microlevel (molecular, gene, cell, and nanoscale), the mesolevel (tissue and organ), and the macrolevel (whole body). Since 2001, the emerging Grid, Internet 2, and other advanced information technology capabilities and imaging devices have been used in the VHP and similar digital human activities.

In addition to the VHP, there are now many other digital human projects outside the United States, including the European Union, Japan, Korea, Singapore, and China. In China, the proposed draft for a digital human initiative was submitted to the government by scientists in 2001. Two experimental whole-body data sets and several high-resolution organ data sets (heart, kidney, and liver) have been completed. The next generation digital data sets and applications in medical teaching and clinical practices are expected to occur within the next 5 to 10 years. This brief history of digital human research underscores the need for implementing scientific data sharing in support of both research and applications in these digital human initiatives.

The Protein Data Bank: A Key Biological Resource5

As noted in the previous chapter, the Protein Data Bank (PDB) is the single international repository of three-dimensional data for biological macromolecules and currently contains over 25,000 entries. The concept of the PDB began to be formed during the late 1960s and early 1970s with community discussions about the need for such a resource. Protein crystallography was still in its infancy, but it was apparent to the producers of these structures as well as to the potential users that every structure contained valuable information that needed to be archived and maintained for posterity. In June 1971, key representatives of the two communities attended the Cold Spring Harbor Symposium on Quantitative Biology and agreed that the time was right to create the PDB. The PDB was established in October of that year at the Brookhaven National Laboratories as an archive for biological macromolecular crystal structures.

5

Based on a presentation by Zukang Feng, Protein Data Bank, United States, available at http://www7.nationalacademies.org/usnc-codata/ZukangFengPresentation7.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

In the 1980s, the number of deposited structures began to increase dramatically. This was primarily due to the improved technology for all aspects of the crystallographic process, the addition of structures determined by nuclear magnetic resonance methods, and the changes in the research community’s views about data sharing. By the early 1990s, the majority of journals required a PDB accession code and government funding agencies adopted the guidelines published by the International Union of Crystallography requiring the deposition of data for all protein structures. The archive’s growth has been accompanied by increases in both data content and the structural complexity of individual entries over the years.

In October 1998, the management of the PDB became the responsibility of the Research Collaboratory for Structural Bioinformatics (RCSB) at Rutgers University, together with the University of San Diego Supercomputing Center and the National Institute of Standards and Technology. The vision of the RCSB is to create a resource based on the most modern technology that facilitates the use and analysis of structural data and thus creates an enabling resource for biological research. Its mission is to provide the most accurate, well-annotated data in the most timely and efficient way possible to facilitate new discoveries and advances in science.

The challenges that the PDB is addressing include the continuing increase in the number and complexity of structures, the need to develop new methods for structure determination, satisfying user demands for response to complex queries and better annotation, integrating PDB data with other genomic and proteomic information, and serving a growing and more diverse community of users. The PDB’s strategy for meeting these challenges involves the adoption of new technologies, creating extensible and portable data systems, making the archive as uniform as possible, improving communication with the users, and helping to create and enforce community policies and standards.

The Safeguarding and Sharing of Traditional Chinese Medicine Database Resources6

A big effort to develop traditional Chinese medicine database resources began in the 1980s. Since that time, nearly one hundred such databases of various sizes have been constructed by numerous universities, colleges, and

6

Based on a presentation by Baoyan Liu and Meng Cui, China Academy of Traditional Chinese Medicine.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

institutes. The digitization of information in traditional Chinese medicine information has been completed on a preliminary basis, including modern and ancient literature databases, factual and structural databases, and data warehouses.

The main organizations involved in the modern literature and factual databases are the Traditional Chinese Medical literature center and branch center, which are affiliated with the State Administration of Traditional Chinese Medicine. The national structural databases are being developed now. In 2001, the scientific experiments information database of traditional Chinese medicine was initiated using data warehouse technology and a virtual research center platform, which is now operational. The ancient literature database contains various e-books.

The development of traditional Chinese medicine databases already has made significant achievements and established a basis for the digitization of traditional Chinese medical information at the national level. Because this scientific and technical literature, and the underlying data and scientific experiments, are the result of fundamental research work for the public welfare, it needs the public support and funding of the government to guarantee its continuation. If this initiative obtains the support of the government, the main database can be completed quickly and provide a sharing mechanism as a public good.

Open Access to Scientific Data on Biological Diversity: An Urgent Need for China7

Biological diversity (also called biodiversity) is generally divided into three categories: genetic-level diversity, species-level diversity, and ecosystem-level diversity. The focus here is on data in the latter two categories in China.

China’s species-level biodiversity is immense. For example, it contains about 30,000 different plant species, and nearly 500 mammal species. In fact, it has been estimated that the 17 so-called “megadiverse” countries, which include China, contain 70 percent of the world’s species of plants and animals within their borders. China has databases for some of its species-level biodiversity. For example, the State Environmental Protection

7

Based on a presentation by James Edwards, Global Biodiversity Information Facility, Denmark, available at http://www7.nationalacademies.org/usnc-codata/JamesEdwardsPresentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

Agency of China lists 67 databases of plants, animals, and microorganisms.8 However, these are mostly small databases, except for birds and plants, and most do not have georeferenced data or follow international standards. Moreover, unlike some of the other megadiverse countries—for example Costa Rica and Mexico—China has not initiated systematic efforts to develop computerized databases about its biota, or to access the great wealth of biodiversity information about China that is contained in the world’s natural history collections. As a result, China cannot currently use this considerable body of knowledge for informed decision making.

The Global Biodiversity Information Facility (GBIF) is an international consortium aimed at making the world’s primary biodiversity data freely and openly available over the Internet to benefit society, science, and a sustainable future. Begun in 2001, GBIF’s members as of June 2004 include 41 countries and 24 international organizations, each of which agrees to set up a computer node to share primary biodiversity data. Control of the data, including the decision on what information to make available, resides with the data providers in each country or organization. GBIF’s role is to aid the data providers in setting up their databases and to provide a portal that allows users to search all the databases at once.9 GBIF is thus a network of participant nodes and other partners that agree to use common standards for data and metadata, encourage the generation and contribution of additional data and information for the network, and assure that data providers retain control of their own data.

As of June 2004, the GBIF data portal (initiated only a few months earlier) is serving nearly 24 million records containing information about specimens in natural history collections, as well as observational data. These records are being served by 63 data providers from around the world. Even though China is not yet a member of GBIF, the portal already contains approximately 45,000 records of plants and animals, representing more than 9,000 species that were collected in China.

The data being served by GBIF can be a valuable resource for many scientific and societal problems, including tracking invasive species, predicting the spread of emerging infectious diseases, optimal design of protected areas, and making decisions about where to undertake field trials of genetically modified crops. Other innovative examples of how megadiverse countries have used these data are being compiled by GBIF.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

China has been much more successful at developing and archiving ecosystem-level biodiversity data than species-level data. The Chinese Ecosystem Research Network (CERN)10 is a consortium of 33 field research stations and one synthesis center. CERN was established in 1988, and currently provides access to a wide range of ecological and environmental data, including more than 3,000 historical datasets.11 CERN has also developed a comprehensive data sharing policy and joined the International Long-Term Ecological Research Network. China should be encouraged to give similar attention to developing and archiving its species-level biodiversity information.

The NIH Roadmap for Medical Research12

The U.S. National Institutes of Health (NIH) launched the Roadmap for Medical Research initiative in 2003. It is focused on important public health challenges such as acute and chronic medical conditions, an aging population, health disparities in society, emerging diseases, and biodefense concerns. The roadmap provides a framework of priorities and a vision for a more efficient, innovative, and productive research system. It also establishes a set of initiatives that are central to improving the quality of healthy life for people in the United States and around the world.

One research priority of the NIH roadmap is the reengineering of the clinical research enterprise. This reengineering effort has multiple components, including integration of a clinical research network and facilitating data mining to advance data sharing goals. Examples of health databases and technologies that are part of this NIH focus are the lung image database, a network for translational research for optical imaging, a biomedical informatics research network, bioinformatics roadmap centers, and an insight segmentation and registration toolkit in support of the VHP.

The National Institute of Biomedical Imaging and Bioengineering, a component of the NIH, is addressing specific issues in the clinical research network initiative. There are a number of barriers to creating a successful

10

See “The Data Sharing Policy of the Chinese Ecosystem Research Network” in Chapter 4 of this report.

11

See http://www.cern.ac.cn:8080/index.jsp.

12

Based on a presentation by Belinda Seto, National Institute of Biomedical Imaging and Bioengineering, U.S. National Institutes of Health, available at http://www7.nationalacademies.org/usnc-codata/BelindaSetoPresentation7.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

network, which can include fundamental differences in informatics infrastructure and communication tools used at various research sites. To the extent that interoperability can be implemented and data and tools shared, studies can be initiated more quickly. From the perspective of the Institute, there is a need to create imaging databases and repositories where researchers can access such data. However, access to databases and data mining also requires user-friendly informatics tools. Approaches that combine images, genomic, gene expression, and patient medical records data will ultimately deliver patient-specific information at a time and place where clinical decisions are made regarding risk, diagnosis, treatment, and follow-up.

The overall implementation strategy involves the development and standardized validation of application-specific software for integration and knowledge extraction of heterogeneous, clinically relevant data. Specific functions that need to be addressed include:

  • Quantitative data integration, knowledge extraction, and clinical interpretation;

  • Linking imaging and other databases with software tools;

  • Managing software in the scientific and clinical workflow;

  • Partnerships between industry and academia for software development and dissemination;

  • Database development specifically for software validation and regulatory approval; and

  • Standards related to interoperability of imaging and other databases, and including results of quantitative analysis of metadata.

EXAMPLES OF EARTH SCIENCES, ENVIRONMENTAL, AND NATURAL RESOURCES DATA ACTIVITIES

Progress in Meteorological Data Sharing in China13

Meteorological data are a vast resource that applies to many fields. Such data are indispensable for economic and social development, scientific and technological innovation, and general human welfare. The collec-

13

Based on a presentation by Dahe Qin, China Meteorological Administration, available at http://www7.nationalacademies.org/usnc-codata/Qin_Dahe_Presentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

tion of meteorological data by the China Meteorological Administration (CMA) is supported operationally by polar-orbiting and geostationary satellites; a ground-based observation network consisting of various types of sensors; the synthesized operational transmitting system, which covers the entire world; a supercomputer; and many other facilities and equipment. The total amount of archived data is over 100 terabytes.

As a member of the World Meteorological Organization, the CMA has cooperated extensively with the international community in the exchange of meteorological data products. In December 2001, under the support of the Ministry of Science and Technology (MoST), the CMA initiated the meteorological data sharing services program as part of the China SDSP.

Developments in meteorological data sharing over the past three years include the integration of data resources, the compilation of technical standards, and the construction of service systems. The project has provided online and in-house services for scientific research, education, national constructing projects, and the public. Its data services are based on policy, technical standards, an operational data management system, and the classification of data and users. The work and experience of this project can be used as a reference in developing the China-SDSP and its other projects.

The next step is for the meteorological data sharing project to take a leading role in the China-SDSP. To accomplish this, it must:

  • Increase the capability of its service functions, and provide more and better services for national economic development and social welfare;

  • Build upon the advances in science and technology, improve data security, and provide high-quality data products for sharing;

  • Optimize the allocation of resources in the climate system and establish a “climate system” data sharing platform; and

  • Develop extensive cooperation with domestic sectors and with international meteorological Web sites, institutes, and organizations.

The project is expected to increase the sharing of meteorological data and promote the proper configuration and effective utilization of national information resources. The project also plans to quicken the implementation of the Chinese Climate Observation System, advance climate system data sharing as part of the Global Climate Observation System, and help in the enactment of the new national data sharing law and policy.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

The World Data Center for Renewable Resources and Environment14

The World Data Center (WDC) system was established by ICSU in 1957. It consists of 52 discipline centers, which are distributed in the United States, Russia, Europe, Japan, India, Australia, and China. China joined the WDC system in 1988 and it now has nine discipline centers (Astronomy, Geology, Geophysics, Glaciology and Geocryology, Meteorology, Oceanography, Renewable Resources and Environment, Seismology, and Space Sciences).

The WDC for Renewable Resources and Environment (WDC-RRE) is maintained by the Global Change Information and Research Center at the Institute of Geographic Sciences and Natural Resources Research in the Chinese Academy of Sciences. The mission of the WDC-RRE is to cooperate actively with ICSU to promote exchange and sharing of data in the fields of natural resources and the environment. The WDC-RRE attaches great importance not only to data collection, but also to data exchange and services to users. It seeks to play an important role in supporting scientific research, public decision making, scientific popularization, personnel training, and international cooperation.

The WDC-RRE is funded by the Chinese Academy of Sciences and MoST. The WDC-RRE carries out its work under the direction of the head of the Global Change Information and Research Center, and is engaged in the following activities:

  • Researching the present situation regarding renewable resources and environmental data inside and outside China;

  • Investigating user requirements for such data;

  • Working out a metadata standard for WDCs in China;

  • Establishing the Web site of the WDC for RRE;15 and

  • Producing data and providing data services.

14

Based on a presentation by Shunbao Liao, Geosciences and Natural Resources Institute, Chinese Academy of Sciences, available at http://www7.nationalacademies.org/usnc-codata/Liao_Shunbao_Presentation.ppt.

15

See http://eng.wdc.cn:8080/Metadata/index.jsp.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

Information System for Earth Science Data of China16

The Information System for Earth Science Data is a project initiated in 2002 by the Scientific Information Center for Resources and Environment (SICRE) of the Chinese Academy of Sciences. Since that time, SICRE has established regulations and policies for the collection, processing, management, and sharing of earth science data. The draft of the Chinese metadata requirements for this system is compatible with international approaches and is based on a synthesis of the main metadata criteria for earth science data. The system has begun to provide data and information services online.

The China-SDSP will be a great benefit to the researchers and the scientific activities in China. Presently, more detailed polices are being established and the common understanding of data sharing is developing. One challenge is that the standardization and digitization of earth science data is complicated, and there is a great demand for it.

This tension between the supply and demand for these data will be mitigated once the Clearinghouse of Earth Science Data is established. The Clearinghouse will reproduce and organize the primary data with lower costs and less time. With such a “top-down” process, the actualization of China’s scientific data sharing can be realized in all respects. So far, an effective data management and information catalogue has been completed and the data service function is available for researchers and decision makers. It is with these goals in mind that the Clearinghouse of Earth Science Data in China has been launched.

Present Status and Future Development Strategy of China’s Sustainable Development Information Network17

China’s Sustainable Development Information Network (CSDIN) was created by the Administrative Center for China’s Agenda 21 and nine academic institutions and scientific organizations in 1997 with the support of the Chinese MoST. The main goal of CSDIN is to provide data and infor-

16

Based on a presentation by Jiansheng Qu, Scientific Information Center for Resources and Environment, Chinese Academy of Sciences.

17

Based on a presentation by Xiaofeng Fu, Administrative Centre for China’s Agenda 21, Ministry of Science and Technology, China and Xintong Li, State Key Laboratory of Resources and Environment Information System, Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

mation for research, management, and decision-making related to sustainable development strategies that are being implemented in China. A related goal is to improve the public awareness of sustainable development issues through information dissemination, and to achieve the objectives for long-term development that were established in China’s Agenda 21 program.

The data and information in the CSDIN system are focused on natural resources, biodiversity, forestry, agriculture, macroeconomics, environmental protection, environmental technology transfer, and natural disasters, which together may be called sustainable development information. CSDIN’s multidisciplinary data have been integrated and analyzed for implementing sustainable development strategies implemented in China, such as national or regional sustainable development capacity evaluation methodologies and experiences. As of June 2004, there were 50 gigabytes of data and 17 databases.

The formation process and outcomes of the data sharing policy, and the standards for promoting information sharing in China are key aspects of CSDIN. The policy regulates CSDIN’s data access and Internet dissemination activities according to data pricing, user status (e.g., government sector, the general public, and private enterprises), and data use. For the data standards, certain metadata and data dictionary standards, a geo-grid standard, and a data classification and encoding standard have been proposed. These standards will be used in the whole country. These CSDIN standards provide important application experiences for the Chinese national e-government and scientific data sharing program. Other major elements for the development of CSDIN include the technology for the legacy system’s reconstruction, the architecture of the data warehouse, the development of a geospatial database, and technologies for geo-information services.

The future development of CSDIN will focus on geo-information service standards and technical specifications; implementation of a more user-friendly interface; enabling interoperable spatial, thematic, and temporal geo-information services; and building a sustainable development decision support system.

Progress Toward a National Spatial Data Infrastructure in China18

As society is becoming more and more information dependent, multiscale digital spatial data are urgently needed by a variety of users for supporting their planning, monitoring, management, and decision making in

18

Based on a presentation by Jun Chen, National Geomatics Center, China.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

a broad range of applications. In this context, China’s National Spatial Data Infrastructure (NSDI) is providing the digital geospatial framework. It consists of vertically and horizontally integrated geospatial databases and communication networks, as well as necessary institutional arrangements for effective flow and exchange of geospatial information.

There are four main components of China’s NSDI—data sets, the clearinghouse, policy and standards, and institutional arrangements. There are multiscale fundamental geospatial databases at the national, provincial, and municipal levels. For example, 1:1,000,000 and 1:250,000 scale databases have already been completed, and a 1:50,000 scale database is expected to be developed. Finer-scale databases are being produced at the provincial and local levels.

Several key factors influence the sharing of geospatial data. These include the data resources that are available, the policies that apply to these activities, and technical and institutional coordination. User needs also must be considered. Administrative licensing regulations in 1999 created three pricing categories for data users: government agencies can obtain data freely, not-for-profit institutions at 10 percent of full commercial cost, and corporations at full commercial prices. E-government initiatives are now driving a variety of applications that require greater data integration.

Uses of Seismic Data and the Importance of Open Access to Major Data Centers in Seismology19

Earthquakes originate at places where stress levels in the Earth have become too high. Rocks rupture, and slip occurs on a fault surface in order to release stress. The resulting seismic waves spread throughout the Earth’s interior. Eventually they reach the Earth’s surface where they may be recorded.

Every day, there are about one or two hundred earthquakes, large enough for their seismic signals to be recorded more than 1,000 kilometers from the earthquake source. Also every day, some earthquakes are large enough for their signals to be recorded all over the world. The only way to study such earthquakes effectively is to work with data recorded by seismographic stations in different countries.

19

Based on a presentation by Paul Richards, Columbia University, United States, available at http://www7.nationalacademies.org/usnc-codata/PaulRichardsPresentation.pdf.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

Scientists and engineers use seismic signals to make earthquake catalogs and bulletins, which provide basic information for various kinds of research and applications, including the study of earthquake hazards, the physics of the earthquake sources, and the structure of the Earth’s interior. The great progress in seismology in all these fields has been stimulated principally by the availability of improved and improving data.

Both China and the United States are developing major new networks of seismometers at a cost of hundreds of millions of U.S. dollars to each country. The experience in the United States indicates that data centers that do not have open data policies are rarely able to attract researchers who apply state-of-the-art methods of data analysis. Such data centers consequently find it difficult to maintain high-quality operations. The best research is usually associated with centers that make their data openly and easily available. For example, users draw attention to errors that inevitably arise in the data, they find ways to correct the data, they share information about how to use the data center effectively, and they contribute to new ways to process the data. Information from users of the data thus is needed to provide guidance for the management of a data center. From this perspective, it appears that an important part of providing international scientific leadership in seismology is making seismic data easily available to all interested potential users.

China too has excellent data sets of seismic waveforms, which will yield new insights into earthquake physics, tectonics, and the Earth’s internal structure. New methods of locating earthquakes have recently been applied to limited datasets. They indicate the potential for China to produce one of the best bulletins of seismicity in the world, covering a large region (more than 10,000,000 square km). Bulletins are a starting point for hazard management, as well as for scientific projects in the study of the Earth’s structure and earthquake physics.

Presently there are handicaps, however, in that station coordinates are not made easily available, and waveform data are accessible for only a very limited number of stations. For many years, China has not allowed even the locations of seismographic stations to be known to western scientists, except for a network of 24 stations. A consequence of these restrictions is that the locations of earthquakes in China are not as well known as they would be if more data were made generally available.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

Existing Infrastructure for International Exchange of Seismic Data20

As noted in the preceding section, knowledge of earthquake hazards can advance as a result of unrestricted sharing of seismic data, including seismic station information, bulletins, and waveforms. The infrastructure to arrange for and carry out international data exchange has existed and been used successfully for many years. The International Association for Seismology and the Physics of the Earth’s Interior (IASPEI) includes commissions to discuss specific arrangements and to establish data format standards. The International Seismological Centre (ISC) collects, merges, and redistributes seismic bulletin data. The Federation of Digital Seismic Networks (FDSN) helps broadband seismic networks to coordinate their activities. The United States and China help to fund and participate in each of these organizations, but the amount of data that the Chinese have shared lags far behind most other countries with similarly extensive earthquake monitoring.

It is useful to have data from more stations for a number of reasons in addition to the ones outlined in the preceding section. Seismology is advancing beyond computing “formal” estimates of uncertainty that are based on questionable statistical assumptions. To estimate “absolute” location uncertainty, seismologists must calibrate new techniques using very accurate locations. Many seismologists accept a location computed by a local network as absolutely accurate within 5 kilometers only if it is computed from arrival times at stations at least one of which is within 30 kilometers of the earthquake; at least ten of which are within 250 kilometers of the earthquake; and with a very good distribution around the earthquake.

Local networks may be reconfigured frequently to meet new needs, so using their data depends on detailed knowledge of the network. Thus, seismologists recognize that there may be advantages from arranging some exchanges of local network data individually.

Regional networks cover a broader area but with somewhat more sparsely distributed stations. Regional networks usually have a more standardized configuration, so data from them can be used with greater confidence by more seismologists. Many seismologists accept a location computed by a regional network as absolutely accurate within 20 kilometers if it is computed from arrival times at seismic stations that are all within

20

Based on a presentation by Raymond J. Willemann, GEM Technologies, United States, available at http://www7.nationalacademies.org/usnc-codata/RaymondWillemannPresentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

1,000 kilometers of the earthquake and have very good distribution around the earthquake.

Achieving good distribution around an earthquake usually requires at least ten seismic stations. Because China is so large, several hundred stations are needed to obtain a good distribution of regional stations around all of the earthquakes in China.

Two recent changes offer an opportunity to improve the international contributions of seismic data from China. First, IASPEI passed at its 2003 General Assembly in Sapporo, Japan, a resolution urging all seismic networks to share information about all seismic stations. The resolution states:

RECOGNIZING the need to accurately locate earthquakes and determine earthquake size, and compile complete earthquake bulletins,

URGES all operators of seismic stations and networks to deposit unique station codes with the international registry maintained by the International Seismological Centre and by the World Data Centre for Seismology, Denver, and to freely share the coordinates of all seismic stations,

URGES all operators of seismic stations and networks to keep accurate record of instrument response and performance.

China might respond to this IASPEI resolution by starting to send the ISC a computer-readable bulletin from the China Earthquake Administration that is complete for a network of several hundred seismic stations in China. Second, the FDSN has broadened the definition of membership to include many more regional and local networks of broadband seismic stations. In response, numerous provincial networks in China might join the FDSN. This would provide the networks an opportunity to contribute data from selected stations to the FDSN archive and access to software and other assistance from the FDSN to establish their own data centers to distribute their own data on the Internet.

Digital Fujian21

The digital Earth has the allure of a diamond; people from different angles all can appreciate its prism and reflection. Consistent with the con-

21

Based on a presentation by Qinmin Wang, Department of Science and Technology, Fujian Province, available at http://www7.nationalacademies.org/usnc-codata/Wang_qinmin_Presentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

cept of digital Earth, thinking globally and acting locally has become the approach for organizing the regional e-government information sharing program in Fujian Province. The governor of Fujian Province formed the Digital Fujian program to promote regional modernization during the 2001-2005 time period. The program includes the establishment of an information network, the creation of data resources (a data warehouse with metadata, data processing, modeling, and other related functions), the formation of data standards and a data sharing policy, the training of qualified technicians, and the funding of an information applications infrastructure.

After the three years of development, the Digital Fujian program has established successfully the e-government information networking and data sharing platform between the provincial government and the city and county government departments. It now includes approximately one terabyte of standardized data resources among 21 government agencies, 9 information application systems, 20 government information application projects, a provincial information technology technician training center, and a set of information sharing standards and regulations. Another key technical issue of the program is how to extract useful information from the huge Earth observation satellite databases. Therefore, data mining, data analysis, data integration, information extraction, information presentation, and intelligent decision making are also important parts of the program.

Local and Regional Earth System Science Applications and Associated Infrastructure: The Mid-Atlantic Geospatial Information Consortium22

The Mid-Atlantic Geospatial Information Consortium (MAGIC) is a federated consortium of universities in that region of the United States. Their mission is to develop a distributed remote sensing, applications, geospatial data and information system, serving a variety of users at the local, state, and regional levels. It extends wide usage of National Aeronautics and Space Administration (NASA) data. It focuses on the use of such data for NASA priorities and on the dissemination of such data through interoperable information systems that are coupled to NASA’s

22

Based on a presentation by Menas Kafatos, George Mason University, United States, available at http://www7.nationalacademies.org/usnc-codata/MenasKafatosPresentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

systems and that promote open solutions and standards. It prototypes applications for regional effects of climate phenomena and change, land use, pollution mitigation, agriculture, health, wetlands, and forestry, coupling to NASA priorities.

There are many factors related to providing open data access that must be considered by MAGIC, including the following:

  • A federated data and information management system. Support depends on individual partners and changes are not reflected at all sites.

  • Standards. MAGIC uses standards developed by the Federal Geographic Data Committee, WebGIS (Geographic Information Systems), and the Web Mapping Service.

  • Diversity of users. MAGIC serves users at the local, state, regional, and federal levels, and from the private and public sectors. These users have different needs, different expertise, and different infrastructure.

  • Large number of data types. There are over 2,000 databases with many different types of data (e.g., GIS, swath, gridded, socioeconomic). These data sources require different tools and metadata to make them useful and interoperable with federal government agencies and across all other MAGIC sources. Help desks are needed for user support.

  • Infrastructure. High-speed network access is required, but not all partners have the same access capabilities.

  • Funding. The NASA Earth System Science Applications program provided initial funding. Funding priorities change, however, and the users are a mix of public- and private-sector entities. Ultimately, the funding sources should reflect this diversity of users and include the state government and industry.

  • Training. Remote-sensing and other geospatial data are difficult to use. Users therefore frequently require training, including in the application of geographic information systems to remote-sensing data.

  • Free versus proprietary data. NASA Earth System Science data are freely and openly available. Other government (e.g., from the U.S. Geological Survey and the National Oceanic and Atmospheric Administration) data are low-cost and unrestricted. Local and state data are not free, and commercial private-sector data are proprietary. An important question is, how can a federated, Web-based system make access free to different users? This key question is still being addressed.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

THEMATIC ISSUES IN SCIENTIFIC INFORMATION, JOURNALS, AND DIGITAL LIBRARIES

Policies and Mechanisms for Literature Resource Sharing—The Practice of the Chinese National Scientific and Technical Library23

Information resource sharing benefits society and the economy by the spread of knowledge at a low cost. Intellectual property rights (IPRs) stimulate and protect creative innovation, but restrict the public diffusion and application of knowledge. They are thus positive on the one hand, but negative on the other. Enhancing the resource sharing requirement may limit the interests and benefits of the IPR owner, whereas increasing the IPRs of the owner may limit the benefits that society could gain from resource sharing. A proper balance between the two needs to be found to maximize the social benefits. The policy and legal system is facing an increasing challenge in reconciling these opposing effects.

Information resource sharing is a powerful instrument for reducing the digital divide, but the threshold for doing it is quite high due to the price of literature. At the same time, the strengthening of IPRs is the current trend in international law and trade. This is related with the fact that the main exporters of knowledge and information products are in developed countries. For developing economies, the level of IPR protection should be synchronized with the level of development of the economic and legal system. Setting the protection standards too high can be harmful. There are choices and tradeoffs in enacting these policies.

If the rigorous application of copyright law requires payment for every drop of knowledge, then the capacity of public libraries will be very limited by the shortage of funds. The urgent problem is: How to observe international IPR regulations while at the same time improving the scientific information resource capacity building and meeting the demands of social progress?

The Practice of the Chinese National Scientific and Technical Library of China

The National Scientific and Technical Library (NSTL) of China is a virtual scientific literature service center initiated in June 2000 by the MoST

23

Based on a keynote presentation by Qiheng Hu, Vice President, Chinese Association for Science and Technology, available at http://www7.nationalacademies.org/usnc-codata/Hu_Qiheng_Presentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

and the Ministry of Finance in cooperation with five other ministries, and approved by the State Council. The NSTL is composed of seven member libraries: the Center for Literature and Information of the Chinese Academy of Sciences, the China S&T Information Institute, the Machinery Industry Information Institute, the Metallurgy Industry Information Standard Institute, the Chemical Industry Information Center, the Literature Center of the Chinese Academy for Agricultural Science, and the Information Center of the Chinese Academy for Medical Science. The NSTL office is in charge of coordinating and managing the services.

The principles for operation of the NSTL are: “unified purchasing, normalized processing, combined networking, and resource sharing.” The main goals are to:

  • Build and share the scientific literature and information resources among all members through a convenient network so that this virtual library can provide better service to the research community in the areas of basic science, engineering, agriculture, and medicine;

  • Develop a high-level scientific literature collection and service center;

  • Demonstrate effective applications of information technologies in scientific literature and information services;

  • Become a pivotal force in cooperation with the broader Chinese library system and the leader in the scientific library system of China;

  • Play a major role in exchanges with the international library community; and

  • Establish the information resource base for research, training, and the popularization of scientific education.

The responsibilities of NSTL are diverse, consistent with its main goals. These responsibilities include: planning and funding of comparatively complete literature resource acquisitions; coordination of the literature collection to avoid unnecessary redundancies; formulation of standards, norms, and formats for the unified database; providing service to users throughout the country with the NSTL network service platform; development and application of in-depth resources; and domestic and international exchanges and collaborations.

The responsibilities of the NSTL are directed by a 19-member Council, which is the decision-making and leading body, and also represents NSTL members and users. The NSTL Director is appointed by the Council. The Director General is in charge of operations and hires the office staff. The Council and the Director are advised by two advisory groups, the

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

Expert Advisory Committee for Literature Resource Construction and the Expert Advisory Committee on Computer-Network Construction.

The NSTL has made significant progress in several areas, three of which are highlighted here. First, it has increased the total number and variety of the scientific literature resources in China. The investments in scientific literature have been used more effectively since the NSTL member libraries stopped the counterproductive duplication and competition in purchasing international scientific periodicals as a result of NSTL planning. Prior to 2000, there were only about 3,000 English-language scientific journals and periodicals in the NSTL member libraries. The NSTL increased the total number to 10,653 in 2000. By 2003, the NSTL holdings of journals and periodicals in English numbered approximately 13,500 with an additional 5,000 conference proceedings and handbooks in English. Also in English are 15 kinds of network edition periodicals and 65 kinds (physics and chemistry) of network edition subscriptions with access controls. The NSTL also has a rich variety of literature in the Chinese language, including more than 4,000 kinds of journals and periodicals, over 22,000 conference proceedings, and 470,000 theses for Masters and PhD degrees.

Second, NSTL has constructed a 1 gigabit/second broadband network linking the NSTL’s seven member libraries, thereby making the separate libraries a united library with digitized resources in the networked environment. The NSTL service network is also linked with the National Library, the China Education and Research Network, and the China Scientific and Technical Network via 100 megabit/second connections.

The third major area of NSTL’s progress is in developing the rapid growth of online digitized resources and related services. In 2000, before the launch of the NSTL network, there were 1.7 million information items of online data. This increased to 27 million items in 39 databases by December 2003. The NSTL provides many services including: literature searches, full-text provision, periodicals cataloging, common directory query, full-text database access for the network-edition literature, literature directory database searching, expert consulting and information services, online resources search engine, preprints system in Chinese, and a portal for preprints in English. Services that are provided to users via the Internet include free literature searching and online payment for services (around 44 percent of payable services are paid online). Requests for full-text provision are processed within two working days.

All users can access and download for free the 15 network-edition periodicals in English for which NSTL buys access. The other 65 kinds of

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

periodicals on physics and chemistry are available for 80 universities and research institutes according to the subscription license, based on access controls. Over 99 percent of the periodicals in English are accessed each year.

Despite these accomplishments, the NSTL, like all public libraries, continues to face a serious problem. The benefit of using information technology to decrease the costs of knowledge sharing is greatly counteracted by the rising costs of the imported information resources. From the experience of the NSTL, the network edition’s cost is typically very high and is proportionate to the scale of information sharing. There may be no other choice than to go back to printed editions for such high-cost imported information. Does this sound like progress in the age of digital networks?

Perspectives on the Future of the Library and on the Economics of Open Access24

The big question facing the circulation of scientific information is: What is going to become of the emerging open-access movement? There are growing numbers of arguments in support of increasing access to research through a variety of open-access models. The case for increasing access has components that can be roughly categorized as epistemological, historical, developmental, political, and public—as well as economic and legal, which are discussed elsewhere—and is directed primarily toward faculty members, students, librarians, policy makers, and the public.

The epistemological argument for open access, for example, has to do with how dependent a knowledge claim is on being fully open to review and critique. Anything that unduly restricts the circulation of knowledge, especially among “legitimate” participants in its construction, reduces that body of knowledge’s claims to validity and reliability. If the current subscription publishing model can be shown to contribute to suboptimal levels of access, then those models are not what we might call epistemologically conducive to the development of knowledge.

The concern with exploring new publishing models leads to the historical argument, which draws on precedents from an earlier era of publishing innovation, using Isaac Newton as a leading instance. Newton is well known for being a highly secretive scientist and a reluctant author. Neverthe-

24

Based on a presentation by John Willinsky, University of British Columbia, Canada.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

less, after he had tracked for only a few years the birth and emergence of the scientific periodical, with the launching of Philosophical Transactions in 1655, he understood that this publicly circulated, relatively inexpensive 16-page journal represented something important to science. He allowed one of his letters, on optics, to be published in that journal. It was a move he came to regret and did not do again, but this early experience in open access went a long way in shaping what became the norms of the scientific article.

The developmental (or developing countries) argument for increasing access has everything to do with the parallel development in China of both economic growth and scientific papers published, with an increase by a factor of 10 since the 1980s. Developing countries are suffering a knowledge gap, even as their university population grows. The universities in the West are contributing to that gap, as their work becomes increasingly expensive to access (with some generous exceptions negotiated with publishers by the International Network for the Availability of Scientific Publications and some other organizations).

The political and public arguments for open access to research are about people’s basic right to know, especially in matters of publicly funded research and scholarship. The value of exercising that right is affirmed by the health revolution brought about by public access to medical information. Public access to research also speaks to greater accountability demanded of professionals (e.g., physicians, educators, lawyers) and the increasing role of interest groups in selectively presenting information to the public, against which full access to the research would act as a safeguard.

An Open-Access Future25

The open-access movement has gained momentum over the past several years, with increased visibility and recognition from the various stakeholder communities, including research and publishing communities. Since the Budapest Open Access Initiative26 began collecting signatures in February 2002, more than 3,500 individuals and organizations have signed on with their support for free access to information. The Directory of Open Access Journals27 at Lund University, which contained over 1,100 journals

25

Based on a presentation by Helen Doyle, Public Library of Science, United States.

26

See http://www.soros.org/openaccess/.

27

See http://www.doaj.org/.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

as of June 2004, has announced the launch of its second phase, allowing sophisticated searching of the full text of the Directory’s articles. The U.K.-based open-access publisher BioMedCentral now publishes over 100 open-access journals. Since its launch in October 2003, PLoS Biology28—the first peer-reviewed, open-access journal of the Public Library of Science (PLoS)—is demonstrating remarkable strength as a competitive new journal: submissions are increasing; readership, measured as visits to the site and as downloads of individual articles, is growing; and PLoS Biology’s reputation as a high-quality, peer-reviewed journal is improving among scientists, publishers, librarians, and other stakeholder groups.

In addition to a transformation of the economics of scientific publishing, open-access publishing also represents a modernization of traditional copyright laws that are based on an outdated print-based economic model. In the open-access definitions used in both the Bethesda Principles29 and the Berlin Declaration,30 an open-access article can be reused and redistributed freely and without permission from the publisher, for any responsible purpose. Authors retain their copyright. In the case of PLoS journals, the copyright license is the Creative Commons31 “attribution license,” which preserves the author’s right to be acknowledged for the original work.

Many journals that are labeled open access are in fact free access, meaning that the restrictions on use and distribution are the same as for many subscription-based journals. It is worth noting that researchers themselves virtually never benefit financially from publication of their peer-reviewed articles. Several recent policies that may appear to be a liberalization of subscription policies are in fact small concessions to the growing demand for greater access from the scientific community that produces the articles, concessions made at little economic risk to the publishers.

The sharing of data, reagents, and ideas is fundamental to the scientific process itself. Open-access publishing, including both the unfettered distribution and searching afforded by online free access and the unlimited creative reuses permitted by less restrictive copyright licenses, will facilitate the advance of science and medicine.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

Other Opportunities in the Changing Information Environment32

It has been recognized that research and access to research results play a vital role in the development of all countries, and that transitional and developing countries often cannot obtain and make use of research that would benefit them. In response to this “information divide,” the International Network for the Availability of Scientific Publications (INASP) was established in 1992 as a program of ICSU to provide support to networking and partnerships with the aim of bridging the increasing information divide between the developed and developing world. It now operates a range of programs to support access to information for researchers, health workers, and rural development experts. Access to research information is facilitated through its Programme for the Enhancement of Research Information (PERI). This program not only provides access to international research, but facilitates activities to support national publications to increase their visibility and long-term sustainability. Research information is provided in many different forms (e.g., datasets, publications), but the scholarly journal remains one of the prime vehicles for accrediting and disseminating research information. Of course, both national and international information is of importance to research. PERI provides support for (1) access to global information, (2) increased visibility for local publications, (3) training in the use and management of online information, (4) support for publishers and editors, and (5) research and networking support. INASP negotiates access to as many required resources as possible with content owners and publishers. The exact cost of each resource is related to the GDP of the country, and although many of the resources are available without cost as part of PERI, others are obtainable at up to 98 percent discount on the normal subscription rates. Nearly all of the resources are available on a countrywide license basis, meaning that anyone in an educational, research, or nonprofit environment is eligible to access them.

The Journals OnLine (JOL) project supports a methodology to enable national publications to have an online presence, to increase their visibility, and to promote communication with readers and authors. It has been particularly successful in Africa with the African Journals Online (AJOL) ser-

32

Based on a presentation by Pippa Smart, International Network for the Availability of Scientific Publications, United Kingdom, available at http://www7.nationalacademies.org/usnc-codata/PippaSmartPresentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

vice that now includes 184 journals, and also supports full-text, online publishing. This service has been operational since 1998 and was recently re-launched on a new platform. The new software enables individual publishers to load, edit, and correct their own content to further support development and use of online communication. The JOL methodology and software are available for other countries and regions to adopt for their own publications.

Of course, there continue to be many challenges for information access, and INASP is continually updating its activities to respond to requests from partners and to increase its capabilities to develop sustainable methodologies.

Scientific Information and Digital Libraries: Can Developing Countries Become Key Players in the Information Society?33

In most developing countries, politics rather than markets drive knowledge diffusion. Whether new technology is adopted or not by a country is largely determined by political will, its patterns of power and influence, and resource-allocation policies. Politics are thus a decisive variable driving the transition of developing countries to the information society.

In addition to the political hurdles in most developing countries, there are many other factors working against preservation of and open access to scientific information in developing countries. These factors may be characterized as institutional, economic, and social, although many of them intersect and are difficult to separate out.

Generally speaking, the institutional culture creates barriers to the creation and diffusion of knowledge. For example, government officials tend to believe that access to knowledge is automatic once connectivity to digital networks is ensured. There is no institutional framework for data preservation mechanisms (e.g., the demise of the South African Data Archive). Moreover, there is no legislative mandate that obliges researchers that are publicly funded to make available research findings publicly accessible. The lack of comprehensive strategies and policies for data management are a major barrier to open access in developing countries.

From an economic standpoint, a large percentage of the population in

33

Based on a presentation by Lulama Makhubela, National Development Agency, South Africa, available at http://www7.nationalacademies.org/usnc-codata/LulamaPresentation.ppt.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

developing countries currently falls below the poverty line. Income distribution remains highly unequal, and poverty and inequality continue to exhibit strong geographic and racial biases. Public funding for the development and connectivity of information and communication technologies (ICTs)—from both national and global sources—continues to be inadequate. The resulting lack of access to ICTs further restricts the potential use of even the information that is otherwise openly accessible online.

The social barriers are no less daunting. Human resources in the scientific and technical domains are not adequately developed. Moreover, there is a “brain drain” of highly trained scientists and professionals from the developing to the more developed countries. The organizational complexity of scientific communities and the esoteric language of scholarly journals create further roadblocks to the broad transfer of knowledge. Within the broader society, the reading environment is poorly developed. Only a small percentage of the population uses library facilities, and most rural areas do not have libraries. Educational gaps mean that access to information does not necessarily result in access to knowledge. The simple availability of information technologies has shown to be insufficient to guarantee proper diffusion of scientific knowledge across society. Achieving mastery of technological change in the economy and society remains elusive.

Some actions that may be considered to help ameliorate the barriers outlined above include the following:

  • There must be a greater effort to develop human resources in research, especially in national tertiary education programs, and to connect research more effectively with productive sectors and the specific needs of society. The curricula for research professionals should include digital data and information management principles and techniques.

  • The adoption of rigorous and more collaborative approaches in addressing the lack of leadership and professional management, which has resulted in the poor implementation of data preservation in many areas, is vital. This needs to be coupled with a greater effort on developing a better understanding of the value of data and information preservation for future access among working scientists.

  • A strategic approach is also needed to leverage resources and maximize effectiveness by increased collaboration among developing countries, and to document and disseminate best practices. This goal can be promoted by forming data archiving groups at different geographic levels to overcome the isolation of individual data archivists and promote beneficial

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

exchanges. This will help harness the human intellectual capacity with collective expertise in issues of data preservation and open access. Cooperation should be encouraged for richer quality and more impact, rather than merely for the sake of cooperation.

  • At the broader social level it is important to promote the use of scientific knowledge and of information technologies among all population strata, as well as to reach out to the literate poor. Investments in technological innovations deserve high priority because in some cases they can overcome the constraints of low incomes and weak institutions.

In conclusion, the discourse on scientific data and information resources and on digital libraries needs to be understood in the political, economic, and social context of developing countries. The future global information society may be one of widespread and beneficial international collaboration, or one of highly stratified access to knowledge. In order for the first scenario to prevail, the external barriers to access to knowledge need to be reduced, and the perverse internal dynamics preventing many developing countries from joining the global scientific community as active participants must be changed. A search for effective strategies for preservation of and open access to scientific information in developing countries will remain utopian unless those countries themselves become actively integrated in the broader information society.

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×

This page intially left blank

Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 62
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 63
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 64
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 65
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 66
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 67
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 68
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 69
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 70
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 71
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 72
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 73
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 74
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 75
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 76
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 77
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 78
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 79
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 80
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 81
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 82
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 83
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 84
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 85
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 86
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 87
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 88
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 89
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 90
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 91
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 92
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 93
Suggested Citation:"5 Summaries of Presentations on Thematic Issues." National Research Council. 2006. Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11710.
×
Page 94
Next: Appendix A International Workshop on Strategies for Preservation of and Open Access to Scientific Data »
Strategies for Preservation of and Open Access to Scientific Data in China: Summary of a Workshop Get This Book
×
Buy Paperback | $50.00 Buy Ebook | $39.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF
  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!