Presentations on China’s Scientific Data Sharing Policy and Project
THE DEVELOPMENT OF CHINA’S SCIENTIFIC DATA SHARING POLICY Jinpei ChengVice Minister, Ministry of Science and Technology of China
At the present rate of rapid advances in modern science and technology many areas, such as cosmology, earth system science, cognitive science, and nonlinear science, are becoming the new scientific frontiers. The integration of information science, bioscience, and materials science, as well as the interaction between natural sciences and social sciences, marks our entrance into a decade of unprecedented intensive knowledge and innovation.
Since opening to the outside world, the Chinese government has attached much importance to the development of science and technology, as evidenced by the national strategies of “National Renewal through Science and Education” and “Realization of a Prosperous China through Human Education.” The Chinese government recognizes that knowledge and innovation are essential elements for realizing the goals outlined in these strategies, and the key components for obtaining and fostering national and international competitiveness. In recent years, China has achieved a series of advances in science and technology, including manned space flight, super-hybridized rice breeding, and supercomputer research and develop-
ment. These achievements highlight some of China’s ability for innovation. Science and technology are playing important roles in agricultural advancement, industrial technology upgrades, socially sustainable development, and the evolution of China’s advanced-technology industries.
In 2002, the Chinese government established a new initiative with the central objective of building an affluent society throughout the country. In 2003, the Third Plenary Session of the 16th Central Committee of the Chinese Communist Party formulated and adopted five scientific development goals that collectively direct the national economic and social development. A large gap still exists, however, between the requirements for economic and social development in China and the capacity of its science and technology to meet these requirements. For example, insufficient investment in science and technology infrastructure, lack of world-class research teams, and an outdated research management system greatly constrain China’s innovation and international competitive ability in science and technology. Among these constraining factors, the inefficient use of scientific and technological resources and the repetitiveness and duplication of research efforts have been prominent bottlenecks to China’s advancement in innovation.
In 2002, the State Council authorized the Ministry of Science and Technology (MoST) to initiate a pilot project of the national science and technology infrastructure platform in coordination with 16 other ministries and departments. Based on reforms that strengthen data sharing and scientific resource system integration, this project focuses on increasing China’s international competitiveness and science and technology innovation potential. China will use modern information technology and international resources to construct the public, fundamental, and strategic science and technology infrastructure platform. The primary objective is to create an environment that fosters scientific and technological innovation by providing the necessary infrastructure that best enables advancements in science and technology, and that best supports long-term developments that are the cornerstones of discovery and innovation. The sharing of scientific data is the core component in this project.
MoST regards scientific data sharing as a national science and technology infrastructure platform, considers it of national interest, and treats it as an important research component based on several factors. Scientific data are the most active and innovative resource in the information era. Scientific data have remarkable research and application possibilities, and decision-making potential. They are fundamental for meeting the needs for the
advancement and innovation of science and technology, and for social development, economic growth, and national security. The value of scientific data resources is strongly correlated with their sharing characteristics in two respects. First, scientific research requires the use and sharing of data, information, and knowledge from other pertinent disciplines to make innovative advancements most efficiently and effectively. Solutions to world problems require an interdisciplinary approach. Second, the envisioned Chinese scientific data sharing platform would facilitate simultaneous and unlimited copying and uses of data. The scientific data-sharing platform will thus be based on the principle that all data be fully shared to facilitate the use of the data for fully realizing their maximum value.
Scientific data are derived from scientific and technological activities such as observation, monitoring, investigation, experiments, and research analysis in various organizations and institutions. The types of scientific data include numerical, spatial, graphical, and text data, which are complex, widely distributed, in multiple formats, and massive in scope.
Scientific data are a knowledge resource for the whole society, so China should manage the data sharing to best serve the entire country. China’s massive data holdings are obtained and accumulated as a result of national investment plans. As such, they are a national asset and resource shared by the whole society. The capital accumulated by the taxpayers is used to obtain scientific data. Therefore, the taxpayers should have the right to access and share the data resources, which means that the producer of public data must distribute the data to the public and serve the whole country and society. It is on this basis that the data should be shared.
The insufficient use of China’s massive data holdings has been an urgent problem. Over time, the Chinese government has organized different observations, surveys, and experiments in many scientific fields, and accumulated large quantities of scientific data. However, most of these data are not shared or used efficiently because of the lack of a policy of openness and the lack of a mechanism for sharing the data. This results in great waste since the national investment is used primarily for repeatedly collecting basic data. For example, there are 18 data receivers in the United States for the moderate resolution imaging spectroradiometer sensors on National Aeronautic and Space Administration (NASA) Earth Observing Satellites, while in China there are more than 30 stations, and their number continues to increase. Overall, the existing 5,000 to 6,000 scientific databases in China do not really support the development of the country and society. One can see that the problem of not sharing scientific data is pervasive
because of policy, management, technology, and other factors. As a result, it is difficult for scientists to obtain the needed research information, and precious national resources are wasted due to repeated and redundant collection. Therefore, scientific data sharing and the means for China to use its resources more effectively are essential.
The development of international scientific data sharing policies has provided good references for the Chinese data sharing project. In recent years, many developed countries have carried out scientific data sharing activities. For example, in the United States NASA established distributed active archive centers for earth and environmental data, the White House developed an open data management policy for global change research in 1991, and the federal government has adopted other regulations and policies concerning data management, many of them since 1990. Some international scientific organizations have also strengthened the work of data exchange and sharing. For example, the World Meteorological Organization manages a data exchange system of global meteorological data. There is also the Global Disaster Information Network, and many other global data sharing projects have been established. These examples all provide approaches that China can adopt in carrying out its own scientific data sharing.
The quick growth of China’s massive data holdings and the development of information technology have provided support for scientific data sharing. According to the statistics for the past 30 years, rapid developments of science and technology in the world have produced massive scientific data—much more than the data produced during all previous history. China has entered a new phase, with its national information infrastructure and information superhighway developing quickly. Series of networks are being established successively (e.g., the Science and Technology Network, Education and Science Network, Gold Bridge Network, Chinese Public Network, and the new broadband networks). These can all help to ensure that the scientific data are extensively, conveniently, and quickly shared. It is not only an urgent requirement for the scientific and technological innovation system to implement scientific data sharing and to create the new order of opening and distribution for scientific data, but also to promote China’s participation in the global economy.
In the process of implementing its scientific data sharing policy, China must consider national strategic requirements and the world’s technological evolution. There are five major steps that need to be emphasized as China moves forward on its data sharing strategy.
1. Change and enhance perspectives, and advocate resource sharing. For a long time, the attitude that public scientific data in China are the exclusive private possessions of individuals and departments has made the development of scientific data exchange and sharing difficult. In order to raise the awareness of scientific data sharing and to enhance its effectiveness, China must break the “information exchange barrier”; that is, change the traditional view that the scientific data resources are private property, and foster a new culture of scientific resource sharing.
Chinese researchers need to recognize that public scientific data are national resources that no organization or individual is allowed to keep privately. Public scientific data are derived from observations and experiments that are established through a national investment, as a national resource. This call for open scientific data sharing and reinforcement of the national interest subrogates the individual’s interest to the national interest, facilitates the active circulation of scientific data among science departments, and develops its substantial value in the economy, society, and research.
China should encourage organizations to devote their efforts toward scientific data sharing through improved data management systems and mechanisms. MoST has established special funds to facilitate these developments of data sharing. At the same time, MoST also has created and implemented an evaluation policy for scientific data sharing to help ensure that data sharing develops in an orderly manner.
Through the demonstration of data sharing benefits, people gradually will be encouraged to change their traditional views and to make scientific data sharing part of their research activity. Some organizations that have initiated data sharing have provided scientific data conveniently and quickly to many users, which helps to change the data users’ and the data managers’ thinking and approach. This encourages more organizations and individuals to adopt a data sharing ethic and practice.
2. Plan and develop the scientific data sharing policy and system. In order to ensure the implementation of a scientific data sharing project, it is necessary to develop and integrate a national policy and operational system. “Sharing” requires a management system to regulate and ensure data exchange; to streamline scientific data linkages among data collection, integration, and application; and to promote the relationships between data owners, administrators, and users to ensure data sharing progress.
Scientific data sharing is not limited to one type of industry, depart-
ment, or person; it needs the cooperation of the state and the whole society. The adoption of new laws can aid the national-level policy and regulation to help normalize the various social relationships associated with data sharing. Under a sound legal system, scientific data resources can be valued fully and serve scientific innovation and national development.
3. Strengthen the establishment of scientific data sharing standards. In order to share scientific data fully, a standard approach must be established. This should include establishing the technological system for the scientific data sharing platform, enacting standards for scientific data sharing and distribution, classifying data into different categories for different users, adopting the data distribution policy, and guiding data integration and communication. During the establishment of such a standard approach, it is necessary to import and use the related standards from other countries, as appropriate. It also is important to emphasize the combination of basic and universal standards, and their application.
4. Construct a service system for national scientific data sharing. It also is a very important task to implement scientific data sharing for existing data resources and for all new scientific data. Scientific data resources that are derived from government scientific and technological programs and distributed among research groups and individuals should be organized and shared efficiently.
Scientific data centers typically need to be organized at the department or organizational level. MoST has chosen some initial departments and organizations, such as resources and environment, medicine and sanitation, and agriculture, and established related data centers and networks. This preliminary work can demonstrate successful experiences and promote the development of scientific data sharing platforms.
5. Strengthen cooperation and improve the sharing of global data resources. International cooperation is essential for data resource sharing. Many important and complex scientific problems cannot be solved by only one country. Researchers need to share scientific data resources globally. China is reforming and opening, learning from other advanced experiences abroad, and encouraging its scientists to show their research results to the world. China also believes that the challenges it addresses can serve as an example for other countries.
MoST has signed bilateral and multilateral treaties on scientific and
technological cooperation with more than 100 countries. On the basis of these treaties, the Chinese government is developing activities to encourage its scientists to cooperate with international scientific organizations and to take active part in data exchanges with international data centers. China also welcomes other international scientists to China to establish long-term cooperation in the scientific data and information fields.
The world’s scientific community is rapidly entering into a new era dominated by digital technologies. Scientific data sharing, which is a basic strategy for this coming decade, can strongly improve worldwide scientific and technologic abilities with the combined efforts of international organizations, national governments, and individual scientists and engineers. A new knowledge society based on data and information sharing will be created soon.
INTRODUCTION TO THE CHINA SCIENTIFIC DATA SHARING PROJECT Xian’en ZhangDirector General of the Division of Basic Research Ministry of Science and Technology of China1
Scientific data have become a new resource in the information era and will play a key role in the process of scientific and technological innovation. More reliable, comprehensive, and richer scientific data mean more opportunities for original innovation. The fact that innovations in science and technology are, for the most part, interdisciplinary indicates that such collaboration is the future of science. Successful research depends on open access to data, information, and knowledge from various fields to the greatest extent possible.
The current status for scientific data sharing is far from meeting the demands of China’s scientific and technical development, its emerging economy, and growing national power. The scarcity of scientific and technical data resources has become a major obstacle for innovation. The existing, limited resources cannot be used fully because of the outdated research
Based on a presentation available at http://www7.nationalacademies.org/usnc-codata/Zhang_Xianen_Presentation.ppt.
management system. Funding also is not sufficient, and the supporting funding system needs to be improved immediately.
The China Scientific Data Sharing Project (China-SDSP) was launched in 2002 in response to this situation, as well as in response to meeting China’s need for sustainable development based on science and technology in the information society. China-SDSP is a part of the National Facility Information Infrastructure for Science and Technology. This presentation provides a brief introduction to China-SDSP activities.
General Considerations and Objectives of China-SDSP
The major objective of China-SDSP is to establish a data sharing architecture that facilitates the use of scientific data by establishing laws, policies, and standards that are supportive of scientific data preservation and sharing and taking advantage of information and communication technology. China-SDSP is being developed through comprehensive planning at the national level; it is collecting and reorganizing data from government agencies, institutes, programs, and individual investigators while making full use of international scientific data resources through cooperation. China-SDSP should make all these data accessible to all interested users at an affordable cost, or free if possible. By utilizing modern information technology, integrating scientific data resources from all kinds of departments, establishing a data sharing policy, and improving management, scientific data resources can be integrated into a uniform framework of national scientific data sharing management. The methodology of the China-SDSP is to build up a data center cluster and a sharing-oriented network, with the goal of forming a multi-faceted, distributed scientific data sharing system that bridges the gaps between different agencies, institutes, and geographical regions. By 2020, the project is expected to achieve the following goals: to form a scientific data management and sharing system that is more user-friendly; to develop a set of supportive laws, policies, and standards; and to establish a core base of data service professionals through a career reward mechanism. By that time, 80 percent of scientific data funded by the government is expected to be made available to the general public.
Framework Architecture of China-SDSP
Logical Framework of the Project
China-SDSP has three major elements: master databases, scientific data centers and networks, and a Gateway Web site.
Master Databases. A major part of the development of the master database component is to reorganize the existing databases. Master databases disseminate authoritative and reliable data to users. The “master database” designation normally corresponds with some specific academic discipline, and contains and describes data that are important for innovations and the advance of science. Approximately 300 such databases have already been identified.
Scientific Data Centers and Networks. As with the development of the master databases, the establishment of scientific data centers and networks is also a result of re-organizing existing data centers and networks. Priority will be given to those centers or networks that have stable data sources and well-planned data archiving in fields such as natural resources and the environment.
Gateway Web Site. The Gateway Web site will link the master databases to all of the other information in every data center. It will deliver services, including content exchange, metadata, and information distribution in order to provide one-stop querying and information dissemination. Figure 1 provides a schematic diagram of the Gateway Web site.
Scope of Data Sharing Supported by China-SDSP
China-SDSP also functions as a catalyst. Its original purpose has been to integrate publicly funded data resources, but its long-term goal is to leverage all possible data resources from government to the private sectors, and make them available to the general public.
The data flow is illustrated in Figure 2. The data may come from three types of sources: large programs funded by the central government, programs funded by other agencies and institutes, and international cooperation programs. Data will be submitted and integrated into an easy-to-use form at following common guidelines and standards. Qualified users will have the right to access these datasets.
Six data sharing systems will be on the priority list: natural resources and
environment management, agriculture, population and health, basic and frontier science, engineering and technology, and regional development.
Service Architecture of China-SDSP
China-SDSP is expected to provide services in the following four ways:
Data management services will integrate up-to-date technologies, such as distributed databases, data warehouses, metadata, and networks, and build up a distributed database system that facilitates data submission, processing, archiving, and updating.
Content services will provide content query based on metadata so that users can obtain information about specific data in a timely and effective manner.
Data services based on a successful content query service will support data browsing and downloading of many kinds of data, either spatial or nonspatial, well-structured or nonstructured.
Extended services will provide tools for various users to search in massive amounts of data, integrate data from different sources, and mining of data to find new knowledge. Data centers may develop more customized services, such as subject-oriented computation.
Major tasks that need to be addressed include scientific data resource development, standardization, and policy and law aspects.
Scientific Data Resource Development
Data resource development is the process of data collecting, integrating, re-organizing, and mining. The master databases are the natural result of this process. The major tasks are to improve existing data resources; safeguard or rescue endangered scientific data and records; develop the master databases for large research programs funded by the government; introduce international data resources based on their scientific values, quality, and usability; integrate multi-source data; and conduct value-added research.
Standardization is a prerequisite for scientific data sharing in the digital era. There are two kinds of standards being addressed by China-SDSP: platform technical standards and data sharing standards. Data standards within major application areas will also be on the list of priorities.
Policy and Law Aspects
The policy and law aspects for scientific data sharing are also key tasks that need to be addressed. Policy making should be considered first as a means for guiding legislation, which is a long and complex process and requires sufficient feedback from sharing practices. Thus both theoretical investigations and case studies are important inputs for a sound data sharing policy. The relevant policy and law system will be developed step-by-step and should handle such issues as guidelines, data classification for sharing, copyrights, incentives, performance evaluation, and so on.
The Chinese data sharing policies and legislation should be improved for several reasons. The existing scientific and technical legislation has not fully represented the sharing spirit and principles. Targets for data sharing are not defined in current policies and regulations. No relevant description for a specific data sharing system could be found in the current policies and regulations. Finally, the existing policies and legislation provide no guidance to research institutions about data sharing, or their proper rights and obligations.
As of this time, over 20 countries have issued relevant policies for scientific data sharing management, thereby forming a primitive global system for scientific data exchange and sharing services in some domains. The policies and laws of these countries provide useful examples.
Tasks that should be addressed in the policy arena include the establishment and implementation of several types of guidelines: the implementation and management guidelines of the China-SDSP; data submission guidelines of major science and technology programs funded by the government; guidelines for scientific and technological data classification for data sharing; and performance evaluation (merit appraisal) of scientific data sharing activities.
Scientific data sharing legislation in China will be important for several reasons. It will provide one of the basic resources for transferring basic scientific and technical results (i.e., data) into real productivity. The imple-
mentation of scientific data sharing could produce new types of information business, such as data product processing and data distribution services. Scientific data sharing could also provide open scientific data resources and a fair competitive environment for data resources in the information industry. For the purpose of fully participating in worldwide high-level collaboration and competition, China should improve its national scientific and technical competitive ability and lay a foundation for efficiently supporting its emerging economy.
Data sharing legislation and regulations therefore need to define property rights in data; encourage data sharing; and establish a system for supervising data sharing, for the technical decisions about data sharing, and for data sharing evaluation. There are several legislative and regulatory actions that should be taken in constructing a domestic legal system for scientific data sharing.
Revise the Scientific and Technical Improvement Law. As the basic national law of science and technology and the standard for sharing scientific and technical resources, this law should represent the most important “sharing principle” for platform construction. It should make sure the data resources are open and shared with the public by data managers and public research managers, give rights to relevant bodies that need access to data resources, as well as define the characteristics of national scientific and technical data resources and regulate public scientific and technical data resource sharing on proper terms.
Draft a Scientific and Technical Resource Sharing and Protection Law. This law should define basic obligations for data resource managers and owners regarding open and shared data resources with the public, regulate basic qualifications and procedures for users, and stipulate the requirements for providing service and education. Such measures are needed to prevent improper or illegal activities and to provide the national scientific and technical resource management center with adequate funding, information, and timely data.
Regulation on Scientific Data Sharing. This regulation should define scientific data, clarify the obligations for scientific data owners and managers to open and share their resources, and establish a standard management system for scientific data resource and standard data formats. The persons in charge should be held accountable for the waste of resources and other types of mismanagement.
Regulation on Opening Government Information. This national-level regulation will clarify the principle for government to freely issue its information, similar to the resource sharing in the scientific and technical fundamental platform. The resource sharing on the platform could be greatly improved with the announcement and implementation of this major regulation.
There may be other relevant laws and regulations that need to be considered or amended as well.
Summary of the China-SDSP Work Plan
The following activities are being implemented during the initial experimental period, 2001-2005: overall planning and design; research and planning on the policy and legal framework; drafting and issuing relevant policies and regulations; developing technology and standards; establishing scientific data centers and networks and initiating a data sharing pilot project; identifying the optimal mechanisms for existing data consolidation and sharing; launching of the program Gateway Web site, selecting 25 data centers for the data sharing pilot project, and selecting other candidate centers for further development; summing up experiences from various aspects of the experimental period; and preparing a feasibility report about the overall implementation of public data sharing in the next period of activity.
In 2001, the meteorological data sharing project was launched,2 which heralded the start of the scientific data sharing program in China.
The China-SDSP was formally initiated in 2002. By the end of 2002, another five data centers (land survey, hydrology and water resources, seismic, forestry, and agriculture) and three networks (earth system science data center network, modern agricultural technology and rural development network, and sustainable development network) had joined the pilot project.
In June 2003, a Coordinating Group and a Scientific Group were established for scientific data sharing. The main task of these groups was to develop the Plan for the China-SDSP by May 2004 with the following six major components: current status and major national requirements; overall
See the summary of the presentation on “Progress in Meteorological Data Sharing in China” by Dahe Qin in Chapter 4.
considerations; principles and objectives; strategic framework and tasks; implementation and measurements; and supporting conditions and facilities.
As of June 2004, the Gateway Web site has become pivotal to the China-SDSP. Its function and technical specifications have been clarified and the overall design and specific modules are finished. Data from the five data centers and three networks are being integrated into this Gateway and will be available by the end of 2004.
In terms of policy making, a working group for data sharing was established and investigated the current status and trend of data policy, both at home and abroad, compiled relevant materials and information, established the “Guidelines of Data Submission from Major National Programs,” began researching the framework of relevant law and policy, and finished the conceptual design for data classification for sharing.
A research group for data standards also has been established. It has investigated the current status and trend of data standards both at home and abroad, compiled relevant standards, and drafted the framework and guidelines for data sharing management. In general, the China-SDSP is still in the overall planning phase, accumulating the experiences of technology and policy making, as well as overseeing pilot data sharing projects.
During the implementation period, 2006 to 2010, the following work will need to be done: continue the establishment of data sharing technology, policy, and law; extend the program coverage of scientific data centers or networks and make them operational; gradually improve technology and standards; enforce the cooperation among data centers in different research areas; and enhance the capacity to develop high-level data products and to ensure quality. After each yearly performance evaluation of the 25 pilot data centers or networks, the qualified ones will be included in the “National Scientific Data Master Network” and will start regular operation. The amount invested in each center will depend on its merits and performance. Another 15 to 20 data centers will be launched, including 200 new master databases. By 2010, a mechanism is expected to be established through which data are submitted from various governmental agencies and programs and delivered to potential users efficiently.
In conclusion, it is important to emphasize that science respects no border. The establishment and implementation of the China-SDSP needs the support and assistance of the international community, and in turn will contribute to the development of global science and technology.