Click for next page ( 74


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 73
PART FOUR: THE LIMITS AND BARRIERS TO DATA SHARING 73

OCR for page 73
74 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA 19. Data Sharing: Limits and Barriers and Initiatives to Overcome Them – An Introduction Roger Pfister ICSU Committee on Freedom and Responsibility in the conduct of Science, France, and Swiss Academies of Arts and Sciences, Switzerland Discussing the limits and barriers to data sharing with developing countries is also about their access to information. This paper suggests that the situation is improving, albeit slowly. We will look at the developments in this area from a historical perspective. A review of the last few decades will allow seeing the current situation against that background. Calls for a new information order To locate the idea of information sharing in a historic perspective, we go back to the 1960s. Mention needs to be made of the Non-Aligned Movement (NAM) in this regard. Established in 1961, the mission of this political organization was to pursue an independent policy based on coexistence between the two power blocks, East and West, dominating the Cold War era. NAM was basically the device of the developing world, because it comprised primarily countries from Africa, Central and South America, and Asia. Following political independence, especially on the African continent, during the first part of the 1960s, members of the Non-Aligned Movement strived for economic and cultural liberation from the North, which could be seen as synonymous to the West in those days. For that purpose it propagated two initiatives during the first half of the 1970s, namely the New International Economic Order and the New International Information Order. To gain international attention and support for them, and by means of political lobbying, the latter was taken into the United Nations Educational, Scientific and Cultural Organization (UNESCO). The reason for this move was that the number of developing countries had increased so much in that UN body that developing world demands could be pushed through rather easily. As a result, the Non-Aligned Movement succeeded at the UNESCO 19th General Conference in Nairobi in 1976. A resolution was adopted that called for the free flow of information. This was facilitated by the fact that the UNESCO Director-General at the time was Amadou-Mahtar M’Bow from Senegal. However, the two initiatives came to a standstill in the mid-1980s because political quarrels over the organization’s role in this field limited UNESCO’s possibilities. A crucial loss was when the United States, by far the largest sponsor, left the organization in 1984 over that issue. Yet, the NAM’s demand for a New International Information Order had a sound basis. The notion of developing countries having been left out of the worldwide flow of information was based on realities that we would like to illustrate with some statistical data from the period before e-mail and Internet emerged as tools of communication and tools for exchanging information. These were the days when newspapers, television, and radio were the principal means for spreading information. Indicators of information sharing Figures on the number of daily newspapers available in the different world regions reveal that Africa had the fewest between 1980 and 1994, with no increase in the period under consideration, and that all developing areas fared much behind the developed Europe and North America (Table 19-1).

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 75 TABLE 19-1 Daily newspapers per 1,000 people 1980 1990 1994 N. America 247 231 209 Europe 188 220 244 Asia 87 106 116 S. America 88 95 98 Oceania 64 44 46 Africa 14 16 14 UN Statistical Yearbook; UNESCO Statistical Yearbook. Another statistical figure indicating the underprivileged access of developing countries to information is the low number of television receivers per one thousand habitants. At the same time, it is significant to note the sixfold increase in numbers in the developing parts of the world between 1980 and 1997, as compared with only some 25 percent in the developed regions (Table 19-2). TABLE 19-2 TV receivers per 1,000 people 1980 1990 1997 Developing 27 124 157 Countries Developed 424 492 548 Countries UNESCO Statistical Yearbook Radios have been an even more important source of information in the developing countries, and an almost 400 percent increase can be noted from 1980 to 1997, as compared with only 50 percent in the developed countries (Table 19-3). TABLE 19-3 Radio receivers per 1,000 people 1980 1990 1997 Developing 398 895 1,124 Countries Developed 986 1,181 1,308 Countries UNESCO Statistical Yearbook All of the above figures indicate that the issue of information flow has been of great relevance to the developing countries, and that some development and progress can be discerned with them gaining increased access.

OCR for page 73
76 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA This is to put into perspective the situation concerning the access of these countries to information, which has become increasingly and more easily available since the advent of the Internet in the mid-1990s. Figures, once again, illustrate that developing countries are lagging behind. However, there has been a phenomenal increase in the number of people in those regions using the Internet in the years from 2000 to 2010 (Table 19-4). There is still a long way to go to reach dimensions such as in North America or Europe, but there is progress. TABLE 19-4 Internet growth 2000–2010 Internet users Growth (%; Penetration 2000-2010) (% Population) N. America 146 77 Europe 352 58 Asia 622 22 S. America 1,033 35 Africa 2,357 11 http://www.internetworldstats.com/stats.htm Initiatives of the International Council for Science (ICSU) Against this background, the International Council for Science (ICSU) recognizes that access to information is crucial for both science and for a world where science is used for the benefit of society. A cornerstone of ICSU’s mission, therefore, is to promote the universal and equitable access to data and information. Several ICSU initiatives sustain this policy approach and are now mentioned in the chronological order of their establishment. The general objectives of the Committee on Data for Science and Technology (CODATA; established in 1966) are to improve the quality and accessibility of data, to facilitate international cooperation among those collecting, organizing, and using data, as well as to promote an increased awareness in the scientific and technical community of the importance of these activities. The International Network for the Availability of Scientific Publications (INASP) aims at improving access to scientific and scholarly information; fostering in-country, regional, and international cooperation and networking; and advising local organizations and funding agencies on ways to utilize information and publishing to achieve development goals. Most recently, the World Data System (WDS) was established in 2008 to inter alia, enable universal and equitable access to quality-assured scientific data, data services, products, and information. The WDS is being built on the foundation of two earlier international networks that were established in the context of the International Geophysical Year in 1958—the World Data Centers and the Federation of Astronomical and Geophysical Data Analysis Services. Apart from these bodies with a specific remit for promoting data and information sharing, three regional offices—in Africa, Asia and the Pacific, and Latin America and the Caribbean—ensure that ICSU’s strategy and activities, among them its approach to data and information sharing, are responsive to the needs of developing countries. Finally, the ICSU Committee on Freedom and Responsibility in the conduct of Science (CFRS) is also concerned with these issues as part of its mission to promote the Principle of Universality of Science.

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 77 This principle is about developing a truly global scientific community on the basis of equity and nondiscrimination, which also comprises equal and nondiscriminatory access to data and information. For this reason, the committee, and several of its members, cosponsored this international symposium organized by the U.S. National Academies, which forms the basis for the present publication. To further raise awareness among the global scientific community for these concerns, the CFRS issued an Advisory Note on the matter following this scientific meeting1 1 The CFRS Advisory Note is available in Appendix C and at http://www.icsu.org/publications/cfrs-statements.

OCR for page 73
78 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA 20. Consideration of Barriers to Data Sharing Elaine Collier National Institutes of Health, United States I will give a brief overview of the things we should think about when considering the barriers to data sharing. I will focus on questions related to data, although scientific information is equally important. First, let us look at finding out about the existence of data. Many questions come to mind, such as the following: • How do you know whether the data that you want exist or not, and how do you find out? • Is the database discoverable by humans or machines, and human or machine readable and usable? • Do you have to know a friend of a friend of a friend to get access to the data or information, or can you discover it on the Internet, or on other media? • Once you discover that the data exist, can you discover what the characteristics of the data are, and if the data are usable in ways that you want to use them? • Do you know what parameters there are among the data? • What are the requirements for access to the database, and can you discover what those requirements are? All of this should be easy. We should know whether the data exist or not, but whether we have access to them is a different question. Often their very existence is difficult to determine, particularly in scientific areas. Some of the other issues relate to the actual characteristics of data, including semantics or meaning of the data, for example: • What are the elements and what are the fields in the data, and what do they mean? • What formats are they in so that you can actually use them? • Are there readable words on a page or are the data in a data field in a relational database? • Are the data on the Web in semantic Web technology? • What kind of information and protocols are the data in? • Is the database complete? Are you getting the raw data, aggregate data, or derived data? • What is the history of the data? Who collected them? On what authority? Who curated the database, added to it, or annotated it? • Does it link to other data or other information that is out there? Is it public or private information? And again, how do you get access to that data and information? When we talk about semantics or data quality, we are referring to the actual meaning of the data. That has to do with the content. There are also some questions in this regard: • What is the content information in the data or information that you have? What is it about? What does it mean? • What is the context of that content? Is it seen from a perspective of a certain country or a profession? • What are the temporal aspects? Are you getting the early information or the latest information? • Do you have the whole picture of the time frame of the data? • What is the granularity of the database? Do you know the details about the data or only high-level information?

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 79 • In what language are the data? This is a particularly interesting issue when it comes to sharing information across countries and the world. • What is the durability of the information? Will it be there tomorrow or is it ethereal and passing? Then there are the annotations of the data, and the properties or other context information about the data. In what framework are the data? What are the community needs and requirements about the data, in the collection of the data, in your analysis, and in your use? Are there shared standards for this information, so that you can actually mix them with other data that are similar from other places and other countries and other areas? Some of the technical aspects relate not only to having standards for all these data quality measures, but also to being able to share the format so you can actually share the information electronically. Or if you are going to share it on paper in human-readable form, it again depends on what language it is in. It also depends on whether it is on a piece of paper or in a file that you can download. What is its availability? Is it persistent? If it is persistent, what version do you have of the data? Is it updated? Are there requirements for reusing the data or repurposing them? When you do that, what are your requirements of the database? Do you have to clean it? Do you have to derive it into other formats and document what you are doing with it? Did other people who did that to the data before you document them so that you know what happened to the data? Are the data coming from a repository, where the people you are getting them from are merely serving as the repository or are they actually the publishers of a database in the sense of publishing a dataset, not just publishing a paper? Again, is the data linked to other data, or is it able to be linked to other data? How do we preserve the data so that we actually have a history of what is going on, particularly as technology changes? Some of the issues relate to policy and cultural issues. Some people who collect data are concerned about their misuse or their misinterpretation, and are reluctant to share them because they are afraid somebody will use them wrongly. There are confidentiality and privacy concerns. Some relate to human data from clinical studies. Some of the concerns relate to health care data, while others are related to competitive advantage and proprietary information. There are also legal considerations across countries, which are very complex and relate to privacy issues. Intellectual property is another interesting issue. One person’s intellectual property and one country’s intellectual property is not necessarily the same as another’s. What resources does it take to make your data available for sharing? Even if a country wanted to make data available, how would they do it and would they have the resources to make the data accessible? What are the resources needed to access other people’s data? What kind of approval process do you need? What are the costs? The costs here are not only technical costs, but also the cost of policy agreements that may be required to use the data or to make them available. Some of the competing interests arise as a result of different perspectives. One perspective is that of a researcher or a collector of the data. That can be a country, a utility, or a scientific researcher. That individual has certain interests in the data, but the institution, the company, or the country government where they work may have interests that may be different from the collector’s. Second, there are national issues related to having competitive advantages. Third, there are certain advantages for cooperating and sharing data.

OCR for page 73
80 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA Next, there are the public and private issues. What are public data and what are private data? Could certain information that you consider public be considered private in some areas? Some issues relate to privacy and security. What is the impact of sharing the information on the individual? Is it an individual’s information? Is it the individual who actually collected the data? What is the impact on the institution? Will the institution look bad if they share the data? If you have hospital data and infection rates and you share them, will that impact badly on one hospital because they have a higher infection rate than another, or is that related to the patients they see? Public health data are critical to share in order to prevent public health outbreaks, but such data also can affect the reputation of countries and institutions. Some of this is cultural. There are more open groups, people, and institutions that are sharing and being willing to risk more. How much of a database should we share? Should we share the raw or aggregate data? How do we make the data unidentifiable? Does that make them better or worse to share? Clinical care and clinical research are particular issues related to privacy and security, and the latter are clearly very different across different countries. First, you have questions about the audience. One audience includes both the people who are actually sharing their data and information; the other is the people who are using those data. Then we consider cost. Is it the cost of the institution or the cost to the person? What is the cost versus the value of the information? Is it more valuable to spend the money to get the data or not? We also need to consider the effect of cost on the usability and the availability of the data. Free data may be very valuable or worthless. They could also be very expensive data to buy in cost or in effort and ultimately not be worth anything. How do we get these issues worked out?

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 81 21. Artificial Barriers to Data Sharing – Technical Aspects Donald R. Riley University of Maryland, United States When I talk about technical issues, I speak as a mechanical engineering professor who taught for 22 years at the University of Minnesota. I grew up with what eventually became the Internet. I also grew up in the university environment and have remained inside it throughout my career. My view of the world is that we teach and do research that has real impact on society. To do that, we need certain kinds of tools and access to data, resources like cloud computing, and so on. I also have been a chief information officer, meaning I had to provide and be responsible for infrastructure and tools and services in two major research universities. There was a need to develop advanced capabilities and create an infrastructure that had certain characteristics to serve those missions. This led to the creation of Internet2, which has become a global activity. I think it is fundamental and crucial. I am also now part-time chair of the Internet Educational Equal Access Foundation, which has a goal of trying to get universities and schools connected around the world. What the statistics about regional usage of the Internet do not show is whether or not people have to drive or walk or bike 2 miles to an Internet café to get that access. Is the university connected, and at what speed? In Table 21-1 and Figure 21-2, you can look at some of the statistics, in aggregate, but numbers sometimes do not tell the whole story. TABLE 21-1 2010 Internet World Statistics Credit: Copyright © 2000 - 2012, Miniwatts Marketing Group. All rights reserved.

OCR for page 73
82 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA FIGURE 21-1 World Internet Penetration Rates by Geographical Region in 2010 Credit: Copyright © 2000 - 2012, Miniwatts Marketing Group. All rights reserved. When we look at Africa, it seems like the situation is improving. People are really excited about cell phone penetration rates, because it generates some revenue and it fosters connectivity, but each device’s speed and capability affects what you can actually do. Even these limited devices are expensive when your monthly income is less than $100 a month. We have to consider more deeply the quality, access, and affordability. Is it about the Internet or is it about next-generation kinds of things that you can explore and develop and be part of the technology generation with new information and new tools? How do we collaborate beyond e-mail and tweets? From my perspective, I think universities are the cornerstone of technology development implementation. The biggest problem we have in Africa is that there are not enough people who know how to manage routers and other such things, and how to deal with bots and other malware. The real message is that we need to look beyond just focusing on the Internet and whatever the telecommunications companies provide, and look at the future. The emphasis should be on performance: advanced capabilities instead of just more bandwidth. If we really want to have a society that is able to compete in the global information economy, then we have to focus on high performance and collaboration without barriers. We have to look beyond bandwidth. We have to look at how we support advanced collaboration, teaching, and research. We need to focus on quality and recognize it is not just about saying there are things you can do from an educational standpoint, but it is also fundamentally about creating that enabler for advanced economic development and sustainability. The model that is evolving on the international scale is similar to our Internet2: the National Research and Education Network (NREN) that manages its own backbone and capabilities. In the United States, we have state-based regional optical networks that all tie into this big backbone, and with the Obama administration Broadband Technologies Opportunities Program, we are now talking about reaching out and creating “community anchor” networks in, for example, rural areas and inner cities and in other areas that need improved broadband access—that then tie into this new universal community-access public- purpose backbone. My next message concerns NRENs. If you look at the map showing international collaborators in Figure 21-2, the NRENs that are mostly in the northern hemisphere and the more economically developed countries. There also are some light gray areas, mostly in the southern hemisphere, where there is no

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 83 equivalent NREN in place yet. Most of those gaps are in developing countries. Fortunately, however, there are already some networks from the developing countries represented, so it is getting better. FIGURE 21-2 The International Reach of the Internet2 Network If you want to know more about NRENs, the European Networking Research Organization does an annual compendium of the NRENs around the world. They also produce a study called The Case for NRENS2, which looks at where NRENs exist and what their impact has been. A significant conclusion that they drew is that cutting-edge advanced capability networks provide services and encourage and enable technology spillover into the commercial sector. They also concluded that where NRENs do not exist, it hampers development and can exclude countries from achieving advances that could help their economic development. If we look at the Global Lambda Integrated Facility, which is a global research platform for advanced applications, they have multiple 10-gigabyte-per-second links, a level of connectivity that gives it the capability of shared and collaborative research, connecting different parts of the world, with one such connection to Africa in South Africa. In African nations, and other developing countries, international connectivity is poor and expensive, because • Internet cost is very high (Figure 21-3); • Satellite access limits what can be undertaken, because of latencies and asymmetrical characteristics (it assumes Africa is a user of, not a generator of, new information); and • There are significant barriers to access to information and resources, modern education, collaboration, research, and funding opportunities. 2 Available at http://www.terena.org/publications/files/20090127-case-for-nrens.pdf

OCR for page 73
86 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA 22. Scientific Management and Cultural Aspects David Carlson University of Colorado, United States3 Suppose that based on everything we have heard for the last day and a half, we took the best information and the most advanced and thoughtful practices and we decided to conduct a big science program. Let us adhere strictly to free and open access, and do it in the age of cyberinfrastructure and global connections and aspire to set new standards for data access. What would happen? Well, the International Polar Year (IPY) did that. It was the largest program since International Geophysical Year in 1957–1958. Out of IPY came some interesting lessons. These lessons have nothing to do with developing versus developed countries. Rather, they come out of the science culture itself. That is what I want to talk about today: data lessons from the International Polar Year. The two big organizations, the International Council for Science (ICSU) and the World Meteorological Organization (WMO), proclaimed the International Polar Year in 2007–2008. It spanned 2 years, because it takes a lot of time to actually do a year’s worth of work in the polar regions, and if you are going to do it in the North and the South, it takes 2 years. They convened a joint science committee to set the overall goals, and the committee immediately adopted the most advanced free and open data access policy. Using the World Climate Research Program’s program as a model, they established an International Program Office to organize and manage the event, including managing the data aspects. I had the challenge and privilege to lead that office. Then we took the crucial step of inviting individual international proposals for IPY projects. This was done deliberately to bring the best ideas forward. We did not build IPY from national plans, and we did not want to write the central plan. We wanted to solicit as many ideas as possible. This led to enormous success in the diversity of science and international participation, but it left data management as a severe challenge. Each of these projects was intended to bring forward urgent and significant research; something that was more than what you would do in your normal routine. Each of these projects was to stimulate international partnerships. The average project had about 15 international partners. Some had as many as 40 partners. They were to build connections across disciplines between science and policy, across generations. They were to store and share data. Again, IPY set explicitly a free and open data access policy. This was not a passive requirement. Every project for endorsement had to check a box just like you check an agreement when you download software. We agreed to the IPY data policies; everyone opted in. They were to take on substantial education and outreach. Each project had to show how to expand the polar community, by which we meant not only reach out from polar science disciplines to other science disciplines but also from polar science countries to other countries. It was a community building and outreach exercise. We attracted enormous interest, and ended up with 230 IPY-endorsed projects related to earth, land, people, ocean, ice, atmosphere, space, education, and outreach. For geography, we have the Arctic, the Antarctic, and projects and processes that occur in both the Arctic and the Antarctic. Of the 230 projects endorsed, 170 of them were funded, and most of these projects got most of what they requested. 3 Retired.

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 87 Another aspect I would like to highlight is the participation of indigenous partners, both as leaders of projects and as partners in the research team. We wanted to build the capacity of leadership from this community. We generated $1.2 billion in science funds. That number is growing as countries start to reassess what they actually spent on IPY, and I think that number will easily surpass $1.5 billion. One of our goals was to stimulate new money for polar research, and we succeeded. There were 63 countries and easily 50,000 participants. That number we think today is probably more like 60,000. By the standard bandwidth metrics—money, people, and countries—we did extremely well in IPY. What I want to do now is to take a more critical look at IPY. I want to assess whether we are going to meet the IPY goals and how; meaning, how are we doing today and in the future (let us say over a period of 10 years)? • Advance polar knowledge. There is no question now that we had a huge impact and there will be an avalanche of polar science coming forward. • Enhance facilities and infrastructure. Polar science involves huge infrastructure, such as ice breakers and bases in Antarctica. However, they are very expensive. They take a lot of fuel, and even though IPY enhanced them a lot, I think it is going to be a challenge for the polar community to keep going at the level we did in IPY. • Inspire the next generations. We absolutely did that. It is hard to see how we could have done better. We have an association of 2,600 young scientists in 40 countries. They have a secretariat and a Web site. They are a recognized member of the Scientific Committee for Arctic Research and the International Arctic Science Committee. They are receiving grants from the ICSU, and we are very happy about that. They have 5 years of funding for their secretariat, but I am worried about their funding in the long term. • Attract public interest. Given the resources we had, we did extremely well. In the future, I think public interest will decrease, partly because people’s attention turns to other issues, but also partly because the networks that we put in place have already started to deteriorate. • Integrated and accessible data. I am less optimistic here and I am scoring data low. It is that low score for data that causes me to worry about our ability to keep doing polar science in the future. Let me explain the problem. I am going to give you a tour of IPY data. It includes: o GPS positions, vertical and horizontal, all around Greenland and Antarctica. o Everything about sea ice: temperature, salinity, porosity, rheology, roughness, and extent. It includes an extensive measurement of contaminants, how they are transported, what the deposition processes are onto the ice, and all kinds of lake sediment cores, and from that, pollen records, volcanic records, and chemical records are developed. o For much of the work in the deep oceans, tracers, including isotopes to understand the deep ocean circulation systems. o Everything about ice cores: age, hardness, isotopes, dust levels, and other kinds of data. o On-the-ground mineralogy of exposed rock in the Arctic and especially the Antarctic, and also the microbiology of these systems. o Information about the social cultures of the inhabitants in the Arctic region. o Satellite images of new pipelines and new roads that are barriers to migration. o All kinds of data around the herding and meat-producing industry of the North. o Arctic Ocean temperature, salinity, currents, warm water under the ice, and all the aspects of that complicated system. o The great fish migrations of the North and the health of the fish. In this case, if you measure their ear bones and measure the isotopes, you can tell when they went from fresh water to sea water. Chemistry, physiology, and zoology are included.

OCR for page 73
88 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA o For birds: long migrations, fitness, health of the birds, and population level. o Measurements of the sun. What are the solar properties through the dry polar atmosphere? Yesterday we talked about the atmospheric window. Antarctica is one of those atmospheric windows. We also want to know about all the children around the world who did the IPY-wide experiments on polar days. We want to know all the students who were exposed to IPY at exhibits around the world. We want to know the number of cities and museums that had IPY events. We want to know about students who went to summer schools and gave their first poster or gave their first presentation, and we want to know the number of young scientists who have joined this new association of polar scientists. This is just a glimpse of the IPY datasets, but you see the variety and complexity. It is literally true that, among the ICSU unions, no union could identify a science that was not included in IPY. This is its strength, but this is also its challenge from a data management point of view. Think of our data management systems on two axes: one international and the other interdisciplinary. The WMO and the World Data Centers are examples of international data activities. The WMO has meteorological and hydrological data, but it is actually relatively narrow in an interdisciplinary perspective. The World Data Centers are even more so. Canada built a relatively good interdisciplinary data portal for IPY. It does not cover satellite data or public health data, but it covers much of the range of what we would call earth science data. Canada did a very good job with interdisciplinary access. IPY is thus both highly interdisciplinary and highly international. IPY is leading in this regard and I think this is where science is going to be in the future. It is going to be widely interdisciplinary and widely international, but we also expose the gaps. There really are no existing services and that is an institutional or international infrastructural gap. We have projects that have good data plans and others that have adequate data plans. I define a good data plan as those projects that have a plan for both storing and sharing the data. Thirty-five of the projects have good data plans. Another 30 of them have adequate data plans. That means they know where they are going to store their data, but they do not know how they are going to make it accessible. It might be stored regionally or locally. This means that of the 170 projects, about 105 of them do not have adequate plans at all. I lead the project office. We set the overall goals, but we do not enforce the data plans. We defer to the national organization. If you were funded by the National Science Foundation, the National Research Council of Canada, or the National Environment Research Council of the United Kingdom, that is where the enforcement happens in good data practices. If those countries have varying practices, that is, one allows a 2-year proprietary period, and one has a more aggressive and enlightened practice, then you will get that variation propagated into the projects itself. Institutional behavior inhibits the actual practice of good data stewardship. The enforcement of good data policies actually comes down to a national funding agency issue. In the IPY, where we have all these national funding agencies, the enforcement is very spotty. We have identified that there are national issues related to enforcement of national policies, but there are also individual behavioral issues. The individual issues are related to incentives for data sharing. Elaine Collier identified very nicely a variety of reasons that people have for not sharing, but fundamentally there is no equivalent incentive for sharing. If you are competing, if you are trying to protect proprietary data, there is no incentive that actually pulls you the other way. I wrote an article in Nature in early 2011, where I talked about these lessons in data sharing. I suggested publishing the data, because by doing that you can get credit for it by way of a citation, and on your

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 89 promotion and tenure-record, it shows the dataset. It is very appropriate because many of these datasets involve a huge creative effort to pull together data into quality products. This can be the same effort that we put into writing a paper. There is a new journal called Earth System Science Data. The journal itself is open access, and all the datasets that are published are at an open-access repository. We see that as a step forward on the data publication side. We still have the issue of how to encourage everyone to share their individual datasets. We started to build a Polar Information Commons (PIC). This is a joint effort with the Committee on Data for Science and Technology and IPY to set up a sharing system for datasets. What we are trying to do is to set up an exchange system and to overcome barriers related to people saying, “I am afraid to share. Someone is going to steal my data. How will I know who used it?” and set up an exchange system. Below are the PIC guidelines for data users: • You agree to cite it through acknowledgment or even coauthorship, whatever the appropriate mechanism is. • You agree to acknowledge that you got it off of the PIC. • You notify others of the use and of any issues that you noticed with the dataset. • You recognize that you as the user are responsible for determining the quality and the appropriateness. • If you make any improvements, then you return that value-added dataset to the PIC. Notice that there is a parallel set of responsibilities for contributors. Contributors agree to make their data openly accessible. We use a Creative Commons license for that. You also agree to use the PIC badge and provide adequate metadata. You agree under limits to respond to any inquiries about the dataset, and you agree to include into the PIC system a notification of any changes that you made. The key here is not these guidelines, because they are really just standard rules of proper behavior that we already use in science. The key here is that a user can be a contributor and a contributor can be a user. As long as you are on both sides of this exchange, then there is incentive to cooperate with the guidelines and norms. In conclusion, the strength of the IPY is that it has a whole collection of data, which is what it takes to do science in the polar region and, I would argue, in the tropical regions and the temperate regions today. It takes this kind of breadth to do modern science, but when we do this, we are going to have to bring the international, national, and individual data behaviors up to a new standard as well. We like to say IPY polar science has had global impact. There is no question that in the short term we have had global impact on the quality of science. We had a positive effect on the public, recruited young scientists, and produced a burst of data.

OCR for page 73
90 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA 23. Political and Economic Barriers to Data Sharing: The African Perspective Tilahun Yilma University of California, Davis, United States Development will bring food security only if it is people-centered, if it is environmentally sound, if it is participatory, and if it builds local and national capacity for self-reliance. These are the basic characteristics of sustainable human development. James Gustave Speth (UN Development Program, 1994). I think you have heard a lot about the problems we are facing in Africa with data sharing. I would like to share with you information about the work of Africans who have come to the University of California (UC), Davis. They have generated very significant scientific data. So if African scientists can do this excellent work at UC Davis, why are they not capable of doing the same in Africa? The answer may be because of political problems. For example, I was not able to go back to Ethiopia, so I established an International Laboratory of Molecular Biology (ILMB) at UC Davis in 1991. The goals of the ILMB are to (1) promote a culture of science in developing countries; (2) conduct research in tropical viral diseases, including the development of recombinant vaccines and rapid diagnostic kits; (3) transfer technologies to developing countries; and (4) address the issue of barriers to the growth of science in developing countries. Any type of aid program that does not lead eventually to self-sufficiency is actually destructive, just like welfare, and that is what has happened with many aid programs in Africa. Much of what we have heard about Africa so far is negative. Yesterday, Dr. Yang brilliantly showed how China has advanced very far both technologically and economically in the past 30 years. My goal has been to promote similar development in Africa. My idea was for African scientists to be trained at UC Davis and then to go back to train other Africans, thus allowing the successful transfer of technology and enhancement science. Unfortunately, there were many political problems and barriers that interfered with this goal, and I would like to discuss these with you. I am going to use rinderpest as an example. Rinderpest is an important disease that played a very significant role in the development of the veterinary profession. It was the virus that was introduced into East Africa through collaborations between the British and the Italians in the 1800s. It was used as germ warfare, and in fact, 32 to 60 percent of the population and more than 90 percent of all ruminant animals, including cattle, goats, and sheep, perished. I was sent to the United States to become a veterinarian and then returned to Ethiopia to aid in the attempt to eradicate rinderpest using a tissue culture vaccine. In the 1970s we managed to vaccinate more than 120 million cattle, and we then celebrated the eradication of this disease. Unfortunately, rinderpest was not eradicated in African wildlife; it did not take very long for the virus to spread back into livestock and once again become endemic in Africa. The major problems associated with the failure of the first attempt to eradicate rinderpest from Africa included the lack of heat-stability, the expensive cost of producing the vaccine, the difficulty in administering the vaccine, and a lack of continued surveillance for the disease. I decided to develop a safe, heat stable, inexpensive, and effective vaccine by implementing the new technology that uses the smallpox vaccine (vaccinia virus) as a vector for recombinant vaccines to express the protective proteins of the rinderpest virus. Using vaccinia virus as a vector, we developed several vaccines for rinderpest; the first paper demonstrating the safety and efficacy of this vaccine was published in the journal Science in 1988. Further refinements of the vaccine led to field trials in Africa and demonstrated that it can be given

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 91 intramuscularly or intradermally. We also developed and transferred an inexpensive, companion diagnostic kit that can be used to distinguish vaccinated from infected animals. This is what African scientists have done in collaboration with scientists from the United States and developing countries at the ILMB. Other vaccines we have developed utilizing this technology include a vaccine for a disease similar to foot- and-mouth disease, vesicular stomatitis. Unlike the rinderpest vaccine, the level of protection was very minimal. Another vaccine we developed in our lab was for HIV; we were the first to show that vaccines for HIV do not protect but only reduce the virus load in the blood. This was one of the three major contributions considered in the HIV field. These results indicate that many recombinant vaccines require enhancement of efficacy to be protective immunogens. What we needed then was to enhance the efficacy of these recombinant vaccines. During his inaugural speech, David Baltimore, the Nobel Laureate and former president of the California Institute of Technology, stated, “When you grow up with a world like that, there is a central aspect of society that makes no sense: politics. For years, I simply could not comprehend what that meant. When people said that in making decisions, you need to consider both the rational elements of an issue and the political ones, I did not understand what they meant—why was not rationality enough? So my whole life since I left my parents’ nest has been an education in irrationality. I have had to learn that you cannot deny the passions of people, you must accommodate them; that you cannot deny history, you must accommodate it. I think this is a perspective that all scientists who are willing to work within the larger society have to learn, and it is what sometimes limits the effectiveness of scientists when they do venture outside of their laboratories and institutions.” This is really what I have learned. Using science to make vaccines is simple, but dealing with the politics is very difficult. The Journal of Virology is the number one international specialty journal for virology. Details about the rinderpest vaccine that was tested in Kenya were published there by a group of scientists from the International Laboratory of Molecular Biology for Tropical Disease Agents who originally came from Argentina, Brazil, Afghanistan, Pakistan, the United States, Ethiopia, and Kenya. You can see that these people who came together were able to develop what Dr. Gordon Ada described in the journal Nature (January 31, 1991)4 as one of the two outstanding recombinant vaccinia virus vaccines in the world. These results help prove my point that people from developing countries, if given the right opportunity, are quite capable of competing with scientists in developed countries. Another contribution that people from developing countries have made is to use cytokine genes for enhancing the safety of vaccines by more than 100-million-fold. One brilliant Ph.D. student born in Afghanistan, published her dissertation research on the topic in the Journal of Virology and the Proceedings of the National Academy of Sciences. Another Ph.D. student, from Ethiopia, has advanced a concept to develop a safer, effective vaccine for smallpox. Extending this work, we have developed a recombinant vaccine for Rift Valley fever, a disease that affects both humans and livestock, a project sponsored by the U.S. Department of Homeland Security. Rift Valley fever virus is considered a very dangerous agent that could be used as a bioterrorist weapon. We have also developed diagnostic tests for both Rift Valley fever and foot-and-mouth disease. The program that we started at UC Davis is to advance self-reliance. Based on that, we built labs in Egypt, Kenya, Ethiopia, and Senegal. An insect virus expression system was used to produce a recombinant rinderpest protein used in the development of a diagnostic kit at the ILMB. This reagent is 4 Available at http://www.nature.com/nature/journal/v349/n6308/pdf/349369a0.pdf

OCR for page 73
92 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA produced at high levels and does not come from a virus capable of infecting mammals. Thus, what would have cost $60,000 could be produced for 5 cents. African scientists trained at the ILMB went back to Senegal, produced the diagnostic kits, and trained other African scientists from 30 different laboratories to successfully transfer the technology.5 The accomplishment of postdoctoral researchers and students in the lab is evident in publications such as Science, Nature, and other top scientific journals, including Nature Biotechnology. This is proof that these people are capable of making valuable contributions in a supportive environment. We have received the highest award in animal science, been elected to the National Academy of Sciences, and honored with the University of California Medal. All this was done by people from developing countries. Then the question I ask again is, why not in Africa? If we can do this in Davis, California, why can we not do it in Africa? I am not asking about Southeast Asia, because progress there is obvious. One of these days, I hope we can say the same thing about Africa. What are the barriers internally and externally that prevent the development of science in Africa? Natural resources are sucked dry by governments from developed countries. Moreover, African countries pay billions for military goods and warfare. When I worked in Senegal, I asked a colleague, how is it that you have absolutely no natural resources, yet you have a very high per capita income in the African continent? Her response was, “We are blessed in Senegal in that we have no natural resources. Thus, we are left alone and spared from destruction.” One example I like to use is the war that was conducted between Eritrea and Ethiopia. People from both countries speak the Tigrinya language. Each country spent more than $2 billon purchasing military weapons and jets to fight a war for a piece of desert called Badme and sacrificing more than 200,000 people. According to Africa Today, observers likened the conflict to “two bald men fighting over a comb.” What economic or strategic benefit could be gained from control of 400 square-kilometers of a rocky triangle of land over which these two former allies were now locked in battle? “Eritrea already has enough rocks,” says one analyst, adding that “if rocks were worth money, Eritrea would be the richest country in the world.” In my opinion, the countries in Africa should follow the example of China, India, and Brazil if they want to achieve development and overcome their reliance on destructive foreign aid. Africans should work toward becoming more self-sufficient. 5 Yilma, et al. (2003) Inexpensive vaccines and rapid diagnostic kits tailor-made for the global eradication of rinderpest. In Vaccines for OIE List A and Emerging Animal Diseases. Developments in Biologicals. 114: 99-111.

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 93 24. DISCUSSION BY THE WORKSHOP PARTICIPANTS PARTICIPANT: I am going to ask a question as a former president of the Committee on Data for Science and Technology (CODATA), which is an international organization concerned with advancing the use of data within science. We have heard a number of talks about various barriers, and one of the questions that we have been grappling with for a long time is what we can effectively do to break down some of these barriers, given that in the United States you buy a personal computer and you basically are ready to be a data scientist. In many parts of the developing world, that is not the case. They do not have the connectivity, software, or training to do that. What practical steps could an organization like CODATA take to actually help break down some of the barriers that we have heard discussed today? DR. RILEY: If your university was spending 30 percent of its budget on Internet connectivity, how would you feel as a faculty member? You would probably grumble. You would probably go to your faculty governance meetings and say, “We have to do something about this.” One of the interesting things is that as these National Research and Education Networks evolve, and as this international connectivity comes into place, and networks are built and the prices start to fall, you start to free up money that can then be used on other things, like computers and software and content-related activities. It is just amazing when you look at how much money is being sucked out of the universities because of this connectivity problem. We have recognized that something needs to be done. One of the reasons I mentioned the statements from the InterAcademy Panel is that we spent a long time trying to craft messages asking, “What can you as members of the Academies do? We would like you to refer your governments to these statements. Go into your societies, and ask for help in fixing this problem so we can get on to the things we really want to do.” DR. CARLSON: We cannot accept the status quo, and CODATA can be an agent to identify that interface. A good idea like an information commons rapidly runs into pushback from the research infrastructure: legal pushback, behavioral pushback, and so on. CODATA can be the agent to keep pushing forward. DR. COLLIER: I will comment from the perspective of the National Institutes of Health (NIH). The National Center for Research Resources supports a lot of development in rural parts of America. There were and still are real connectivity problems in parts of this country. I am amazed by the creativity of some of the people at those institutions who solve problems by using lower-technology solutions that actually move data. I think that we need to push forward on getting the higher performance and bandwidth, but we also need to not squash creativity and actually allow things to happen with what is available. Just saying you cannot do it until you get more is not true in many cases. DR. RILEY: I need to clarify what I was saying. I was not saying that nothing is happening. There is a lot of great activity going on, but what could happen if we could free scientists in less-developed countries from the low network connectivity they have? It is amazing how creative people everywhere have been trying to get access. However, try doing your research onsite in the middle of Botswana, for example, and you will understand what the limitations are. What could these great students and faculty do if they had the kind of connectivity I have at home, which I do not think is even as good as when I am sitting in my office, but it is wonderful compared to what they have. DR. KAHN: My question is directed to Professor Riley. On one extreme, he is talking about a poverty of technology, and he has just reemphasized that, whereas at the other extreme, Dr. Yilma is talking about the poverty of politics. Dr. Yilma said that in Africa there are not enough people to manage the routers and networks. Yet Africans have been more than capable of putting in the routers and the networks and running mobile telephony and making fortunes. Here are some data. The 300th wealthiest person in the world is an Egyptian named Naguib Sawiris. He runs a network. Another example is the 630th in the

OCR for page 73
94 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA world, who is a former Sudanese and now lives in the United Kingdom. His name is Mohamed Ibrahim. It seems where there is a demand, the technology is not the barrier. I would like your comment in reply. DR. RILEY: Cellular technology is generating a revenue stream that is now making it economically feasible to use submarine cables, and a small number of people are getting wealthy. They hire a few people to do those routers and switchers, and a lot of them are nonnative Africans. Many countries in Africa are using Chinese money, including loans to bring in Chinese engineers to build national backbones, and also highways and mines and petroleum centers using Chinese technology. When it is over they will be captive to owing the Chinese government money, and captive to the annual maintenance costs associated with the equipment. They will not have had much development of the human side of being able to run the routers and switchers other than maybe what ties into those cell towers. The problem is very deep and very complex. I would stand behind everything I said, because most of it comes from colleagues in Africa or people doing that infrastructure. There are new forms of colonialism going on, too. PARTICIPANT: There were two messages that I think were not consistent in Dr. Yilma’s speech. One message was that Africa is better off being left alone. The other message was that places like the University of California, Davis, and Columbia University have resources and training and capacity to help development constructively without leaving Africa alone. An example is my center, where we do a lot of capacity-building training on how to manage data and use data for decision making in Millennium Villages and in the African context. I am just curious which message you really think is the one that is important. DR. YILMA: I would be happy to address this. Africans lived for a millennium in harmony with the environment. Since the colonial introduction, look at what has happened to Africa. It is in total turmoil. PARTICIPANT: I was involved in the selection of the Millennium Villages from the ecological and sustainable development point of view. As far as I know, the villages do not have direct links to the people involved in the conflict. The approach of the whole network of Millennium Villages is to help build capacity and work with expertise in the developed countries. We have a big project in Haiti, which is a postconflict country just like ones that are currently in conflict. The point is not to ignore the concerns of postconflict countries. Those are the countries that in fact need help and need positive reinforcement separate from these other political forces. You did not really answer the question of whether you believe in groups at U.S. universities and other developed countries getting engaged with the developing world. Is that not a model that you also think is one that can work? DR. YILMA: Absolutely. I support such activities, and I think there should be an exchange of data and exchange of scientists. This is a wonderful thing. PARTICIPANT: We have this improvement taking place in connectivity and hopefully it will also change the possibilities for Africa. Simultaneously, however, we have a growth of databases and we get very big amounts of data. It is really hard to work with these databases if you do not have access to very good broadband. My question is: Do you think that we should consider the problems for lack of connectivity in certain parts of the world when working on the design, structure, and accessibility of these databases so they can be easily accessible? Is that a problem? DR. RILEY: The answer is, of course. One of the problems has been that people who do collaborative projects with African colleagues often make a decision to put their data on a server back home. The consequence of that is that their African colleagues then have a hard time getting the data. As this connectivity comes into place, this is going to create new opportunities where that does not have to be the case. For example, initially the submarine cables will land at places like Mombasa, Dar es Salaam, and, of

OCR for page 73
PART FOUR: LIMITS AND BARRIERS TO DATA SHARING 95 course, South Africa. The University of Dar es Salaam has been one of the first to get some of that bandwidth connectivity, and I think Kenyatta University and several in the Nairobi area have as well. Those become interesting places to think about hosting data centers as the infrastructure gets built up around the rest of the country. PARTICIPANT: We heard in some of the presentations yesterday and today that at least one of the barriers to sharing data are the lack of incentives and motivation from scientists. Professor Carlson, you articulated very nicely some suggestions for changing professional institutional structures to get scientists to share data, but in my field in global health and public health, the incentives are not enough. You cannot just have carrots. You have to have sticks. For example, the NIH, Canada, a lot of public funders, and even private funders in my home country require submission of data-sharing plans, but there is not that follow-up or at least a robust system of enforcement. I was wondering what the panel thinks about having something like sticks and what those sticks might look like. If incentives are not enough to motivate data sharing, what should the penalties look like? Is that a good idea? DR. CARLSON: I completely agree. If you look at the regulatory environment of the funding agencies and talk to the directors of those funding agencies, most of them would say, “Yes, we have those sticks in place. If scientists want to come back to us for a renewal of their grant, it already says in the grant award they must have a data plan and provide their data to a center.” The problem is not the lack of the regulations. The problem is that at the program manager level, there is no enforcement. I can say this honestly without listing any country. I have been into a research council where the director said, “We have in place procedures so that our investigators have to provide their data in order to get a renewal. However, the program managers have no idea if any of their principal investigators have provided that data. They are not set up to track it. They have no mechanism, and there is no incentive to track it.” It is actually a culture within the agencies. The agencies already have the written regulations, but the agency culture has to change. DR. COLLIER: I think that you can do some things with sticks, but if you did the enforcement, where would these people put their information? How would it be sustained? Do we have the infrastructure in any country to deal with that? Who do we hold accountable? Is it the institution? Is it the investigator? What if the investigator dies? What if the institution goes bankrupt? We do not have a real plan for how we are going to do this. I think that you can push, but it is going to take pushing in other places and providing some solutions as to how we are actually going to manage that information, where it is going to be, and how we are actually going to access it. PARTICIPANT: I am wondering if the panel would be willing to address two issues that are enormously hindering, not only to data sharing but also to general science development in emerging economies. One of them is taking again what Michael was saying about cellular telephony as well as broadband availability. He mentioned a number of individuals, and we also know that the first or second richest man in the world is a Mexican who controls telecommunications throughout the continent. What we have experienced is that telecommunications in all of South America are enormously more expensive than they are for comparable services in the United States. That is one of the issues. The other one is somewhat related, although it is strictly applied to the biological sciences, and that is the issue of intellectual property rights (IPR), either for therapeutic agents or for vaccine agents. There are a variety of agents that can be produced through recombinant technologies at extraordinarily low cost compared to the price at which they are available commercially in Africa and Southeast Asia. There is quite a bit of controversy regarding the prudence or the rationale for proceeding with producing the necessary agents regardless of IPR. Two common issues link these two: one is enormous greed, and the second is corruption. Would the panel be willing to address that?

OCR for page 73
96 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFIC DATA DR. RILEY: How do you deal with greed and corruption? I am not sure that we know how to do that. Look at what is going on around the world right now because of people getting fed up with greed and corruption in their countries, and now using these tools that seem to be unstoppable, although there certainly have been attempts to stop them by shutting down cell phones and shutting down the Internet. If we could get rid of the devil, we would solve the problem.