The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 37
PART THREE: COMPELLING BENEFITS 37

OCR for page 37
38 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA 11. Developing the Rice Genome in China Huanming Yang BGI (Beijing Genomics Institute), China In this presentation, I will share with you information about the development of the rice genome as well as other genomes in China, through international collaboration and international sharing of scientific data. The field of genomics was cultivated by the Human Genome Project (HGP). It was important to China that its scientists made a contribution, even a small one, to the HGP. China was a latecomer. One of my colleagues wrote: “The work was already underway in other countries, and China was way behind the curve. But Huanming convinced me that China’s involvement would represent a major advance for his country and for the Beijing Genome Center.” Our idea was that with just a simple contribution from us, together with all other countries, we could benefit from this project as could all the people in the world. It was so expensive that China alone could not afford it. That was the reason for us to support international scientific collaboration. As a member of the International Bioethics Committee of UNESCO, I strongly proposed that the most important and urgent issue in bioethics at present is the immediate release and free sharing of the human genome data. In a period of 5 months, I submitted four proposals for free sharing of human genome data. Then on May 9, 2000, the director-general of UNESCO issued a statement on the free availability of human genome data. This led to the Group of 8 (G8) communiqué of the Okinawa meeting in July 2000, as well as the United Nations Millennium Declaration on September 19, 2000, to ensure free access to information on the human genome sequence. As a unique member state from the developing countries in the HGP consortium, China’s contribution is not only a technical accomplishment, but also a recognized effort in the free sharing of human genome data. As Michael Morgan, who was responsible for the Wellcome Trust projects on genomics, said, “China’s unswerving support of open data release was an important factor in ensuring that the human genome sequence is the property of the whole world.” John Sulston, the leader of the HGP of the United Kingdom, said, “I especially salute the Chinese colleagues, who have affirmed the Human Genome Project’s common ownership by all mankind.” The Human Genome Project has established a brilliant example for international collaboration and data sharing. With no participation in the HGP or no international data sharing, genomics in China today would not be so advanced. With regard to the rice genome data, we freely shared and published the first draft sequence of the rice genome in April 2002. It was a big event in the history of natural science in China. That is the reason that the editorial department of Science magazine went to Beijing for the news release: “Science magazine honoring China’s sequencing of the rice genome.” I would like to quote from Science: “This team deserves enormous credit for their outstanding world-class accomplishment in a remarkably short time.” It is true that the whole job was done in 74 days. Then we released all the data. Our database on the rice genome has been one of the most popular databases in global genomics. If regarding citations as one indicator of the impact of that paper, we are proud to see that our paper has been cited about 2,000 times up to now, and the citations are still increasing. The number of specific publications in one field could be another indication of the impact of the work in that field. Since the release and the sharing of the rice genome data, the research on rice has dramatically increased and has outnumbered that of wheat, though they were more or less the same at the starting line.

OCR for page 37
PART THREE: COMPLELLING BENEFITS 39 For our students, it would be difficult to imagine how research could be done on rice without referring to its genome sequences. This is also the case for all the researchers in the field of genomics. I will give you two examples showing how significant the impact of the free sharing of the rice genome sequence can be. We are now collaborating with various institutes and universities within and outside China to sequence 10,000 rice accessions or strains. We believe that is an important stimulus for rice genome research and breeding. We also have analyzed the genomes of other important crops, such as maize and potatoes. A paper about this research will be published in a prestigious journal very soon. We also have analyzed the cucumber, and sequenced the genes related to sex expression, disease resistance, and so on. As soybean curd is an important food for the Chinese, we have also completed the genome analysis on soybeans. We have analyzed the genome for chickens. One of the most important scientific discoveries is that the genome diversity of chickens originated before their domestication. Together with other institutions, we have analyzed the genome for silkworms. The most thorough genome sequence data so far for the silkworm was released by BGI and other Chinese researchers many years ago. Now the job is continued by analyzing about 50 genomes, with the aim to reveal the domestication events and genes for this important insect. We have analyzed the panda genome. Technically it is the first successful de novo assembly of a mammal, without the help from a genetic or physical map of this animal. We have also analyzed, in collaboration with our colleagues in the United States, the genome of two ant species to reveal how they organize their social activities. We are also concerned about global climate change, so we have finished the sequencing part of some animals living in extreme environments, like the polar bear, penguin, the Tibetan antelope, and the camel. We have initiated, together with our colleagues all around the world, the International 10K Genomes Project. About 25 percent of all the vertebrates are listed in this project. We have already begun studying the first batch of more than 100 species. Now I would like to turn back to the human genome research. As I have already indicated, the Human Genome Project has shaped the field of human genomics, which is characterized by international collaboration and international data sharing. Our contribution to the HGP has been small, probably 1 percent, but we made about a 10 percent contribution with the International HapMap Project. We completed the first Asian human genome by means of the new-generation sequencers and published the results in Nature. The publication of the Asian genome revealed that the methods or sequencing technology available now are extremely powerful, as stated in a review paper in Science. Perhaps you have seen that my institute is committed to sequencing more than 10,000 individuals before the end of this year. We are also an essential part of the Human Variome Project, as well as the International Cancer Genome Project. For the International Cancer Genome Project, we are committed to gastric cancer now, and we will expand our contribution to other cancers related to digestive systems. We have initiated the 1,000 Monogenic Diseases Project, based on our own experiences working on the Mendelian disorders. One of the most important discoveries in 2009 was the “human pan-genome”. We have identified that 0.6 ~ 1.5 percent of human genome sequences actually are population-specific. We also analyzed people living in two different environments to reveal how their genomes have adapted to the environment. The population study of Tibetans and Han Chinese uncovered that regulation of the hypoxia response is central to high-altitude adaptation. That paper was published in Science.

OCR for page 37
40 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA We captured more than 80 percent of the human genome sequence from the hairs of a sample which was at least 4,000 years old. Then, we published the first catalogue of the human gut metagenomes, in which we identified at least 2,000 species of bacteria. Finally, we also have initiated a 10,000 Microbial Genome Project, as well as the Earth Microbiome Project, also through international collaboration. As scientists, we acknowledge that we have a responsibility to the world. In 1994, I told my colleagues what we should bring back is not only the technology, but also the ethical principles. As co-chairman of the European Actions for Global Life Sciences, we developed this slogan: “To raise the banner of science and humanity.” I am president of BGI (formerly known as Beijing Genomics Institute), and the institute’s slogan is: “To raise the banner of innovation and ethics.” BGI’s mission is to share what we have with others to promote global genomics, to work together with those who are unable to access what they need, and to work together with others to do something bigger, faster, cheaper, and better. It is, after all, so important to build the capacity in developing countries. I think that international sharing of scientific data is not only an issue of science, nor merely an issue of economics, but also an issue of humanity and global harmony. In genomics, we Chinese already have benefited so much from it. We are told by our ancestors that nobody can be a hero without three partners. Genomics cannot be done alone. BGI’s strategy for sustainable development requires international collaboration and data sharing. None of our achievements would have been gained without international collaboration. We are confident of our passionate young staff, but of course, enthusiasm is not enough for science. Innovation is the key for scientific development. Bioinformatics software is our major innovation. That is how we have made rice genome assembly possible, as well as most, if not all, of the software that we are using for next- generation sequencers. Nonetheless, humbleness is rooted in our Chinese culture. We began with nothing, though now my institute is poised to become the biggest DNA sequencing lab in the world. When we are called the sequencing factory for the world, we are happy to have this nickname. According to Science, “BGI- Shenzhen enhances its reputation as the world’s largest sequencing center, deciphering an ant, the Asian human genome, the human methylome, and a gene catalog of the human gut microbes.” At the same time, we appreciate again all those who have helped us. Just like the Chinese proverb says: “When you drink sweet water from the well, do not forget who helped dig it.”

OCR for page 37
PART THREE: COMPLELLING BENEFITS 41 12. Data Sharing in Astronomy Željko Ivezić University of Washington, United States I am going to summarize a few experiences we collected in astronomy with data sharing. I will first list a few questions that we deal with in astronomy just to set the stage. Then I will review what we learned from the Sloan Digital Sky Survey (SDSS) and similar surveys in the context of data sharing. My main point is that we have to submit ourselves to cost-benefit analysis and see whether or not it pays back. What we are trying to do in astronomy is fairly well summarized in these three big questions: 1. How and when did the universe begin? 2. How did the structure (planets, stars, galaxies) in the universe form and evolve? 3. Is our planetary system unique (or, is there life anywhere else)? You can rephrase this by asking if the physics we learned on earth is applicable to the rest of the universe, and if we can use the observations of the universe and of the heavens to learn more physics. These are, so to say, business questions, but we should never forget that, going from families visiting the Smithsonian Institution in Washington, D.C., to a small village in some corner of the world where grandparents tell their children the stories about stars, we are all fascinated by our place in the cosmos. Over the last decade or so, there has been an explosion in new tools and methods. Due to the fast development of computer and information technologies, we now have new tools, methods, and cutting edge sky surveys that allow us to observe the whole sky in great detail. There are three frontiers in optical astronomy. The first is to build ever-larger telescopes. These are, for example, the twin Keck telescopes in Hawaii. We build large telescopes not to get more detail, but to see fainter objects. The second frontier is to launch our telescopes in space, above the Earth’s atmosphere, which blurs the images of ground-based telescopes and absorbs all the light except visible and radio. The beautiful images from the Hubble Space Telescope are so detailed not because Hubble is exceptionally large, but because the images are not blurred by the atmosphere. Both the Keck telescopes and Hubble more or less study one object at a time, but they cannot see the whole sky. In fact, all of the area in the sky that Hubble imaged to date is less than one-thousandth of the whole sky. With new technology, we can begin to cover the entire sky and get diverse and precise data on hundreds of millions of objects. That progress brought in the third frontier in astronomy: gathering digital data for these large numbers of sources and then using statistical analysis and data-mining methods to study them. One of the key developments related to this symposium is that about a decade ago astronomers started sharing all these giant databases freely with the public. What do these data contain? First, they have images. Observers are interested in the position of the object in the sky, its brightness, the size and shape of its image. Objects are observed with different filters to get information about the spectral energy distribution. Images are processed to measure objects, such as stars and galaxies, and to construct catalogues, which are the most useful data products that we put in the public domain—of course, together with images. Why are these catalogues important? We can use them to discover new objects, classify those new objects, study them statistically, and search for unusual objects. The larger the sample is, the more unusual stars or galaxies will be found. Then we can study cosmology and try to answer these questions:

OCR for page 37
42 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA When did the universe begin? How did it develop? When astronomy databases are open to the public, more people can participate. One of the foremost examples of this new generation of surveys is the Sloan Digital Sky Survey (SDSS). It was the first time that we could get digital data, a color map of the sky, for a substantial fraction of the sky. There have been a number of projects in astronomy in the last decade that have built digital databases and made them available to the world’s public. SDSS used a camera that has 120 megapixels. It used to be the largest camera in the world. For more than half a decade we collected measurements—positions, colors, shapes, and so on—for about 400 million objects. When the database containing these measurements was released to the public, even before a substantial fraction of science analysis was completed by the project team, that was a revolution in astronomy. This approach is still not accepted in all the fields, but I would argue that there are clear benefits of doing so. The SDSS public portal provides astronomical data to anyone, anywhere. Two years ago, I was vacationing in Croatia, my country of origin, and I met a friend from Serbia who is an astronomer who had a house on the same island. We got all the data for a paper in four days by accessing this database through our laptops while drinking beer in a beach coffee bar. The portal also has a special section with many exercises for teachers and supplemental material that shows K–12 teachers how to use these data in the classroom. There are literally tens of thousands of examples where the data were incorporated in school curricula. I would like to remind you that astronomy is very effective in attracting students to science, technology, engineering, and mathematics professions. As a result of this public data release, again even before a substantial fraction of science was done by the project team itself, several thousand papers were published by scientists not associated with the SDSS— more papers than by the people who were members of the SDSS collaboration. The total data volume that was delivered through this portal was more than 100 times the size of the full SDSS database. There were more than 300 million Web hits over 6 years. The number of unique users was about 1 million. Compare this to about 10,000 professional astronomers. It is a huge impact. As a result of SDSS and previous work, the intellectual father of SDSS, Prof. Jim Gunn from Princeton University, was awarded the National Medal of Science by President Obama. Over the last few years, even more portals have been developed. One was developed by Microsoft and called WorldWide Telescope. You can use it to get not only SDSS data, but just about every large astronomical dataset. Google Sky did a similar thing. A colleague who developed Google Sky with people from Google told me that as soon as they put it online, there were several million downloads. Google thought that someone was giving them a denial-of-service attack. That shows many people were interested. What did we learn from this exercise of releasing data even before the project team did its own science? First, it requires higher standards. When you put something out to the public, you have to have documentation. You cannot put out documentation that is not spell-checked, for example. It is a very serious job to put something in the public domain. Then there are costs to keep it public, such as servers, help desks, and so on. These are the two main issues: higher standards and the cost of curation, and because there are costs, we cannot just say, let’s do it. We have to subject this idea of publicly releasing data to a cost-benefit analysis. I will go through a list of the top 10 benefits that we think we extracted from our experience in astronomy. The caveat is, not all of them may apply to your field. Also, they may vary with time. For

OCR for page 37
PART THREE: COMPLELLING BENEFITS 43 example, 20 years ago in astronomy, not many astronomers would buy the arguments that I am going to present to you, but today most of them would. Things change with time. The primary benefit, in the view of many, is that while you are taking data, and you have a finite lifetime for your project, if your dataset is complex, you need to subject it to rigorous analysis to be certain that everything is fine with your data-taking strategy, with your instrument, and so on. Some easy things, of course, you can see immediately. In astronomy, if you get a blank image, then you know that something is wrong. But there are effects at the 1 percent level. With these statistical surveys that have hundreds of millions of objects, we study such percent-level effects. Often they can be hidden systematically in the data that you cannot see with the naked eye; you cannot even see with simple analysis. You need to do complex analysis and cutting-edge research to discover something special with your data. If the data collecting is already finished, then it is too late. That is the main reason why you should do it early, before the end of your project. Then, especially in astronomy, sometimes you want to take data with different facilities at the same time. In astronomy there are objects in the sky that do not exist forever. Supernova explosions last for a month or so; similarly, there are some asteroids that pass close to Earth, and then we do not see them for decades. There are events in the sky that last a short time and we want to deploy many facilities. To enable this, you have to release your data early. If you release your data to the whole world, you will have many more users doing your science. With SDSS in particular, outsiders wrote more papers. Many of these papers had great ideas that were not even listed in the project book that was given to funding agencies to justify this investment of close to $200 million. More people mean more ideas. If you release your data for everyone to work on, you ensure that all the scientific results will be reproducible. They can be verified and you will preserve your data for posterity. That is, again, very important in astronomy. Astronomers strongly believe that code should be released with the data, not just the final product. In particular, in astronomy, because of the atmospheric effects, we see from the ground only optical and radio wavelengths, but we learn a lot about the heavens by observing in X-ray and infrared bandwidths, among others, and using telescopes on satellites. To take this same argument further, cross-disciplinary science is enabled. There are some fantastic examples from the astronomy–statistics–computer science boundary, where computer scientists use astronomical data to develop new ideas, which are then used once again in astronomy to do better science than astronomers could have done on our own. One can extend this also to other fields, because most disciplines have these giant datasets today. Many of the problems are the same: How do you do data mining in highly multidimensional space? How do you visualize massive datasets? How do you store and query 100 petabytes? These are all common problems. It is not efficient to try to solve them N times when we could solve them together. Sometimes collaborating and promising to release the data are the only way to secure scarce resources, especially in astronomy where tools are becoming very expensive. Most of the easy things were done in the last century—things you could do in a basement, with one professor and three graduate students. Today’s more difficult problems require major resources. Also if you look at the gross domestic product (GDP) of a big country like the United States, in the 1950s, it was 50 percent of the world’s economy. Today, it is only one-quarter of the world GDP: therefore, it is now much more profitable for the United States to enter international partnerships than it was 50 years ago. In astronomy, very often when you have a major undertaking, you have to get everyone on board. That is why it is good to share your data. Going back to the team that produced the data, we are all concerned about our careers, especially when

OCR for page 37
44 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA we are young postdoctoral researchers (postdocs) with small children to feed. We want to protect these young people. We want them in successful positions. Indeed, we heard many counterarguments against early release of data from SDSS, claiming that all of us who were postdocs at that time would not get jobs because other people would do the science. What happened in practice was that all of us who were postdocs had know-how about the dataset. We did all the early science, even though the data was public. There was a delay of a year to two for the world community to get on board. Furthermore, our know-how became a marketable commodity, so all the young postdocs who worked on SDSS—literally, all of them, a few dozen—are faculty today. Then, especially in astronomy, but in other fields as well, education and public outreach are important aspects of our science endeavors. You may have heard about the Galaxy Zoo project, where members of the public visually classify galaxies from the SDSS. The project has already recruited 200,000 volunteers, who went through images of a million galaxies. Evidently, the general public is greatly fascinated by astronomy. Finally, there are issues of ethics and broader impact. First, sharing is nice, as we all know from our kindergarten days. Furthermore, taxpayers paid for most of our projects. They have the right to see the results at every level of understanding, looking at astronomical images through Google Sky and WorldWide Telescope, all the way to seeing scientific results and translating these to understand better how the universe began, and on the other big questions. A big aspect of sharing data worldwide is that we are enabling democratization of scientific research. In the context of developing countries, this public release of giant datasets allows small teams to do big science. That colleague of mine and I on the beach in Croatia were a pretty small team, just two of us, and we managed to do cutting-edge science because SDSS data were made public worldwide. These are the top benefits, but then there are some other issues that we should consider when thinking about releasing data. For example, if you spend some resources to release the data, is anyone going to use them? How many customers do you expect? Of course, there are different kinds of data. Sometimes they are of great use, like astronomical data. We saw millions of hits to public data releases of people looking at images of the sky. Sometimes data are so specific that even if you release them, not many people would use them—for example, giant datastreams from particle accelerators. That is another extreme. Then there are security and proprietary issues. For example, a big astronomical project, Pan-STARRS, was done in collaboration with the Air Force. They have to mask some parts of images and not release them to the public. There are also issues of commercial gains and foreign competition. Jim Gray, who was a Principal Researcher of Microsoft Research Lab, developed parts of those tools that SDSS used to manage and release data. He liked to joke that he loved to work with astronomical data because “they are worthless.” What he meant was that they do not have commercial value. With other databases, he had to be very careful about what was released. If you are doing a major data release that can cost tens of millions of dollars, you have to justify the cost. Once, I listed these top benefits on one page. Being a scientist, I immediately asked myself if there is any predictive value in that list. Can we go through other projects and see if they are consistent with this reasoning? Should they publicly release their data? I will share with you two examples, one from astronomy and one from physics. The first one, an astronomical example, will be a new telescope, a successor to SDSS called the Large Synoptic Survey Telescope (LSST), for which I am the project scientist. Located in Chile, the collaboration includes more than 30 U.S. institutions and about a half

OCR for page 37
PART THREE: COMPLELLING BENEFITS 45 dozen European institutions. The shortest way to summarize the difference is that the SDSS gave us the first digital color map of its kind, a snapshot. With LSST, we will observe the sky in a similar fashion as SDSS, but with about 1,000 times greater resolution. Essentially, you can think of it as a digital movie— the greatest movie ever made. If you watched that movie, it would take you a full year without sleep, just staring at the screen. This telescope will have an 8-meter aperture, not a 2.5-meter one, like SDSS. Because its sensitivity will be greater, LSST will detect many more objects. Instead of 400 million objects detected with SDSS, we will collect 20 billion objects with this telescope. It will be the first time that astronomers will have cataloged more objects than living people on Earth. Everyone can get their own galaxy. When you look at the data volume, SDSS was about 40 terabytes. This is roughly the data volume of the books in the U.S. Library of Congress. Now, when you want to compare LSST, about 100 petabytes, 100,000 terabytes, it is about the same as the volume of all the words ever printed in the world since Gutenberg. Of course, so much data can be problematic. We have 20 billion objects, and we have to track them in real time. On my list of 10 benefits, indeed, all 10 of them apply to the LSST dataset. It is not a great surprise, because it is so similar to SDSS. But because of this clear win in the cost-benefit analysis, all of the data will be made public to the world. Phenomena that change in the sky will be released within 60 seconds. Then every year there will be a cumulative data release as well (about 10 petabytes of new data every year). Let me tell you just a few words about another big project in physics, the Laser Interferometer Gravitational Wave Observatory (LIGO). There are two sites in Louisiana and Washington. The project is trying to detect gravitational waves. If it is successful, that is a Nobel Prize-class discovery that would have a great impact on our understanding of the fundamental physics. Should those data be released immediately, or not? It turns out that most of the top 10 benefits apply to LIGO, too, but it is a more difficult question. To summarize, the issue of public data sharing is a matter of cost-benefit analysis. Often when you do a detailed analysis, you are led to the conclusion that you just have to do it. Given all the parameters of the problem, all the benefits, and all other forces that act upon you, you simply have to share the data.

OCR for page 37
46 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA 13. Sharing Engineering Data for Failure Analysis in Airplane Crashes: Creation of a Web-based Knowledge System Daniel I. Cheney Federal Aviation Administration, United States What I would like to address today is an information system that the Federal Aviation Administration has developed regarding transport airplane accidents. It is an initiative to gather information and lessons learned from these accidents, because they definitely have been repeated. The origin of this effort goes back to some problems that we worked on in the late 1990s and the early 2000s. There were several accidents, some of them very large that were heavily covered in the media. When we looked at them carefully, they exposed deficiencies about the way the aircrafts were operated, shortcomings in the maintenance programs, and in the fundamentals of the design of the aircraft. The processes that linked them together were inconsistent. The closer we looked, the starker the inconsistencies were. Four accidents drew particular attention. These were the TWA Flight 800, Swissair Flight 111, Alaska Airlines Flight 261, and the American Eagle Flight 4184. From an investigator’s standpoint, the TWA Flight 800 accident investigation was a very long, complicated, and technical one. It took years to assemble the wreckage, and much research was done to understand the science behind the cause. At the end, we certainly knew more about fuel system flammability than we did before this accident. Much has been done to reduce the risk of this kind of accident on airplanes today. The Swissair Flight 111 accident was also a very complicated one. Much research was done on the subject of flammability of materials, particularly the materials used in the cabin for thermal acoustic protection. Flawed assumptions were a dominant characteristic of this accident. The Alaska Airlines Flight 261 accident involved a stabilizer control system that malfunctioned due to poor lubrication. Again, there were inconsistencies in the methods by which the aircraft was being maintained. The fourth accident that caused us to initiate this accident library was the American Eagle Flight 4184 accident in Indiana. There were flawed assumptions in the way that atmospheric icing would affect the airplane. The ice would actually accumulate on parts of the airfoil that had never been observed before. It was very unusual and, before this accident, it was a part of atmospheric icing that was not understood. The result was loss of flight control. All these accidents and investigations resulted in a great deal of scientific research, testing, retesting, and evaluation. That research was documented, but was largely languishing in various archives and places not readily available to the general public. Knowledge of these and other major accidents was basically being lost with the passage of time. The more years that passed, the more people forgot the causes, and the more new folks coming into this industry had no knowledge of these deficiencies and shortcomings at all, and we were seeing repeated accidents, which is unacceptable. Awareness of information about previous accidents became key to understanding more recent accidents. The challenge that we undertook was to craft an information system that could take what could be well over a decade of work in doing the research and fixing things and make it freely and easily accessible to those who can benefit from it so we do not need to make these mistakes again. More than a century ago, George Santayana of Harvard University wrote a paper titled, Life of Reason: Reason in Common Sense, saying that “those that cannot remember the past are condemned to repeat it.”1 1 Available at http://iat.iupui.edu/Santayana

OCR for page 37
PART THREE: COMPLELLING BENEFITS 47 It really is true. We will continue to repeat things unless we are mindful about what we need to be careful. A large transport airplane accident is an enormous human tragedy, but a second tragedy would be not to learn from it and then cause similar accidents. Let me now talk about the barriers. Some other presenters talked about barriers to science and barriers to information flow. Aviation, particularly accidents, is driven by these very real and powerful barriers: • Fear of negative publicity. There is probably nothing more negative than a large transport airplane accident. It is very sad for all involved. • Lengthy investigation. Some investigations take more than a decade in order to get all the science and research together, and get a go-forward path that is solid. • Continual workforce turnover. Many of the people that come into these accident investigations will be involved for a decade or so and then move on, and we replace them with new people and fail to build corporate memory of the problems. • The information technology (IT) tools. If you go back 30 or 40 years, before the Internet and the computational systems, it was very hard to capture all these data. We have developed a “Lessons Learned from Transport Airplane Accidents” library2 that is organized, threat-based, and has search-and-sort capability. Its intent is to stop and reverse the loss of lessons so that others can benefit from what has taken 40 or 50 years to accumulate. Many of the speakers at this meeting, including myself, that came from out of town flew here on one of these airplanes. We take air travel almost for granted. The biggest risk is the car, not the airplane. We want to keep it that way and maintain a very safe aviation system. Now let me talk about the portal itself. Right now there are 57 major accident modules on the site. We are working on more all the time. We have another five being crafted. It is relatively time-consuming to create this material and involves different stakeholders. Boeing, Airbus, General Electric, Pratt and Whitney, and the airline companies all have been very helpful. They realize that their work can benefit from getting this right. The only information on the portal is information that is already publicly available. We are looking at maybe 10 to 15 years of hard work captured in a 15-minute read. Anybody who wants to know about accidents like these can get the big issues in about 15 minutes. Regarding the organization of the library, when we first started this, it was very tempting to only look at issues from technical and scientific aspects, such as fire, structural issues, flight control issues, and things like that. Then we looked at several major accidents and asked ourselves: Are we really getting the maximum value of the material? If not, what are we missing? After looking at half a dozen big accidents that we had prototyped, we realized that we were missing the bigger issues beyond the technical and scientific ones; we were missing what may be an organizational problem or a human error aspect. After the iterative process of developing the library, we agreed upon three perspectives of looking at accidents. First, we look at the accident from a perspective of what we call the airplane life cycle. This is the beginning, the operational, and the maintenance and repair perspective. Second, we examine the accident threat categories. In looking at accidents since the jet era began, it turns out that we can put them into 18 technical categories that are important. The third perspective is what we call the common themes. Every accident has a strong link to one of five common themes: flawed assumptions, human error, 2 Available at http://accidents-ll.faa.gov/.

OCR for page 37
62 THE C CASE FOR INT TERNATIONAL SHARING OF SCIENITIFC D L F DATA Malawi ha the largest inequality am as mong the six c countries, but the highest l satisfactio (see Figure 16- t life on e 4). FIGGURE 16-4 S African Co Six ountries–Ove erall Satisfaction with Life. Source: From the speaker presentati at the sym m r’s ion mposium. Credit: H HDR 2010. Overall Satisfa O action with Lif is based on responses to a question a ife n o about satisfac ction with life in a Gallup Wor Poll rld What connclusions can be made from these data? Would the da help us ide b m ata entify and fla key issues t ag to address on the global scale? As we s from thes statistics, li satisfaction has no stron relation to n s see se ife n ng poverty an inequality. nd When Ma alawi is compared with No orway, the couuntry with higghest HDI, the satisfaction-with-life ind is e dex 6 versus 8 (The lowes life-satisfac 8. st ction level is o observed in TTanzania at 2.4 and Togo a 2.6, the high at hest is in Costa Rica at 8.5.) We can also expand the research agen What po a nda: olicies would make people happier an inspired to act towards change and li nd o ife-quality im mprovement? Here, mor information is needed: f example, m re n for measures of e empowerment unpaid wom t, men’s work, a and domestic violence. Wh factors inc hat crease happine of the pop ess pulation? According to the HDR “human dev g R, velopment is t expansion of people’s freedoms to l long, healthy, the n live dvance other goals they ha reason to value, and to engage activ in shaping and creati lives, to ad ive ave vely developm equitably and sustaina ment y ably on a shared planet. Peo are both the beneficia ople aries and drive of ers human deevelopment as individuals a groups.” s and Let us turn now to som compelling examples sh n me g howing how s social statistic were used t drive policy cs to y decisions. In 2009 in Mexico, the Constitution an the Genera Law of Soc Developm were . M nd al cial ment amended based on a multidimension poverty m m nal measure reflec cting various d deprivations t household the ds face (Natiional Council for Evaluatio of Social P l on Policy). The ccountry is map pped based on the level of f deprivatio in at least one of six dim on o ucation, health care, social security, hou mensions: edu h using quality, basic

OCR for page 37
PART THREE: COMPLELLING BENEFITS 63 household utilities, and access to food. The Mexican government uses the data to monitor the effectiveness of national social assistance programs. The World Bank utilizes the Participatory Poverty Assessment (PPA) approach. The PPA allows the World Bank to incorporate views of the poor into the analysis of poverty, combine the results with other types of data, and communicate the findings to the policy makers, thus allowing the poor to influence policy. In numerous cases, PPA results contributed to a shift in World Bank lending programs. For example, in Ghana, PPAs contributed to a shift in the focus of reforms to rural infrastructure, quality and accessibility of health care, and education. In the 1990s, the World Bank focus in Nigeria was shifted to water and roads. At the same time, PPAs were used in Thailand as a part of the Social Investment Project to increase the understanding of shifting patterns of vulnerability as the impact of the Asian crisis deepened, and to inform policy makers, strengthening the capacity of the country by consolidating various types of information. The Asian Development Bank also conducted similar assessments in Laos and the Philippines. Social aspects are also key in disaster response and mitigation. Along with physical aspects, such as the probability of loss and risk, social aspects such as vulnerability and resilience should be considered. Fear, depression, despair, and post-traumatic stress can cause long-term consequences for the nation after a natural disaster. Apart from the assessment of hazards, probability, magnitude, and impact, a people- centered perspective is required to evaluate the susceptibility of the community to natural hazards. Interaction between hazard conditions and vulnerability conditions should be tracked dynamically (e.g., climate change impacts and economy impacts on migration of people to places susceptible to hazards like floods). The World Health Organization points out that the concept of risk is associated with the perception of risk, and is a characteristic of society and culture. More effort is needed in community involvement in risk mapping and analysis, enhancing vulnerability assessments, understanding of risk perception, and the capacity to adjust. The example of Mozambique shows how to link early warning of disaster and early action. After the flood in early 2000 left devastating consequences for the country, the social aspects of disaster mitigation were assessed and used to train the community to understand the risk and use warning information. The consequences of the 2007 and 2008 floods were significantly less severe. Social factors contributing to cyclones’ severity in Bangladesh are cultural. Male heads of the household historically did not want to move to shelters “unsuitable for females,” with lack of privacy and poor sanitation. Families mostly adopted a wait-and-see strategy. Vulnerability was increased by late responses. Currently the issue is being addressed by educating the population. The examples above show us how social statistics can serve as an instrument and basis for decision making and research on sustainable development processes. We are looking at the management process of decision making and utilizing this powerful instrument. We are showing how this instrument works on the global scale in the computation of indexes, which help us to understand objective trends. The focus is on the level at which the statistics are used. We reviewed three levels of application of social statistics: global, national, and local decision making. Many challenges remain, such as the sharing of data, the need for global statistical standards, data communication, intellectual property rights to the research results, and utilization of the data and research in policy decision making.

OCR for page 37
64 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA The benefits of sharing social data are many, and only a selection of them are given in this presentation. We should not forget that with benefits come responsibilities: To whom do we distribute the data? How do we protect the information against misinterpretation? How do we educate communities on how to use the data? How do we protect the data against terrorism? Who is accountable for quality, timeliness, and accessibility of data? Let me conclude by reinforcing what I said earlier. We have tremendous opportunities. With the help of powerful technologies and new innovative models, we can promote and reinforce sustainable development. We can illustrate socioeconomic and ecological aspects of the development and address key risks. To validate, calibrate, and use the models, we need accurate and timely data and collective cooperation among statistical agencies and policy makers around the world. The assessment of the contribution of policies and financing mechanisms in the improvement of people’s lives and expansion of their freedoms, cannot possibly be done without extensive shared information on social statistics. References Human Development Report (UN), 2010. Robb, Caroline M., Can the Poor Influence Policy? Participatory Poverty Assessment in the Developing World, 2nd ed. (International Bank for Reconstruction and Development–The World Bank, 2002). World Disasters Report 2009, Focus on Early Warning, Early Action (International Federation of Red Cross and Red Crescent Societies, 2009). Bakhtina, Victoria A., Sub-Saharan Africa: Sustainability Risk Discussion (paper presented at the CODATA 22 International Conference, Cape Town, October 24–27, 2010).

OCR for page 37
PART THREE: COMPLELLING BENEFITS 65 17. Remote Sensing and In Situ Measurements in the Global Earth Observation System of Systems Curtis Woodcock Boston University, United States What I would like to present today are my experiences related to the Group on Earth Observations (GEO), specifically the Forest Carbon Tracking (FCT) task. There are examples that relate to what we are interested in here, so I would like to share those. The whole point of the FCT task has been to try to stay ahead of the problem of determining if and how developing countries could start to report on rates of deforestation and degradation. This is important so they could be eligible for compensation for reducing those rates, all in support of trying to reduce greenhouse gas emissions to the atmosphere to try to mitigate increasing carbon dioxide in the atmosphere. Then the main question is, can those countries report in a reasonable way? To do this accurately, there is a need for data. The two primary kinds of data that are needed for this task are satellite observations and in situ forest data. I am mostly going to talk about the satellite observations. On the one hand, that is because if you think about data policy and data sharing, in situ observations are typically collected by people in their own backyard. On the other hand, satellite data are generally collected by somebody else. Certainly we ought to make the first step, that if we are going to collect data in other people’s backyards, we ought to at least share the data with them, rather than necessarily force them to share the data they collect. If we could at least take that step, it would be a good step forward. GEO has identified coordination among space agencies for image acquisitions as its first and top priority as it moves forward. Different satellite programs have collected data to be contributed to this task, including those in Table 17-1: Table 17-1 Image Acquisitions by Country Why I call it a first priority is that there has actually been some coordination of image acquisitions across different countries’ space agencies. Those data were collected for this task and has been contributed to countries where we are trying to demonstrate these technologies. Nonetheless, there are still questions about data access, even though the data were collected specifically for this task. Free and open sharing of data remains very much an issue. Why is this so hard? It is hard because satellite missions are expensive. They range from the hundreds of millions to billions of dollars. What makes it frustrating is that the revenues that countries get from selling data that come from these missions rarely offset anything close to a significant fraction of the cost of those missions. Just logically, it does not match up very well. I am going to try to convince you that charging for the data significantly hinders the use of the data.

OCR for page 37
66 THE C CASE FOR INT TERNATIONAL SHARING OF SCIENITIFC D L F DATA The exam I am goin to use is th Landsat mi mple ng he ission, which is the oldest o the land re of emote-sensingg missions i the United States. There have been se in e even satellites in the Lands program. Two are bare s sat ely functionin still. The eighth is sched ng e duled to be la aunched in De ecember 2012 What we ha is a datase 2. ave et from 38 yyears of satelli observatio that is a little more than 2.5 million images. An im ite ons n mage covers 185 by 185 killometers of th surface of the Earth. If y think abo all the costs associated with building and he you out g launching the satellites the data cos somewhere between $5 b g s, st billion and $1 billion. It i an expensiv 10 is ve dataset. W spent a lot of money on generating th dataset, an we would like to see it g used We n his nd get effectively in the future y e. The good news is that starting in Occtober 2008, t U.S. Geol the logical Survey (USGS), the agency that y e t distributes the Landsat data, stopped charging for the data and provided acc on a no-c basis to s t d r d cess cost anyone in the world. The data usage has gone up by a factor o 100 since it became free available. A a n T e p of t ely As taxpayer, think about th As shown in Figure 17 before th data becam freely avail his. n 7-2, he me lable, in the biggest ye ever for sa of Landsat data, they s ear ales 5,000 images. The income that was com sold about 25 ming in from seelling images, on an annua basis, was s , al somewhere be etween about $5 million an at the very nd, y best, $10 million. That is one-tenth of 1 percent o the cost of the dataset, o an annual b t of f on basis, that theey were getti back from selling the d ing m data. FIGURE 17-1 Landsat Web-enabled Monthly Sta t d atistics After all t these years, we are finally getting our m w money’s worth out of this d h dataset. This is the kind of effect that data p policy can hav on the abil to do rese ve lity earch and app plications. The other thing that is interesting to note here is t I have be using Lan r o that een ndsat data for my research over 35 years. All of a sudd we are fig den, guring out new and better w w ways to do a l of things w never thou lot we ught we could do in the past It not only allows us to know what has happened i the past, bu there are w t. in ut whole new categ vities that are starting to sh up, and it has been onl 2 years since the distribu gories of activ how t ly ution policy wa changed. I think it is a tr as ransformative kind of expe e erience for th use of satel he llite data. Let me raise one sensit issue abo this data policy. Landsa is a U.S. sa tive out at atellite, but the United State e es has had innternational collaborators. The governm sold licen to receiv ment nses ving stations t get data ove the to er years. The are about 2.5 million im ere mages archive in the Unit States and about 3 mill ed ted d lion images n in now the internaational groun stations aro nd ound the worl Although t ld. those other co ountries paid for the licens to ses

OCR for page 37
PART THREE: COMPLELLING BENEFITS 67 collect the data, the United States now is giving the data away freely. At the same time, the history of the surface of the Earth is embedded in these datasets. We cannot go back and recollect the data. It is a really valuable dataset showing part of the history of the planet’s surface. Here is where that effort stands. The USGS made an offer to the long list of countries that have data in their archives that if they would give us the data, we would reprocess them to the highest standards and return the data to them, and we would freely make available the data, if they want them. The United States has made that offer to these countries, but I think you can appreciate the delicate nature of this sort of discussion, when, after having asked people to pay for the data all these years ago, you are now saying that you want the data returned at no cost. It really is an effort to bring into a standard state of affairs a very valuable dataset. A lot of this imagery is sitting around on media that is degrading. If we are going to do this right in the future, because we have not really done this correctly in the past, we need to think about the dimensions of international coordination and collaboration on earth observations. We now have dozens of countries running satellite observation missions, but in a very coordinated manner. The first step is mission planning, so that there could actually be some compatibility between the datasets produced by these different organizations and countries. Then you have acquisition strategies. Can we get some coordination of acquisition strategies? For the first time, countries are actually beginning to collect data in concert with each other. These satellite missions are expensive; there is no question about that. A big question then is risk mitigation. If multiple countries are going to put up comparable satellite missions, which is what is happening now, can we at least share risk with each other, so that if somebody’s mission fails, the other will provide the data that would have otherwise been collected by the other satellite mission? This is starting to happen to some extent between the European Space Agency and the U.S. Geological Survey, with the Sentinel satellites and the Landsat program. If we are really going to take advantage of having multiple satellite systems collecting similar data in space, it is equally important to get the data processed and distributed in consistent fashion, so that we can actually start to make use of them. It is difficult for individual investigators to try to take data from multiple satellite systems and combine them if they do not have any coordination at the outset. This is something that, at an organizational level, would really help. My concluding thoughts are not too surprising. One is that we need satellite observations for many societal benefit areas, such as effective research and monitoring on climate and deforestation. We are in the infancy of trying to do any significant international coordination of these missions and collaboration, and there are many benefits to doing it that way. Free and open access to Earth observation, in some ways, is most important for developing countries, because they can least afford to pay for the data. I also work in an organization called the Global Observation of Forest Cover and Global Observation of Land Dynamics. We have two sets of thematically oriented groups—the forest group and a land cover group. I cochair the latter. We also support capacity building through a set of regional networks. These are groups of regional scientists with common interests in Earth observation as used for forest monitoring and other such research and applications. We have been running data-sharing workshops for the last couple of years, sponsoring people from regional networks to come to the USGS Earth Resources Observation Systems Data Center in South Dakota, and letting them take away all the data they want. This removes many of the technology barriers.

OCR for page 37
68 THE CASE FOR INTERNATIONAL SHARING OF SCIENITIFC DATA They go home with hard disks full of satellite images that they can use and share in the region. It becomes a focused network. We are going to continue to do this for at least the next 2 or 3 years. There was one workshop per year for each of the last 2 years, and we are going to do two more this year and in the next 2 years. If anybody wants to try to get people into one of these workshops, or is interested in a particular region or getting access to the data that has already been taken back to some of these regions, I would be happy to help connect people and try to coordinate those activities.

OCR for page 37
PART THREE: COMPLELLING BENEFITS 69 18. DISCUSSION BY THE WORKSHOP PARTICIPANTS PARTICIPANT: This question is to Dan Cheney. Have you and the Federal Aviation Administration taken the next step? Now that you have all this wonderful scientific and technological experience data, have you started thinking about how you can exploit those data to make the entire air system both safer and more efficient? MR. CHENEY: That really is the ultimate plan. The first step was to accumulate the knowledge. We are at the front end of taking the final step, and that is integrating the knowledge into our processes for maintaining and improving aviation safety. This initiative is only two and a half years old, so we are still in the middle stages of catching up on the history of aviation. We are laying the groundwork now for what to do with it. We know we have some gaps in knowledge in various aspects of aviation. So the question is, how do we use this to fill those gaps? We are not there yet. There is more work to do. PARTICIPANT: I am thinking particularly about the accidents that did not happen. There is a huge amount of learning there that goes on inside the airline industry all the time, when these things are adequately reported. Sometimes they are not. There are malpractices with loading, with crew behavior in the cockpit, and so on. There are seriously overweight takeoffs and landings. We, as the general public, do not hear about those at all. There is an enormous amount of learning embedded in that. MR. CHENEY: The aviation industry has benefited from a level of safety margins that exist in every aspect of safety. We have achieved internationally an unthinkable safety record when you look back at where we were 40 years ago and 30 years ago, and even 20 years ago. But it is because of the safety margins that are there. When we do have the overweight takeoff, when we do have the pilot that forgets to deploy a system, or a fire occurs, there are margins upon margins that result in that airplane not having an accident. What we believe is the value in understanding the causes of yesterday’s accidents is to recognize that when you do load an airplane over gross or you do have a tired crew that forgets things, you effectively take away one of the five levels of margin or one of the three levels of margin for that flight. That in itself may not be catastrophic, but now you have put that plane in a level of vulnerability in which it would not otherwise have been in. The accident is nearly always because you have eaten up four or five levels, and you have nothing left. The close calls, the accidents that did not happen—and they are certainly going on every day—are mirror images of what has happened in the last 50 years since the jet era. Whether it is an A380 or a 787 of tomorrow, those margins are the margins. I think our big return will be to have tomorrow’s decision makers and operators understand the importance of checklists, the importance of getting it right from a maintenance perspective, because getting it wrong takes the margin away on one level. Now we have a two-legged stool. One more mistake, and eventually we have an accident. The reason we are enjoying the safety we have is that it is a robust, margin-full industry. We are going to fight any threats of giving up any of the margins. PARTICIPANT: One of the themes I got from several presentations is that a traditional problem for these kinds of data efforts is that data was locked up in disciplines, in silos. A lot of the presentations talked about how, especially for applied problems, you were able to overcome those traditional silos and put data together in new ways. The thing I am wondering about is, now that we have new problem-oriented silos, it seems there is always a danger of repeating history and having these new problem areas become silos on their own. I am just curious if you have thoughts on how to avoid that. To what extent in each of your areas are you really paying attention to other people’s standards and looking at interoperability between biodiversity and disasters, which may not have an obvious connection now, but in fact, because of deforestation and forest cover loss, may have a connection to land use and disaster loss? To what extent

OCR for page 37
70 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFC DATA do you think people are thinking, especially in developing countries, about not creating new barriers after having overcome the old barriers? DR. CANHOS: I think that you raise a very critical point about standards and protocols for interoperability of databanks and data systems. When you look back at biological data, there were no standards and protocols for biodiversity data, although several organizations have worked in this area in recent years, such as the Global Biodiversity Information Facility (GBIF) in close collaboration with a Committee on Data for Science and Technology (CODATA) committee for developments and improvements in this area. We do have big issues with silos. You talk here about different people working in different data areas such as biological data, species-oriented biological data and integration at a molecular level, genome data, and then integration with the information derived from satellite data, like land coverage. I think for local development, it is extremely important from the beginning to look at how the internationally agreed standards and protocols are developed. For the speciesLink network, when we started 10 years ago, our first approach was to see what was happening in the taxonomic databases working group. Also, after looking at what they were doing regarding biodiversity information, we decided to work in close collaboration with the Taxonomic Data Working Group (TDWG) and GBIF. In conclusion, I think here we need the international organizations. We need CODATA and the International Council for Science (ICSU). All those efforts, like TDWG, are voluntary efforts. It is extremely difficult to develop standards and protocols on a voluntary basis. I think the major development was the support of GBIF to TDWG. This happened with $1.5 million that came from the Gordon and Betty Moore Foundation. Again, we need more involvement of international organizations. I think ICSU has an important role to play in the definition and further development of the data-related standards and protocols. PARTICIPANT: Dr. Canhos, are you concerned about the sustainability of your biodiversity database? DR. CANHOS: I am very concerned about the sustainability of those efforts. For me, it is difficult to get funds from the Brazilian government to maintain the speciesLink network. As to the content, from the beginning of the development of the network, we told the collections, “You can come in or leave the network anytime you want.” It is just like pressing a button to take all the data from your database from the network, but that did not happen. Today we have more than 200 collections. The cost is well distributed and includes the cost of development of the software and the tools. This was a well-funded project for 4 years. Now the cost is to upgrade all those tools and also to develop more tools. The more data use we get, the more new requests for new applications we receive. I think that is a challenge, but now we are in an age of integrating distributed intelligence and distributed data. That lowers the cost. That is public infrastructure. When you think about the industrialized countries, we are talking about huge legacy collections. Think about the Smithsonian Institution. They have more than 130 million biodiversity objects in the Smithsonian collections. The cost to digitize all this legacy information is very high. Finally, I think we need support from international organizations. We need ICSU and CODATA to continue advocating the importance of not only sharing the data but also finding means to gather this data and to treat the data so that it can be easily accessible by the whole community all over the world. DR. BAKHTINA: Regarding involvement of international organizations, I want to add that the World Bank promotes democratization of development via open data access. Last week, a new initiative, Mapping for Results, was launched. This initiative allows civil societies, various organizations, and citizens to access large World Bank databases, to contribute to the database and to use the data. It also opens a direct dialogue with governments and citizens, in terms of transparency of government policy and delivery of public services.

OCR for page 37
PART THREE: COMPELLING BENEFITS 71 Furthermore, it will promote transparency related to the development, and hopefully, create incentives for various types of organizations to improve data quality and increase data sharing. PARTICIPANT: I am with the International Environmental Data Rescue Organization. We are a nonprofit organization whose mandate is to locate, rescue, and digitize every piece of historic environmental data we can find throughout the world. We have projects in 15 developing countries, and we have rescued probably 2 million to 30 million historic weather observations. One of the problems we have is that, for example, we have located about 30 million weather observations on microfiche. They were taken in about 1,000 observation sites throughout Africa. We are negotiating with the African Center for Meteorological Applications for Development in Niger to at least get the microfiche before they deteriorate to nothing. I am wondering, do any of your organizations have data in a format where you cannot share them? Right now 95 percent of our data are either on microfiche or on paper, and that is a very huge problem for us. I am wondering if anybody knows of any other organizations, other than our own, that actually seek out data on perishable media to rescue them before they are gone forever. DR. WOODCOCK: The U.S. Geological Survey is doing that for Landsat data. They do get data on all kinds of customized software and media, where they have to go back and reconstruct and reconcile them. It is hard. I do not do that myself, but I have been convinced that it is a pretty big obstacle in many countries. PARTICIPANT: I am not sure if I am the only librarian in the audience, but libraries are definitely aware of this. They have been looking at this for straight text documents, and now they are starting to look at it for both born-digital, which is also being lost at an alarming rate, and print objects. So there are institutions working on these problems, and the National Science Foundation is supporting some of that activity. PARTICIPANT: There is also a new CODATA task group called Data at Risk, which is working to at least identify a broader range of scientific data, not just environmental. Another group is the Minnesota Population Center. There is a lot of recovery of old census data, going back and migrating data on old media and similar activities. There are different groups in different disciplines that are doing this. PARTICIPANT: The National Oceanic and Atmospheric Administration also has a data-rescue activity. I imagine there are quite a few others. And the United Nations Educational, Scientific and Cultural Organization, I believe, has a digital heritage program. PARTICIPANT: I have a question for Dan Cheney about the difficulty in getting access to the data that you have been talking about because of the sensitive nature of them. For the FAA data, it could be potentially sensitive for the companies that either built the planes or operate them, in legal exposure. I was wondering if that is mitigated by some kind of legislation that caps the exposure to lawsuits or if there is some kind of waiver of liability associated with the disclosure of the data. MR. CHENEY: I do not think I mentioned this during the talk. There are four criteria that have to be met before we even begin to look at an accident: the official accident report is issued; the corrective action and accident results are finished; there has not been another accident or incident that would call into question the official accident findings; and litigation is finished. Only then, when all four of those are met, do we begin to work, and we only use publicly available information. Normally it is the information that was gathered during the course of the investigative process, and in the United States it is part of the public domain. We work with other countries’ accident-investigating bodies to secure access to their information that was gathered in the course of the investigation. We do not do any additional investigation. It is only what is already lying around in archives in pieces that is being lost. The task is to put it together in one cohesive place, look at it, and put it in a structure that makes sense. As far as litigation, the opportunity to

OCR for page 37
72 THE CASE FOR INTERNATIONAL SHARING OF SCIENTIFC DATA litigate has already come and gone. Could someone come back and say, “Well, we did not think about that. Let us review it again.” They could. Our greater-good concern, however, is that the information is so important that it cannot be lost, and we have to take this step and make sure that erosion is stopped. PARTICIPANT: Do you also get data from other countries? MR. CHENEY: We do. Not all are equally sharing, but there is outreach. As we have done networking and have improved our networking process, the efficiency is increasing, with recognition that this is a product that will benefit world aviation, not just U.S. aviation. It is getting much better. PARTICIPANT: Victoria Bakhtina, how difficult is it to get access to the data that support the statistics for the different countries? Are the data collected by the countries themselves and made available to the World Bank, or does the World Bank pay people to collect the data and work with the different ministries? It is not clear how the data are compiled, and whether it is comparable kind of coverage. DR. BAKHTINA: In terms of access, with the launch of open data policy, it becomes very easy. Now the set of all development reports that include underlying data will be available. Just visit the World Bank website, and you can easily download and research the statistics from different countries. Moreover, you can visualize the data and map the country statistics to the World Bank projects information. I am also using the UN data which are also publicly available. There are various methods and strategies of collecting the data and multiple approaches can be applied depending on what is measured. The World Bank Data Group partners with other organizations on data collection and statistical capacity-building. I would recommend that you consult the World Bank web site for details. When using any publicly available data, it is important to understand what is behind the numbers, and depending on the end goal of your research, determine what coverage would be acceptable, and most importantly, always conduct the analysis within a specific context.