The Role, Value, and Limits of S&T Data and Information in the Public Domain for Research: Earth and Environmental Sciences
This presentation is based in part on a National Research Council report called Resolving Conflicts Arising from the Privatization of Environmental Data, which is available on the National Academies Web site.1 First, I want to emphasize that I am a scientist, an environmental scientist insofar as that ever exists. I am a meteorologist with some experience in oceanography, but I have made it my business over the past 20 years to learn about all my other colleagues, what they do as geologists, chemists, ecosystem people, people interested in the cryosphere, and so on. They are all environmental scientists, whether they recognize it or not. The environment sciences are not a homogeneous domain. There are many different sorts of environmental scientists. Twenty years ago, they never used to talk to each other at all. One of the great changes that has happened in the past 15 to 20 years is that there is a group of people trying to look at how the system functions as a whole and how the various pieces are interrelated.
My priority in this presentation is to explain to nonscientists the special data needs of environmental science. There are some differences from the bioinformatics area. In particular, there are few startups in the environmental sciences. There are some, but not very many. The other important difference is that our topic is fundamentally international. Many environmentalists’ views are global and other governments and countries are partners in that enterprise. To come to sensible public policies about the environment, we have to work collaboratively with those other nations. It simply is not feasible to devise a strategy for the United States and expect the rest of the world to follow that strategy.
There are a number of issues surrounding my presentation, which I am not going to deal with directly. In particular, we have heard already about some sui generis intellectual property rights in databases, which have been introduced in the European Union. That is of great concern to environmental scientists in the United States because the Europeans are our collaborators, and the Database Directive has led to restrictions on the availability of environmental data of various sorts. We are very concerned about what would happen if the United States also went the same way. If it did, I have no doubt that the world would follow.
Another issue here is that some foreign governments are trying to sell their data, typically in Europe, but not universally. Those are government agencies acting as quasi-commercial enterprises. That, likewise, gives us great
National Research Council. 2001. Resolving Conflicts Arising from the Privatization of Environmental Data, National Academy Press, Washington, D.C. Available on the National Academies Press Web site at http://www.nap.edu/catalog/10237.html?se_side.
concern. I do not believe they are actually going to succeed. It turns out that the market is very thin and there are no indications that any of them are even beginning to cover their costs for reasons that I will come to later.
Additional issues are that the United States has a policy of encouraging public–private partnerships in the information provision area. These partnerships have to be thought through very carefully as to what the respective roles are of the partners and, in particular, what are the relative data rights. That has to be done on a case-by-case basis; it is not something that can be written into legislation.
Finally, the U.S. policy on commercialization of space is also introducing tensions into this area. Many applications, such as satellite observations, provide useful data for environmental science and potentially have commercial applications. That interface is, in fact, troublesome.
Let me now focus on the imperatives for environmental research and education, which is the primary purpose of this talk. As I have already indicated, there has been a movement over the past 20 years among the scientific community to face up to the fundamental problem, which is understanding human interactions with the natural environment. That is a huge canvas and I am certainly not going to touch on more than small pieces of it today. What I will assert, however, is that long-term global data, by that I mean many decades, are essential to document what is going on and to unravel a lot of the interconnections that exist between, for example, the ecosystems and climate. These data are also important in the distillation of interconnections to enhance the understanding of what is occurring, which can be conveyed not only among the specialists, but also to our children and grandchildren. If we do not understand what we are doing to their futures, they are not going to be in a position to do very much about it. A central requirement within all of this is a dependable, coherent observing and information system through which researchers can synthesize core information products. I am going to come back to this again and again, but let me just introduce an analogy.
The key analogy here is to a tree, as you can see in Figure 8-1.2 The roots are where the data are actually collected in many different countries with many different types of instruments. As you move into the trunk and go up the trunk, data are being collated, cross-checked, and put into higher-order information products. That conversion of data to information is really seamless. There is a key point in this process, which I have labeled “core products,” that get distributed by a whole variety of mechanisms to the end uses and that are represented by the
leaves in the tree. Core products, for example, can include calibrated and verified data derived from a single rain gauge.
When considering such an enterprise, one has to take the systems point of view and start with what are the end uses, what are we trying to cater to, what are the priorities for that, and then come back to what are the core products that might be produced and what are the implications.
However, it is also clear that you do not pay for a system like that, which is expensive, simply on the basis of research. There is a lot of research money going into this, but it is not nearly enough to pay for the complete systems that we have and need. Indeed, you have to serve multiple users and applications, which will generate a broader social return for the taxpayers as a whole to justify the large costs. We also need to foster consensus on scientific understanding and policy action. This implies that other countries have to be involved in what we are talking about. They have to participate actively in the system. That includes building research capacity, particularly in developing nations that may not yet have it. That is the benefit that they get out of a system of this sort in exchange for participating in the data collection.
The public requires reliable information that is properly interpreted. If the public does not believe what is coming out of a system of this sort, it is a waste of money. I already have mentioned that many environmental issues are international and global in scope. I would like to emphasize that the contributions of foreign governments come in kind, rather than through direct payments. They are based on what those governments do within their own borders or with their own systems because money is not easily transferred internationally, as I think we all understand.
Finally, the natural environment is very complex and uncontrollable, and describing its behavior requires many observations from different places. No single scientist or group conceivably can accomplish this alone. These are the absolute imperatives for pooling resources and for sharing the data effectively.
I am now going to provide some brief examples of information systems, starting with a most familiar one of weather and climate. Think of a data buoy out in the tropical Pacific. It is measuring the winds and the atmospheric temperature, and there is a 300-meter cable below that is measuring the temperatures in the ocean. All of these data are being telemetered back through a satellite and are available on the Web. Researchers, students, and people in many other sectors who have a need for such information can look up these data on the Web site and get the complete picture of the ocean and atmospheric temperatures for the past five days.
Another example is a processed satellite image to give the type of vegetation that is present. This provides information about land use, which is a fundamental part of the environment. It is frequently socially determined.
Another type of information system includes one used to predict and assess fish stocks in fisheries around the world. This is a major concern because, of course, a lot of the world’s people depend on fish for their protein. The take is increasing, but the stocks are rapidly decreasing and more species are being fished out. As is the case with data about other natural resources, the same information can be used to both deplete and protect them.
Earthquake hazards provide yet another example. There is a worldwide seismological network measuring earthquakes. Of particular interest to this group of researchers is that proprietary data from the big oil and gas exploration companies are now being donated into the public domain. These are data that had significant commercial value when they were collected, but are now outdated and are being donated in the public domain. There are costs of assimilating and storing those data, but they can be very valuable for research purposes.3
Let us return to the tree analogy and start to fill out some of the details (see Figure 8-2). In the roots, there is a mix of systems. One is the international networks, the contributions that are being made by different countries and telemetered around the world as needed. There are also national networks doing the same things and there are also different types of measurements being made, some of which are satellite and others in situ measurements. To get a successful system, we need all of them, and they have to work together seamlessly. That is a major enterprise.
The main point is that the total cost is mostly in the roots. Collecting the data and pulling them together is where the cost lies. As you move up the roots to the trunk, it is the preparation of core data products, which is the primary function, and they have to be made available in the public domain at marginal cost. Otherwise we are
See Chapter 27 of these Proceedings, “Corporate Donations of Geophysical Data,” by Shirley Dutton.
cheating ourselves by investing all that government money in the roots and not taking full advantage of it. The crucial thing about the trunk is that both the input data and the algorithms must be open to scientific scrutiny, because without that, the outputs are not credible. If they are not credible, then we are wasting our money.
Finally, moving up to the branches and leaves, we have the distribution and use of the data and information. It is as complex as the roots. Each U.S. federal science and technology agency sponsors a system like this, such as the National Oceanic and Atmospheric Administration for weather, but it is different for different agencies. They will have their own internal agency requirements and also have their own distribution system.
The leaves represent the end uses. For example the energy, forestry, and insurance industries are all big users of this information, as is education through the integration of data in textbooks and things like that. The general public is very interested in a lot of these data for recreational purposes. There also is a whole set of issues about setting environmental policy on regulations.
Finally, the branch represents a distribution system tailored to identifiable user groups by reformatting core products, adding additional information, or otherwise increasing value to that group. These branches are always developing and changing. Diversity of the branches is another major feature, which is not always fully appreciated.
The fundamental premise is that those products at the top of the trunk have to be in the public domain at marginal cost. Having said that, the branches do not have to be public domain. In fact, many of them currently are not. They are in the form, for example, of value-added weather data. That is perfectly in order, provided that all those products are starting from the same base of public-domain information coming from the core.
There are also opportunities for the private sector down in the roots. There are now commercial satellites that provide 1-meter resolution of what is going on in your own backyard. The satellite companies are selling these data. There are various legitimate purposes for these data, some of which are needed for environmental studies. The point is, if the purchase of such data from the private sector is the cheapest way to get what the government needs, it is entirely fair that they should buy it from the commercial concerns. However, I want to emphasize that what the government buys has to include the rights to the data they actually purchase. If they cannot afford to do that, then they have to reduce the amount of data they purchase. You cannot mix restricted data with public-domain data in the trunk because it ruins the transparency and essentially compromises the whole research enterprise.
To conclude, publicly funded, shared-use, long-term observational information is essential for sound public policy concerning human interactions with the natural environment. Core products of the trunk of such systems must be in the public domain and available at marginal cost of reproduction. Value-added, private-sector distribu-
tion systems may enhance the enterprise if they show benefits. They do not have to be, however, in the private sector. For example, scientists have their own climate and weather data distribution system, which is paid for out of research funds and justified on that basis. Finally, purchase from private vendors of all rights to a limited amount of data may under certain circumstances be cost-effective.
There is one concluding issue that I would like to note. I have presented a model of various environmental systems. It turns out that not a single one includes a recognizable mechanism by which the different stakeholders— the government agencies, the policymakers, the scientists, and the private-sector participants—can actually get together and work out mutually satisfactory win-win situations of some of the conflicts that are arising. A key conflict that tends to arise is just what is the right definition of the core products at the top of the trunk. That is the place where conflicts most come into focus, but the point is we have no forum for working those out. That needs to be established.