Roberta Balstad Columbia University, United States
We have heard a lot about the practical economic and applied implications of having open access to data, but we should not lose sight of the benefits to science. One of the reasons that access to data are becoming so important is, of course, that the technology has changed, and that we can deal with massive databases in a way that we could not have dealt with them 20 or 30 years ago. Another reason is in the very nature of the scientific process itself. What is science? For many people, it is simply experimentation and testing. That narrow definition has been modified in recent years to include experimentation, observation, and testing. For other people, science is really a matter of modeling and projections. If you cannot project something accurately, many believe, it is not science. So you need data for projections, too.
Equally important, scientific research is increasingly evolving into “data-intensive science.” You read about it in the field of health care, for example, where scientists combine data from 20, 30, or 100 different studies to get a larger base in order to analyze and investigate topics that are impossible to pursue in a small, intensive study of perhaps 20 individuals. This is also true in a number of other fields. Data-intensive science relies on open access to data from all sectors, because only then are scientists able to combine datasets to ask new types of questions.
Scientists are able to address much broader questions in data-intensive science than they could if they were responsible for collecting their own data for every study that they conduct. Increasingly, for example, we find that governments collect much of the scientific data that we use. These databases in many countries are open. We would like to see them become more open in even more countries so that scientists can use them.
Open access to data advances science. It improves descriptive, comparative, and observational science; it enriches modeling and prediction; and it makes it easier to test and retest propositions using the same databases. That, of course, goes back to the philosopher of science Karl Popper, who said that true science is science that can be tested, that is falsifiable, and that you can prove wrong. To do that, you have to have access to data.
A second reason for providing better access to scientific data, in addition to advancing science, is that it levels the playing field for scientists from smaller or less-developed countries so that they are able to conduct data-intensive science using publicly available data. In short, data access makes a principal resource of scientific research available to all.
Traditionally, data access policies were quite restrictive in terms of both policies and practices. Data were held to be the private property of a scientist. At the end of doing a dissertation, we had a body of data that we could mine for a long time. That was considered to be the property of the scientist and that was what made his or her work significant. In other cases, the kinds of data that Professor Farouk El-Baz was talking about (e.g., remote-sensing data) were often seen as a national asset that had to be protected.
Data were also seen as a commodity that had economic value for the scientist or, more often, for the government that sponsored the data collection. When science becomes a commodity, obviously, those who collect data begin to think about marketing the data, and then they easily slide into charging for data in a for-profit or even not-for-profit setting.
To summarize, the benefits of changing from restricted to open data access policies are as follows: