Data from Publicly Funded Research—The Economic Perspective

The most striking theme throughout this report is how progress in information technologies has changed the way science is accomplished. It has enabled the generation, processing, storage, and distribution of quantities of data undreamed of even a decade ago.

Sensing systems (e.g., Earth observation satellites, the Hubble Space Telescope, ground-based radars) and other forms of automated data generation (e.g., genome studies) produce enormous amounts of useful data, enabling scientists to study natural phenomena at a much greater level of detail and granularity than was hitherto possible. Science and scientists have been the main drivers of this highly sophisticated and often very expensive technology, using it to push forward the frontiers of knowledge in their respective disciplines. The continuous increases in the processing power available to analyze the data are as crucial to this evolution as the improvements in data generation capabilities. Surprisingly, these increases have not come principally from more powerful supercomputers, but rather from cheaper, more powerful workstations and PCs available throughout the scientific community. The development of inexpensive mass storage media has ensured that the preservation of these vast quantities of data, both processed and unprocessed, is both possible and affordable. Finally, the most profound change in technology has been the worldwide growth of the Internet, with its potential to make data from anywhere in the world available anywhere else in the world, instantaneously, and, increasingly, in large quantities.

These four factors, taken together, have revolutionized the way science is conducted, making it truly global. Perhaps most interesting is that this progress has changed the way scientists communicate with each other. Physical scientists



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 110
--> Data from Publicly Funded Research—The Economic Perspective The most striking theme throughout this report is how progress in information technologies has changed the way science is accomplished. It has enabled the generation, processing, storage, and distribution of quantities of data undreamed of even a decade ago. Sensing systems (e.g., Earth observation satellites, the Hubble Space Telescope, ground-based radars) and other forms of automated data generation (e.g., genome studies) produce enormous amounts of useful data, enabling scientists to study natural phenomena at a much greater level of detail and granularity than was hitherto possible. Science and scientists have been the main drivers of this highly sophisticated and often very expensive technology, using it to push forward the frontiers of knowledge in their respective disciplines. The continuous increases in the processing power available to analyze the data are as crucial to this evolution as the improvements in data generation capabilities. Surprisingly, these increases have not come principally from more powerful supercomputers, but rather from cheaper, more powerful workstations and PCs available throughout the scientific community. The development of inexpensive mass storage media has ensured that the preservation of these vast quantities of data, both processed and unprocessed, is both possible and affordable. Finally, the most profound change in technology has been the worldwide growth of the Internet, with its potential to make data from anywhere in the world available anywhere else in the world, instantaneously, and, increasingly, in large quantities. These four factors, taken together, have revolutionized the way science is conducted, making it truly global. Perhaps most interesting is that this progress has changed the way scientists communicate with each other. Physical scientists

OCR for page 110
--> are leading the way among scholars in the publication of electronic journals, compressing the time between discovery and communication of the results. This phenomenon is accelerating the already rapid pace of discovery and innovation, as the cycle time of discovery, communication, and next discovery is reduced. The committee uses the term "digitization of science" as a shorthand for this phenomenon. As a consequence, the flow of scientific data and information has been improved, as the cost of publication and of access to information has been drastically reduced. While not all scientists in every country have full access to modern PCs and fast Internet connections, these technologies are becoming widespread and are likely to be ubiquitous in the near future. This digitization of science has occurred contemporaneously (and coincidentally) with the demise of the great power rivalry of the Cold War. Russian and U.S. scientific relations have become less heavily dominated by security considerations, and this factor also has led to an increase in the availability and transfer of scientific data, as noted in Chapter 3. At the same time, there have been fundamental changes in how governments in many countries see their role relative to markets. Budget pressures, plus the evident success of market economies, have led many governments to privatize activities previously delivered via the public sector, in hopes of relieving the burden on taxpayers while improving the allocation of economic resources. These pressures have begun to be felt in the area of scientific data; for example, in the United States, Landsat remote sensing was privatized in the mid- 1980s, and some European countries have strongly urged limits on the sharing of meteorological and other data in order to protect the data markets for their government monopolies. THE TREND TOWARD MARKETS—GOOD OR BAD FOR SCIENCE? To researchers and educators in the natural sciences, this pressure toward privatization and commercialization of scientific data is of great concern. Many fear that scientific data, the lifeblood of science, will be priced beyond their means, especially in less developed countries. It is argued, correctly, that the conduct of scientific research, including the maintenance and distribution of scientific data, is a public good, provided for by government funding (see Box 4.1). This traditional model1 has worked well in the past, and many scientists2 are of the view that privatizing the distribution of scientific data will impede scientific research. To the economist, this concern at first seems misplaced. While the conduct of scientific research certainly is a public good, one might consider the maintenance and distribution of scientific data as the provision of one of the commodities used by scientists. This view makes scientific data analogous to the chemicals, computers, and travel that each scientist is free to buy or not, as they best

OCR for page 110
--> BOX 4.1 What is a Public Good? Economic analysis recognizes that not all goods can be easily transacted through markets. There is an important class of economic goods, called public goods, in which markets may not work well. The term “public good” does not refer to something that is “good for the public.” It refers to a product or service possessed of certain properties that lead to collective consumption or production, rather than private consumption or production. A public good is characterized by two attributes, nondepletability and nonexcludability (nonappropriability). Nondepletability means that the product in question cannot be used up and is thus available to additional persons. If I eat a slice of cheese, less remains available for others to consume. However, if I use the latest statistics on the number of employees in the steel industry or the number of persons infected by some disease, those statistics remain just as available as before for use by others—their supply is not depleted by additional use or additional users. Nondepletability is the main reason economists conclude that free use of public goods can be justified—them is no social cost when another person uses them, and there is no justification for the disincentive to their use that is constituted by a substantial price for that use. This, then, is the user side of the dilemma of provision of public goods—economic efficiency calls for a zero or very low price for their use, but private enterprise cannot be expected to provide a costly and valuable service without charge. The second attribute, nonexcludability, is the supplier side of the problem. Nonexcludability means that the good in question produces benefits from which others cannot be excluded and which cannot easily be constrained only to those who pay. A classic case is that of national defense; defending one American involves defending all Americans. One cannot supply the service to some people but exclude others. Information also possesses this attribute, although somewhat less fully than national defense. Once information is provided to some, It is likely, but not certain, to leak out to others. Absence of excludability makes it very difficult for the provider of such a service to collect reimbursement for the cost of supplying the service. That is why, economists conclude, it is often difficult for any supplier other than government to provide certain types of public goods. It is generally accepted that scientific research itself has strong public-good attributes, in that the knowledge produced by such research is freely available to all (i.e., nonexcludable) and provides social and economic benefits to members of society far beyond those who produce It and those who pay for It. Such goods are usually provided by governments-—or they are not provided at all. The public-good nature of science is not limited to any particular nation. The scientific endeavor has traditionally been and will continue to be a global enterprise; the success of this endeavor depends critically on the global community of scientists, and their ability to work with innovators, implementers, and users. To the extent that this global interchange is restricted or inhibited, the long-run contribution of science to the U.S. economy will decrease. Thus, the United States has an interest not only in a healthy domestic scientific community, but also in a robust global scientific community. However, the issue in this report is not scientific research, but rather the data that science generates, either as input to scientific research (e.g., data from meteorological satellites) or as output from scientific research (e.g., description of a

OCR for page 110
--> gene sequence). Clearly, scientific data have some aspects of a public good. On the other hand, scholarly journals have been copyrighting scientific articles for years (thereby privatizing them) without impeding the flow of science. Certainly, the provision of scientific data has important spillovers: future researchers within the field benefit from these data, researchers in other fields also may benefit (.g,, medical researchers benefit from the provision of biological research data), and commercial firms (e.g., pharmaceutical firms)' may benefit, as well. Unfortunately, uncertainty about who the ultimate beneficiaries are, which appears to be fundamental to science, precludes asking those beneficiaries of spillovers to pay. serve the scientist's needs. In this traditional market model, both consumers and researchers would be better served by suppliers anxious to survive by supplying the most desired data set traits: reliability, accuracy, timeliness, and so forth. Instead of the government providing financial support to data centers, why not give the funds to individual researchers, who can then make choices among data suppliers who best serve their needs? To many economists, therefore, privatization appears at first blush to be a positive development for science. A more careful economic analysis of the maintenance and distribution of scientific data, however, suggests that a somewhat different market model is more appropriate here, for a number of reasons. In some cases, the (public good) scientific research is tightly tied to the collection, maintenance, and distribution of the data generated from the research. For example, the Hubble Space Telescope (HST) project (and other space science observatories), clearly a public good, collects, maintains, and distributes the data from the HST as part and parcel of the project itself. In this case, the basic research is vertically integrated with data distribution, and separating the two functions would create more difficulties than keeping them integrated. In other cases, frequently overlapping with the conditions described above for the HST project, the contributors of scientific data are the same as the consumers of the data, all of whom are members of the same relatively small research community. This is especially so in relatively esoteric areas of fundamental research, such as high-energy physics or paleontology. This model is closer to that of a family or clan, in which exchange is not monetized but depends on social norms specifying expected and well-understood levels of contribution. Imposing a market price system in such a situation could not only be countercultural, but also counterproductive. For example, administrative expenses of instituting such a system might well be higher than the revenues realized. In yet other cases, the market for scientific data is not large enough to support more than a single supplier, which would mean that the data either would not be provided by the market or would be provided under monopoly conditions.

OCR for page 110
--> An additional and more subtle difficulty arises from the nature of the funding of scientific research. There is no question that the public-good nature of fundamental scientific research requires public funding,3 and this includes ensuring that researchers have the necessary inputs such as scientific data.4 Privatizing data distribution would not change the requirement that data acquisition be publicly funded; it would simply change the locus of funding from the supplier/ distributor of data (i.e., publicly funded and operated data centers providing data either free or at low cost to researchers) to the consumer of data (i.e., the researchers would have a ''data budget" as part of their grant, which they would use to shop among suppliers for their data needs). But are these two modes of funding equivalent? The second mode might be thought by economists to be superior, in that it puts the financial power in the hands of the consumer. However, the fact that both involve public funding, subject to year-to-year political vagaries, suggests a different view. In the first mode, funding is directed to government agencies or large research institutions, which thereby have an interest in continued funding and can make the case to their legislators for such funding. In the second mode, funding is directed to individual researchers, who, while they have an interest in ensuring continued funding for purchasing data, clearly do not have the political ability to protect this funding from political pressures to reduce or eliminate it. This inability is a major factor underlying scientists' concerns that if the distribution of scientific data were privatized, the increases in research grants to enable their data purchases would soon disappear, leaving researchers and perhaps their universities to pick up the bill. The economic problem here is that government cannot commit to future funding of researchers for buying data. Neither Congress nor the Administration can make credible commitments for future funding. Ensuring that large institutions have the means and the access to argue the political case for scientific data will increase the likelihood that future funding of such data will be made available. 5 The committee believes that direct appropriation or block grant support to institutions with broad responsibilities for data management, retention, and distribution, while not assured, is typically more stable and secure, and fortified by an institutional memory to recognize and support the continued utility of archived data. Thus, a strong case can be made for funding (subsidizing) institutions to be data distributors as part of the infrastructure for government research in cases where both (1) the long-term availability of the data is essential for the conduct of research, and (2) there is no guarantee of continued financial support of the user community for acquiring data. DETERMINANTS OF THE STRUCTURE OF SCIENTIFIC DATA DISTRIBUTION How best to distribute scientific data depends on several economic properties of the underlying science and scientific community that both generates and

OCR for page 110
--> uses the data, as well as other uses for the data. To determine these properties, the key questions are as follows: Does the scientific research depend on a substantial public investment in one or more facilities that generate the data of interest? The Hubble Space Telescope is a clear example of a single costly facility, the sole purpose of which is to generate basic scientific data for its useful life. Another example is weather or other earth observing satellites, which constitute major facilities in their own right, but that also contribute to a common broader research data set. In these cases the data have significant nonresearch applications as well, which are a mix of public good and commercial uses. The collection, processing, and distribution of data typically are most efficiently vertically integrated in the same program. The costly observational facility is usually provided by government (e.g., NOAA, NASA, ESA), and so the distribution of the resulting data (at least the minimally processed data) is best handled by the same agency. Is the (non-facilities-based) scientific research coordinated across researchers, possibly in different countries? An example of this situation is the Human Genome Project. Individual researchers throughout the world contribute to this effort; the results of each individual research project are made available to all researchers. In fact, the maintenance of common data sets available to all is what defines this as a project. In such cases, there typically is a lead government agency with responsibility for the entire project that also takes responsibility, either directly or indirectly, for providing for data collection and distribution. Though not facilities-based, a common repository of information and scientific data is essential to the conduct and usefulness of the entire project. An important element of such a repository is that it is the mechanism by which researchers communicate with each other. In some cases, such as paleoclimatological research, contributors and users are largely the same, and the repository acts solely as the means of professional exchange. In this situation, ensuring that the data are freely available is part of the project itself. The responsible agency may provide the distribution facility, or it may fund a university, consortium of universities, or other nongovernmental organizations (NGOs) to operate the facility. Is the community of users roughly the same as the community of contributors? As with the paleoclimatology or astronomy examples above, the distribution of scientific data is best viewed as a sharing of results within a community, rather than as a market opportunity. If the community of users is much broader than those making contributions, then distributing scientific data is also a publication function, possibly to private concerns. A good example of this situation is the Human Genome Project, for which the repository of research results is of interest to others beyond the immediate research community, such as pharmaceutical companies. In cases where the user community is much larger than the contributor community, governments or NGOs may still wish to make the information available

OCR for page 110
--> to all at no cost. This was the case for the World Meteorological Organization's World Weather Watch, discussed in the previous chapter. However, for budget reasons nonscientific users may be required to pay for the data. Under this arrangement, some form of price discrimination or product differentiation may be required, in which scientific researchers can acquire the data free or at very low cost, while nonresearch users are charged for the data.6 This can be done by the responsible agency itself. However, the agency first must determine whether the transaction costs of establishing and maintaining discriminatory prices exceed any extra income so derived, particularly for data from narrowly focused basic research projects. Obviously, an agency should implement a price discrimination scheme only if the efficiency gains (not just the revenue gains) outweigh the transaction and administrative costs of doing so. Price discrimination, in practice, will be worth the effort only with a sufficiently large commercial user base. Is the user community large enough to support more than one data distributor? In many cases, a particular scientific data set is likely to be of interest to only a few scientists and practitioners, and a private7 market may support only one distributor, due to scale economies.8 In such cases, privatizing data distribution will result in a private monopoly with no incentive to support the public interest, replacing a public monopoly that does have such a commitment. Of course, government monopolies that do not sustain activities providing public goods are no better than private monopolies. As a quick reference, Table 4.1 lists the relevant properties mentioned above for each of a number of diverse scientific data sets. PRIVATIZATION: WHEN DOES IT MAKE SENSE? Given the unique nature of scientific data, it might appear that government (or NGO subcontractor) distribution of such data is always the correct choice. However, there may be opportunities for private firms to reformat, enhance, and market these publicly available data in new, added-value forms. Private firms may have capabilities not available to government agencies or NGOs that would add value for various end users. For example, NOAA distributes weather data to all users on equal terms, including to commercial firms, some of which package this information and provide forecasting services to the public via mass media. While NOAA clearly has a huge advantage in technology and meteorological science, the commercial firms have an equally clear advantage in packaging the information for maximum public impact.9 Thus, any private supplier who requests access to scientific data should be given it and should be permitted to go into competition with other suppliers and with the government itself. The marketplace will then determine the best package of service, support, and reliable data for users, including scientific researchers. 10 Fortunately, there are many cases in addition to that for weather data in which private sector distribution of value-

OCR for page 110
--> TABLE 4.1 Properties Relevant to Distribution for a Number of Scientific Data Sets   Facilities-based Single/Multiple Activities No. of Users Equal to (=) or Greater Than (>) Contributors? Observational/ Experimental Raw/Processed National/Global Major Commercial Interest? Pricing Basisa Discrimination? Genbank—NIH/NLM/NCBI N M > E P (submittal/retrieval software) G Yes Internet access free; CD-ROM@MC Genome Database—Johns Hopkins University, funded by DOE, NIH, Japan S&T N M > E P G Yes Free Hubble Space Telescope—NASA/ESA (operated by AURA) Y S = (principally space researchers) O P G No; occasional use by publishers Free National Space Science Data Center—NASA Y M = O R/P G No @MC European Space Information System Y M = O P (browser/imaging software) G No Free National Center for Atmospheric Research—Not-for-profit, NSF funding Y M = O R/P N Little @MC; additional services available

OCR for page 110
-->   Facilities-based Single/Multiple Activities No. of Users Equal to (=) or Greater Than (>) Contributors? Observational/ Experimental Raw/Processed National/Global Major Commercial Interest? Pricing Basisa Discrimination? T-2 Nuclear Information Service—UCb DOE Y M = E P G No Free Australian Oceanographic Data Centre—Australian Dept. of Defense Y M >, but mostly researchers O P (free software) G Some Free as “public good,” @MC for commercial ESA—Information Retrieval Service—European Union Space Agency Y M > O and E; includes financial data P G; focused on European Union Perhaps some Appears to be > IC Earth Resources Observation Systems—USGS (DOI) Y M > O R/P G Some; used by government agencies Varies: ranges from free to > IC Scripps Institution of Oceanography—Not-for-profit Y M = O P G Little @MC

OCR for page 110
--> University of Alaska GeoData Center—University/State funds Y M >;mostly public O R N Little @MC Paleoclimatology World Data Center A—NGDC NOAA/DOC N M = O R/P G No Free for direct contributors; @ MC otherwise National Climatic Data Center—NOAA/DOC Y M > O P G No @MC; careful accounting for analysis to determine MC Incorporated Research Institutions for Seismology (IRIS)—University consortium; NSF funding Y M > O P G Yes Free Properties of Intermetallic Alloys—Purdue University CINDAS/MIAC(DOD) N M > E P G No Printed books fairly expensive; higher cost for foreign users Evaluated nuclear Data File/B (DOE) N M > E P G   Free a Pricing basis: IC = incremental cost; MC = marginal cost, or cost of reproduction b Abbreviations and acronyms are defined in Appendix A

OCR for page 110
--> BOX 4.2 FAME: A Private-Public Sector Success An example of how a market can be made from subsidized data generation is the Fatty Acid Methyl Ester, or FAME, system of identifying bacteria. The fatty acids are extracted from the bacteria to be identified, made volatile by methyl esterification, and subjected to chromatography. The resultant chromatogram yields a profile pattern that is analyzed statistically to identify the bacteria. The profiles and the statistics make up the database of interest. The original work was done, and the database was compiled, at the Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia. The CDC staff is augmenting and actively using the database in fulfilling its mission. Much of the database is included in a publication for use by bacteriologists, especially in clinical microbiology laboratories.1 A commercial company, MID,2 adapted the CDC system, developed its own proprietary database, and apparently has been successful in commercializing the system. The two databases have gone their separate ways, competing in the intellectual, but not the economic, marketplace. 1   W.S. Weyant, C.W. Moss, R.E. Weaver, D.G, HoliCs, J.G. Jordan, E.C. Cook, and M.l. Daneshvar (1996), Identification of Unusual Pathogenic Gram-negative Aerobic and Facultatively Anaerobic Bacteria, Williams and Wilkins, Baltimore, Md. 2   Located at 115 Barsdale Professional Center, Newark, DE 19711. SOURCE: Micah Krichevsky, Bionomnics International, Bethesda, Md. added data has worked to the benefit of science and the broader public. The Fatty Acid Methyl Ester (FAME) database in microbiology (Box 4.2) is a case in which a private firm offers alternatives to the data available from government. The committee lists below some necessary conditions for the complete privatization of scientific data distribution to be an appropriate option: Can the distribution of data be separated easily from their generation? For the HST, the answer is "no"; for the Human Genome Project, the answer is likely "yes." Is the scientific data set used by others beyond the research community? Again, for the HST, the answer is "no"; for the Human Genome Project, the answer is likely "yes." Is the potential market large enough to support several data distributors? If so, then the resulting private market could be competitive, and privatization could be helpful to scientists and others. If not, then privatization could lead to monopolization, which would likely be detrimental to the interests of science. Is it easy to discriminate prices or differentiate products between scientific users and other users? If this is possible, can low prices be mandated contractually for government-funded data for scientific users? If so, then it is likely that scientists will obtain the needed data on more favorable terms than their colleagues in private industry with more resources.

OCR for page 110
--> Is it costly to separate the distribution of data to scientists from their distribution to other users, such as commercial users? For small or esoteric research communities, economies of scale in data distribution may make this separation costly. If all these questions can be answered "yes," then privatizing the distribution of scientific data should be an option to be considered. Privatizing data distribution might appear to be attractive to a budget-constrained government agency. It certainly removes the cost of maintaining and distributing the data and may even bring in revenues that can help the agency fulfill other aspects of its mission. BOX 4.3 The Failed Privatization of Landsat The story of what happened to the Landsat system when it was privatized is instructive. Landsat is a series of remote sensing satellites, the first of which was launched by NASA in 1972. The Carter Administration proposed the privatization of Landsat in 1978. This led in the early 1980s to a transfer of responsibility for the system to NOAA, which attempted to build a customer base for Landsat's data products. Under both NASA and NOAA management, Landsat images were made available to all users at the marginal cost of reproduction. The privatization process was accelerated during the Reagan Administration. Congress passed the Land Remote-Sensing Commercialization Act in 1984, setting forth the general provisions for privatization of the system. In 1985, the Earth Observation Satellite (EOSAT) Co. (a joint venture of Hughes and RCA) was awarded the contract, along with an additional $250 million and a promise to pay for future satellite launches. EOSAT was thus given a de facto monopoly on all Landsat images, because there were no direct competitors. The government's policy of providing nondiscriminatory access to remote sensing data on a worldwide basis was interpreted by EOSAT to mean that as long as the company charged the same (high) price to everyone, it was following the government's nondiscriminatory access policy. The price of Landsat images increased from approximately $400 per image to $4,400 per image, a price at which EOSAT was able to attract some commercial and federal government customers, but few academic or independent researchers. In the early 1990s, the research community became anxious to use Landsat data for global change research, and NASA and NOAA complained to Congress about the high prices. In response to those and other concerns, Congress passed the Land Remote Sensing Policy Act of 1992, which returned the Landsat system to the public sector. NASA and EOSAT negotiated a price reduction to the U.S. Government and Affiliated Users to $425 per image. However, nonfederal and foreign researchers still must pay the high image prices, effectively cutting off a large segment of the research community that would benefit from the use of Landsat images. This situation is expected to be rectified with the launch of Landsat 7, the next satellite in the series, at which time NASA is supposed to make the data available again to all users at the marginal cost of reproduction.

OCR for page 110
--> BOX 4.4 The Impact of Landsat Privatization on Research Forty-five leading scientists were asked to describe the effects that their limited access to Landsat data has had on their work over the past decade.1 The list below summarizes their most important points: Projects in the initial concept often are not proposed, or are drastically scaled down, because the cost of the scenes is prohibitive. The development of state land cover maps for many different applied purposes has not been possible. Important agricultural inventories were inhibited to the point where efforts were abandoned. The high costs of the data significantly hindered studies involving multiple dates and scenes. Developing appropriate and automated methods of change detection has been especially hindered. The conduct of seasonal or phenotypically related research was not possible. Many remote sensing studies have involved analysis of a single scene, and the conclusions derived relate to that (and only that) scene. The repeatability of such conclusions is thus suspect. Would the same observations have been derived using a data set from a different time period? If a given technique does not work with a July data set, should one conclude that the technique does not work at all? Only tests with a multitude of data sets will provide these answers, but high prices limit these rather basic tests. One impact of the continuing dilemma with Landsat and privatization has been that the technology is not the state of the art. For more than a decade the United States has had no long-term commitment to land remote sensing at the national level. Landsat 5, the highest-resolution operational U.S. satellite and the only one available for civilian use, is more than 12 years old. The United States has been struggling to develop and launch Landsat 7 (with Landsat 6 having failed on launch). There is now increasing competition from other nations that have committed to long-term operational continuity for their land remote sensing systems and have stepped in to fill this void in U.S. policy. Graduate students frequently are restricted in the Landsat data available to them. Some students spend time writing mini-proposals for funds just for acquiring data and these proposals are often rejected because of unavailability of funds. The ability to conduct low-cost basic research has been hampered. Many technological innovations have come out of such research, but because of the cost of new imagery, these opportunities have been reduced or the research has been completed with older, archived data. While technology development is somewhat independent of the age of the data, applied research and environmental managers become less interested when such efforts use data long out of date. Even now, there are numerous articles in the journals using data from the 1980s. Techniques such as subpixel classification, research into periodicity patterns as they affect image quality, patterns of spectral mixing in patchy landscapes, and climate/microclimate/image calibration are all significantly under-researched. Thus, the high cost of imagery has resulted in few, if any, innovations in the applications of this technology. The poor availability of Landsat Thematic Mapper (TM) imagery is not only due to cost, but also to the practice of operating the satellite selectively for

OCR for page 110
--> certain land-surface areas. The cost of imagery reduced the user base, and EOSAT had to determine which images would be most marketable prior to acquiring them. That left many scientists with a very limited, high-cost archive of TM data. Thus, for certain areas of the globe there is extensive coverage, but for others it is very poor. In addition, the more detailed narrative of a senior soil scientist, a member of the interdisciplinary Laboratory for Applications of Remote Sensing at Purdue University, is revealing: During the mid- to late 1970s and into the early 1980s, our research group was heavily involved in interdisciplinary research (involving electrical engineers, civil engineers, computer scientists, statisticians, meteorologists, crop physiologists, soil scientists, foresters, environmental specialists), collaborating with several federal agencies and other universities. Our research focused on the study of the relationships between the data derived from the Landsat multispectral scanner (MSS) and thematic mapper, the characteristics of agricultural land surfaces, and the changes of these surfaces with time. Specific objectives were to determine the feasibility of using Landsat MSS and TM data for crop inventory and monitoring. Some questions addressed were: What quantitative changes occur in the spectral characteristics of crops (corn, soybeans, winter wheat, spring wheat, rice, many other crops) throughout phenological development? What spectral changes are a result of stress from drought, nutrient deficiency, disease, insect infestation, salinity of the soil, storm damage, wetness, and other causes? Can these changes be identified and delineated by classical pattern recognition analysis applied to multispectral data obtained by aerospace sensors? Many of these questions were addressed by our research during that period and at least partially answered. One of the many areas of research that came to a complete halt when the price of TM data was Increased manifold was in the application of time-sequential remote sensing data (e.g., MSS, TM, advanced very-high-resolution radiometer (AVHRR), and others) to mapping and monitoring terrestrial ecosystems and to developing models to assess land quality, soil productivity and degradation, and erosion hazards. The anticipation that had begun to build for use of earth observation satellite data for integration with other data sets to provide national, continental, and global resource databases was suddenly dashed. It became impossible to develop the procedures, approaches, and models for doing any credible global monitoring and modeling, especially for terrestrial ecosystems, without such data. Remote sensing research with agricultural crops slowed considerably, and much of it stopped completely as a result of the diminished support for civilian space research in the early 1980s and the subsequent commercialization of Landsat, which resulted in exorbitant data costs. Researchers in remote sensing laboratories and centers around the world, especially in universities, almost overnight went from a 'data rich" situation to a condition of "data poverty." Many of the basic questions that were being addressed in the early 1980s are still being asked because the data became unaffordable to the research community addressing these questions. The resulting nonavailability of data probably played a significant role in the decline of and, in many cases, the closing of remote sensing programs at numerous universities. It is a pity that the commercialization occurred when it did. The scientific "homework" had not been completed or carried out to the point at which marketable products had been sufficiently demonstrated. Another few years of affordable data and public research support in this area might have made the commercialization process more feasible and ultimately less painful. 1   Compiled by Michael D. Jennings, National Biological Service, Gap Analysis Program, University of Idaho, Moscow.

OCR for page 110
--> However, if not done in accord with the principles above, it can be disastrous for the government's basic scientific mission. Such a situation occurred with the privatization of the Landsat system in 1985. The history of this case and its impact on science are described in Boxes 4.3 and 4.4. It also should be noted in this context that the government producers of scientific data are themselves in a potential position to act as exploitive monopolies, absent some formal restraints. In the United States, the prices that the federal government can charge are heavily regulated by law (see the following section). This is not the case in all other countries, however; some government agency monopolies are allowed to sell their data commercially without the restraints of a competitive market. In summary, when privatization or commercialization leads to unregulated monopoly supply, it is not good public policy, and especially not good for science. When privatization leads to competitive supply to multiple user communities, it could well be good public policy, especially if scientific users are assured access at reasonable prices and there is a net benefit to the public from such a transfer. PRICING PUBLICLY FUNDED SCIENTIFIC DATA Ramsey pricing is a mechanism developed for regulated monopolies, which in this context mean either government monopolies or monopolies acting as agents of the government. It was long ago proved by the British mathematician, Frank Ramsey,11 that where the optimal price for a good is zero or is insufficiently high to pay for the total cost of the product, economic efficiency requires the shortfall to be covered by differential prices, with the highest prices charged to users with low elasticity of demand, that is, users whose usage will be reduced relatively little by a given charge for the item. The reason is clear. A high price to a user whose elasticity of demand is great will cause a large cut in that person's use of the item—a major deviation from that individual's optimal use of the item. Thus, any price that is likely to prevent scientists from using data because of budgetary problems means that the elasticity of demand of the scientist users is high. Ramsey analysis then confirms that these scientists should not be charged a substantial price for data. Commercial users, in contrast, if they stand to make a considerable profit from the data, will acquire them even at a substantial price, meaning that their demand is relatively inelastic. Welfare analysis confirms that these users should bear the bulk of the cost. There are several approaches to achieving so-called ''Ramsey" prices. The most straightforward approach is pure price discrimination. The supplier can establish different prices for different customer groups, using some means of identifying which customers are in which groups. For example, professional societies and scientific journals often charge different dues or subscription prices to libraries, to professors of various ranks, and to graduate students. Since student discounts are generally quite deep, some form of identification is required

OCR for page 110
--> for such discounts. Usually, societies depend on self-reporting to discriminate among full, associate, and assistant professors, or by income. Naturally, there is some "leakage" due to false reporting, sharing of materials, and the like. However, these schemes generally are successful in reaching multiple user groups while maintaining revenues. In some cases, price discrimination based on observable customer characteristics is not feasible. A useful approach in such cases is for the supplier to develop data product lines, consisting of somewhat different packages of data and value-added services at different prices, directed at different user segments. The price differentials cannot be so great as to encourage excessive leakage of users from one package to another because of price. In order to prevent this, price differentials can be no greater than the value differentials among user segments. There are several methods of product line differentiation: Time. Users who demand up-to-the-minute (i.e., real-time or near-realtime) data pay more than those willing to wait and use archived or retrospective data. Customer support. Users who demand full access to telephone support and other types of support services pay more. Sample size. Users who want a higher sampling rate pay more. Documentation. Users who require full documentation of the data pay more. Scope of coverage. Are the data useful for a narrow user group or for a broader, more comprehensive audience? However, these methods do not address how much should be charged to various users of the scientific data. The appropriate guidelines are as follows: For the contributor and active researcher community, data acquisition should most likely be free, or, following much of current practice, available at the marginal cost of distribution. This guideline is based on the presumption that the individual contributor is actually creating the value. In this case, the data distributor is acting as a repository of the data for the active research community, which is both contributing and using the data.12 However, it is possible that free access may generate such a great demand for service that congestion occurs, in which case some form of congestion pricing would be appropriate. For others, including commercial users, data acquisition should be priced to cover the costs of serving those users. The two appropriate pricing methods, incremental and marginal cost pricing, differ as follows: Incremental cost pricing. The price to secondary users is set so that

OCR for page 110
--> revenues cover the cost to provide this incremental use, including recompiling the data, perhaps maintaining a computer site for downloading, purchasing CDROM blanks, recording the data, shipping to the user, and customer support, but not including costs for the core service. Marginal cost pricing. The price to secondary users is set at the marginal cost of the specific unit sent to the user, including the cost of the CD-ROM blanks and postage and shipping. This price is lower than the incremental cost price, as long as the cost of output per unit declines when volume increases. It is easiest to express the difference between these two ideas mathematically. Assume that the total cost to supply the quantities q1 q2 ≥0 of two goods is (q1, q)≥0, with ∂C/∂qi >0; then marginal cost of good 1 = ∂C/∂qi ; incremental cost of good 1 = C (q1, q2) - C (0, q2)/q1 . The pricing policy specified in Office of Management and Budget (OMB) Circular A-130,13 which applies to all federal government agencies, corresponds to incremental cost pricing. The tradition in the research community, and the pricing level indicated by the "full and open access" policy, corresponds to marginal cost pricing. One can argue on the basis of public good benefits that the price floor should be zero or the marginal cost. If avoiding undue subsidization were to become an overriding concern, then incremental cost would be the appropriate price floor.14 Under the OMB pricing policy for federal government data, however, the full incremental cost is also the ceiling. ELECTRONIC ACCESS AND INTERNET CONGESTION Perhaps the most profound change associated with the digitization of science is the ability to access scientific data worldwide from the desktop, via the Internet. But new capabilities give rise to new problems. In this case, the recent congestion on the Internet, particularly across the Atlantic and the Pacific, has reduced the ability of scientists to access data around the world, and particularly to monitor experiments overseas in real time. Once the exclusive preserve of scientists, the Internet has attracted so much interest since the advent of the World Wide Web that nonscience traffic now dwarfs scientific traffic. This phenomenon is a classic example of a "tragedy of the commons," in which use of a common property becomes so intense that its value and benefit to its users diminish, possibly even to the extent that the good becomes useless. Many scientists see

OCR for page 110
--> the congestion resulting from this popularization of the Web as interfering with their ability to do science across the Internet, as discussed in Chapter 2. The Internet congestion issue is a difficult one and is likely to be with us for a long time. Prior to the popularity of the World Wide Web, the fairly low level of usage imposed on the Internet by scientists was well within the modest capacities of the network to function without discernible delay. Scientific users perceived the network as having no capacity constraints, because they never encountered any. Today, the situation is different; the use of the Internet has increased to the point that those modest capacities, even though they are expanding, are being reached (or even exceeded in peak periods) in many areas. Delays are the norm in some situations, such as the link to Central Europe during daytime hours on either side. What are those capacities? Generally, capacity constraints exist within computers (servers) and the transmission pipes that connect them. These transmission pipes are leased telephone company lines and share the same physical facilities as all other telecommunications services. For example, there are several physical transmission facilities that span the North Atlantic, the most important of which are the undersea fiber-optic cables. All telephone traffic and leased lines use these facilities, with the split being determined by how many lines have been leased for Internet use.15 Additionally, local "hot spots" can occur, in which a server that hosts a particularly popular Web site becomes congested because of increased traffic. This in turn causes nearby servers to become congested with traffic attempting to reach the busy server, so network congestion can spread. The solution is straightforward: the popular server must add capacity. Note that the greatest strength of the Internet—its decentralized character—can become its greatest weakness. In a fully decentralized network, the solution to congestion relies on the implementation of decisions to expand capacity. Unfortunately, congestion affects not only the party that causes the congestion but others as well, so it becomes a classic "externality" problem. The owner of a site may be perfectly happy to live with congestion on his or her popular site (even though the site's visitors are not happy), but the owner's actions will cause congestion at nearby sites and perhaps even throughout the network, thereby imposing costs on others. Furthermore, congestion can also be imposed when two parties cannot come to an agreement about expanding capacity. If the United States and European countries cannot agree on how much capacity is needed to connect their respective networks, or how to share the costs of that capacity, then needed expansion may not occur, to the detriment of users. How best to deal with this "tragedy of the commons"? The usual solution to congestion16 is to ensure that those who cause it bear its full cost, in the form of congestion prices. This is a mature theory in economics. The Varian and MacKieMason proposals for "smart pricing" are the most well-developed in the context of the Internet and are specifically designed to cope with congestion problems.17 It is sufficient for the committee's purpose here to note that such pricing schemes

OCR for page 110
--> will generally involve some form of usage-based prices for real-time traffic (i.e., traffic that cannot be delayed until congestion has eased). This could be a perpacket charge, for example, or a per-minute charge differentiated by type of traffic (e.g., telnet, video, Internet, phone).18 The effectiveness of congestion prices depends on two obvious, but critical, functions served by prices: (1) users will have an incentive to postpone traffic to less congested, lower-cost periods if they are required to pay a high price during peak periods, and (2) suppliers will have an incentive to increase the capacity of a server or a transmission route if traffic during the peak periods is highly profitable. The demand-shift effect tends to reduce traffic, while the supply effect tends to increase capacity in the long run. Additionally, users who require peak-load use in the short run will, upon payment of the peak-load price, be more likely to obtain service under this regime than with underpriced peak service. One other possible means for dealing with congestion would involve investment in hardware and especially software and would follow a growing trend in business. This would be the creation of one or more dedicated networks for scientific research, such as the Internet II now being developed with support from the National Science Foundation. Such a network would function much like the "intranets" being established by private firms. The committee urges that funding agencies and professional societies begin to examine and evaluate this option in greater detail. There is no question that congestion is also its own punishment. Servers or network administrators who generate congestion also suffer its consequences. However, those consequences extend to others, and the individual disincentives will not reflect the "external" costs imposed. Current administrative mechanisms may help alleviate congestion, as well. Sufficient peer pressure from network "partners" may induce managers of congestion-causing servers to increase their capacity to keep their standing with their colleagues. In the long run, however, the "partnership" model of the past is unlikely to provide sufficient incentives to alleviate congestion, and the current situation can be interpreted as a transition to a new regime in which more formal mechanisms, such as congestion pricing, will be required. RECOMMENDATIONS REGARDING ECONOMIC ASPECTS OF SCIENTIFIC DATA The committee recommends that the economic aspects of facilities for storage and distribution of scientific data generated by publicly funded research be evaluated according to the following criteria: Does the scientific research depend on a substantial public investment in one or more facilities that generate the data of interest? If so, the data

OCR for page 110
--> distribution facilities are most likely to benefit by being vertically integrated with the observational or experimental facilities themselves. Does the (non-facilities-based) distributed scientific research involve coordination among researchers, possibly in different countries? If so, then data distribution becomes a means of communication among contributing scientists, and for this community, the price of the data alone should be zero. If the distributor subsequently adds value to the data, then the price should be no higher than the marginal cost of adding value.19 Is the community of users roughly the same as the community of contributors? If so, then data distribution should be priced at zero (or at marginal cost, if value is added). If there are many users who are not contributors, such as commercial customers, then some form of price discrimination to ensure zero or low prices to contributing scientific users, with possibly higher prices to others, may be appropriate. Is the user community large enough to support more than one data distributor? If so, then privatization of data distribution may be a viable policy option. If not, then privatization should occur only if the contractual arrangements are adequately protective of the needs of the scientific community. Necessary—but not necessarily sufficient—conditions for privatization to be desirable are as follows: The distribution of data can be separated easily from their generation. The scientific data set is used by others beyond the research community. It is easy to price discriminate/product differentiate between scientific users and other users, and it is easy for the government to contractually mandate low prices to scientific users for government-funded data. Privatization will not result in the unrestricted monopoly provision of the data. The appropriate price ceiling for nonscientific users of scientific data generated through government research is incremental cost, as defined in the section above titled "Pricing Publicly Funded Scientific Data." The price of scientific data to the contributing scientific community should be zero, or at most marginal cost. NOTES 1.   Not all scientific data are maintained and distributed by public agencies or via public funding, although this is the norm. Various not-for-profit institutions and private firms are also providers of basic scientific data. 2.   Throughout this section, the terms "scientists" and "scientific researchers" are intended to include explicitly both U.S. and foreign scientists working in the natural sciences. Scientific research is by its nature a global enterprise; actions by any single government are felt by the entire scientific community. However, the barriers to scientific information flow are exacerbated by the problem of unequal technological capabilities, important in virtually all the disci-

OCR for page 110
-->     pline-specific contexts. This has at least two dimensions: individuals/institutions have different levels of computing and networking skills, and individuals/institutions have different levels of accessibility to hardware and software needed to exercise those skills. Generally, scientists are relatively sophisticated in their skill sets; the problem in science is the availability of tools, both computing and networking. The digitization of science suggests that if individuals, institutions, or countries lack the tools to process, analyze, and share scientific data at the level typically enjoyed in the developed countries, they will be unable to participate fully in the scientific endeavor, to the detriment of science as a whole. The problem, of course, is not confined to science, and involves the unequal distribution of wealth and income across the globe. At this level, it is highly unlikely that the expressed needs of scientists will have much effect in changing this unequal distribution of income. However, specific science-oriented activities can make a difference; recycling hardware, for example, to scientists in developing countries may greatly improve their ability to process data at almost no cost. While U.S. scientists may view two-year-old 486 PCs as hopelessly underpowered, these cast-off computers may be a godsend to scientists in Africa who currently have nothing. Perhaps the best way to facilitate such transfers is via professional society programs and institutions, which are likely to be able to identify both needs and donors efficiently, with relatively low levels of public financial support. 3.   The external benefits of fundamental or basic research are to be contrasted with the benefits of development, generally of products and services for sale, in which the full benefits are captured by the buyer and seller. Generally, only basic research is acknowledged to be a public good, while development is seen as a private good, at least where there is effective patent protection. Of course, the distinction between research and development is not as clean as this suggests, although it is often a useful distinction. 4.   Of course, the committee recognizes that the acquisition of data by the government through its research activities is neither a costless activity nor an activity requiring unfettered spending; it is, however, a part of the process of doing research that falls outside the charge of this study, which focuses on data distribution and access. 5.   An instructive lesson can be drawn from the shift in public policy regarding mental health in the 1960s. The strategy was to shift from institutional care to community-based care, with substantial deinstitutionalization of patients and a funding shift to community facilities. The unfortunate result was that funding for institutions dropped, but no funding was provided for community care. It is claimed that this failure to provide for local funding has added substantially to the homeless population. 6.   A more extensive introduction to the complex topic of price discrimination/product differentiation is contained in the section "Pricing Publicly Funded Scientific Data," below. 7.   The committee's use of the term "private" in this context includes only for-profit firms that are not subsidized by the government; other private institutions such as universities, educational consortia, foundations, and other NGOs are not included in this context. 8.   As a practical matter, one would expect scale economies of data distribution to be exhausted at rather low levels of demand. However, many scientific data sets may have demand lower than this threshold and thus be subject to a "natural monopoly." 9.   In this case, of course, meteorologists still obtain their scientific data from NOAA, not commercial firms, and so there are actually separate distribution channels for scientists and the public. The point here is that there are activities in which the private sector can outperform the public/NGO sector. 10.   Implicit is the assumption that if the government and private suppliers (who may get their raw data "wholesale" from the government) are competing against one another in the marketplace, the government is constrained to set prices to recover its cost. See the section titled "Pricing Publicly Funded Scientific Data." 11.   F. Ramsey (1927), "A Contribution to the Theory of Taxation," Econ. J., 35 (March):47-61.

OCR for page 110
-->     See also M. Boiteaux (1956), "Sur la gestion des Monopoles Publics astreints a l'equilibre budgetaire," Econometrica (Jan. 24): 22-40, and W.J. Baumol and D.F. Bradford (1970), "Optimal Departures from Marginal Cost Pricing," Am. Econ. Rev., 60 (June):265-283. 12.   A zero price might seem too low to some; however, it is the contributions of this community that actually create the value in the first place. An instructive analogy is consumer banking: depositors with sufficiently large balances receive "free" checking from banks; the quid pro quo is that the bank has the use of their money. Similarly, the database provider could give free access to contributors, with the quid pro quo being the contributions themselves. It also should be noted that if the data were not supplied free, it is almost certain that some public-spirited scientist with spare server capacity would have a graduate student maintain an FTP or WWW site with the data on it for free downloading for interested colleagues. 13.   The OMB Circular A-130 policy regarding federal government information dissemination practices was codified in the Paperwork Reduction Act of 1995, P.L. 104-13, which amended 44 U.S.C. Chapter 35, effective October 1, 1995. 14.   The original reference for this finding is Gerald Faulhaber (1975), "Cross Subsidization: Pricing in Public Enterprises," Am. Econ. Rev., 65:966-977. Later references, among many others, are J.C. Panzar and R.D. Willig (1977), "Free Entry and the Sustainability of Natural Monopoly," Bell J. Econ., 8 (Spring): 1-22, and W.J. Baumol (1977), "On the Proper Cost Tests for Natural Monopoly in a Multiproduct Industry," Am. Econ. Rev., 67 (December):809-822. 15.   The current situation on the North Atlantic route is that there is virtually no congestion for placing telephone calls from Europe to the United States, but serious congestion for Internet traffic. The conclusion from this evidence is clear: there is plenty of capacity in the physical transmission facilities, but too little of that capacity is devoted to the Internet. The simple (but expensive) solution for the researcher in Europe is to place a modem telephone call to the U.S. computer, or vice versa, and conduct the research via direct connection. 16.   There are, of course, a wide variety of engineering and queuing-theory solutions, priority schemes, compression methods, and so forth. All such methods either reduce the capacity required or seek to allocate scarce capacity during congestion to more valued uses. Ultimately, however, capacity is finite, and congestion may still occur, but at higher load levels. 17.   For a discussion of the economics of network pricing, see Hal Varian and Jeff MacKie-Mason (1995), "Pricing the Internet," in B. Kahin and J. Keller, eds., Public Access to the Internet, MIT Press, Cambridge, Mass.; Hal Varian and Jeff MacKie-Mason (1995), "Pricing Congestible Network Resources," Advances in the Fundamentals of Networking, IEEE Journal on Selected Areas in Communications, and Gerald R. Faulhaber (1992), "Pricing Internet: The Efficient Subsidy," in B. Kahin, ed., Building Information Infrastructure, McGraw-Hill, New York. 18.   Another possibility for dealing with congestion is to offer a lower subscription charge to users who are willing to postpone their use to off-peak times. Currently, Lexis/Nexis offers universities a low subscription charge but denies access during peak times. 19.   By "adding value" in this case is meant any transformation of the data beyond that necessary for scientific research that increases the value of the information for some or all potential users of the data.