3
Overarching Principles

The critical functions of an integrated, end-to-end data management system can be broken into three main elements: archiving, which includes ingestion, storage, and preservation; access, which includes discovery, delivery, and integration; and stewardship, which spans archiving and access and includes all the processes and procedures that preserve and improve the information content of individual data sets and their associated metadata, as well as assuring access and understanding for users. These three functional elements are interdependent, and they need to be coordinated through a formal, ongoing planning process to ensure that the data management system is integrated, cost-effective, and successful in meeting user needs and requirements. In addition, there are certain overarching principles, laid out in this chapter, that apply to all aspects of data management. Many of the concepts introduced by these overarching principles, such as the importance of user engagement, are also discussed in subsequent chapters that explore the three main functional elements of data management (stewardship, archiving, and access) and, in the final chapter, the need for a formal ongoing planning process to integrate these elements into an effective end-to-end data management system.

The nine principles offered in this report, including the five overarching principles introduced in this chapter, are numbered sequentially for convenience, but all nine should be regarded as equally important for effective environmental data management. Some principles are accompanied by guidelines intended to provide specific recommendations for how NOAA and its partners could and should apply these principles to



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 32
3 Overarching Principles The critical functions of an integrated, end-to-end data management system can be broken into three main elements: archiing, which includes ingestion, storage, and preservation; access, which includes discovery, delivery, and integration; and stewardship, which spans archiving and access and includes all the processes and procedures that preserve and improve the information content of individual data sets and their associ- ated metadata, as well as assuring access and understanding for users. These three functional elements are interdependent, and they need to be coordinated through a formal, ongoing planning process to ensure that the data management system is integrated, cost-effective, and successful in meeting user needs and requirements. In addition, there are certain overarching principles, laid out in this chapter, that apply to all aspects of data management. Many of the concepts introduced by these overarching principles, such as the importance of user engagement, are also discussed in subsequent chapters that explore the three main functional elements of data management (stewardship, archiving, and access) and, in the final chapter, the need for a formal ongoing planning process to integrate these elements into an effective end-to-end data management system. The nine principles offered in this report, including the five over- arching principles introduced in this chapter, are numbered sequentially for convenience, but all nine should be regarded as equally important for effective environmental data management. Some principles are accom- panied by guidelines intended to provide specific recommendations for how NOAA and its partners could and should apply these principles to 

OCR for page 32
 OVERARCHING PRINCIPLES their current and future data management activities. These principles and guidelines are based on a variety of sources: the large body of previous work described in Chapter 2; NOAA’s legal and administrative man- dates; the data archiving and access requirements demanded by NOAA’s mission objectives; information and feedback received from NOAA, its agency partners, and its data users regarding current and planned data management activities; and the knowledge and collective experience of the members of this committee. PRINCIPLE #1: Environmental data should be archived and made accessible. The environmental data (including model output, derived products, and other information) collected by NOAA and its partners are an invaluable resource that should be archived and made accessible in a form that allows a diverse group of users to conduct analyses and gen- erate products necessary to describe, understand, and predict changes in the Earth’s environment. Full and open access to data should be a fundamental tenet of all US federal agencies, including NOAA. NOAA’s mission is “to understand and predict changes in the Earth’s environment and conserve and manage coastal and marine resources to meet our nation’s economic, social and environmental needs.” Since the Earth and its environment represent a complex, interconnected biogeo- chemical system, describing the current state of the system or its variabil- ity over time requires a large number of observations, and predicting its future behavior often demands the use of models built around a detailed understanding of various system components. Thus, any observational data stream, model output array, or other environmental data set that contributes to the description, understanding, or prediction of the Earth system (including derived products) should, in principle, be archived by NOAA. Likewise, in order to realize the full benefits of Earth system mea- surements, analyses, and predictions, these data should be made acces- sible to the broadest possible range of users in a form that allows them to make informed economic, social, and environmental decisions. Although “save all environmental data and disseminate it to all pos- sible users” is a worthwhile goal for a data archival and access system, a number of practical considerations make this goal impossible to fully achieve. First among these is the reality of limited resources, which places restrictions on the volume and quality of data that can be archived, the ability to make these data accessible to a wide range of users, and the number of personnel dedicated to ensuring reliable data stewardship. Normally, cost-benefit analysis would be a powerful tool for identifying the most appropriate way to allocate limited resources and prioritize data

OCR for page 32
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA sets. However, it is difficult to assess the current value of any particular environmental data stream (see, for example, Millard et al., 1998), and impossible to anticipate all its potential future uses. In addition, it is usu- ally impossible to resample environmental data, although there are excep- tions (for example, some types of geodetic data or model output). Finally, we agree with the many previous groups (e.g., NRC, 1997) who have argued that full and open access to data should be a fundamental tenet at all federal agencies. These considerations support the notion that NOAA should strive to archive the broadest possible collection of environmental data and disseminate these data to the broadest possible range of users, and they also support the following principle. PRINCIPLE #2: Data-generating activities should include adequate resources to support end-to-end data management. End-to-end data management (which includes acquiring, process- ing, storing, maintaining, updating, and providing access to data) should be planned and budgeted at the outset of any activity that will generate environmental data. This planning should explicitly address data archiving, data stewardship, and data access responsibilities, and sufficient funds should be provided to archive and provide ready and easy access to the resulting data for extended periods of time. In general, the cost of archiving and providing access to environmen- tal data represents only a small fraction of the total resources invested in originally collecting or generating the data. For example, NOAA’s F�2007 budget request includes $965 million for satellite observation systems alone (acquisitions plus operations), compared to $69 million for all of NOAA’s data management activities combined (although some of the satellite observing system funding is spread out over several years to sup- port data acquisition, processing, analysis, and distribution).1 For digital data, hardware for data storage and for data distribution constitutes a sizable fraction of data management costs, but funding to support data management personnel is also critically important. The return on these investments in terms of societal benefits is difficult to quantify, although the value of environmental data can sometimes be inferred; for example, the economic value of weather forecasts for the household sector alone has been estimated to exceed $109 per household (or $11.4 billion total) per year, compared to an average expenditure of $13 per household per year to support the National Weather Service (Lazo and Chestnut, 2002). Effective data management planning is needed to ensure that the nation receives an appropriate return on the substantial investments made 1 http://www.corporateservices.noaa.gov/~nbo/07bluebook_highlights.html.

OCR for page 32
 OVERARCHING PRINCIPLES in observing systems, environmental models, and other data-generating activities. Most agencies, including NOAA,2 have begun to recognize that data management is far easier to plan and more cost-effective to imple- ment at the outset of data-generating activities. Unfortunately, many in situ data sets, as well as high-volume radar and satellite data, have been collected in support of NOAA’s operational missions with little initial provision made for long-term preservation and access. However, data once perceived to be of little use can sometimes become quite valuable as a result of advancing technology or changing user needs. For example, many atmospheric data sets that previously had little use beyond satis- fying operational needs can now be ingested by models using new data assimilation methods to yield improved atmospheric reanalyses (see, for example, Kalnay et al., 1996). It is thus essential to secure adequate funding to manage the environmental data streams that were collected or generated without sufficient resources reserved for long-term data management. A number of previous reports, including many of those summarized in Chapter 2, have highlighted additional benefits that might be realized when sufficient resources are available to support effective and proactive environmental data management. Despite these demonstrable societal benefits, ensuring adequate and sustained levels of funding to support data management activities remains a major ongoing challenge for both NOAA and other agencies and international groups involved in archiving and providing access to environmental observations, model output, and other environmental information. This challenge has arisen largely for three reasons: in part because data management activities require continuing costs that extend long after the data are originally collected or generated, in part because data management often requires considerable coordination among differ- ent agencies or groups, and in part because the benefits of effective data management are difficult to quantify. Considerable progress has been made in some areas, such as the recent expansion of the Comprehen- sive Large-Array [data] Stewardship System (CLASS), but many other potential environmental data users and applications would benefit from improved archiving and access capabilities at NOAA, especially improve- ments in data discoverability and integration across NOAA’s diverse data streams. Any additional resources that can be secured to support long-term data management at NOAA should be incorporated into a comprehen- sive, integrated, and flexible end-to-end data management plan that is coordinated across NOAA to support Earth system research, improved environmental predictions, and other societal benefits. Additional 2 See, for example, NOAA Administrative Order 212-15.

OCR for page 32
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA resources can be expected if the data management system is shown to be cost-effective and to have significant, demonstrable societal benefits. Chapter 7 describes some of the potential applications and societal ben- efits that could be realized with additional investments in data archiving and access capabilities. PRINCIPLE #3: Environmental data management activities should recognize user needs. The success of any data management enterprise is judged by its use- fulness to current and future users. However, for environmental data, both user needs and the data themselves are constantly evolving. Thus, all environmental data management activities, including data archiving and access decisions for specific data sets as well as the development of the overall data management system, should incorporate substantial and ongoing user input. The ultimate measure of any data management enterprise is its benefit to society, and the most straightforward way of both ensuring and assess- ing these benefits is via close and continuous interaction with users. User participation in the planning and in the ongoing management process helps to ensure that the system meets societal needs, and user feedback provides a mechanism to evaluate individual system components and to identify and develop potential system improvements. Since user needs are continually evolving and new applications for data are constantly being developed, user participation is most effective when it is regular and scheduled, rather than episodic or ad hoc, and when it is solicited from the user community most familiar with the environmental data under review. Ideally, user input should include both informal feedback from a broad range of data users, for instance through user logs, and more formal forms of feedback such as external advisory groups and stakeholder panels. Regular communication with users likewise promotes a proactive approach toward data access improvements. To the maximum extent possible, current efforts to engage users in specific data archiving and access decisions, as well as in the planning and implementation process for an enterprise-wide data management system at NOAA should be continued and expanded. The recent formation of the Data Access and Archiving Requirements (DAAR) Working Group represents an important step toward expanded user involvement; it will be critical for the DAAR Working Group to remain actively involved in the planning and implementation process for both CLASS and the Global Earth Observation Integrated Data Environment (GEO-IDE), and for that working group, or a similar entity, to be actively engaged in guiding the future evolution of NOAA’s enterprise-wide data management activities.

OCR for page 32
 OVERARCHING PRINCIPLES Because the DAAR Working Group is only a single, ad hoc group with limited time and resources, they will not be able to offer specific advice on the large and diverse range of data and products that NOAA collects and generates. It will therefore also be essential to establish a mechanism for ongoing discipline-specific user feedback to help inform individual data archiving and access decisions at NOAA’s National Data Centers and other centers of data. Fortunately, these centers already have considerable experience working with their user communities, especially when devel- oping and improving data access capabilities. This experience should be leveraged to facilitate further user engagement and participation in individual data management decisions. The importance of user feedback and involvement for data steward- ship, data archiving decisions, and designing effective data discovery, access, and integration tools are discussed in further detail in Chapters 4, 5, and 6, respectively. PRINCIPLE #4: Effective interagency and international partner- ships are essential. Any environmental data management planning process or system needs to include substantial coordination and agreement among the relevant federal agencies and international partners in order to achieve maximum cost-effectiveness and to ensure proper data stewardship. NOAA should take steps to improve its relationship with several impor- tant partners and consider taking a leadership role in archiving and providing access to a broad range of environmental data, including data not traditionally regarded as falling under its operational mission. Many different entities collect environmental data, and scientific understanding and other societal benefits cross over political and agency boundaries. Thus, NOAA needs to work closely with other agencies and international groups to make certain that all important environmental data are archived and made available to users. In particular, there should be a clear understanding of both individual and shared responsibilities among the various agencies (including NOAA, NASA, the U.S. Geologi- cal Survey, the Department of the Interior, the Department of Defense, and others) involved in archiving and providing access to environmental data collected both domestically and internationally. Even though NOAA is probably the largest environmental data steward in the world, as mea- sured collectively by data diversity and volume, it can and should engage other similar organizations to discuss common problems, share lessons learned, and find areas where collaborations result in mutual benefits (for example, providing mutual backup services).

OCR for page 32
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Interagency coordination is fraught with practical difficulties. Different agencies have different missions with respect to collecting, archiving, and providing access to different types of data, and these missions sometimes overlap or leave gaps in responsibility for critical data sets. The transition from research to operations is one notable and longstanding challenge (see, e.g., NRC, 2000). The concept that a single agency should archive all environmental data for the United States, regardless of the source or pri- mary users of a data set, does not appear to be a fiscally, bureaucratically, or politically feasible solution, although the concept of a single repository for all scientific data generated by U.S. agencies is apparently still being discussed (Butler, 2007). Our view is that NOAA should consider taking a leadership role in archiving and providing access to a broad range of environmental data, including data not traditionally regarded as falling under its operational mission. Leadership on data management issues is important if NOAA wants to continue to be regarded as the lead U.S. agency for Earth System research, environmental prediction, and other important data-driven activities, such as climate services; it is recognized, however, that additional resources would be required in order for NOAA to assume such a role. At the very least, NOAA should work to improve its relationship with its agency partners to ensure that all environmental data derived from publicly funded research are archived and made available to users. Inventories and formal agreements can be time consuming and difficult to complete, but they form the foundation for all future data management decisions. It is also essential to periodically review these data inventories and agreements and to establish extensive and ongoing coordination efforts in order to make sure that new data streams are properly archived and that the evolving set of user needs continues to be satisfied. Facilitat- ing international cooperation is associated with additional complications, as noted in Chapter 2, but also brings the potential for more dynamic collaboration, as illustrated by the effort to bring about the Global Earth Observing System of Systems (GEOSS). Of course, funding is always an important consideration in any international or interagency agreement process, a consideration that provides further support for the principle that “data-generating activities should include adequate resources to support end-to-end data management.” Box 3-1 illustrates some of the problems that can arise when interagency coordination is lacking, and additional discussion of interagency and international issues is included in several of the chapters that follow.

OCR for page 32
 OVERARCHING PRINCIPLES BOX 3-1 Archiving Responsibilities for EOS Data A recent example illustrates the perils of inadequate coordination of data management activities. At one of the meetings of this committee, it was revealed that NASA officials were unaware of NOAA’s decision to archive only a small sub- set of Earth Observing System (EOS) data (in particular, MODIS “Level 1b” data). NOAA representatives explained that NOAA’s National Environmental Satellite, Data, and Information Service (NESDIS) will archive only those data that fall under its climate monitoring or operational missions, citing both funding constraints and memorandums of understanding (MOUs) signed by NASA and NOAA in 1989 and 1992 (see Chapter 2). NASA has subsequently decided to archive the rest of the EOS data on an ad hoc basis, and there appears to be little discussion between the two agencies to guarantee that all EOS data are archived and made accessible for the remaining portion of their life cycle. This situation raises seri- ous concerns about the stewardship and long-term preservation of data that are critical for global change research but which NOAA does not consider relevant to its mission. While NOAA’s decision to archive and provide access only to data directly relevant to its operational and climate monitoring activities may be based on very real funding constraints, federal agencies should not be forced to rely on ambiguous, nonbinding 15-year-old agreements to make critical data management decisions. Of particular concern is the absence of any formal mechanism for NOAA and NASA to periodically review their agreements in order to ensure that important environmental data are continually archived and made available to users. These issues are explored in further detail in Chapter 5. PRINCIPLE #5: Metadata are essential for data management. Metadata that adequately document and describe each archived data set should be created and preserved to ensure the enhancement of knowledge for scientific and societal benefit. NOAA and its partners should continue and expand their usage of metadata standards. Metadata are all the pieces of information necessary for data to be independently understood by users, to ensure proper stewardship of the data, and to allow for future discovery. This information should include, at a minimum, a thorough description of each data set, such as its spatial and temporal resolution and how it was originally collected or produced, as well as thorough documentation of how it has been managed and processed. Ideally, metadata should also describe appropriate applica- tions of the data, the relationship between the data and other data within and outside of the archive, and enough high-level information to allow even inexperienced users to find, understand, and use the data. Metadata

OCR for page 32
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA are thus the essential data management component that makes an envi- ronmental data archive useful. Whenever possible, NOAA’s metadata protocols should be made compatible with international standards to facilitate discovery and coordination with other archives, and NOAA should work with its users and partners to expand and improve these standards. The contents and utility of metadata for data stewardship, archiving, and access are described in additional detail in Chapters 4, 5, and 6, respectively.