7
Integrated Data Management at NOAA

PRINCIPLE #9: Effective data management requires a formal, ongoing planning process.


NOAA should establish and codify an enterprise-wide data management plan that explicitly incorporates all the principles in this report and is in alignment with NOAA’s mission, vision, and goal statements. The plan should be flexible and needs to include a formal, ongoing planning process—developed and integrated across NOAA with substantial user involvement and coordination with other agencies—to ensure that the system keeps pace with changes in observing systems, models, data storage and access technologies, scientific understanding, and user needs, as well as available resources.

NOAA has responsibility for a vast and growing collection of environmental data sets, some of which are critical for supporting environmental decisions, while others may be of limited value. Perhaps the biggest deficiency in NOAA’s current data management enterprise is the lack of a formal decision-making process for assigning specific archiving, stewardship, and access responsibilities for individual data sets. An additional, related concern is the absence of a clear, enterprise-wide framework to connect its many disparate data management activities. NOAA’s current Strategic Plan mentions data management in its crosscutting priorities section, but it does not explicitly acknowledge the data management functions needed to support NOAA’s mission or the challenges associated with meeting its ever-expanding data management responsibilities.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 85
7 Integrated Data Management at NOAA PRINCIPLE #9: Effective data management requires a formal, ongo- ing planning process. NOAA should establish and codify an enterprise-wide data man- agement plan that explicitly incorporates all the principles in this report and is in alignment with NOAA’s mission, vision, and goal statements. The plan should be flexible and needs to include a formal, ongoing planning process—developed and integrated across NOAA with sub- stantial user involvement and coordination with other agencies—to ensure that the system keeps pace with changes in observing systems, models, data storage and access technologies, scientific understanding, and user needs, as well as available resources. NOAA has responsibility for a vast and growing collection of environ- mental data sets, some of which are critical for supporting environmental decisions, while others may be of limited value. Perhaps the biggest defi- ciency in NOAA’s current data management enterprise is the lack of a formal decision-making process for assigning specific archiving, steward- ship, and access responsibilities for individual data sets. An additional, related concern is the absence of a clear, enterprise-wide framework to connect its many disparate data management activities. NOAA’s current Strategic Plan mentions data management in its crosscutting priorities section, but it does not explicitly acknowledge the data management functions needed to support NOAA’s mission or the challenges associated with meeting its ever-expanding data management responsibilities. 

OCR for page 85
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Thus far NOAA has been able to rely on ad hoc groups and the dedication of its personnel to provide reliable archiving and stewardship of essential data and to develop data access capabilities for specific user communities. For example, data managers at individual NOAA National Data Centers and centers of data have assumed many of the decision- making responsibilities for individual data sets. However, the recent and expected future increases in data volumes and complexity necessitate a highly coordinated, more defined process for managing the nation’s envi- ronmental data, as well as a comprehensive, user-centric framework to guide the ongoing development of NOAA’s data management activities. The principles and guidelines in this report provide not only guidance for determining which data sets to retain and provide access to, but also a foundation on which to build a comprehensive data management plan. The next section presents one possible framework for data management at NOAA—based on a “system-of-systems” concept—along with some specific steps that describe how this framework could be implemented to take advantage of and extend existing data archiving, stewardship, and access capabilities. However, it should be noted that the principles and guidelines in previous chapters would remain applicable under any number of alternative data management frameworks. Also included at the end of this chapter are some concluding thoughts regarding the need for, and potential benefits of, a comprehensive, integrated, inclusive, and ongoing data management planning process across NOAA. VISION AND FRAMEWORK NOAA has already established a vision, illustrated in Figure 7-1, for an integrated, enterprise-wide data management system. The fundamen- tal attribute of this system is the ability to link disparate data sources with the interdisciplinary applications in support of NOAA’s mission. How- ever, this is an extremely difficult and complex task, made all the more challenging by the volume, complexity, and diversity of NOAA’s many data sets, the myriad needs of its diverse users, the constraints of limited resources, and the underlying need to preserve and support current data management and user support activities. It should also be noted that the connection between data and mission objectives ultimately depends on scientific analysis and interpretation of data, not just hardware and soft- ware. Thus, the challenge is to define an effective framework that fills in the details of the broad and amorphous “integrated data management system” at the center of Figure 7-1. In addition to fulfilling the basic vision of connecting environmental data to NOAA’s mission goals, the integrated data management system should be designed to fully address the main functional elements (stew-

OCR for page 85
 INTREGATED DATA MANAGEMENT AT NOAA FIGURE 7-1 NOAA’s Observation System: Target Architecture (SOURCE: Adang, 2006.) ardship, archiving, and access) described in Chapters 4, 5, and 6, as well as the overarching data management principles discussed in Chapter 3. In addition, the system will also need to be designed with sufficient flex- ibility, scalability, and adaptability to ensure that user needs and legisla- tive and administrative mandates all continue to be met, despite growing data volumes and rapidly evolving user requirements. The elements of a continuously evolving and expanding data system include: • stewardship of past, present, and future data, the value of which is likely to change over time; • scalability to accommodate the demands of both existing and future user communities; • flexibility to incorporate new data sources as they become available; • adaptability to take advantage of transformational changes in data system technology;

OCR for page 85
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA • extensibility to incorporate smooth and nondisruptive upgrades for hardware, software, and data; and • an optimal and cost-effective combination of technologies that maximally leverages off-the-shelf components with specialized compo- nents developed in-house to meet specific NOAA needs. A number of practical considerations also need to be taken into account when designing and implementing a comprehensive, integrated, enterprise-wide end-to-end environmental data management system. For example, such a system requires persistent (or sustained) facilities, hard- ware, software, support staff, and, as discussed in Chapter 3, a dedicated, ongoing funding process that supports these infrastructural components. A more structured and inclusive process for user input is necessary, and acceptance and “buy in” by data providers as well as users will be neces- sary to make sure that the data management system meets both current and future user needs. Finally, NOAA needs a strategy for prioritizing its data management activities to focus resources on the most important management issues and thus maximize cost-effectiveness. Collectively, these practical considerations demand an organizational structure for data management that is reliable, flexible, adaptable, scalable, cost-effec- tive, and responsive to user needs; the nature of this organizational struc- ture is the main focus of the next section. A System-of-Systems for Data Management The framework for NOAA’s data management system should ulti- mately be driven by NOAA’s mission objectives, with the system design and components based on existing infrastructure and plans, available resources, and, most importantly, the data archiving, data stewardship, and data access services required by its evolving set of user communities. An important lesson learned from the development of data management systems at other agencies (NASA’s Earth Observing System Data and Information System [EOSDIS], among others) is that a combination of top-down and bottom-up approaches is necessary to iterate to an effec- tive data management solution, and NOAA would also be well advised to build upon the legacy of many years of successful support to its pri- mary user communities. A federated system, unified by a common vision from the top and building upon small successful discipline- and instru- ment-focused systems, is a cost-effective design that embraces exactly this approach. In light of these considerations, it is advisable for NOAA to continue moving toward a federated but integrated “system-of-systems” to man- age its environmental data. Such an approach would leverage the many

OCR for page 85
 INTREGATED DATA MANAGEMENT AT NOAA successful data management activities that NOAA is currently engaged in, facilitate the inclusion of users and other stakeholders in the plan- ning process, and provide the top-level support and strategic guidance required to ensure adequate coordination and support while improving integration and cost-effectiveness. This approach would also mirror and support the evolution toward a system-of-systems concept in the obser- vational network (for example, the Global Earth Observing System of Systems, or GEOSS), and it is highly compatible with the Global Earth Observation Integrated Data Environment (GEO-IDE) concept discussed in Chapter 2. A schematic diagram of an integrated data management system that illustrates and defines some of the components of the integrated sys- tem-of-systems concept is depicted in Figure 7-2. This framework for NOAA’s data management enterprise, which combines the legacy dis- cipline-specific National Data Centers, centers of data, and data portals with a central data archive and interconnected multi-portal data access, provides the fundamental mechanisms required to connect diverse com- munities of users with NOAA’s broad spectrum of environmental data. The central data archive and other data archives would focus on data storage, data integrity, standards and formats, emerging technologies, and the associated research and development, while NOAA’s three National Data Centers and centers of data would focus on data acquisition, data stewardship, quality control, and data access. It should be noted that the data centers would still maintain their existing data portals; however, as indicated by the interconnected multi-portal access block at the center of Figure 7-2, these portals should be better connected—with improved linkages to related data sets at other centers—to facilitate better utilization and integration of existing data. Seamlessly interconnected, multi-portal access is the most essential component of the framework illustrated in Figure 7-2 because it “connects the stovepipes” present in current data management activities and thus allows improved data discovery, integra- tion, and interoperability. There are several motivations for recommending that NOAA build upon existing decentralized, discipline-specific systems rather than replace them. NOAA has a proven track record of providing fundamen- tal data records and supporting the data archive and access needs of its existing user communities. It is also apparent from past experiences with archive system development at both NOAA and NASA that predomi- nantly top-down, centralized, technology-driven system designs have deficiencies and are limiting, largely because they are inflexible and fail to recognize and accommodate the extremely wide variety of data character- istics and user needs. These considerations strongly suggest that NOAA’s data management vision should be grounded on a bottom-up approach

OCR for page 85
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA FIGURE 7-2 Elements of an integrated data management system that connects users to environmental data via an interconnected multi-portal access framework. The “interconnected multi-portal access” includes search engine and automated data mining capabilities, as well as access to multidisciplinary advisory support. The National Data Centers and centers of data are regarded as “disciplinary” data centers that are responsible for both user support and data stewardship. The central data archive and other data archives are responsible for central data storage and preservation, as well as the research and development of emerging technologies, standards, and protocols. Archiving responsibilities for individual data sets may be assigned to any two of the centers in the lower two dashed boxes to guarantee survivability. The double-headed arrows indicate generalized con- nections between different elements of the system, illustrating the flow of data as well as data support services. Links to other data systems are not shown. that leverages existing archive, stewardship, and access capabilities and facilities, as well as direct and ongoing input from different user commu- nities. However, it is equally clear that enterprise-wide coordination in the planning, implementation, and continual refinements of the eventual system will be needed to ensure adequate interoperability and to make sure that the needs of users and stakeholders are met.

OCR for page 85
 INTREGATED DATA MANAGEMENT AT NOAA Implementation The system-of-systems framework for integrated data management at NOAA is not radically different from either the GEO-IDE data man- agement plan or the broader GEOSS data architecture, both of which are currently still in the development phase, although it does represent a major departure from the current implementation of data systems within NOAA. The existing legacy systems at NOAA, with decentralized, dis- cipline-specific support, have actually been working quite well despite limited resources, especially in supporting discipline-specific questions from a range of users. However, they could and should be expanded and linked more effectively to meet the challenges associated with more complicated, multidisciplinary questions, to increase the realized value of collected data sets, and to make the overall system more cost-effective. GEO-IDE, which was discussed in detail in Chapter 2, provides an excellent starting point from which to develop an integrated data manage- ment system-of-systems at NOAA. Indeed, the draft concept of operations and implementation plan (NOAA, 2006b, 2006c) describes GEO-IDE as an enterprise-wide system that stresses improved integration and accessibil- ity, as well as the use of standards, user input, and a federated approach to link diverse data streams (see Figure 2-3). In particular, the implementa- tion plan calls for NOAA to: • continue operating existing systems, but gradually evolve them to be more interoperable, reliable, and scalable; • create cross-organizational teams to find solutions to focused prob- lems specified by higher management and to implement agreed-upon adoption of solutions; • define and implement a process for adopting information man- agement standards, and ensure that legacy programs migrate to these standards and new programs use them from the start; • develop, publish, and maintain a NOAA Guide to Integrated Infor- mation Management; and • invest in education to develop necessary new leadership, manage- ment, and technical skills for the integrated data environment. In addition, the GEO-IDE implementation plan calls for NOAA’s Line Offices, across the whole organization, to participate in well-ordered, standards-based data infrastructure to improve efficiency and promote easy data discovery and access. These and other concepts in the GEO-IDE planning documents are consistent with the principles and guidelines offered in this report, and they provide an excellent foundation for plan- ning additional improvements.

OCR for page 85
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA However, as discussed in Chapter 2, GEO-IDE remains very much in the planning phase, with only limited resources dedicated to its further development and almost none for its eventual implementation. In addi- tion, the GEO-IDE planning documents will need to be expanded to more completely address several key issues, such as data stewardship responsi- bilities, interagency cooperation, and user input, in order to conform to all the principles and guidelines offered in this report. Specific examples of missing elements include: an inventory of existing data and a structured process to review and determine the status of each environmental data set at key points during its life cycle; a clear mechanism or mechanisms for soliciting and incorporating user input into data archiving and data access decisions; a prioritization process for applying limited resources to various data management activities; and mechanisms to improve data discoverability and accessibility. If these and other changes are made in order to bring the GEO-IDE concept into better alignment with the prin- ciples and guidelines in this report, and sufficient resources can be made available for its implementation, the resulting data management system will yield significant benefits. Some examples of how an integrated, coor- dinated, and user-driven data management system at NOAA would make these benefits possible are given in Boxes 7-1 and 7-2. NOAA is also encouraged to continue exploring new technologies and alternative management structures in an ongoing effort to improve capacity, reliability, and cost-effectiveness. Data integration and exploi- tation through advances in information technology (such as comput- ing power, networking capabilities, and data storage volumes) have the potential to accelerate scientific progress over the next several decades in much the same way that modeling and simulation studies have over the previous two decades. NOAA will need not only to plan for the near- and intermediate-term requirements of these technological advances, which will define their initial system but also to anticipate as yet undefined future possibilities and corresponding user needs. NOAA also needs to establish a significant priority, within the current development and future planning, for improving data discovery and integration capabilities. A good starting point would be creating a complete, searchable inventory of existing data holdings, together with the development of more intuitive portals that provide more direct discovery and access of all environment data by different user groups, including interdisciplinary scientists, K-12 professionals, and the general public. Finally, it should be reiterated that the specific framework for data management depicted in this section is intended to assist NOAA in the planning, funding, and implementation of a comprehensive data man- agement system that is compatible with the principles and guidelines offered in previous chapters. The proposed system-of-systems architec-

OCR for page 85
 INTREGATED DATA MANAGEMENT AT NOAA BOX 7-1 Addressing Interdisciplinary Problems To illustrate how an integrated data management system needs to work in order to address inherently interdisciplinary problems, consider a researcher who wishes to study the decline in a particular fish species in the eastern Pacific Ocean. Even if the researcher were aware of all the data they might need to study this problem, it would currently be quite difficult for them to easily find and obtain the most appropriate fish catch data, population model results, in situ sea surface temperature and salinity measurements, satellite-derived ocean color data, winds and currents from reanalyses, and other potentially important variables such as cloud cover and aerosol concentrations. A truly integrated data management sys- tem would facilitate the researchers’ ability to identify and obtain all potentially relevant data sets, including those that they might not have thought of as initially being relevant. Expanded metadata standards and mappings among discipline-specific standards are critical to this integration and would be a good starting point for any planned coordination or integration activities within NOAA. Logical presenta- tion of the information during discovery is also important. In this particular case, the researcher might need assistance in selecting the appropriate satellite data; for example, would orbital swath, gridded multiday averages, or seasonal mean ocean color, be most appropriate? This example illustrates both how advances in our understanding of the Earth System often require the merging and analysis of disparate data sets and how an integrated data management vision can be used to answer complex interdisciplinary questions. BOX 7-2 Addressing Novel Applied Problems Derelict fishing gear and other marine debris pose increasing hazards to com- mercial and recreational navigation, entanglement of endangered and protected species, and wasteful “ghost fishing.” To mitigate the potential damage, NOAA has funded several efforts that use near real-time data from ocean transport models, satellites, piloted and autonomous aircraft, and drifting buoys to guide ships to interdict the debris before it reaches land. The capacity to easily integrate the data from the diverse suite of platforms has proven critical in the efforts of the program managers to allocate resources in an efficient and effective manner. These efforts illustrate the need for coordination and flexibility in designing a data management system that can be applied toward novel applied problems.

OCR for page 85
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA ture provides avenues for the complementary top-down and bottom-up development approaches necessary to ensure that the system is integrated and coordinated across the agency, is cost-effective, and is sufficiently flexible and adaptable to respond effectively to increasing data volumes and evolving user requirements. Should NOAA decide that an alternative vision or approach is necessary, however, the principles and guidelines in this report are general enough that they may be applied to alternative data management frameworks. Of course, for any significant progress to be realized, NOAA needs to receive secure funding to establish and maintain these important data management activities. CONCLUDING THOUGHTS NOAA has the legacy systems, basic organizational structure, sev- eral successful prototype user support systems, and infrastructure plans (namely, GEO-IDE and the Comprehensive Large-Array [data] Steward- ship System, or CLASS) to meet many of its diverse user needs. The discipline-specific NOAA National Data Centers and centers of data already provide excellent user support and continue to develop new methods and practices in response to user needs. However, the expand- ing demands and diversity of users, coupled with an explosive increase in data volumes, represent a formidable data management challenge. Although technological improvement will help, the resources currently allocated for data management across the agency may not be sufficient to ensure that NOAA continues to meet its basic data archiving and access requirements, and further resources may be needed to improve the dis- coverability, accessibility, and integration of different data sets needed to address important interdisciplinary problems. In light of these considerations, some modifications to NOAA’s cur- rent plans and resource allocations will be needed to address all of the key issues involved in providing the focused user support demanded by NOAA’s mission. The principles and guidelines in this report are intended to provide a foundation for the design, implementation, and operation of an integrated, forward-looking data management system that meets or exceeds current user requirements and responds to new requirements as they emerge. The three main functional requirements of such a system—namely, data stewardship, data archiving, and data access—were explored in Chapters 4, 5, and 6, respectively. Chapter 3 pro- vided five overarching principles that address the motivation, financial resources, interagency and international coordination, metadata require- ments, and responsiveness to user needs that are necessary for truly effec- tive data management. Finally, in this chapter we described an integrated, adaptable, user-driven “system-of-systems” that is compatible with these

OCR for page 85
 INTREGATED DATA MANAGEMENT AT NOAA principles and guidelines; that leverages NOAA’s existing data archiving and access infrastructure and expertise; that emphasizes reliability, cost- effectiveness, and broad stakeholder involvement in data management decisions; and that maximizes the societal benefit of existing and planned observational capabilities. Understanding and predicting environmental change will remain critically important activities throughout the 21st century. These activities depend on historical data and reliable projections of the future evolution of the Earth System, both of which rely on accurate descriptions of cur- rent environmental conditions. Hence, it is critical to archive and provide access to the environmental data needed to describe, understand, and predict changes in the Earth System and the impacts of these changes on both natural and human systems. NOAA and its partners are clearly dedi- cated to continue providing excellent service for their various user com- munities and to continue meeting their legal mandates and requirements. However, these goals can be achieved only if NOAA’s data management system enables and promotes the discovery, access, and integration of a wide variety of environmental data by a broad range of users. With such abilities, users are empowered to reap the many societal benefits of environmental data and to advance our understanding of the Earth’s environment. NOAA and its partners will, we hope, find the guidance offered in this report useful as they continue to serve as effective stewards of important national assets.