Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
7 Integrated Data Management at NOAA PRINCIPLE #9: Effective data management requires a formal, ongo- ing planning process. NOAA should establish and codify an enterprise-wide data man- agement plan that explicitly incorporates all the principles in this report and is in alignment with NOAAâs mission, vision, and goal statements. The plan should be flexible and needs to include a formal, ongoing planning processâdeveloped and integrated across NOAA with sub- stantial user involvement and coordination with other agenciesâto ensure that the system keeps pace with changes in observing systems, models, data storage and access technologies, scientific understanding, and user needs, as well as available resources. NOAA has responsibility for a vast and growing collection of environ- mental data sets, some of which are critical for supporting environmental decisions, while others may be of limited value. Perhaps the biggest defi- ciency in NOAAâs current data management enterprise is the lack of a formal decision-making process for assigning specific archiving, steward- ship, and access responsibilities for individual data sets. An additional, related concern is the absence of a clear, enterprise-wide framework to connect its many disparate data management activities. NOAAâs current Strategic Plan mentions data management in its crosscutting priorities section, but it does not explicitly acknowledge the data management functions needed to support NOAAâs mission or the challenges associated with meeting its ever-expanding data management responsibilities. 85
86 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Thus far NOAA has been able to rely on ad hoc groups and the dedication of its personnel to provide reliable archiving and stewardship of essential data and to develop data access capabilities for specific user communities. For example, data managers at individual NOAA National Data Centers and centers of data have assumed many of the decision- making responsibilities for individual data sets. However, the recent and expected future increases in data volumes and complexity necessitate a highly coordinated, more defined process for managing the nationâs envi- ronmental data, as well as a comprehensive, user-centric framework to guide the ongoing development of NOAAâs data management activities. The principles and guidelines in this report provide not only guidance for determining which data sets to retain and provide access to, but also a foundation on which to build a comprehensive data management plan. The next section presents one possible framework for data management at NOAAâbased on a âsystem-of-systemsâ conceptâalong with some specific steps that describe how this framework could be implemented to take advantage of and extend existing data archiving, stewardship, and access capabilities. However, it should be noted that the principles and guidelines in previous chapters would remain applicable under any number of alternative data management frameworks. Also included at the end of this chapter are some concluding thoughts regarding the need for, and potential benefits of, a comprehensive, integrated, inclusive, and ongoing data management planning process across NOAA. Vision and Framework NOAA has already established a vision, illustrated in Figure 7-1, for an integrated, enterprise-wide data management system. The fundamen- tal attribute of this system is the ability to link disparate data sources with the interdisciplinary applications in support of NOAAâs mission. How- ever, this is an extremely difficult and complex task, made all the more challenging by the volume, complexity, and diversity of NOAAâs many data sets, the myriad needs of its diverse users, the constraints of limited resources, and the underlying need to preserve and support current data management and user support activities. It should also be noted that the connection between data and mission objectives ultimately depends on scientific analysis and interpretation of data, not just hardware and soft- ware. Thus, the challenge is to define an effective framework that fills in the details of the broad and amorphous âintegrated data management systemâ at the center of Figure 7-1. In addition to fulfilling the basic vision of connecting environmental data to NOAAâs mission goals, the integrated data management system should be designed to fully address the main functional elements (stew-
INTREGATED DATA MANAGEMENT AT NOAA 87 Figure 7-1â NOAAâs Observation System: Target Architecture (SOURCE: Adang, 2006.) ardship, archiving, and access) described in Chapters 4, 5, and 6, as well as the overarching data management principles discussed in Chapter 3. In addition, the system will also need to be designed with sufficient flex- ibility, scalability, and adaptability to ensure that user needs and legisla- tive and administrative mandates all continue to be met, despite growing data volumes and rapidly evolving user requirements. The elements of a continuously evolving and expanding data system include: â¢ stewardship of past, present, and future data, the value of which is likely to change over time; â¢ scalability to accommodate the demands of both existing and future user communities; â¢ flexibility to incorporate new data sources as they become available; â¢ adaptability to take advantage of transformational changes in data system technology;
88 ENVIRONMENTAL DATA MANAGEMENT AT NOAA â¢ extensibility to incorporate smooth and nondisruptive upgrades for hardware, software, and data; and â¢ an optimal and cost-effective combination of technologies that maximally leverages off-the-shelf components with specialized compo- nents developed in-house to meet specific NOAA needs. A number of practical considerations also need to be taken into account when designing and implementing a comprehensive, integrated, enterprise-wide end-to-end environmental data management system. For example, such a system requires persistent (or sustained) facilities, hard- ware, software, support staff, and, as discussed in Chapter 3, a dedicated, ongoing funding process that supports these infrastructural components. A more structured and inclusive process for user input is necessary, and acceptance and âbuy inâ by data providers as well as users will be neces- sary to make sure that the data management system meets both current and future user needs. Finally, NOAA needs a strategy for prioritizing its data management activities to focus resources on the most important management issues and thus maximize cost-effectiveness. Collectively, these practical considerations demand an organizational structure for data management that is reliable, flexible, adaptable, scalable, cost-effec- tive, and responsive to user needs; the nature of this organizational struc- ture is the main focus of the next section. A System-of-Systems for Data Management The framework for NOAAâs data management system should ulti- mately be driven by NOAAâs mission objectives, with the system design and components based on existing infrastructure and plans, available resources, and, most importantly, the data archiving, data stewardship, and data access services required by its evolving set of user communities. An important lesson learned from the development of data management systems at other agencies (NASAâs Earth Observing System Data and Information System [EOSDIS], among others) is that a combination of top-down and bottom-up approaches is necessary to iterate to an effec- tive data management solution, and NOAA would also be well advised to build upon the legacy of many years of successful support to its pri- mary user communities. A federated system, unified by a common vision from the top and building upon small successful discipline- and instru- ment-focused systems, is a cost-effective design that embraces exactly this approach. In light of these considerations, it is advisable for NOAA to continue moving toward a federated but integrated âsystem-of-systemsâ to man- age its environmental data. Such an approach would leverage the many
INTREGATED DATA MANAGEMENT AT NOAA 89 successful data management activities that NOAA is currently engaged in, facilitate the inclusion of users and other stakeholders in the plan- ning process, and provide the top-level support and strategic guidance required to ensure adequate coordination and support while improving integration and cost-effectiveness. This approach would also mirror and support the evolution toward a system-of-systems concept in the obser- vational network (for example, the Global Earth Observing System of Systems, or GEOSS), and it is highly compatible with the Global Earth Observation Integrated Data Environment (GEO-IDE) concept discussed in Chapter 2. A schematic diagram of an integrated data management system that illustrates and defines some of the components of the integrated sys- tem-of-systems concept is depicted in Figure 7-2. This framework for NOAAâs data management enterprise, which combines the legacy dis- cipline-specific National Data Centers, centers of data, and data portals with a central data archive and interconnected multi-portal data access, provides the fundamental mechanisms required to connect diverse com- munities of users with NOAAâs broad spectrum of environmental data. The central data archive and other data archives would focus on data storage, data integrity, standards and formats, emerging technologies, and the associated research and development, while NOAAâs three National Data Centers and centers of data would focus on data acquisition, data stewardship, quality control, and data access. It should be noted that the data centers would still maintain their existing data portals; however, as indicated by the interconnected multi-portal access block at the center of Figure 7-2, these portals should be better connectedâwith improved linkages to related data sets at other centersâto facilitate better utilization and integration of existing data. Seamlessly interconnected, multi-portal access is the most essential component of the framework illustrated in Figure 7-2 because it âconnects the stovepipesâ present in current data management activities and thus allows improved data discovery, integra- tion, and interoperability. There are several motivations for recommending that NOAA build upon existing decentralized, discipline-specific systems rather than replace them. NOAA has a proven track record of providing fundamen- tal data records and supporting the data archive and access needs of its existing user communities. It is also apparent from past experiences with archive system development at both NOAA and NASA that predomi- nantly top-down, centralized, technology-driven system designs have deficiencies and are limiting, largely because they are inflexible and fail to recognize and accommodate the extremely wide variety of data character- istics and user needs. These considerations strongly suggest that NOAAâs data management vision should be grounded on a bottom-up approach
90 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Figure 7-2â Elements of an integrated data management system that connects users to environmental data via an interconnected multi-portal access framework. The âinterconnected multi-portal accessâ includes search engine and automated data mining capabilities, as well as access to multidisciplinary advisory support. The National Data Centers and centers of data are regarded as âdisciplinaryâ data centers that are responsible for both user support and data stewardship. The central data archive and other data archives are responsible for central data storage and preservation, as well as the research and development of emerging technologies, standards, and protocols. Archiving responsibilities for individual data sets may be assigned to any two of the centers in the lower two dashed boxes to guarantee survivability. The double-headed arrows indicate generalized con- nections between different elements of the system, illustrating the flow of data as well as data support services. Links to other data systems are not shown. that leverages existing archive, stewardship, and access capabilities and facilities, as well as direct and ongoing input from different user commu- nities. However, it is equally clear that enterprise-wide coordination in the planning, implementation, and continual refinements of the eventual system will be needed to ensure adequate interoperability and to make sure that the needs of users and stakeholders are met.
INTREGATED DATA MANAGEMENT AT NOAA 91 Implementation The system-of-systems framework for integrated data management at NOAA is not radically different from either the GEO-IDE data man- agement plan or the broader GEOSS data architecture, both of which are currently still in the development phase, although it does represent a major departure from the current implementation of data systems within NOAA. The existing legacy systems at NOAA, with decentralized, dis- cipline-specific support, have actually been working quite well despite limited resources, especially in supporting discipline-specific questions from a range of users. However, they could and should be expanded and linked more effectively to meet the challenges associated with more complicated, multidisciplinary questions, to increase the realized value of collected data sets, and to make the overall system more cost-effective. GEO-IDE, which was discussed in detail in Chapter 2, provides an excellent starting point from which to develop an integrated data manage- ment system-of-systems at NOAA. Indeed, the draft concept of operations and implementation plan (NOAA, 2006b, 2006c) describes GEO-IDE as an enterprise-wide system that stresses improved integration and accessibil- ity, as well as the use of standards, user input, and a federated approach to link diverse data streams (see Figure 2-3). In particular, the implementa- tion plan calls for NOAA to: â¢ continue operating existing systems, but gradually evolve them to be more interoperable, reliable, and scalable; â¢ create cross-organizational teams to find solutions to focused prob- lems specified by higher management and to implement agreed-upon adoption of solutions; â¢ define and implement a process for adopting information man- agement standards, and ensure that legacy programs migrate to these standards and new programs use them from the start; â¢ develop, publish, and maintain a NOAA Guide to Integrated Infor- mation Management; and â¢ invest in education to develop necessary new leadership, manage- ment, and technical skills for the integrated data environment. In addition, the GEO-IDE implementation plan calls for NOAAâs Line Offices, across the whole organization, to participate in well-ordered, standards-based data infrastructure to improve efficiency and promote easy data discovery and access. These and other concepts in the GEO-IDE planning documents are consistent with the principles and guidelines offered in this report, and they provide an excellent foundation for plan- ning additional improvements.
92 ENVIRONMENTAL DATA MANAGEMENT AT NOAA However, as discussed in Chapter 2, GEO-IDE remains very much in the planning phase, with only limited resources dedicated to its further development and almost none for its eventual implementation. In addi- tion, the GEO-IDE planning documents will need to be expanded to more completely address several key issues, such as data stewardship responsi- bilities, interagency cooperation, and user input, in order to conform to all the principles and guidelines offered in this report. Specific examples of missing elements include: an inventory of existing data and a structured process to review and determine the status of each environmental data set at key points during its life cycle; a clear mechanism or mechanisms for soliciting and incorporating user input into data archiving and data access decisions; a prioritization process for applying limited resources to various data management activities; and mechanisms to improve data discoverability and accessibility. If these and other changes are made in order to bring the GEO-IDE concept into better alignment with the prin- ciples and guidelines in this report, and sufficient resources can be made available for its implementation, the resulting data management system will yield significant benefits. Some examples of how an integrated, coor- dinated, and user-driven data management system at NOAA would make these benefits possible are given in Boxes 7-1 and 7-2. NOAA is also encouraged to continue exploring new technologies and alternative management structures in an ongoing effort to improve capacity, reliability, and cost-effectiveness. Data integration and exploi- tation through advances in information technology (such as comput- ing power, networking capabilities, and data storage volumes) have the potential to accelerate scientific progress over the next several decades in much the same way that modeling and simulation studies have over the previous two decades. NOAA will need not only to plan for the near- and intermediate-term requirements of these technological advances, which will define their initial system but also to anticipate as yet undefined future possibilities and corresponding user needs. NOAA also needs to establish a significant priority, within the current development and future planning, for improving data discovery and integration capabilities. A good starting point would be creating a complete, searchable inventory of existing data holdings, together with the development of more intuitive portals that provide more direct discovery and access of all environment data by different user groups, including interdisciplinary scientists, K-12 professionals, and the general public. Finally, it should be reiterated that the specific framework for data management depicted in this section is intended to assist NOAA in the planning, funding, and implementation of a comprehensive data man- agement system that is compatible with the principles and guidelines offered in previous chapters. The proposed system-of-systems architec-
INTREGATED DATA MANAGEMENT AT NOAA 93 BOX 7-1 Addressing Interdisciplinary Problems To illustrate how an integrated data management system needs to work in order to address inherently interdisciplinary problems, consider a researcher who wishes to study the decline in a particular fish species in the eastern Pacific Ocean. Even if the researcher were aware of all the data they might need to study this problem, it would currently be quite difficult for them to easily find and obtain the most appropriate fish catch data, population model results, in situ sea surface temperature and salinity measurements, satellite-derived ocean color data, winds and currents from reanalyses, and other potentially important variables such as cloud cover and aerosol concentrations. A truly integrated data management sys- tem would facilitate the researchersâ ability to identify and obtain all potentially relevant data sets, including those that they might not have thought of as initially being relevant. Expanded metadata standards and mappings among discipline-specific standards are critical to this integration and would be a good starting point for any planned coordination or integration activities within NOAA. Logical presenta- tion of the information during discovery is also important. In this particular case, the researcher might need assistance in selecting the appropriate satellite data; for example, would orbital swath, gridded multiday averages, or seasonal mean ocean color, be most appropriate? This example illustrates both how advances in our understanding of the Earth System often require the merging and analysis of disparate data sets and how an integrated data management vision can be used to answer complex interdisciplinary questions. Box 7-2 Addressing Novel Applied Problems Derelict fishing gear and other marine debris pose increasing hazards to com- mercial and recreational navigation, entanglement of endangered and protected species, and wasteful âghost fishing.â To mitigate the potential damage, NOAA has funded several efforts that use near real-time data from ocean transport models, satellites, piloted and autonomous aircraft, and drifting buoys to guide ships to interdict the debris before it reaches land. The capacity to easily integrate the data from the diverse suite of platforms has proven critical in the efforts of the program managers to allocate resources in an efficient and effective manner. These efforts illustrate the need for coordination and flexibility in designing a data management system that can be applied toward novel applied problems.
94 ENVIRONMENTAL DATA MANAGEMENT AT NOAA ture provides avenues for the complementary top-down and bottom-up development approaches necessary to ensure that the system is integrated and coordinated across the agency, is cost-effective, and is sufficiently flexible and adaptable to respond effectively to increasing data volumes and evolving user requirements. Should NOAA decide that an alternative vision or approach is necessary, however, the principles and guidelines in this report are general enough that they may be applied to alternative data management frameworks. Of course, for any significant progress to be realized, NOAA needs to receive secure funding to establish and maintain these important data management activities. Concluding Thoughts NOAA has the legacy systems, basic organizational structure, sev- eral successful prototype user support systems, and infrastructure plans (namely, GEO-IDE and the Comprehensive Large-Array [data] Steward- ship System, or CLASS) to meet many of its diverse user needs. The discipline-specific NOAA National Data Centers and centers of data already provide excellent user support and continue to develop new methods and practices in response to user needs. However, the expand- ing demands and diversity of users, coupled with an explosive increase in data volumes, represent a formidable data management challenge. Although technological improvement will help, the resources currently allocated for data management across the agency may not be sufficient to ensure that NOAA continues to meet its basic data archiving and access requirements, and further resources may be needed to improve the dis- coverability, accessibility, and integration of different data sets needed to address important interdisciplinary problems. In light of these considerations, some modifications to NOAAâs cur- rent plans and resource allocations will be needed to address all of the key issues involved in providing the focused user support demanded by NOAAâs mission. The principles and guidelines in this report are intended to provide a foundation for the design, implementation, and operation of an integrated, forward-looking data management system that meets or exceeds current user requirements and responds to new requirements as they emerge. The three main functional requirements of such a systemânamely, data stewardship, data archiving, and data accessâwere explored in Chapters 4, 5, and 6, respectively. Chapter 3 pro- vided five overarching principles that address the motivation, financial resources, interagency and international coordination, metadata require- ments, and responsiveness to user needs that are necessary for truly effec- tive data management. Finally, in this chapter we described an integrated, adaptable, user-driven âsystem-of-systemsâ that is compatible with these
INTREGATED DATA MANAGEMENT AT NOAA 95 principles and guidelines; that leverages NOAAâs existing data archiving and access infrastructure and expertise; that emphasizes reliability, cost- effectiveness, and broad stakeholder involvement in data management decisions; and that maximizes the societal benefit of existing and planned observational capabilities. Understanding and predicting environmental change will remain critically important activities throughout the 21st century. These activities depend on historical data and reliable projections of the future evolution of the Earth System, both of which rely on accurate descriptions of cur- rent environmental conditions. Hence, it is critical to archive and provide access to the environmental data needed to describe, understand, and predict changes in the Earth System and the impacts of these changes on both natural and human systems. NOAA and its partners are clearly dedi- cated to continue providing excellent service for their various user com- munities and to continue meeting their legal mandates and requirements. However, these goals can be achieved only if NOAAâs data management system enables and promotes the discovery, access, and integration of a wide variety of environmental data by a broad range of users. With such abilities, users are empowered to reap the many societal benefits of environmental data and to advance our understanding of the Earthâs environment. NOAA and its partners will, we hope, find the guidance offered in this report useful as they continue to serve as effective stewards of important national assets.