6
Data Discovery, Access, and Integration

PRINCIPLE #8: An effective data archive should provide for discovery, access, and integration.


NOAA’s environmental data should be made easily discoverable and readily accessible to a broad range of users, and its data management system should be designed to allow the integrated exploitation of data from multiple sources within and outside the agency to answer environmental questions and support NOAA’s mission.

The utility of any data archive is ultimately defined by the ability of users to make use of the data it contains. Improving access to the environmental data in NOAA’s archives is, justifiably, the main focus of many current data management activities. Simply making data available upon request, however, is only the first step toward a data management system that promotes extensive utilization of different kinds of environmental data to address a broad range of societal issues. An equally important but often neglected element of effective data management is the ability of the access system to support and promote the discovery of data that might be brought to bear on specific questions or problems, ideally without the user having prior knowledge of the contents of the archive. A third essential functional element of an effective environmental data management system is its ability to facilitate the integration of multiple data sets from different sources to address increasingly inter- and multidisciplinary environmental topics. These three crosscutting functions are the main



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 69
6 Data Discovery, Access, and Integration PRINCIPLE #8: An effective data archive should provide for discov- ery, access, and integration. NOAA’s environmental data should be made easily discoverable and readily accessible to a broad range of users, and its data manage- ment system should be designed to allow the integrated exploitation of data from multiple sources within and outside the agency to answer environmental questions and support NOAA’s mission. The utility of any data archive is ultimately defined by the ability of users to make use of the data it contains. Improving access to the envi- ronmental data in NOAA’s archives is, justifiably, the main focus of many current data management activities. Simply making data available upon request, however, is only the first step toward a data management system that promotes extensive utilization of different kinds of environmental data to address a broad range of societal issues. An equally important but often neglected element of effective data management is the ability of the access system to support and promote the discovery of data that might be brought to bear on specific questions or problems, ideally without the user having prior knowledge of the contents of the archive. A third essential functional element of an effective environmental data manage- ment system is its ability to facilitate the integration of multiple data sets from different sources to address increasingly inter- and multidisciplinary environmental topics. These three crosscutting functions are the main 

OCR for page 69
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA focus of this chapter, but as a base for those discussions, we first explore the characteristics of NOAA’s data users. USER CONSIDERATIONS One of the overarching principles introduced in Chapter 3 is that “data management activities should be user-driven.” The previous two chapters have noted the importance of user input for data stewardship and data archiving decisions. User input is also critical for driving the design, implementation, and maintenance of data access capabilities, par- ticularly the crosscutting capabilities of data discovery and data integra- tion. This section explores some general characteristics of NOAA’s user community that are important to consider when improving the data man- agement system to promote extended or enhanced capabilities for data discovery, data access, and data integration. Guideline: NOAA’s data access systems need to account for diverse user needs and capabilities. Effective data access should be tailored to the needs and capabilities of different kinds of users. For example, the technical capabilities and sci- entific sophistication of NOAA’s user base ranges from elementary school children looking for temperature data near their house for a school proj- ect, to inexperienced users seeking highly specific weather and climate information for legal or business purposes, to experienced modelers look- ing for a large volume of well-calibrated data to test a physical param- eterization, to multidisciplinary scientists seeking to project the combined effect of multiple environmental stresses on a particular ecosystem. Users may also prefer data in a particular format: ASCII data to import into a spreadsheet, JPEG images, or GIS-compatible formats, to name but a few. Access systems should strive to serve as many of these levels of scien- tific skill, technical sophistication, and packaging preferences as possible. One approach that has been successfully employed, which NOAA may wish to consider adopting, is to create several different access pathways for different user classifications, starting from the same access portal.1 A good example is the Archiving, Validation and Interpretation of Satellite Oceanographic data (AVISO) Web site.2 1A web portal is a term, often used interchangeably with gateway, for a World Wide Web site whose purpose is to be a major starting point for users when they connect to the Web (Alexandrou, 2007). 2 http://www.aviso.oceanobs.com/.

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION Guideline: The needs of the user community should be evaluated on a continuing basis to improve the effectiveness of data access tools. As noted in Chapters 3 and 4, user surveys and other feedback mech- anisms are among the most important tools available for evaluating and improving the data in an archive, as well as for ensuring that important environmental data are preserved. User feedback can also improve many aspects of data access because it provides important insights about the effectiveness of the various discovery, search, and access tools and proto- cols, as well as how these processes could be improved. Since the needs of the user community are constantly evolving, this feedback needs to be collected on an ongoing basis and incorporated into the ongoing, enter- prise-wide data management planning process. Feedback mechanisms should also be imbedded within the system implementation to provide an avenue for immediate suggestions and comments on the effectiveness of the data management system. This feedback is invaluable because it allows NOAA to better understand the capabilities, needs, and charac- teristics of its customers and to design their data access tools and user interfaces to match these requirements. As with many aspects of data archiving and access, obtaining user feedback is fraught with many practical challenges. For example, users do not always know what data or data access tools they would find most useful. Detailed surveys, crafted by sociologists rather than data scien- tists, would be extremely useful in characterizing NOAA’s current and potential users. However, the Paperwork Reduction Act of 1995 makes it difficult for federal agencies to conduct formal user polls or surveys. 3 NOAA will thus need to be creative to ensure that user needs are being met and to develop strategies for improving data access capabilities. For example, even lower-level feedback such as Web-based tools that anony- mously capture the activities and search patterns of individual customers would allow NOAA to obtain a better understanding of its customers and their needs. Likewise, usage metrics can be used to capture the net activity of the data user population; NOAA should consider collecting a set of standardized metrics enterprise-wide to help expose data discovery, access, and integration gaps. To be most effective, the feedback collected from a broad range of users should be accompanied by more formal, focused feedback from external advisory groups or user panels. These focused groups, which should include scientific experts as well as representatives from other important user groups, can be tasked to take into account the broad user metrics and feedback and provide specific, balanced advice to data 34 U.S.C. 3501 et seq.

OCR for page 69
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA managers. Focused stakeholder panels can also provide advice on data archiving requirements, as discussed in Chapter 5. Additional methods for incorporating user feedback to improve data discovery, access, and integration are described in the sections that follow. DISCOVERY Guideline: Environmental data should be easily discoverable by a broad range of users. In particular, data discovery should not require any specific knowledge about the data or how they are managed. Discoverability is an essential characteristic of a data management system that promotes the full exploitation of archived data. NOAA’s cur- rent data access systems tend to be structured mainly to meet the needs of the scientific community. For example, since scientific users tend to be technologically sophisticated, with extensive knowledge of both the different data types that are available and where different data sets can be found, the structure of NOAA’s access systems tends to emphasize efficient access to contiguous or closely related data volumes. The onus is placed on users to find the appropriate portals for access to particular data sets, which can create problems even for experienced, technically proficient users (see Box 6-1). The majority of NOAA’s current data access systems are not opti- mized for nonexpert users such as students or lawyers who may need data as simple as the temperature at a specific location or time, or who may need more extensive but still user-friendly data sets, or who may not even know what data they need in order to address a particular ques- tion or problem. These nonexpert users almost certainly do not know that NOAA archives Geostationary Operational Environmental Satellite (GOES) and NEXt-generation RADar (NEXRAD) data but not CloudSat data, or what data products are produced from the Advanced Very High Resolution Radiometer (AVHRR) and WindSat sensors. However, poten- tial users seeking environmental data should not need to know anything about mission boundaries, organizational structures, or observational campaigns in order to locate the data they require for their application. Since current data access systems at NOAA meet the needs of many present users, they should be continued. However, efforts to link, con- solidate, or expand access to potentially related data sets would improve discoverability and thus yield additional societal benefits. A good start- ing point would be to create a complete, publicly available inventory of NOAA’s data holdings (or, better still, all federal data holdings). The

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION BOX 6-1 Data Discovery Can Pose a Challenge Even for Experts To illustrate the need for improved data discovery in NOAA’s current data management scheme, one of the members of this committee attempted to obtain pan evaporation data for the nearest station to Norman, Oklahoma, using NOAA Web pages. Pan evaporation data would be of interest to a variety of different us- ers, but this measurement is taken at just a small fraction of NOAA’s cooperative observer sites, so this search, while limited in sample size, represents a realistic test of discovery and search capabilities. The committee member, an experienced meteorologist with extensive knowledge of the meteorological observational net- work and high computer literacy, spent 30 minutes searching for the relevant data through various NOAA Web portals. The committee member found that the steps one needed to take to find this particular data set were not obvious, and he en- countered several apparent dead-end links on the National Climatic Data Center’s (NCDC’s) Web site. One promising pathway, “Find a Station,” linked to “NNDC [NOAA National Data Center] Climate Data Online,” which provided an option to choose a city. A number of variables, including pan evaporation, were listed when Norman was selected. However, after choosing pan evaporation and the period 1970–2006, the query came back indicating that no data were available for that location. Moreover, there did not appear to be an option at any time during the pro- cess to list sites with specific types of data for specific periods so that a substitute site could be chosen. Attempts to access such data through other portals were also unsuccessful. It should be noted, however, that when climatologists who frequently use the NCDC Web site were consulted for advice, they were able to describe a successful procedure for obtaining the data. While the committee member may have eventually discovered this procedure, the inability to successfully locate the intended data in a reasonable amount of time suggests that significant improve- ments in data discoverability are possible. current maze of data access entry points should also be better organized and connected as part of an enterprise-wide effort to enhance discover- ability for a broad range of users who only want to keep track of a single portal to obtain environmental data. The success of preliminary efforts to create problem-specific Web portals, such as the National Integrated Drought Information System (NIDIS),4 suggests that further investments in application-specific Web portals would increase the discoverability and utilization of other data that are already in NOAA’s archives. Investments in data discovery are among the highest-yielding investments that could be made to existing data access infrastructure at NOAA. 4 http://wwa.colorado.edu/themes/current_research/water_and_climate_products.html.

OCR for page 69
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA Guideline: Search tools and other discovery-enhancing features could be improved at many environmental data access points. To pro- mote data discovery, the structure of the data archive should allow searches to be intuitive, geographically tailored, thematic, and inte- grated across data set types. The ubiquity of Web-based search engines has made data discovery a less daunting problem than it once was, but simple keyword searches (for instance, Google) still have limitations. Text-based searches limit searching by space and time, and typically they require the person performing the search to have some knowledge of the relevant discipline-specific termi- nology. NOAA should explore methods to make their data more readily searchable. These methods might include the creation and implementa- tion of more detailed metadata and the development of formal, machine- understandable, contextual knowledge-encoding mechanisms (or ontolo- gies). Because data are typically integrated (and sub-setted) across time and space, it is especially important to have detailed metadata describing the spatial and temporal attributes of the data. Formal thematic vocabu- laries, taxonomies, and ontologies can facilitate cross-disciplinary data exploration and automated reasoning. This formal semantic approach, coupled with consistently structured data, can aid sophisticated data min- ing or advanced search techniques. While making good use of external search mechanisms, NOAA should also strive to improve the visibility and effectiveness of its own data access mechanisms, especially on the World Wide Web. To facilitate broad interdisciplinary discovery, these search and access mechanisms should be interoperable and linked across NOAA, where possible. Formal user-driven interface and software design techniques should be used to improve their effectiveness and intuitiveness. For example, user cases or scenarios (remember the AVISO example noted in the previous section) could be developed to make sure that the tools are designed to meet spe- cific user needs, while formal usability testing can help ensure that they are intuitive and functional. In addition, users may want other informa- tion related to a particular data set such as data quality attributes, activity logs, and user feedback. This sort of information can be very valuable to a user assessing the appropriateness of a data set for a specific application, as discussed in additional detail in the next guideline. Guideline: Data discovery would be improved by the use of expanded metadata. The importance of metadata, or the additional information about a data set that allows the data to be independently understood and used,

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION was discussed in several previous chapters. The close relationship between metadata and data stewardship and the importance of metadata standards were discussed in Chapter 4, and the role played by metadata in more effective data archiving was noted in Chapter 5. Well-organized, stan- dardized, and managed metadata benefits both users (through improved discovery, access, and integration) and data managers (by encouraging efficient stewardship and archiving). As discussed earlier in this chapter, metadata can also enable machine-to-machine and application interoper- ability. An additional, especially important application of metadata is to promote the discovery, access, and integration of different environmental data streams for a variety of users. The spectrum of useful metadata is large and diverse. Metadata are discipline dependent and invariably expand as a data set matures, so they should always be managed in flexible systems to accommodate their expansion and improvement. Some discovery metadata are most effective if they are internally part of the data file structure (for example, netCDF formatted data), while other metadata typically reside in separate files or in databases associated with the data archive. The best systems tie all data and metadata together so that users can assess their options. NOAA is progressing toward, and in many cases meeting the require- ments for, providing standard metadata to the current recommended national and international levels. Although this is a successful trend and an accomplishment, it should not be viewed as a final end point. Many user benefits could be realized by extending metadata management and provision beyond the minimum requirements. What follows is a non- exhaustive list of metadata that is meant to be illustrative. It begins with some very basic metadata and ends with examples of ancillary metadata that could be useful for placing data in the correct context, fostering effi- cient discovery, and providing improved user experiences. • A distinctive dataset title • A general description or summary of the data set • Definition of parameters, variables, and physical units • Observation or grid location, period of record, spatial and temporal resolutions, and update frequency • Description of methods and procedures on how the data were derived, including calibrations, error estimates, processing, etc. • Relevant information about the data set size, format, where it is located (this could be at multiple sites) • The types of data services available (GIS, OPeNDAP, FTP, etc.) • Reference authority and background information: who created the data, and for what purpose; associated publications and technical reports

OCR for page 69
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA • Directives, where possible, about appropriate and inappropriate usage • Historical lineage about the data set version and what caused the evolution, including algorithm changes and reprocessing rationale • Connections to related data sets either at higher or lower process- ing levels, within the same discipline, and across disciplines using rela- tionships (semantic metadata) that can logically map between different terminologies and enable broad automated discovery • Pointers to useful tools for analyzing the data, such as scripts for reading the data or applications to manipulate and display the data • Online environments to capture and share user logs The importance of user log information was also noted in Chapter 4, but some ideas are worth reiterating here because this information could be considered as another form of extended metadata. To the maximum extent possible, NOAA is encouraged to create and maintain open access to user feedback logs and make them part of the metadata associated with each data set. In addition to being valuable to the experts/stewards responsible for data set maintenance, assessment, and improvement, feed- back logs would provide valuable information to users about the quality, utility, and appropriate applications for the data. User feedback can also provide information about the current and potential future uses of each data set and the relationship between the data and data sets at other archives. NOAA personnel could also monitor the logs for feedback and error analysis, and could take advantage of user comments and sugges- tions to improve the user-friendliness of data access protocols. Capturing user feedback in logs and incorporating these logs into standard metadata would thus yield benefits in data stewardship, discovery, access, and integration. A critical concern for NOAA is assuring that at the end of the data discovery exercise, a user has been made aware of all relevant and help- ful data offered by the agency. This is not an easy task. Nevertheless, the use of standardized metadata and deployment of systems that can col- lect, share, catalog, and publish metadata from the many diverse NOAA centers of data and National Data Centers should make it possible to more effectively support data discovery. A metric that quantifies users’ perceived success in discovering what they need would also be a power- ful indicator that, over time, could be used to further improve NOAA’s data access systems.

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION BARRIERS TO ACCESS Guideline: Environmental data should be made available to users in a timely manner and should be accessible with as few barriers as possible. Reducing the barriers on data access and use is essential in advancing scientific and societal goals. Despite the huge volume of data accessed at NOAA every month (see Figure 2-1, for example), millions of other potential users have never requested or gained access to data from a NOAA data archive. Many of these users probably do not even know about the broad range of environmental data that might be available to address their needs, while others may have encountered obstacles while trying to access NOAA data. Enhancing data discovery, providing easier data access, and promoting more extensive usage of archived data are all steps that would significantly increase the realized value of NOAA’s data holdings. The most challenging and difficult barriers are not always technical in nature; rather, they involve the socioeconomic, legal, institutional, and political dimensions of data access. These barriers individually and col- lectively limit full utilization of NOAA’s data assets and impede progress on many fronts, including education and research, creation of new knowl- edge, as well as commercial and societal applications. In this section, we focus on users who are aware of the data they could potentially use and examine the various barriers that might prevent them from easily access- ing the data that interests them. NOAA should address these barriers, which may be further classified into administrative, technical, and sys- tematic barriers, and remove or at least reduce them wherever possible. Administrative Barriers Authorization and Security: More sophisticated access control proce- dures should be implemented using “smart” authorization or authentica- tion methods. For example, in some cases cost-free access to NOAA data is available to academic users for education and research purposes. How- ever, simple approaches, such as restricting access to certain users based on their Internet protocol (IP) addresses, may inadvertently exclude quali- fied users. In general, any registration or login requirement will discour- age data usage. On the other hand, there are some data for which access restrictions are clearly needed, such as the location of rare in situ speci- mens or data with national security implications. Some access constraints may also be needed to protect data archives from unreasonable data

OCR for page 69
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA requests (either intentional or inadvertent) that would result in denial of service to other users. In general, however, our view is that access controls should be kept to a minimum. Proprietary: Unavoidably, some of the data sets NOAA will want to archive are proprietary in nature, particularly data derived from interna- tional and/or commercial sources, and NOAA will not be able to provide full, free, and open access to all of them. Therefore, there should be provi- sions in the data management system for incorporating such proprietary data, as well as guidelines on the use of such data. In general, these pro- prietary barriers to access should be kept to a minimum and should exist only where they are absolutely necessary. International Policy Issues: World Meteorological Organization (WMO) Resolution 40 (“WMO policy and practice for the exchange of meteorological and related data and products including guidelines on relationships in commercial meteorological activities”5) expressly states, as a fundamental principle, that WMO is committed to free and unre- stricted exchange of meteorological and related data and products for education and research purposes. However, the resolution also provides guidance restricting reexport of some data for commercial use. Such a policy, while not draconian, does place certain barriers on the broad use of NOAA data holdings. Full and Open Access: Since the advent of the Internet, full and open access to all types of digital information, including environmental data, has been the subject of considerable discussion among a variety of stake- holders and interests, including educators, researchers, librarians, pub- lishers, sponsoring agencies, commercial enterprises, and government officials. Many previous reports (NRC, 1995c; NRC, 1997; NRC, 1999; NRC, 2001; etc.) have discussed the benefits of full and open exchange of data, and there are already a number of disciplines with large data compilations that are freely and openly available online, such as the Sloan Digital Sky Survey, the National Virtual Observatory, and the National Institute of Health’s GenBank. NRC (1997) offers the following recom- mendation regarding the importance of full and open access: The alue of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data deried from publicly funded research. The public-good interests in the full and open access to and use of scientific data need to be balanced against the legitimate concerns for the protection of national security, indiidual priacy, and intel- lectual property. 5 http://www.nws.noaa.gov/im/wmor40.htm.

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION In the view of this committee, and in accordance with applicable U.S. law and policies,6 full and open access to data should be a fundamen- tal tenet of all US federal agencies, including NOAA. Consistent with other definitions (WMO Resolution 40 and NRC, 1997 are two examples among many), “full and open” should be taken to mean nondiscrimina- tory, without restrictions, and without additional charges beyond the cost of reproduction and delivery of the data and products themselves. The provision of full and unrestricted (or minimally restricted) access can greatly enhance the utility of NOAA data holdings: it advances not only scientific discovery but also goals in the realms of commerce and policy making, among other benefits to society. Full and open access policies also have the ability to catalyze the development of new methods of analysis and integration, to result in increased collaborations, to reduce inefficien- cies, and to accelerate the production of new knowledge. Technical Barriers Digital Divide and Bandwidth Limitations: In the simplest terms, digital divide refers to the gap between those with effective access to infor- mation technology and those without such access. This division encom- passes both physical access to technology hardware, such as computers and networks, and, more broadly, the skills and resources that allow for its use. The digital divide limits or excludes certain segments of society, especially underrepresented groups, from benefiting from NOAA’s data efforts. For the broadest and most effective use of NOAA data, steps are needed to reduce the gap and to devise solutions for overcoming such hurdles. As a specific example, not all users or potential users have access to high-speed Internet connections. To address this barrier to access, NOAA and its partners could consider approaches like progressive disclosure, in which a small amount of data is initially provided and the user can request progressively more data as their network resources permit. One simple way to achieve this would be with “sub-setting/decimation” rou- tines, which allow users to access and download only the data they need (such as a certain spatial area, time period, and/or resolution). In fact these routines should be considered for widespread use because they promote more efficient data access, especially for large data sets. Standardized Protocols and Conventions: The lack of standardized protocols and conventions can be a formidable barrier in the interoper- ability and integration of diverse data sets from different disciplines. For 6 For example, White House Office of Management and Budget (OMB) Circular A-130, as revised December 2000.

OCR for page 69
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA example, Internet usage grew explosively when the hyper-text transfer protocol (http) became the foundation of the Web in the 1990s. Similarly, the use of common protocols and conventions for data access and formats can lead to more effective and broader utilization of NOAA’s environmen- tal data. As an example, many teachers in the K-12 community routinely use spreadsheet applications such as Excel, but these same professionals may be less familiar with or reluctant to use the more specialized scientific application tools favored by educators and researchers in higher educa- tion. To facilitate all users, NOAA data systems should provide access via standard protocols recognized by significant user communities. Latency: NOAA needs to recognize that data latency is not just an operational consideration. For example, certain research communities need near real-time access to data in order to initiate the collection of cor- relative data during extreme events (severe weather, volcanic eruptions, earthquakes, etc.). In some cases data acquisition systems have variable collection frequencies that need to be adjusted during such events. In general, environmental data should be made available to users as soon as possible. If there are situations in which data need to be withheld (for homeland security reasons, for instance), there should be a clear rationale for the withholding period, as well as a clear mechanism for qualified users to apply for the ability to obtain the data if needed for a certain time-critical application. Systemic Barriers Competency of Users: Data systems that are not tailored to individual levels of user sophistication can pose a significant barrier to the utilization of data holdings in those systems. NOAA should provide multiple levels of entry to data portals at a variety of levels of user sophistication, both in terms of customer knowledge and skills (scientists, K-12 educators, the general public, decision makers, and applied or multidisciplinary users) and in technical capabilities. A multi-portal system allows for efficient access by experts, carefully explained and supervised access for begin- ners, and, possibly, several levels in between. For example, extensive use of discipline-specific terminology and acronym-laden data set names in data access systems should be avoided in portals designed for a wide audience. Disability Considerations: In 1998, the U.S. Congress amended the 1973 Rehabilitation Act7 to require federal agencies to make their electronic and information technology accessible to people with disabilities. Inacces- sible technology poses barriers for certain individuals to obtain and use 7 29 USC Sec. 793.

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION information quickly and easily. Section 508 was enacted to eliminate such barriers in information technology, to make available new opportunities for people with disabilities, and to encourage the development of tech- nologies that will help achieve these goals. As a federal agency, NOAA is required to comply with this regulation and to eliminate such barriers to the use of its data information technology systems. Incomplete Metadata: As discussed previously in this chapter and in earlier chapters, full and complete metadata are essential to maximizing data discoverability, accessibility, and usage. Thus, incomplete metadata can be a barrier to complete and effective data access. For example, data sets that lack adequate metadata can be hard to locate, difficult to use, and difficult to integrate with other data sets. This can lead to inefficiencies and, ultimately, unrealized societal benefits. NOAA should reduce such barriers by including the appropriate metadata required for the most effective use of its data holdings. INTEGRATION Guideline: NOAA’s data access system should be designed to allow the integrated exploitation of data from multiple sources within and outside of NOAA to answer environmental questions and support NOAA’s mission. Most environmental problems cannot be addressed using just a single piece or type of information. Thus, in addition to promoting the discov- ery and access of environmental data, NOAA’s data management system should allow for and facilitate the integration of data and other informa- tion. This integration may entail the sequential or simultaneous access of original data from a variety of different platforms, such as satellites, ground sensors, and buoys, or it may be facilitated by Web-based portals that assemble related informational products from different data centers at a single location or specified time. While simple in concept, data integration, like many other aspects of data access, is notoriously difficult to implement. For example, many of the questions currently being addressed by environmental stakeholders require data that are housed at other agencies and international groups. Cooperation with these entities is essential for ensuring that user needs are met (yet another manifestation of the overarching principle that “effec- tive interagency and international cooperation is needed”). The ability to integrate data sets—either to determine new relationships or to create a new, value-added data set—is often critically dependent on the extended metadata discussed previously. Users with multidisciplinary skills can

OCR for page 69
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA BOX 6-2 Educational Data Access and Integration This example raises several important issues about data access and inte- gration. TERC, an education research and development organization, has devel- oped an educational resource called the Earth Exploration Toolbook.a One of the chapters (activities) in the toolbook, “Investigating the Precipitation-Streamflow Relationship,”b shows teachers how to obtain precipitation data from NCDC and stream-flow data from the U.S. Geological Survey, import these data into an Excel spreadsheet, and then create graphs and other analysis tools to investigate how precipitation events impact stream flow. The chapter uses climate data from Boston and stream-flow data from the Sudbury River as an example, but it shows how data from any geographic region could be obtained. What makes this exercise so valuable to students is the ability to explore the data and thus gain insight into stream-flow dynamics, such as the variation in stream flow over the course of the year and the temporal correlation between rainfall and stream flow. According to TERC staff, educators at all levels find the chapter useful, but teachers and students always want to access data for their town or region (educa- tion, like politics, is inherently local). In terms of accessibility, the data should be in a format that can be readily used with the tools that the teachers and students have available. The data also need to have adequate quality control to guarantee a reasonably accurate result, but not so highly processed that the teachers and students do not have the ability to manipulate the data. Finally, this example il- lustrates the societal benefits that can be realized through data discovery, access, and integration; however, the fact that the NCDC and USGS Web sites must be accessed through two different portals also points to the improvements that could be made in interagency coordination for data management. ahttp://www.terc.edu/work/147.html. bhttp://serc.carleton.edu/eet/module_discharge/index.html. help data managers identify related data sets that could be included at a given data portal. Box 6-2 illustrates the value of integrated data access capabilities, as well as some of the difficulties involved in providing these capabilities. Guideline: Practical considerations require a distributed data sys- tem architecture and access infrastructure for environmental data. Ideally, all environmentally relevant data would be easily discover- able and readily accessible from a single portal or access point that facili- tates the integration of multiple data sources and meets the needs of all

OCR for page 69
 DATA DISCOVERY, ACCESS, AND INTEGRATION users. In reality, the diversity, complexity, and volume of environmental data, coupled with the wide range of user communities and applications of these data, necessitate having a tightly integrated data management strategy that is more broadly distributed. It is thus not surprising that a large number of different data access points have been implemented across NOAA, as well as at other agencies, each catering to a different user base, type of data, level of expertise, or application. For example, the Comprehensive Large-Array [data] Stewardship Systems (CLASS) Web interface provides direct access to large-array satellite data, while each of NOAA’s three National Data Centers and many NOAA centers of data have portals designed to provide access to popular subsets of data by a broader user community. For example, the academic community some- times uses the National Geophysical Data Center’s Defense Meteorologi- cal Satellite Program data for special studies such as auroral analysis. As shown, for example, in Figure 2-1, many users avail themselves of NOAA’s current data access systems. Therefore, these systems should be preserved. There are also a number of prototype projects in various stages of development that enhance data discovery, access, and integra- tion for specific applications or user groups (the National Integrated Drought Information System, or NIDIS, is one such project; see Box 6-3). However, all aspects of data access would be improved if there were an enterprise-wide effort to increase the linkages between different portals, to improve and expand the search functions offered at individual portals, and to design new portals that focus on new applications. The nature and justification for these steps are described in more detail in the next guide- line, and the next chapter describes how a system-of-systems approach to data management could provide the distributed access infrastructure demanded by NOAA’s diverse data holdings and user groups. Guideline: A distributed data access infrastructure can and should support improved data discovery and seamless integration. In general, all aspects of data access would be dramatically improved if an enterprise-wide effort were made to increase the linkages between different portals and to improve and expand the search functions offered at individual portals. There have also been some important preliminary efforts to consolidate access to a variety of data sources at portals focused on a particular interdisciplinary problem, such as NIDIS (Box 6-3). These problem-specific portals represent a particularly effective way to improve the relevance of NOAA data for a wide variety of environmental problems and decisions and should be considered a ripe area for further system improvements. However, a truly integrated archive requires a systematic approach that begins with the conceptual design of the data management

OCR for page 69
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA system itself. As discussed in Chapter 7, many of these goals could be achieved using the Global Earth Observation Integrated Data Environ- ment (GEO-IDE) or a similar approach. BOX 6-3 Integrated Information Resources for Drought Assessment on All Time Scales When assessing and addressing issues related to drought, it is critical to have a diverse assemblage of information that bridges multiple temporal and spatial scales. Of greatest immediacy for decision makers is an evaluation of recent and current conditions on time scales ranging from daily to seasonal or even interan- nual. In addition to specific, drought-related indexes such as the U.S. Drought Monitor and U.S. Forest Service Fire Danger Rating, other variables of interest may include precipitation, snow pack water content, surface air temperature, humidity, evapotranspiration, soil moisture, stream flow, groundwater levels, reservoir levels, and foliage or range conditions. Because absolute values and anomalies of all vari- ables are important, all these data may need to be accessible in tabular, graphic map, and narrative formats. Conditions specific to fish and wildlife populations or knowledge of certain climatological states (El Niño-Southern Oscillation, or ENSO, for instance) are also valuable assessment tools. Observational metadata and other ancillary information such as water restrictions, passing flow requirements in regional rivers, water usage statistics, or census figures can also be critical for making effective decisions. It is also important to place present conditions in his- toric perspective, which might require both instrumental records and paleoclimatic or proxy records to estimate past periods of drought, as well as the relationship between regional precipitation and other indexes. Once the present situation is well understood, the next step is to attempt to gain some insight as to whether drought conditions might worsen or ameliorate. Thus, forecasts for atmospheric and associated surface conditions need to be accessible to the decision maker, again on daily to seasonal time steps. These forecasts can be variable specific (precipitation) or for an integrated assemblage of information (drought). For those looking well beyond the present (years and de- cades ahead), model projections would need to be available at appropriate spatial and temporal scales. As with current conditions, outlooks need to be provided to decision makers in formats that permit rather straightforward interpretation. This includes sufficient information regarding probabilities and confidence limits. One-stop shopping is a useful goal as long as the data and information are discoverable and understandable. With respect to drought, work is under way through the National Integrated Drought Information System (NIDIS) program to develop a Web portal that will provide a link to connect the data, scientists, and decision makers.a It is hoped that this portal, as it develops, will serve as a model for other meteorological and climatological phenomena. ahttp://wwa.colorado.edu/resources/nidis/.