4
Data Management

A successful, international Arctic Observing Network (AON) must link sensors, observers, data, and users across space and time. The key to accomplishing this goal is data management, and data management—from data collection through distribution to users—will surely be the central challenge for integrating the AON. Previous chapters summarized the abundance and diversity of arctic observing systems and programs, but the infrastructure to integrate results from these resources is lacking. Accommodating a wide variety of users and uses will require building a data management system that is independent of nation, language, background, expertise, and subject matter.

The goal of this chapter is to provide a roadmap for building the AON data management system by discussing strategies for developing the overall system and parts thereof and then making specific recommendations for implementation. Recognizing that the meaning of “data management” varies depending on whether one is a data user, instrument developer, or employee at a national or international data archive, this discussion strives to assess aspects of AON data management from diverse perspectives.

DESIRED CHARACTERISTICS OF THE AON DATA MANAGEMENT SYSTEM

The fundamental purposes of the AON are to characterize the current state of the arctic environment and its variability and to support studies of attribution and prediction of arctic change. To accomplish these tasks, the AON data management system will need the following characteristics. Data from multiple disciplines will need to be made available to all users quickly, easily, and reliably with standardized metadata and supporting documentation. The AON will encourage full and open exchange of data, metadata, value-added products, and even instruments and platforms within the network. Data and networks must therefore be interoperable so that international scientists, engineers, arctic communities, residents, and policy makers can generate and access data in formats and languages that are understandable and useful to them. Data and supporting metadata must also conform to national and international standards for their discipline and, where feasible, be monitored to ensure high quality. The AON should provide access to both raw data and derived products to ensure that the information helps the broadest range of users. Data products derived by users would be incorporated back into the data management system and made available to help guide decisions ranging from international policy choices to what instrument to deploy to where to hunt on a given day.

Time-series analyses are crucial to recognizing and monitoring arctic environmental change; therefore, a fundamental goal of the AON data management is that, from the time the system is initiated, all observations and samples are preserved. Where cost-effective, the AON could also try to rescue data. AON data will need to be managed with both short- and long-term needs in mind. In the short term, users need to be able to obtain, interpret, disseminate, and store data. In the long term, data will need to preserve the integrity of scientific disciplines and the knowledge of local people, ensuring that research and assumptions can be verified into the future and provide new insights into previous investigations. Because the spectrum of AON data is broader than the expertise presently available at any one national or world data center, AON data will need to be warehoused in a handful of existing data centers organized by discipline, with discipline-specific standards, protocols, maintenance, and management, then tied together in such a way that users can access all data through one portal (e.g., Box 4.1).

DATA MANAGEMENT STRATEGY

To meet AON scientific objectives, the AON data management strategy needs to address acquisition, quality control and data standards, metadata and documentation, access, interoperability, dissemination, archiving, and processing for both extant and pending data. In developing



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 62
Toward an Integrated Arctic Observing Network 4 Data Management A successful, international Arctic Observing Network (AON) must link sensors, observers, data, and users across space and time. The key to accomplishing this goal is data management, and data management—from data collection through distribution to users—will surely be the central challenge for integrating the AON. Previous chapters summarized the abundance and diversity of arctic observing systems and programs, but the infrastructure to integrate results from these resources is lacking. Accommodating a wide variety of users and uses will require building a data management system that is independent of nation, language, background, expertise, and subject matter. The goal of this chapter is to provide a roadmap for building the AON data management system by discussing strategies for developing the overall system and parts thereof and then making specific recommendations for implementation. Recognizing that the meaning of “data management” varies depending on whether one is a data user, instrument developer, or employee at a national or international data archive, this discussion strives to assess aspects of AON data management from diverse perspectives. DESIRED CHARACTERISTICS OF THE AON DATA MANAGEMENT SYSTEM The fundamental purposes of the AON are to characterize the current state of the arctic environment and its variability and to support studies of attribution and prediction of arctic change. To accomplish these tasks, the AON data management system will need the following characteristics. Data from multiple disciplines will need to be made available to all users quickly, easily, and reliably with standardized metadata and supporting documentation. The AON will encourage full and open exchange of data, metadata, value-added products, and even instruments and platforms within the network. Data and networks must therefore be interoperable so that international scientists, engineers, arctic communities, residents, and policy makers can generate and access data in formats and languages that are understandable and useful to them. Data and supporting metadata must also conform to national and international standards for their discipline and, where feasible, be monitored to ensure high quality. The AON should provide access to both raw data and derived products to ensure that the information helps the broadest range of users. Data products derived by users would be incorporated back into the data management system and made available to help guide decisions ranging from international policy choices to what instrument to deploy to where to hunt on a given day. Time-series analyses are crucial to recognizing and monitoring arctic environmental change; therefore, a fundamental goal of the AON data management is that, from the time the system is initiated, all observations and samples are preserved. Where cost-effective, the AON could also try to rescue data. AON data will need to be managed with both short- and long-term needs in mind. In the short term, users need to be able to obtain, interpret, disseminate, and store data. In the long term, data will need to preserve the integrity of scientific disciplines and the knowledge of local people, ensuring that research and assumptions can be verified into the future and provide new insights into previous investigations. Because the spectrum of AON data is broader than the expertise presently available at any one national or world data center, AON data will need to be warehoused in a handful of existing data centers organized by discipline, with discipline-specific standards, protocols, maintenance, and management, then tied together in such a way that users can access all data through one portal (e.g., Box 4.1). DATA MANAGEMENT STRATEGY To meet AON scientific objectives, the AON data management strategy needs to address acquisition, quality control and data standards, metadata and documentation, access, interoperability, dissemination, archiving, and processing for both extant and pending data. In developing

OCR for page 62
Toward an Integrated Arctic Observing Network Box 4.1 An Imaginary Journey Through the Arctic Observing Network Portal Suppose one day you read an interesting article speculating on the contribution of processes in submarine canyons to the global carbon cycle and decide to explore arctic datasets. Entering the AON data portal, you first encounter icons for terrestrial, atmospheric, oceanic, and human dimensions that contain a summary of data holdings under each discipline. You then have the option to browse datasets by discipline or by theme. Using data exploration tools, you search for canyon processes and determine what relevant meteorological, geophysical, and oceanic datasets are archived, and their availability in space and time. Although you do not realize it, the information accessed comes from four different data centers in two different countries. For observations that are interesting but unfamiliar, you find links to descriptions of the instrumentation, the methods, and the data processing steps. You also find links to browse images of the datasets and, after inspection of these, you decide flow levels of the X River bear closer investigation, as the X River appears to be associated with the Y Canyon, and both the oceanic and terrestrial environments are well instrumented. Plotting the time series using the online data display tools, you observe that three years ago, in June, the gauges reported an abrupt drop in water level after a gradual rise through late spring. The screen also shows an icon that looks like the silhouette of a parka. Curious, you click on the icon, and a text box pops up describing a large ice dam that gave way about the time of the abrupt water level drop, with a notation from the Inuit hunter who reported the event. Now you open the relational database interface in the AON portal and frame a query requesting turbidity measurements within 100 km of the mouth of the X River during the timeframe of the ice dam collapse. Within seconds, you have links to data streams and generate another series of plots. These show an increase in turbidity within the Y Canyon two days after the ice dam collapsed. You suspect that you have identified a flow event carrying sediment into the deep Arctic Ocean. Wondering how general these events are, you search for abrupt drops in tide gauge measurements coupled to local increases turbidity measurements for other arctic river systems and find three more candidate events. It is almost the end of the day and you download your time-series plots and email them to your colleagues twelve time zones away for their review tomorrow. You save your AON session using the password protection you have installed so that you can access the data again tomorrow without having to redo the data searches. Before wrapping up, you post a request to the event detection service, providing the combined tide gauge turbidity criteria as the trigger. Finally, you post a request to the observation scheduling list, starting the process to request time on the docked autonomous underwater vehicle near the mouth of the X River to be triggered on detection of an event. It has been a productive day. the data management strategy of the AON, it is imperative not to reinvent the wheel. Much has been written about scientific data management (e.g., NRC, 1995; CCSDS, 2002; NSF/LOC, 2003; ICSU, 2004; Hankin and the DMAC Steering Committee, 2005; IPY, 2005; NSB, 2005), and many nations are establishing standards to promote integration and accessibility (e.g., FGDC, 1998; ISO, 2003). A successful AON data management strategy will follow nationally (or internationally) accepted guidelines and tailor data to meet the needs of the arctic user community while remaining flexible enough to allow for unanticipated use of the instruments and data. Many different countries and organizations make observations in the Arctic. Increasingly, the integration of consistent and high-quality international observations requires a mechanism to prepare regulatory and guidance material relating to data collection, data management, and development of data products. Recommendation: As a first step toward implementing the AON data management strategy, a permanent AON data management committee should be established to provide (i) oversight and coordination of long-term planning for data acquisition, access, distribution, and preservation; (ii) consistency and development of data policies; (iii) oversight of data management system design and engineering; (iv) collaboration with network designers; (v) distribution of integrative and interpretative products to inform national and international policy; (vi) user outreach, and (vii) oversight for the evolution of AON standards.1 Ideally, this group would include advisory members who establish strategies for various components of the data management system as well as members who can implement the strategic recommendations: for example, selecting and disseminating value-added products to inform policy decisions or arctic communities of observed environmental change. The AON data management committee would promote shared infrastructures for AON observations and provide a central portal in a distributive environment for contribution of and access to all the observations that are a part of the AON. 1   In Chapter 6 the Committee collects its ideas about implementation steps for the AON and breaks these ideas into near-term (minimum) actions and longer-term actions for an “ideal” system. Because the topic of the present chapter (4) is one of the Committee’s four Essential Functions, and because it is this “essential function” framework on which implementation recommendations are hung, it is more convenient and effective to place the implementation ideas on data management throughout this chapter than to wait until Chapter 6, where the other essential functions are discussed in detail. Most of the ideas in Chapter 4 are considered necessary near-term actions, but two are mostly for the “ideal” system and are marked accordingly.

OCR for page 62
Toward an Integrated Arctic Observing Network The data management committee would initially assess and build upon the success of similar common-purpose data portals (e.g., the Antarctic Master Directory [Leicester et al., 2001]). Organizations that already adhere to regulatory guidelines, such as World Data Centers and other regional and national data centers, could be represented on the AON data management committee. Furthermore, groups working on data management for the International Polar Year could be engaged, or their policies adopted or modified as needed. Creating this committee would have the additional benefits of involving instrument developers, scientists, and indigenous and local people in data management, and data managers in project planning and data collection. Another critical step toward implementing the AON data management strategy is to decide whether the AON data management system will exist in a distributed or centralized data holding environment. Data holdings for the Arctic are currently highly fragmented in the sense that they are managed by a wide variety of organizations and individuals. Presently, an instrument developer hoping to widely distribute long-term monitoring data acquired by a new sensor must determine appropriate data and metadata standards and assess the ability and expertise of existing data centers to assemble, maintain, disseminate, and preserve the data. A user searching for a particular parameter must first know which organizations are making those measurements, then become familiar with each organization’s data system to search and access data. Searches that encompass multiple data sources are conducted manually and typically involve downloading archives or subsets of archives to conduct additional searches on a local machine. This process is cumbersome and does not always yield the data in a format that is easy to use. Two approaches have been championed to improve the data contribution and discovery processes. The first approach is to develop standards and tools that support distributed searches. The second approach is to centralize data holdings for single-point searches. Given the merits of each (Box 4.2), a combination of these two approaches is best. Recommendation: The AON data holdings should be stored and maintained at a few discipline- or theme-specific data centers so that diverse arctic datasets can be managed by groups having the appropriate expertise (including arctic communities that desire to maintain their local and traditional knowledge). The AON, through its data management committee, should exercise the necessary authority and common sense of purpose to link these data centers and identify and remove gaps in the integrated data management system. The data management strategy will need to consider the users of the data and the level of service the AON is to provide to its stakeholders. The AON will need to provide data in a format that is friendly to all types of users, not just those who acquired the data. This includes determining what level of accessibility is needed, what tools are needed, and what data synthesis and value-added support is required. Significant new efforts are likely to be needed to incorporate human dimensions and local and traditional knowledge (LTK), which are poorly represented in scientific data archives (Krupnik et al., 2005). One of the primary challenges facing the AON will be developing an approach in which researchers and communities learn to use and manage LTK and link it to other diverse sources of information. The successful AON data management system will need to evolve as new observing capabilities, new understandings of arctic variability and change, and better awareness of the needs of local and global communities develop. There are several additional key elements to consider when developing a data management strategy for the AON. To optimize cost-effectiveness, the AON will need to make use of existing metadata centers such as the Global Change Master Directory (GCMD), which contains a large number of arctic dataset descriptions and data repositories (see, e.g., Annex Table 3A.4). Previous scientific data management activities should also guide AON efforts. There will always be unanticipated uses of instruments and the data they collect, and the AON data management system will need to be sufficiently flexible to accommodate new approaches. Data Acquisition Linking and invigorating data acquisition is a significant potential benefit from the AON. There are many observing networks and sites in the Arctic, but with no coherent organization to produce pan-arctic datasets. In developing a strategy to support AON data acquisition, historic and current data, as well as data that will be collected in the future, must be considered. The challenge of acquiring data is compounded by data ownership, the proprietary nature of some data, and the costs and efforts required to rescue observations that have been discontinued. Incentive strategies for contributing data to a network may be necessary to facilitate data submission. These could include paying arctic residents to maintain an instrument developer’s sensor over extended periods or encouraging funding agencies not to support researchers who do not contribute their data to the AON. Implementation of the data acquisition strategy will focus on two components: assembly of data that are already in repositories and acquiring data from sensors not presently supported by existing data centers. Data assembly is actually a significant component of data acquisition; the multidisciplinary, pan-arctic database managed by the AON can be assembled by linking information and samples at existing or developing archives. The AON will need to review the interoperability of hardware, software, and data management technologies of contributing data centers to demonstrate the ability to locate, retrieve, and work with data across disciplines, countries,

OCR for page 62
Toward an Integrated Arctic Observing Network Box 4.2 Comparison of Centralized and Distributed Data Holding Approaches When should data holdings be distributed, and when should they be centralized? Any vision for the future of data management must grapple with rapidly changing foundation technologies such as computation, storage, bandwidth, and algorithmic complexity. These changes are, to first order, predictable and provide a framework that shapes data management solutions. For example, Moore’s Law (Moore, 1965), which observed that the rate of technological capability (gauged by the complexity of an integrated circuit with respect to minimum component cost) doubles in about 24 months, has held true over four decades. Critical trends for data management are increasing storage capacity (a 100-fold increase in the last decade), growth in data bandwidth (a 10-fold growth in the last decade), and increasing complexity of algorithms. The combination of these trends supports a push toward more centralized facilities (Gray et al., 2005). The distributed model offers a number of advantages for holding scientific data. For example, a wide number of funding agencies support data collection, and data archiving and management tends to stay with the funding agency; many funding sources lack natural mechanisms for supporting central data management; any one organization is unlikely to have all the appropriate scientific expertise to manage the extremely diverse datasets for the Arctic; quality control of many datasets is an ongoing process and is best done by the experts who acquired the data; many datasets are large and centralization is impractical; issues of ownership or confidentiality may be involved and are best handled locally; distributed datasets are demonstrated to be readily searchable—the National Virtual Observatory developed by the astronomical community (NVO, 2005) is one example; having many organizations involved in data management increases the talent pool developing data solutions; and for arctic peoples, the distributed model may be desirable since there is interest in having local knowledge held in arctic countries and communities, as well as an interest in training and jobs for local people in data management. There are, however, problems with the distributed data model. For example, effective data management requires dedicated staffing and a range of skill sets that many small organizations simply cannot muster. The rapid rate of technological change creates a continuing need to improve technical expertise, reinforcing the need for such expertise. Furthermore, a distributed search relies on the existence of common standards for metadata and data formats and compliance of the many participating organizations with those standards. The costs of distributed data management can be high, not only because there will necessarily be duplication of skill sets at different organizations, but because the development and maintenance of the standards and tools to support distributed data search appear to require substantial investment. Distributed data holdings can also be fragile in that data management is often an ancillary activity of a scientific investigation, and thus may only be maintained through the period of funding. Thus, arguments for more centralized data management can be made on administrative as well as technical grounds. languages, etc. This approach would incorporate those institutions that have mature data management practices and minimize the work to resolve incompatibilities among data types. For those networks and institutions without mature data infrastructure, the AON will need to provide support including guidelines for data and metadata production, dataset documentation, and enforce the established standards. Recommendation: Where data acquisition involves the collection of information from a sensor or network with limited infrastructure and no established ties to national or world data centers, the AON data management system should facilitate data handoff to the most relevant archive. The AON data management committee’s role of providing the common sense of purpose will encourage diverse data centers to work together. The Committee will also need to identify gaps between networks and data centers and recommend approaches to ensure that AON data are managed and preserved. Additional roles for the AON data management committee include tracking treaties and guidelines for handling and preserving national and international data and metadata and encouraging funding agencies to ensure that Principal Investigators (PIs) meet their obligation to archive data and metadata. Data Quality Control From a strategic point of view, the first measure of whether data should be managed by the AON should be based on their quality. Data of poor quality can hinder scientific analyses and thus should not be included in data archives. Timely quality control of the present-day observa-

OCR for page 62
Toward an Integrated Arctic Observing Network tions by monitoring centers and subsequent notification to data collectors regarding errors will stimulate corrective action. If data errors are not identified and corrected quickly, errors and biases accumulate. Recommendation: The AON should implement an operational system to track, identify, and notify data collectors of observational irregularities, especially time-dependent biases, as close to real time as possible. This system could also be used to evaluate extant data that are submitted for inclusion in the data archives.2 The AON data management system would be the appropriate mechanism for reporting errors to monitoring centers and following up on these reports. This functionality would be especially useful for data quality problems that do not emerge until different datasets are merged (although internal consistency does not mean that data are necessarily correct; similarly, inconsistency does not mean that data are wrong—for example, discrepancies between LTK and sensor readings do not render either dataset useless). Besides setting up a means for tracking and identifying errors, the AON will need to promote data quality at every step in the data pipeline, starting with instrument development, to the observations, and finally to the derived products. For example, many satellite products have error characteristics of derived geospatial variables that could be fully characterized and reported in the dataset documentation. In general, tracking data uncertainties and managing against inappropriate use (see, e.g., Couclelis, 2003; Parsons and Duerr, 2005) will be necessary components of quality control. Data quality among LTK contributions is an issue that would benefit significantly from early attention. Some quality issues can be addressed if local people are using scientific instruments to make AON-related measurements. However, LTK itself cannot be judged or quality-controlled by scientific standards. An effort to link LTK into the AON data management system with input from local and indigenous partners would address these issues. Data and Metadata Protocols and Standards and Dataset Documentation The strategy addressing standardization of data formats and transfer protocols, metadata formats, and supporting dataset documentation is central to the AON data management system. Without metadata standards, it is not possible to make queries across disparate data centers. Standardized data transfer protocols are essential to support data access, dissemination, and analysis. Dataset documentation is an important legacy, allowing information to be correctly Box 4.3 Open Geospatial Consortium Open Geospatial Consortium Inc. (OGC), is a nonprofit international industry/user/technical consortium of 298 companies, government agencies, and universities participating in a voluntary consensus process to develop publicly available standards for geospatial and location-based services. OGC members create open and extensible software interfaces for geographic information systems and other mainstream technologies that make complex spatial information and services accessible and useful with all kinds of applications. understood and processed long after it was acquired. In contrast, data format standards present a moving target. Data formats vary from discipline to discipline. Even within a discipline, different measurement protocols may be in use, possibly for the same variable. As new instrumentation and approaches are developed, data formats evolve (SEEDS [Strategic Evolution of Earth Science Enterprise Data System] Formulation Team, 2003). However, inconsistency limits the capacity to observe both short- and long-term changes in the Arctic. For the AON data management strategy to succeed then, the ability to merge and integrate different datasets across disciplines as well as across diverse user communities must exist. Data standards will need to focus on a key subset of common parameters whose standardization would most facilitate data interfacing. SEEDS has shown that discipline-specific data format standards are more closely followed than standards imposed by outside forces (SEEDS Formulation Team, 2003). Candidate AON standards would be those set by existing data centers for the various domains (i.e., terrestrial, ocean, atmosphere, etc.) that are tasked with developing data standards for their community and existing international protocols (e.g., World Meteorological Organization resolutions for hydrometerological data). The AON will also need to look to experienced data managers to provide advice on archival standards and to the Open Geospatial Consortium (OGC)—a common venue for interoperability technical specification development (Box 4.3). Recommendation: System implementers should adopt standard specifications agreed upon by consensus, with preference to formal international standards such as those of the International Organization for Standardizations.3 2   See previous footnote. This recommendation is for the longer-term “ideal” data management system. 3   See IWGEO (2005) for a parallel recommendation.

OCR for page 62
Toward an Integrated Arctic Observing Network Standardization can be as critical for data as it is for data formats. Consider, for example, common references such as time and location. Detailed measurements of ice canopy thickness can be derived by combining satellite altimetry data with upward-looking sonar measurements from an autonomous underwater vehicle. But if the two instruments were not synchronized or the clock on underwater sensor drifted after it was initially synchronized, integrating the two datasets to produce measurements of sea ice thickness can become difficult or even impossible. Similarly, although the Global Position Satellites now provide standardized positional information on land and at the sea surface, they are unavailable for observations made beneath the surface of the Arctic Ocean. Recommendation: The AON should establish standardized temporal and spatial reference frames to facilitate arctic research, particularly considering difficulties associated with operating under-ice and underwater sensors. Implementation of standard formats for date, time, latitude, and longitude has already been useful for comparing and collecting data from international, multidisciplinary projects such as Surface Heat Budget of the Arctic Ocean (Uttal et al., 2002). Second only to the quality of the data themselves is the quality of the supporting metadata. Metadata should explicitly describe all preliminary processing associated with each dataset, its underlying scientific purpose, as well as describe and quantify uncertainties resulting from each processing step. At a minimum, a metadata file needs information about parameter, keywords, source, sensor, location, project, temporal coverage, spatial coverage, uncertainties, processing steps, personnel, data center, distribution and media, Directory Interchange Format information, lineage, and versioning information. Recommendation: The AON should serve as the central resource linking all arctic metadata regardless of nation, program, institution, or individual contributor. In this capacity, the AON would enforce established metadata standards (e.g., ISO 19115; Geographic Information Metadata, and the Content Standard for Digital Geospatial Metadata established by the Federal Geographic Data Committee), contribute to the development and acceptance of new standards as necessary, and coordinate and set the requirements for the metadata database. A centralized metadata repository could link metadata from existing sources for the initial metadata base population (e.g., GCMD), identify and gather missing metadata, and provide a mechanism for incorporating future metadata. Having a centralized metadata center for the Arctic would also address Box 4.4 Metadata Concerns and How the AON Data Management System Could Help Address Them Loss of Metadata: Often an ASCII or binary format is used for the data and the documentation is kept separately (e.g., readme files). This is a potential problem since the metadata may become separated from the data as the data spread through the user community, and might result in unintentional use of the data. By relying on one repository that hosts all versions of the metadata and tracks data and metadata heritage, users should always be able to find information describing various iterations of data processing and products. Ownership of Metadata: A recurring problem with widely disseminated datasets is determining who owns the metadata, who can edit what parts of the metadata, and how communications between multiple groups maintaining the same metadata will be handled. The policies of a centralized metadata repository can address these issues for all users. Development of Metadata for LTK Observations: These observations may not easily conform to metadata standards set for standard scientific datasets. The AON will need to consider the best ways to provide metadata for LTK allowing for flexibility in the various metadata requirement categories or even the creation of new categories since many LTK studies are not replicable. As stated in OCEAN.US (2005): “Although the goal of [the International Ocean Observing System] may be to provide automatic access to data, it may be necessary to implement this in a staged approach, particularly for historic data.” This may be an approach to consider for LTK. some of the concerns that arise when metadata are broadly distributed (Box 4.4). The metadata database will need to be designed to handle new requirements as they are defined and to promote and facilitate standardization. Recommendation: The AON should be responsible for dissemination of procedures and tools to (i) ensure the collection of adequate and appropriate metadata, (ii) ensure the quality of the metadata collected, and (iii) ease the burden in helping data providers and network and instrument developers meet metadata obligations.4 4   See footnote on second page of this chapter. Items (ii) and (iii) in this recommendation are for the longer-term “ideal” data management system.

OCR for page 62
Toward an Integrated Arctic Observing Network Given that data collectors sometimes have difficulty providing metadata with their datasets, the ability to develop and disseminate software for creating metadata through a centralized repository is advantageous. Because different communities will undoubtedly have different requirements (such as the LTK observations), distributing tools to produce metadata in a standard framework will help accommodate the different needs of disparate arctic researchers. In addition to effectively managing metadata, the AON can advocate that all datasets be accompanied by guide documents. Without well-documented data, sensors, and infrastructure, it is not possible for users of the data to understand any limitations or special characteristics of data, instruments, or networks they are using. Supplemental guide documentation may contain descriptions of a dataset; details of study design and data collection protocols (including instrument or network description, where and how the data were collected, etc.); quality control procedures; any preliminary processing derivation, extrapolation, or estimation procedures; the use of professional judgment; quirks or peculiarities in the data or a sensor; an assessment of features of the data that would constrain their use for certain purposes; and references and project Web site. By recommending that all data be accompanied by complete documentation and providing approaches for creating well-organized documentation (e.g., CCSDS, 2002), the AON data management system would further facilitate arctic research. Data Access To develop a strategy for data access, the AON needs to consider who the users of the data are and their particular needs. User needs vary depending on whether they are scientists, instrument developers, educators, policy makers, indigenous people, or others. For example, scientists need to know where the data are, how to access the data, the level of quality control of the data, and data uncertainties. Instrument developers need to know the requirements and processes for making their datasets available. Educators need tools and products to use data for teaching and to have access to data and analysis tools at little or no cost. Policy makers need derived products to help them interpret what is happening in the Arctic and make informed decisions. Indigenous peoples need data available in accessible formats (e.g., visual data products) and in their own languages. Some indigenous languages like Inuktitut or Gwich’in require special fonts that will need to be available online in standardized forms. Large gaps exist between the definition of the word “data” for all of these users. The term “data” reflects the experiences of the users, the context in which they work, and the questions they are trying to answer. Raw measurements may be data to scientists but not to educators or students. Similarly, policy makers may be more interested in model results and prediction rather than in raw observations. Recommendation: The AON data management system should provide interactive, direct access to data through a single portal. The Committee supports a Web services approach5 to portal design because it has a minimal impact on data management choices made by data contributors and is broadly adaptable to existing and new (client) applications. However, when choosing a Web service, it is important to consider that, in general, the more sophisticated the data system, the more it will require from data contributors (e.g., special formats, documentation). The AON will need to consider these issues in weighing how to manage arctic data and work to ensure the necessary (ever-changing) technologies are adopted. In the initial development of data delivery, access would be provided to those data available online and associated with high quality. Eventually, the AON will develop a catalog system that provides access to all data including those datasets that are only available offline, such as physical samples and historic transcripts. For arctic communities to access the AON, those communities will need the appropriate resources and capabilities. The AON will need to consider the state of computer access in arctic communities as well as what is needed in terms of capacity building, training, and systems for troubleshooting. A useful function for the AON would be to help provide infrastructure for improved data access to encourage locations where Internet connections remain slow to input their observations into the AON framework, as well as to use AON data and products. Long-term Archival of Data Long term archival (LTA) is the central pillar of systems designed to monitor environmental change in the Arctic. According to the Consultative Committee for Space Data Systems (CCSDS, 2002) “long-term” is defined as a period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, of a changing user community, and on the information being held in a repository. From a strategic point of view, this definition implies that LTA needs to be a continuing program for preservation and integrity of comprehensive data, products, and information. Not only is simple access to the original data needed, but any LTA must allow for future development of new and improved products and for use of data in ways that were not originally anticipated. For example, algorithms to derive variables evolve, 5   In this approach, data reside in their parent data center or other archive, but, through interactions among two or more Web applications, can be seamlessly merged with data from other locations on the Web to create new graphics or to support an analysis, for example.

OCR for page 62
Toward an Integrated Arctic Observing Network and it is necessary to archive raw measurements for future processing and reprocessing. Within the AON data management system, a mechanism would be needed for reprocessing from the start of a specific data record or even the archival of a specific data type. This necessitates extensive documentation that goes well beyond typical metadata needs. Metadata for archival purposes must at a minimum contain versioning, lineage, and reference information. These are required to support modifications and corrections to data in archives as well as to maintain reference information for the archived data. The Arctic System Science (ARCSS) Program at the National Science Foundation has shown that the data submission rate from PIs to the ARCSS Data Coordination Center has been much lower than optimal. Lessons learned from ARCSS demonstrate that plans for submitting data in the archive need to be addressed by the PIs at the proposal and field collection stage. Among a number of desirable traits (Box 4.5), the LTA system within the AON will need to be proactive in encouraging PIs to plan for submission of their data and to encourage timely acquisition of data and metadata for LTA. Derived Data Products Making the data usable is typically the responsibility of the researcher or engineer who initially collected the data. Users of the data—whether the original collector, those conducting assessments or educational outreach with the data, or researchers who merge and manipulate data with data from other sources—add value to the original data, creating more polished products that potentially are of broader interest and utility than the original elements of the product. Recommendation: The AON should expand the usefulness of derived data products by being responsible for disseminating value-added data. Box 4.5 Desirable Traits for a Long-term Archiving System Ready data discovery and use: Datasets need to be searchable across the entire time horizon; Increased attention is needed to recover and access past records (e.g., instrumental and paleoclimate reconstructions, traditional knowledge records such as transcripts or tapes of deceased elders) to better establish the variability and long-term trends in the arctic environment. Consistency Upheld: The operations of the component networks need to be monitored on a continuous basis to ensure that standards are being maintained and that observations are being received by the designated AON data management center; Data need to be preserved and citable; Data assimilation and reanalysis products need to be archived—for example, reanalysis products might be a primary product for many users and might be needed in ‘real time;’ Rescued/recovered data need to be transferred into “preservable” formats (e.g., audio tapes to CD-ROMs or digital audio (for LTK), paper ice charts to digital; floppy disks to digital), and proper storage of paper (maps, charts) and films is needed; Products need to be considered that will be useful to local community audiences. IPR (Intellectual Property Rights) system in place: This system will address who ‘owns’ data contributed to AON data management system. For LTK, communities might want to own their intellectual property and have potential researchers contact them for permission to use this data. Can they still be linked into the AON? The IPR system must be flexible, but once data are deposited, are they public? Proper handing of IPR and “sensitive” data is an open question. Ability to migrate to new systems: Electronically stored data need to be continually migrated to newer storage devices and access software; Qualitative data (e.g., non-numeric and context-specific data) need to be accommodated (this is especially key for archiving LTK such as audio and video interviews, text interview transcripts, artwork, maps, drawings). Processing and reprocessing capabilities are available. Data flow to modeling and analysis is possible through translator software. Education of data users and providers is ongoing. Skilled, experienced, and technologically advanced data management staff present. A philosophy to embrace proven new technologies. A philosophy to embrace feedback from users and incorporate this into an evolving system.

OCR for page 62
Toward an Integrated Arctic Observing Network There are many instances when users have difficulty accessing large datasets or do not have the knowledge to work with such observations. Access to and understanding of these data can be made more effective through derived and/or value-added products. Additionally, many products are generated by blending data from different sources, such as blending in situ and satellite observations or combining observations from several sensors. For many applications, maximum benefit is extracted from all the various observations through real-time data assimilation and reanalysis systems in which different data are integrated into comprehensive and internally consistent descriptions of the state of the Arctic. Rather than requiring all users to repeat these efforts to integrate data, the AON can provide derived products, particularly those useful for educational and policy-making purposes, through its data portal. SUMMARY An abundance and diversity of arctic observing systems and programs already exists, but the infrastructure to integrate results from these resources is lacking. Because this infrastructure will need to accommodate a broad spectrum of users, the AON will need a data management system that is independent of nation, language, background, expertise, and scientific interest—no small feat. But the successful completion of this task is the most significant contribution to creating a truly integrated network. Recommendation: A data management system initially built on existing data centers and resources must be designed and implemented immediately by an AON data management committee to support major functions of the network. This system should be accessible through a single portal that connects data across disciplines and themes and should seamlessly link information from arctic sensors, historical datasets, and researchers and other users across space and time.