4
Data Stewardship

PRINCIPLE #6: Data and metadata require expert stewardship.


Scientific data stewardship, with assigned organizational responsibility, should be applied to all environmental data sets and their associated metadata to ensure that this information is preserved, remains continually accessible, and can be improved as future discoveries build understanding and knowledge.

The importance of data stewardship has been examined at length in many previous reports, including many of those discussed in Chapter 2. NOAA has also recognized the value of providing effective and ongoing stewardship for its environmental data, as evidenced by the explicit inclusion of the word in the name Comprehensive Large-Array [data] Stewardship System (CLASS), in the Global Earth Observation Integrated Data Environment (GEO-IDE) planning documents, in the terms of reference for the Data Access and Archive Requirements (DAAR) Working Group, and in the statement of task for this requested report. While there is general agreement that sustained data preservation and effective data access require careful stewardship, the term tends to be used somewhat differently in different contexts. We propose the following definition, which is adhered to throughout this report:

Data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata. These activities include maintaining a scal-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 41
4 Data Stewardship PRINCIPLE #6: Data and metadata require expert stewardship. Scientific data stewardship, with assigned organizational responsi- bility, should be applied to all environmental data sets and their asso- ciated metadata to ensure that this information is preserved, remains continually accessible, and can be improved as future discoveries build understanding and knowledge. The importance of data stewardship has been examined at length in many previous reports, including many of those discussed in Chapter 2. NOAA has also recognized the value of providing effective and ongoing stewardship for its environmental data, as evidenced by the explicit inclu- sion of the word in the name Comprehensive Large-Array [data] Steward- ship System (CLASS), in the Global Earth Observation Integrated Data Environment (GEO-IDE) planning documents, in the terms of reference for the Data Access and Archive Requirements (DAAR) Working Group, and in the statement of task for this requested report. While there is gen- eral agreement that sustained data preservation and effective data access require careful stewardship, the term tends to be used somewhat differ- ently in different contexts. We propose the following definition, which is adhered to throughout this report: Data stewardship encompasses all activities that preserve and improve the information content, accessibility, and usability of data and metadata. These activities include maintaining a scal- 

OCR for page 41
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA able and reliable infrastructure to support long-term access and preservation, preserving data access and archive integrity during media migration and software evolution, providing effective data support services and tools for users, and enhancing data and metadata by adding information that is established throughout the data life cycle. Environmental data archives are not static collections, so data stew- ardship is an ongoing, iterative process involving users, servers, stor- age systems, software, interfaces, and analysis tools. Since analysis and usage can improve the accuracy and understanding of data and metadata, stewardship activities also include capturing these advances and carry- ing them forward for future generations. By adding a data management framework that maintains and improves the archive and ensures access and understanding for users, the stewardship function supplements and enhances data collection, spans data archiving and access, enables dis- covery and integration, and facilitates the realization of societal benefits. Together, these activities support the vision introduced in Global Change Science Requirements for Long-Term Archiing (USGCRP, 1999) of “a con- tinuing program for the preservation and responsive supply of reliable and comprehensive data products and information on the Earth system for use in building new knowledge to guide public policy and business decisions.” Stewardship should be a formal and emphasized component of NOAA’s data management planning process. In particular, each data set should have a designated data steward; in most cases, teams of skilled professionals with complementary skills are required to enable the level of stewardship needed to support NOAA’s mission. These teams include information technology, computer, database, and software specialists; disciplinary science experts; curators; project managers; and a variety of other professionals to support the full range of NOAA’s data management activities. Multidisciplinary teams are critical to stewardship, and it has been recommended that data technology specialists receive recognition for this work as “data scientists” (NSB, 2005). Since people retire and organizations evolve, the stewardship plan for each data set also needs to provide for the continuity of stewardship throughout the data life cycle, which can last many decades. Even though our definition of data stewardship spans a broad range of data management activities and responsibilities, the remainder of this chapter focuses on three fundamental stewardship functions: data and metadata preservation, scientific assessment and improvement, and data support services and tools. In combination, these fundamental steward- ship functions ensure that the archive is maintained and that the flow of

OCR for page 41
 DATA STEWARDSHIP information to users supports the realization of societal benefits. Thus, data stewardship provides the foundation for the additional, cross-cutting data management functions of data archiving and data discovery, access, and integration, topics that are touched on here and discussed in detail in Chapters 5 and 6, respectively. DATA AND METADATA PRESERVATION Guideline: Metadata that adequately document and describe each archived data set should be created and preserved to ensure the enhance- ment of knowledge for scientific and societal benefit. Metadata are all the pieces of information necessary for data to be independently understood by users, to ensure proper stewardship of the data, and to allow for future discovery. This information should include, at a minimum: a thorough description of each data set, including its spatial and temporal resolution; the time and location of each measure- ment, and how the data were originally collected or produced; and a thorough documentation of how the data have been managed and pro- cessed, including information about any media and format migrations, the accessibility of the data, and the algorithms or procedures used for any reprocessing, revisions, or error corrections. Collectively, these pieces of information are what make the data in an archive useful. Ideally, meta- data should also describe appropriate applications of the data, the rela- tionship between the data and other data both within and outside of the archive, and enough high-level information to allow different types of users to find and understand the data. Adding these additional pieces of information would help support the discovery and integration of data across different archives and disciplines. In addition to supporting user needs, maintaining accurate and exten- sive metadata can improve the cost-effectiveness of an archive by provid- ing for more efficient and effective data storage and access. For example, data accessed less frequently or only by a subset of users may be more cost-effectively archived and made available in a different manner than data that are more widely used. Effective metadata management could thus be expected to help NOAA meet the challenge of increasing data volume and diversity. NOAA’s data management system should also be designed to facilitate the sharing of metadata across systems, disciplines, and programs to build more complete data catalogs. This would improve the ability of disparate users to find and use a wide variety of data. Fur- ther, because data systems will evolve to incorporate new information and to take advantage of technological improvements, the data system

OCR for page 41
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA philosophy needs to account for the reality that metadata continually evolve, expand, and mature. As discussed in the next guideline, these con- siderations necessitate the use of existing standards and the adoption of new ones, when appropriate, to facilitate services and integration across data sources and disciplines. Guideline: The application and expansion of metadata and related standards are essential for good stewardship; NOAA and its partners should continue and expand their usage of standards and reference models. Where possible and practical, it is advisable to use established meta- data standards in order to improve cost-effectiveness and coordination with other activities, to guarantee the integrity and security of the archive, and to facilitate the crosscutting stewardship functions of discovery, access, and integration. The PREservation Metadata: Implementation Strategies (PREMIS) Working Group recently developed standards for preservation metadata in accordance with the Open Archival Information System (OAIS) Reference Model (PREMIS Working Group, 2005). The OAIS Reference Model provides detailed guidance on data preservation and should serve as a primary reference for archiving activities at NOAA and other agencies. It is encouraging that the NOAA National Data Cen- ters and CLASS have already adopted the OAIS model to guide their archive development. We recommend that NOAA and its partners make the OAIS model even more of a standard across their data management enterprises and routinely assess their practices against community-devel- oped reference models. NOAA and other agencies should also be encouraged to expand their metadata protocols beyond mandatory requirements. For example, as discussed previously, standard information about each data set could be expanded to include its relationship to other data sets both within and outside of the archive. In the next section, it is also suggested that user feedback logs be included as part of the metadata for each data set to facilitate data and metadata improvement. In general, any information that could be useful for improving the integrity of the archive, data access capabilities, or data quality should be collected, retained, and utilized. Metadata are also essential for data discovery, access, and integration and are thus critical for facilitating the translation of environmental data to societal benefits; Chapter 6 discusses these benefits of metadata in further detail.

OCR for page 41
 DATA STEWARDSHIP Guideline: NOAA should develop and maintain a scalable and reliable infrastructure that ensures long-term access and preservation of digital data assets. Data are often subjected to substantial “environmental” changes over time, such as evolutions in storage and access technologies, changes in stewardship responsibility, relocation and repurposing, and—possi- bly—disruptions due to system failures, operational errors, and natural hazards and disasters. The digital preservation infrastructure for a long- term environmental data archive should therefore be able to handle the following requirements: • Each preserved digital object should contain sufficient information to enable the application of long-term preservation policies and to handle its life cycle management (that is, an Archive Information Package, as described in the OAIS Reference Model). • Efficient management of technological innovations and evolutions in both hardware and software, especially when technology begins to become obsolete. • Efficient risk management and disaster recovery mechanisms from technology failures or degradation; natural disasters such as fires, floods, and earthquakes; or human-induced operational errors. • Efficient mechanisms to guarantee the authenticity of content, con- text, and structure of archived information throughout the preservation period. • The ability to provide for secure discovery and access, with an automatic enforcement of authorization and security policies, throughout the life cycle of each object. • Scalability in terms of data ingestion rates, storage capacity, pro- cessing power, and the speed at which users can discover and retrieve information. Guideline: NOAA should establish and maintain data and metadata migration plans for all current and future long-term archive systems to adapt to information technology evolution. All data, including digital information, must be periodically migrated to new storage media to make sure that they remain retrievable for long periods of time. When the data archive is large and growing and data transfer rates are near the upper bounds in storage systems, media migra- tion can be a continual process; that is, the time needed to transfer data from one media to the next approaches the reliable life expectancy of the legacy media itself. Data stewards need to make certain that no informa-

OCR for page 41
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA tion is lost, that data access remains uncompromised, and that errors may be corrected whenever the data are migrated to new media, translated to another form, or otherwise modified from their original format during the normal course of technology and software evolution. Irreplaceable data should carry so-called fixity information (for instance, digital signatures), which can be tested during any sort of media migration or backup process to verify that the content has not been altered (CCSDS, 2002). Because it changes the actual digital structure of the data, format migration can be more problematic than media migration. The best archive formats are those where the digital content of each data record can be described in elementary terms (for example, number of bytes, numeric type, character string, pixel, etc.). This is one feature of an open format standard that helps minimize software and computer operating system dependencies that could render the data inaccessible in the worst case. So-called proprietary formatted data (non-open format description) should in general not be considered as a good candidate for long-term archiving unless a plan and a process are in place to translate the data to an open format standard. Metadata should be stored in similarly open formats and should be tightly coupled with and managed in conjunction with the data so both are always readily available to the user. The top priority of the data steward should always be the security and integrity of the archive. Thus, the archiving procedures of NOAA and its partners should include storage in multiple independent locations, regu- lar tests on fixity information, and disaster recovery exercises that certify system reliability. Systems that track changes made to digital objects, often used in software development, are also applicable across data and metadata management, and they can be used to guard against irrecover- able errors. NOAA should explore the possibility of expanding its use of change control software, although the long-term viability of the software should also be considered. Finally, data stewards should maintain open lines of communication to their users and their colleagues at other data repositories to ensure the consistency and coordination of archiving pro- cedures across NOAA’s data management enterprise. SCIENTIFIC ASSESSMENT AND IMPROVEMENT Guideline: Good stewardship requires systematic, ongoing assess- ment and improvement of data and metadata. Properly stewarded data collections almost always improve over time. Throughout the data life cycle, which is illustrated in Figure 4-1, archived information may be enhanced by recalibration, error correction,

OCR for page 41
 DATA STEWARDSHIP FIGURE 4-1 The data management life cycle. Data and associated metadata are acquired and integrated into the archive and access system. Ideally, data stewards, in collaboration with users, evaluate the data over time, leading to additional knowledge about the data, including potential improvements that might be made to the data or metadata. If improvements are possible, they should be applied, at which point the archiving, evaluation, and improvement cycle begins again. and/or the addition of supplemental data or metadata derived from sci- entific research, user feedback, and other knowledge-building processes. For example, calibration and validation studies typically lead to suc- cessively higher-quality versions of a given data set via improvements in processing algorithms or error-handling procedures. Of course, these repeated improvements tend to exacerbate the challenge of increasing data archive volumes, particularly with multiple reprocessed data sets. Chapter 5 contains guidelines that can help data managers deal with this issue, noting, for instance, that demand will typically be far greater for the most recent version of each data set and so it may not be necessary to provide immediate access to older versions. It is vital that data stewards be engaged in these decisions. As with most aspects of data management discussed in this report, the assessment and improvement function of data stewardship is most effec- tive when it occurs on a regular basis and under a flexible but systematic set of rules and requirements. These rules should be advertised both to users and to data providers, who should in turn be given a chance to pro- vide input to the process. Similarly, these rules should explicitly take into account the estimated costs and likely benefits of improvement efforts.

OCR for page 41
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA BOX 4-1 Understanding Sea Ice Sea ice data archived and managed by the National Snow and Ice Data Center (NSIDC) provide an example of how data stewardship can lead to in- creased value and better understanding of environmental data. The primary sea ice products at NSIDC are derived from passive microwave remote sensing instru- ments on satellites. These sensors were originally used to support operational weather forecasting, but as a result of an ad hoc process of scientific assessment, researchers now use these data for many other applications, including sea ice research and monitoring. In the late 1980s, NSIDC established a program for the stewardship of passive microwave data and other related sea ice products. Here we provide examples of how the three major aspects of stewardship were applied to these data. To ensure robust preservation of sea ice data, NSIDC established routine off-site backup procedures, as well as an ongoing media migration program, to guarantee data integrity. The center has developed and maintained detailed meta- data that document the data and how it has been archived. More recently, data stewards at NSIDC have been exploring a preservation metadata scheme to im- prove compliance with the OAIS reference model. The scientific assessment and improvement aspect of NSIDC’s stewardship program has been even more dynamic. The instruments and satellites used to collect passive microwave data constantly evolve, making it difficult to construct consistent time series or climate data records, so NSIDC documentation includes detailed information on the known errors and uncertainties in the data. For ex- ample, in 1999 NSIDC was notified by a user of errors in the latitude, longitude, and pixel area files supplied with some of the passive microwave data. These errors would have biased analyses of the data if left uncorrected, but NSIDC was able to Scientific data stewards should be encouraged to engage both expert and nonexpert users to understand how data are being used and should make sure that the metadata for each data set describe the appropriate uses and limitations of each data set, in addition to formal uncertainties and errors (see Box 4-1). It is also important to realize and communicate (using metadata) that despite continued improvements, some data may never be appropriate for certain applications. Guideline: Assessment and improvement should be based on expert knowledge and user feedback. The assessment and improvement of environmental data sets, like

OCR for page 41
 DATA STEWARDSHIP correct the error and notify users of the changes. Other scientific assessment and improvement activities have required more sophisticated scientific analyses; for example, NSIDC and other scientists have prepared three special reports devoted to deeper analysis of passive microwave-derived sea ice products (Maslanik et al., 1998; Stroeve et al., 1997; and Stroeve and Smith, 2001). All told, these in-house analyses and community interactions have led to the creation of time series that are consistent and well characterized across sensors and platforms. Those involved in data support services at NSIDC have also been quite active with sea ice data sets. NSIDC archives several dozen sea ice data sets, one dozen of which are derived from passive microwave remote sensing. This is a confusing array of data, even for a specialist in sea ice research. NSIDC also recognizes that data stewards need to guard against scientifically inappropriate use of their data (Parsons and Duerr, 2005). In total, NSIDC provides hundreds of pages of data documentation along with a Web site that compares and contrasts their sea ice products, including descriptions of the strengths, weaknesses, and appropriate applications of each product.a These tools—which were developed in response to feedback from the user community, both informally and through an established user working group—are intended to guide scientific users to the product that most suits their application. NSIDC has also developed specific products geared to nonexpert users, each of which is accompanied by a description of appropriate uses for the data. The best known among these is the Sea Ice Index, which pro- vides images of average monthly ice conditions and trends, along with anomalies that compare recent conditions with the long-term averages (Fetterer and Knowles, 2004). NSIDC also provides higher-level informative products geared specifically to educational users, the media, and the general public. ahttp://nsidc.org/data/seaice/. other stewardship functions, requires communication among data stew- ards, data users, and data set experts. Typically, the scientific assessment and improvement function is initiated by data set experts as they use data and metadata and uncover problems such as bias offsets, missing records, or systematic errors in spatial or temporal identification. However, input from nonexpert users can also be extremely helpful. In addition to being valuable to the experts responsible for data set maintenance, assessment, and improvement, feedback logs would provide valuable information to users about the quality, utility, and appropriate applications for the data, as well as helping them find related data sets. Chapter 6 discusses these additional benefits of user feedback and recommends including user feedback logs as a regular component of the metadata associated with

OCR for page 41
0 ENVIRONMENTAL DATA MANAGEMENT AT NOAA each data set. Formal efforts to engage users in data management activi- ties, such as user panels, can also be employed to improve all the major data stewardship functions. To the maximum extent possible, these user feedback mechanisms should be made a regular part of NOAA’s data management activities. Guideline: Stewardship plans should be consistent but flexible so that improvements in data and metadata can be captured and included in the data systems. In order to realize the scientific and societal benefits of improvements in data and metadata over time, the data management system should cap- ture and reflect the information that leads to increased knowledge about each data set and its uses. Data management systems should therefore be designed to support the growth of archived information that occurs during the data life cycle, including metadata that is outside of current standards and requirements. As an example, consider a data steward who comes to recognize that a particular data set has broader relevance than is currently being realized and collaborates with other data stewards to achieve wider data distribution. These stewards might work together to transfer the data from one of NOAA’s centers of data to one of the National Data Centers, either once or on a routine basis if the collection is continuously growing. Following the transfer, the data should be veri- fied for completeness and accuracy; the metadata should be captured and standardized; an optimal organization should be established to serve current and future users; and dataset provenance should be documented. Finally, the collection is offered to users by the best available methods and scaled appropriately for the data set. In the broader context of data discovery, access, and integration, it will also be necessary to design data management systems that can accept and disseminate a wider array of metadata as it expands over time. Many user benefits can be derived from quality metadata, including informative Web interfaces for users and increased interoperability through imple- mentation of standards and programming interfaces. The importance of continued metadata improvement cannot be overemphasized and is an important key to enabling integration of data services across NOAA. DATA SUPPORT SERVICES AND TOOLS Guideline: Each data set should have an identified expert who understands the data and who is responsible for working with data

OCR for page 41
 DATA STEWARDSHIP providers and data users to maintain the archive and make certain that data access systems meet user needs. Data support services work at the interface between the data archive and the data user, while data support tools are the applications used to analyze and understand the data. Favorable user experiences are strongly linked to how well these stewardship functions are provided. Of par- ticular importance are the personnel who work both individually and collectively to acquire, preserve, improve, and disseminate the data and metadata under their purview. These data stewards are critical for suc- cessful and effective service because they serve as the bridge between data collectors/generators and end users. In addition to understanding both the data and its end uses, they can often anticipate user needs and orga- nize the information in a manner that helps meet those needs, can work with technology experts to design optimal archival and access systems, and can understand and help resolve user problems and questions. Stewardship starts with the investigator or investigators who first col- lect or obtain the data. For some data sets (for example, time series from a cruise, aircraft, or other in situ measurement) the investigators might collect the data directly, while for others (for example, surface weather observing networks), the investigators might include both volunteer observers and the NOAA staff who routinely monitor the instruments, collect the data from a number of locations, and perform initial qual- ity control. In many cases, one or more of the investigators are already located at a center of data or a National Data Center. For all situations, there should be transfer protocols in place that describe how the data should move from the investigator to an archive. Such protocols assure mutual understanding and increase the likelihood that a complete set of data will be archived for future use. Depending on the situation, the transfer protocol may range from agreements between individuals who routinely transfer data and certify each other’s actions with a data receipt to formal interagency or international submission agreements. The same principles used to transfer data from investigators into the NOAA archive system should be applied and formalized for the cases where data from centers of data need to be transferred to one of the three NOAA National Data Centers. Since data stewards are the primary contact point for information entering and exiting the archive, they should be familiar with the full spectrum of capabilities and standards of the archive and access system, as well as all legal, mission, and administrative requirements (see Chapter 2). Well-designed systems will satisfy a majority of users with automatic data access, but there will always be questions from users, so person- to-person data usage consultation remains an important data support

OCR for page 41
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA service. An active consultation service has several benefits: it helps detect problems with the archive; it helps data stewards gauge how the user community is evolving; and it improves user satisfaction. By maintaining substantial and ongoing communication with users, stewards can build and maintain more effective access tools and facilitate the discovery and integration of data to meet new user needs. (These topics are covered in additional detail in Chapter 6.) Data stewards should also work collaboratively with data experts, data providers, and principal investigators to establish agreed-upon pro- cesses for maintaining the current flow of information into the archive, identifying and planning for new data streams, and, as discussed in the next section, assessing and improving the data and metadata already in the archive. For future data-generating activities, it is beneficial to involve data stewards early in the planning process so that data archiving, stew- ardship, and access can be planned ahead of time and thus more economi- cally. Finally, data stewards need to collaborate with their counterparts at other agencies and international groups to ensure that all important environmental data are archived and made accessible. Guideline: Decisions to upgrade legacy data support systems should be thoroughly planned, in consultation with users, with particular atten- tion paid to preserving Web-based metadata. Many of NOAA’s legacy data archiving and access systems are pro- viding stable and reliable services to users. Decisions to upgrade these systems should be thoroughly planned, with input from users, to avoid costly interruptions in service and to make sure that upgrades actually result in demonstrable improvements in access, discovery, and integra- tion. A related issue is that Web-based infrastructure currently holds vast quantities of context information for data sets and projects. Much of these metadata are not being archived in a manner that permits easy associa- tion with the data for purposes of long-term preservation. This issue is complicated by the fact that Web-based information can easily be linked to information elsewhere on the Web, making it difficult to capture a complete set of metadata for long-term preservation. New methods are needed to address this issue. DATA STEWARDSHIP AT NOAA Establishing and maintaining effective data stewardship is a crucial but challenging task for any program or agency with data management responsibilities, but this task is especially challenging for NOAA because of the volume and diversity of its data holdings and the number and

OCR for page 41
 DATA STEWARDSHIP diversity of its users. While stewardship is mentioned in virtually all of NOAA’s data management planning documents, there may not yet be an agency-wide appreciation of the difficulty and challenge of data steward- ship. Various guidance documents also tend to use the term stewardship in different and sometimes inconsistent ways, which can lead to confusion in determining the appropriate roles for various components of NOAA’s data management enterprise. For example, the word “stewardship” is part of the CLASS acronym, yet current plans for CLASS indicate that the data centers are expected to provide the stewardship function. While this latter division of responsibilities seems most appropriate due to the close relationship between the data centers and their users, in the absence of a comprehensive agency-wide data management plan, confusion over stewardship responsibilities is likely to persist. Based on our review of current NOAA practices, a synthesis of previ- ous reports, and application of concepts that have gained broad accep- tance in the data management community, we believe the time is ripe for the development of specific guidelines, based on the general guidelines provided in this chapter, to develop a more consistent and integrated stewardship program at NOAA. Such a common understanding, coupled with a goal of being good stewards, can help guide NOAA in making appropriate data archiving and access decisions. Box 4-2 illustrates the potential benefits of an effective data stewardship program. The next two chapters discuss these aspects of data management in further detail, while Chapter 7 describes the overarching data management plan that is needed to bring together all three functional elements of data management.

OCR for page 41
 ENVIRONMENTAL DATA MANAGEMENT AT NOAA BOX 4-2 NCEP-NCAR Global Atmospheric Reanalysis: An Example of the Benefits of Data Stewardship The National Centers for Environmental Prediction–National Center for Atmo- spheric Research (NCEP–NCAR) Reanalysis provides a standardized time series of the evolution of the global atmosphere starting in 1948 (Kalnay et al., 1996). This suite of data products, which was made possible by scientific community fore- sight and resources from an operational numerical weather prediction center, has proven to be an extremely valuable resource for a broad range of scientific studies, and in many respects it serves as a model for effective stewardship practices. More than 40 year ago, long before data stewardship became part of the data management vocabulary, Roy Jenne, then at NCAR, and other like-minded data managers and providers at NOAA and NASA began sharing, collecting, documenting, and preserving a wide variety of in situ and satellite observations of atmospheric conditions. These data and metadata preservation activities estab- lished long-term collections that ultimately became a major input data component for the data assimilation and forecast processes used to create the NCEP–NCAR Reanalysis. In fact, it would be fair to say that reanalysis would not have been possible over the nearly 60-year period of record or at the fidelity we have come to expect today without the activities of Jenne and his colleagues. Stewardship is much more than data preservation. A typical dataset used in the NCEP-NCAR Reanalysis has gone through several phases of preparation, usage, and improvement, and careful stewardship is required during each of these phases. For example, through user studies that analyze the data for scientific con- tent and complimentary examinations and quality control by data stewards, fixable problems and systematic errors are occasionally uncovered. Data stewards are critical during this stage because they work with scientists to fully define the prob- lem and to design processes to fix the data. After the data are improved, the meta- data are updated and user access is refreshed with a new version of the data set. In this way historical data sets improve in quality through a stewardship-supported life cycle (see Figure 4-1). Many data sets used in the NCEP-NCAR Reanalysis have experienced several such iterations, and this process has also benefited subsequent reanalyses, such as the European Center for Medium-range Weather Forecasts (ECMWF) ERA-40 data set. The importance of data stewardship for the success of these efforts has been documented, and community-based plans have been developed to make further improvements (Schubert et al., 2006). Stewardship also supports the NCEP-NCAR Reanalysis (and other reanaly- ses) by providing data security, multiple access points for products, user consult- ing, and routine time series extension. NCEP does not archive the Reanalysis for public access; rather, it distributes copies to both NCAR and NCDC to ensure that the data are securely preserved. These two institutions also provide data support services and a variety of access methods to obtain the NCEP-NCAR Reanalysis. NOAA’s Earth Systems Research Laboratory (ESRL) provides additional support, such as transforming some of the most popular data to different formats so users can employ their favorite tools—for example, direct access by remote applica- tions—to exploit the data. NCEP’s collaboration with these other groups results in extensive and well-supported data availability for thousands of users worldwide.