In order to facilitate fact-finding for the case studies, the committee developed a set of criteria to function as a framework for identifying and analyzing issues involved in interfacing disparate data types. The criteria are intended to assist in learning lessons from past experiences and in developing general principles for future data integration efforts in support of environmental research.
The committee has identified five subject areas to consider in its assessments. These are:
Data characteristics and quality,
Data management, and
The specific criteria in each area derive from basic data management and data integration issues. Since these five areas represent a somewhat arbitrary separation of issues, there is some overlap among the specific criteria proposed for each area. In addition, these criteria and their related questions are intended to encompass as wide a range of issues as possible. Therefore, not all of them are equally relevant to every study.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data A Case Study Evaluation Criteria In order to facilitate fact-finding for the case studies, the committee developed a set of criteria to function as a framework for identifying and analyzing issues involved in interfacing disparate data types. The criteria are intended to assist in learning lessons from past experiences and in developing general principles for future data integration efforts in support of environmental research. The committee has identified five subject areas to consider in its assessments. These are: User needs, Study design, Data characteristics and quality, Data management, and Institutional issues. The specific criteria in each area derive from basic data management and data integration issues. Since these five areas represent a somewhat arbitrary separation of issues, there is some overlap among the specific criteria proposed for each area. In addition, these criteria and their related questions are intended to encompass as wide a range of issues as possible. Therefore, not all of them are equally relevant to every study.
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data USER NEEDS Identifying all the users of data and their various needs is vitally important to the successful development and implementation of any data management plan. Given the interdisciplinary nature of much global change research, and the high cost of developing data sets, it is very likely that the user community will include not only existing study participants, but also additional future users. These future users may want to use the data for novel purposes and to interface them with data types beyond those originally envisioned. This requires defining user needs in the broadest sense possible. The term "user needs," as used here, refers to needs to find, evaluate, access, transfer, and/or combine data. It also refers to requirements for manipulating, processing, analyzing, or otherwise working with the data. Finally, it refers to the necessity for users to respond to institutional or cultural constraints, motivations, or pressures. Questions to consider include the following. Identifying Users Was there a clear definition of users and user groups at the inception of the research project? Were users at each step of the data path, from initial data collection to final analysis and archiving, clearly defined? Understanding Users' Requirements Were the specific requirements of users at each step of the data path clearly defined? Were future potential users' needs predicted and accommodated? Were there incompatibilities or conflicts among different user groups? Were institutional structures and management mechanisms (committees, working groups) established to identify users' needs and resolve conflicts? Did users feel as if their needs were accommodated? If not, why not? Technical Aspects Did the study create specialized algorithms, routines, data management procedures, or database structures to accommodate users' needs? If so, how successful were they?
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data Did the study, as originally envisioned, require interfacing disparate databases? Were interfacing requirements and issues understood and allowed for? STUDY DESIGN There are many design principles related to the conceptual and statistical validity of scientific research. Here the committee considers only those related to interfacing among different data types. Considering potential data interfacing issues in the original study design usually lessens the problems associated with the integration of data. This section considers strictly technical study design issues. Other design issues related to program structure and management are listed below in the section ''Institutional Issues." Technical questions to consider include the following. Conceptual Framework Was the study based on an overall conceptual model that described the relationships (both theoretical and functional) among different data types? Was the conceptual framework pursued to a level of detail that helped identify data interfacing issues? Was the conceptual framework explicitly multidisciplinary and multimedia? Methodological Issues Was the study an interdisciplinary one involving multiple data types? Were all relevant disciplines and data types identified at the beginning of the study, or were midcourse adjustments required? Were pilot studies performed to assess potential data integration issues and solutions? Were data integration issues identified and planned for in the initial phase of the study? If not, at what stage of the study were they considered? Were methodological differences among study components that created difficulties in later integration identified at the outset of the study? What changes would the participants make in the study design if they had the opportunity to begin over again?
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data Data Integration Did the study design involve using preexisting data? If so, what problems were encountered? Were enough metadata available? Were there technical differences among disciplines that created data integration problems, e.g., requirements for different spatial scales or levels of detection? What kind of data integration did the study's data analyses require? Were these based on the study's underlying conceptual model and were they allowed for in the study design? DATA CHARACTERISTICS AND DATA QUALITY Issues related to data characteristics and quality will arise from a variety of sources. Some studies will combine both new and historical data. Historical data may contain numerous errors, often lack adequate documentation, may be collected or processed inconsistently from place to place or over time, may lack critical quality control information, and may be stored in incompatible formats. New data may represent a wide variety of data types, as well as spatial and temporal scales. Data volumes are typically large for climate change studies. Quality control is of paramount importance, since errors can occur not only at the time of collection and initial processing, but also at any time the data are accessed and used. Questions to consider include the following. Data Characteristics Were data characteristics sufficiently documented in the metadata? If not, how difficult was it to find needed information about the data? Quality Control If historical data were used, what quality control problems were encountered and how were they resolved? Were potentially problematic data characteristics known beforehand or discovered in the data integration process? How were differences in data quality among data sets handled? Were data quality procedures considered an integral part of data integration? How were data verified and validated?
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data Data Integration What specific data characteristics created data interfacing problems? What was the source(s) of these problems? Were there data formatting or quality standards that proved useful? What lessons were learned that would be applicable to other studies? DATA MANAGEMENT Data management refers to the provisions for handling the data at each step of the data path, from initial study design, through data collection, accessing, and analysis, to final reporting and archiving. It refers not only to specific technical procedures, but also to the overall plan for ensuring the original quality of the data and preventing their degradation over time. Data management plans should include organizational plans that specify data management functions and who has responsibility for data quality at each step of the data path. Relevant questions include the following. Up-Front Planning Was there an overall data management plan that supported the data integration process? What provisions were made for data access, retrieval, and manipulation? Were data management procedures designed to relate directly to technical issues involved in data integration? Were quality control issues considered in all data management procedures? Were archival needs considered at the beginning of the study? Data Management Procedures Were specific database tools developed to aid the database interfacing process? Are there readily identifiable authorized versions of the different data sets? If so, how are these maintained? What provisions were made to make metadata available to users? Did data management requirements related to database interfacing add to project overhead?
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data Did data integration directly benefit project participants? How accessible were the data? Were there any restrictions on use of the data? If so, what was the source of these restrictions? Planning for the Future Did data management procedures and systems explicitly consider future potential needs? What arrangements were made for archiving the data for future uses? Where are the data now and are they easily accessible? Are metadata readily available for future users? How easy would it be to transfer existing data to different database systems? What changes would the participants make in the data management plan if they had the opportunity to begin again? INSTITUTIONAL ISSUES Institutional issues often have an overriding influence on the success of data integration efforts, yet they can be difficult to identify and resolve. These issues arise, for example, from differences in agency missions and mandates, from funding restrictions, from differences in time horizons and constituencies, and from differences in organizational cultures. Relevant questions include the following. Participants Who were the key participants and what were their roles, responsibilities, and authority? What was the nature of the key participants, e.g., private, governmental? Were key players or data sources missing from the study? Did any participants place special conditions on their participation and/or on access to data, e.g., proprietary data?
OCR for page 121
Finding the Forest in the Trees: The Challenge of Combining Diverse Environmental Data Organization and Management What was the project's management structure, especially with regard to database interfacing? Was there a lead entity? Did the study's organizational structure support or impede database interfacing? What arrangements were made among the participants with regard to database interfacing? Were these formal or informal? What was the decision-making process, again especially with regard to database interfacing? What kinds of arrangements were made for acquiring data from other organizations? Was adequate funding available and committed for the duration of the study? Was there a long-term commitment to database updating and other maintenance? Who can access the data now and are there any restrictions on this What agency, if any, was given responsibility for long-term management and maintenance of the data? Data Integration Did all participants agree with the need for data integration? What mechanisms were established for cooperation and data integration? Were any of these novel? Were potential conflicts and disagreements clearly identified and negotiated at the beginning of the study? Did agency missions, mandates, and policies restrict participation or otherwise impede database interfacing? Did existing data management practices impede data integration?