At all levels of domestic and foreign government, within academe, and in the private sector, organizations have struggled with the development of spatial data infrastructures (SDIs). There is no established, validated process for developing an SDI, and past efforts have produced mixed results. However, there are lessons to glean from past effortss that could be applied in developing and refining the SDI for the U.S. Geological Survey (USGS). In reviewing the past efforts, the committee noted several relevant experiences that can provide valuable guidance for the USGS. The missions of various organizations may differ from that of the USGS, and those organizations may have unique requirements, but there are common lessons from each that can serve as a roadmap for successful SDI development for the USGS. The examples selected for this chapter have particular relevance to some aspect of the USGS requirements, and some have been successful.
The committee chose to look at lessons learned from efforts of several types of organizations to gain the broadest perspective possible. Fourteen organizations were examined in the following five categories: USGS analogues in other countries, multinational organizations, U.S. public and private institutions, large discipline-specific organizations, and spatial data at the USGS (see Box 3.1). For the USGS, planning a unique SDI that serves a variety of scientific domains means that no single SDI example can be translated directly to the USGS. However, the committee’s examination revealed several themes that recurred in differ-
Examining Spatial Data Infrastructures
USGS Analogues in Other Countries — The British Geological Survey has made cultural adjustments and committed an impressive budget commitment to managing spatial data. Geoscience Australia is beginning to recognize the high value of scientific collaboration through data-sharing enabled by an SDI. These cases provide lessons at the organizational level and are the closest organizational analogues to the USGS.
Multinational Organizations — The Infrastructure for Spatial Information in the European Community and the Global Earth Observation System of Systems are ambitious multinational efforts at standardization and collaboration with direct relevance to USGS’s role in the National Spatial Data Infrastructure.
U.S. Public and Private Institutions — In the United States, the National Geospatial-Intelligence Agency, the National Aeronautics and Space Administration, the Texas National Resources Information System, and NAVTEQ each take different approaches to integrating datasets from multiple sources. Standardization plays a particularly large role and varies among these institutions, and it provides a valuable comparison for the USGS.
Large Discipline-specific Organizations — The National Ecological Observatory Network and the Consortium of Universities for the Advancement of Hydrologic Science, Inc. provide lessons from large-scale data integration and access efforts.
Spatial Data at the USGS — The USGS Topographic Mapping Program is the seminal agency-wide commitment to an ambitious spatial data program that established the core value of spatial data at the USGS. Research at the Center of Excellence for Geospatial Information Science is providing much of the technology needed to implement an agency-wide SDI through its work on The National Map. The National Biological Information Infrastructure and National Hydrography Dataset are successful integrations of multiple, dissimilar datasets with direct relevance to spatial dataset integration for the USGS SDI. These programs provide examples of how SDI development has occurred at the USGS.
ent organizations. Other lessons are drawn from single incidents that are directly relevant to some aspect of a USGS SDI.
Geoscience Australia (GA) is the national geoscience research and information agency for Australia. GA was formed in 2001 as a result of a merger of the Australian Geological Survey Organization with the government bodies for topographic-mapping and remote-sensing functions. Like the USGS, GA operates in a federal system, in partnership with the states and territories of Australia. Spatial data are a prime responsibility, and activities focus on providing key information for Australia with an emphasis on onshore and offshore environmental hazards
and natural resources. GA is also responsible for coordinating the implementation of the Australian government’s policy on spatial data access. The following information is synthesized from a questionnaire provided by the GA information-management team supplemented by information drawn from a recent report by the Australian National Audit Office (2010).
In designing and implementing SDIs in GA and in other state and science organizations in Australia, there have been a number of common and important challenges that range from organizational and cultural concerns to policy and financial issues. SDI development had been difficult for highly competitive, inwardly focused organizations and ones that focused on the final deliverable. Self-taught experts dominated discussions about SDI development, rather than the necessary highly trained technical informatics experts who fully understood an SDI and who were committed to its successful implementation. Science funding has been increasingly competitive in the last 3 decades. Although collaboration on issues such as data-sharing and agreement of standards is critical for the development of an SDI, competition for shrinking funding has made it difficult for scientists to collaborate. In other cases, scientists did not share data, because they believed the data were unfit for release and had no timeframe for completing the data-improvement processes. Agreed policies were imperative at the organizational level in that properly implemented and articulated policies can be an enabler for SDIs. Spending large amounts of funds in a short period became unsustainable for the financial health of those efforts.
GA personnel reported that the most important factors for successfully building SDIs were the ones that focused on collaboration to develop and improve data standards (in accordance with international standards) and the ones that focused on making data accessible to the broader community. In developing data standards, once the standards are defined and agreed on, they must be applied consistently.
Another factor that led to the Australian government’s successful SDI design and implementation was a well-developed roadmap that was based on sound scientific and business practices; that encompassed technological, computational, and engineering viewpoints; and that was consistently reviewed and updated as required. A well-written business case articulated the value proposition of an SDI, and the efforts were championed by a leader who was knowledgeable and respected in the community and could clearly articulate the value of an SDI in the organization. College educated and respected professionals who understood the technology were needed. Incremental SDI implementation was also important; it was more effective to establish progressive goals than a final deadline.
The culture of the organization played a role in successful SDI implementation. The introduction of an SDI was initially disruptive. Realistic expectations
Excerpts of Key Findings about Geoscience Australia from the Australian National Audit Office
“Feedback from government agencies and key industry stakeholders confirmed that Geoscience Australia’s work is valued and often essential to their outcomes. Notwithstanding this positive feedback, Geoscience Australia’s website, its key interface with customers, is complex to use and more data and information could be made publicly available. In addition, the management of many product and service projects lacked project plans, risk assessments and key performance indicators.”
“In addition, there is no inventory that documents the purpose, extent and nature of Geoscience Australia’s data and information holdings and physical collections. It is therefore not well positioned to appropriately maintain and store its data holdings or make informed decisions about the accessibility of that data.”
SOURCE: Australian National Audit Office, 2010
were needed as inevitable improvements were made after the introduction of the SDI. A policy of under-promising but over-delivering is useful in such situations. Support by the executive level can foster commitment and enthusiasm in senior, middle, and junior members of staff. Adequate funding was also important.
A recent Australian National Audit Office report (2010) provides additional lessons for SDI development (see Box 3.2). The findings are pertinent for public-sector organizations responsible for custodianship and delivery of public-sector data and information, such as the USGS. The key points of the audit report echo findings stated by GA employees. GA’s value is in its spatial data but has yet to be fully appreciated. GA has not yet developed a clear spatial data plan, cataloged and shared data, improved communication with partners, or implemented standards. In many ways, the Survey is further along than GA in SDI implementation, but many of the missed opportunities outlined in the GA audit report can also apply to the USGS.
British Geological Survey
The British Geological Survey (BGS) is the national geological survey of the United Kingdom (UK). Unlike the USGS, which covers many disciplines, BGS examines only geoscience. However, both the BGS and USGS have national responsibility for the acquisition, analysis, management, and delivery of geoscience data in their countries. The BGS budget is roughly £48 million, and approximately half of it is funded by their national government (British Geological Survey, 2011).
The UK treasury agency, HM Treasury, conducted an audit of BGS in 1992 and found its data fragmented, questioned its accuracy, and concluded that the existing information systems could not support BGS’s mission for providing geologic data. It also expressed reservations about the value of the unique data holdings as a major competitive strength of the BGS (Griew, 1990). In 1996, an additional external review of BGS found little improvement in data management. In 2000, after 2 years of pilot studies and development of a new strategy that recognized BGS as an information organization, BGS was restructured from a hierarchically managed organization to a matrix-managed organization. An Information Directorate was created, assigned one-third of the BGS budget, and given corporate priority to work on metadataa, data standards, data-product development, and delivery.
The investment of one-third of the BGS budget in data and the priority given to data activity have resulted in clear benefits for its partners and data users. The result has been up-to-date, quality-assured, and interoperable national versions of all the primary geoscience datasets and internal and external access to an extensive variety of core and value-added Internet information services.
The organizational and cultural challenges that BGS faced in the 1990s in improving the poor condition of its data and information policy and practice are probably similar to those faced by the USGS. A systematic approach was lacking for setting priorities among research projects according to national needs, and the focus was instead on localized independent research projects. Fieldwork and research were accorded high priority, whereas data management was seen as an inherently tedious and unproductive task and received lower priority. Scientists claimed ownership of data, were protective of their data, and were afraid that others might misuse them. Furthermore, individual approaches to data management meant that data standards, either technical or semantic, were not complied with or developed. There was also a cultural divide between scientists who gather and use data and the information system and technology experts who develop and understand how to manage data. Another difficulty was that scientists lack a proper understanding of the needs of society and their stakeholders and often were unable to engage and communicate with them effectively to establish and realistically meet their needs.
In the decades before 2000, data had not received high priority and had not been highly valued. The involvement of scientists in information management is crucial, but data management typically does not have high priority, and placing that responsibility on scientists in the absence of strong and prescriptive directions has proved unsuccessful. Over the last decade, BGS has come to recognize that its Unique Selling Proposition is “national, long-term, and strategic” and that its core competence consists of both expertise and data. BGS adopted a corporate-
British Geological Survey
Stakeholder Benefits from a Corporate- and Asset-based Approach to Data Infrastructure
• Reduced staff effort in finding data.
• Reduced duplication of effort (building databases and applications).
• Improved quality of data available to staff and customers.
• Allowed corporate implementation of standards and best practices.
• Facilitated collaboration within BGS.
• Provided the opportunity to integrate data of diverse types and to create innovative products and services.
• Enabled BGS to be more responsive to customer needs.
based and asset-based approach in developing its data infrastructure, and this benefited both its staff and stakeholders (see Box 3.3). External stakeholders are generally appreciative of the benefits of a professional and corporate approach to information, and their encouraging responses have has led to improvements in engagement, services, and internal processes. The cost-recovery model that funds national mapping in the UK has provided a powerful incentive for public-sector organizations and their employees to focus on customer requirements and to produce datasets that are complete and up to date. Organizations that lack this cost-recovery model, such as the USGS, will need to establish another way of incentivizing scientists to communicate. The cost-recovery model also limits the free availability of data as required for U.S. federal agencies. Finally, implementing an effective information strategy is not a one-time action but requires enduring responsibility.
Infrastructure for Spatial Information In Europe
The Infrastructure for Spatial Information in Europe (INSPIRE) is a directive of the European Commission that establishes an infrastructure for spatial information throughout the European Union (EU). The directive went into effect on May 2007, signaling that the EU decided that a coherent SDI was essential for environmental policy-making across its national boundaries. INSPIRE is a distributed infrastructure and will be based on existing SDIs operating in the 27 member states of the EU. In June 2010, the Krakow Declaration was approved which recommended that participating governments and organizations (1) maintain efforts and investments needed to establish INSPIRE; (2) increase international collaboration; and (3) support implementation of SDIs in non-EU countries.
INSPIRE is being implemented in stages; full compliance is required by 2019 (see Table 3.1 for major milestones). It addresses 34 spatial data themes
Table 3.1 Major Milestones for Implementing INSPIRE
Entry of INSPIRE directive into force
Entry of INSPIRE metadata regulation into force
Entry of provisions of directive into force in all European member states
Adoption of INSPIRE regulation on discovery and view services
Adoption of rules governing access rights of use to spatial datasets and services for Community institutions and bodies
Establishment and running of geoportal at community level by the European Commission
Metadata available for spatial data corresponding to Annexes I and II
Discovery and viewing of services operational
Transformation and downloading of services operational
Newly collected and extensively restructured Annex I spatial datasets available
etadata available for spatial data corresponding to Annex III
Newly collected and extensively restructured Annex II and III spatial datasets available
Other Annex I spatial datasets available in accordance with Implementation Rules for Annex I
Other Annex II and III spatial datasets available in accordance with rules
needed for environmental applications (see Box 3.4). Some of the themes are within the purview of the USGS, but many extend well beyond the mission of the USGS. The directive is specific and provides detailed technical implementing rules, which cover metadata, data specifications, network services, data-sharing, service-sharing, and monitoring and reporting. The intent of INSPIRE is to enable the sharing of environmental spatial information and to facilitate better access to data held by public-sector organizations throughout Europe.
INSPIRE is being implemented in 27 countries that have different languages and cultures, different levels of geographic-information maturity, varied legal systems, and varied approaches to public-sector data access. There are many challenges in introducing an effective SDI, and INSPIRE has defined a number of technical challenges that it has addressed as a part of its basic principles, including
INSPIRE Spatial Data Themes
• Collection of data only once and their being kept where they can be maintained most effectively.
• Ability to combine seamless spatial information from different sources throughout Europe and share it with many users and in many applications.
• Possibility for information collected at one level or scale to be shared at all levels and scales and to be detailed for thorough investigations and generalized for strategic purposes.
• Availability of geographic information for good governance at all levels that would be transparent and readily available.
• Discoverability of the available geographic information and awareness of how the data can be used to meet particular needs.
Reaching an agreement on the scope and design of INSPIRE was a major challenge, and implementing the directive is an even greater one. As of June 2010, Cyprus, Finland, France, Greece, and Luxembourg had failed to enact key INSPIRE components in their national law (EU, 2010). Although INSPIRE is coordinated by the European Commission, it is dependent on the consent and
close involvement of stakeholders and experts in all member states. This international group develops, scrutinizes, and reviews rules and specifications before they enter into law.
INSPIRE is a multinational undertaking, and there are many lessons to glean from its implementation. Three overarching lessons are especially relevant for the USGS to consider. First is the importance of stakeholder collaborations for developing appropriate parameters for an SDI. The INSPIRE directive is a legal instrument that has been transposed into national law in 27 EU member states. To implement an SDI in a complex system in a reasonable timeframe, the EU decided that a legislative approach was necessary. However, INSPIRE still depends on open and transparent stakeholder involvement; it would not be viable without multinational collaboration to define and review specifications and processes.
Second is the importance of having relatively straightforward goals. INSPIRE has been able to distill the purpose of the SDI to making spatial data throughout Europe discoverable, viewable, interoperable, and downloadable and therefore removing barriers to the access and use of data.
Third is the importance of reasonable expectations and timelines given limited resources. The committee believes that despite the simple goals of INSPIRE, the expectations and timeline were too ambitious, and the resources necessary to carry out the goals were underestimated. With diverse stakeholders and data domains, a lesson for the Survey in implementing an SDI is the necessity of simplifying the vision and creating pragmatic objectives.
The Global Earth Observing System of Systems
In 2005, the Group on Earth Observations (GEO) launched efforts to create a Global Earth Observing System of Systems (GEOSS) that would link many different Earth observation systems into a common framework. The framework would not only support science but support decision-making and have applications in a wide array of “societal benefit areas” (SBAs). The nine defined SBAs include disasters, public health, energy, water management, weather, climate, agriculture, ecosystems, and biodiversity (GEO, 2011). With a 10-year implementation timeframe, GEOSS is still in the process of being implemented: it is building on a diverse set of contributed components, and a GEOSS common infrastructure is under development. There remain many challenges, and preliminary lessons can be derived from the experience to date. GEO has made noteworthy progress in SDI development in various ways.
Technical interoperability is a key concern, in that it is difficult to interconnect diverse systems that were developed by different organizations and countries
for different purposes. For a system of systems to function effectively, clear and open interfaces need to be defined between systems regardless of their specific structures and implementation of the component systems. Thus, a major thrust of GEO’s technical efforts was on developing and agreeing to an open architectural approach that could be implemented through widely accepted and transparent interoperability standards. Several groups that are responsible for standards development and implementation were involved at the outset. Prototypes and testing activities were attempted in developing consensus on the most appropriate standards and specifications for exchanging data, metadata, and interlinking tools and services. With GEOSS addressing diverse applications and data types, a key challenge continues to be to develop and implement standards and specifications that can be interoperable among applications and disciplines while providing flexibility to allow tailoring of outputs and interfaces to specific user needs.
The voluntary nature of GEO meant that organizational and institutional cooperation and participation would be a key challenge for GEOSS implementation (GEO, 2011). The implementation plan includes an explicit expression of GEOSS data-sharing principles, which call for full and open exchange of data, metadata, and products within GEOSS and recognition of relevant international instruments and national policies and legislation. It will have to be determined how to enable more open and flexible use of data by GEOSS users while respecting the rights and concerns of data providers, all in the context of a voluntary international initiative.
Perhaps the most useful lesson learned from GEOSS to date is that a voluntary, intergovernmental framework has the potential to create a functional, global-scale SDI. The voluntary nature of the initiative has encouraged a focus on both short- and long-term incentives for participation, cooperation, and collaboration. In the case of GEOSS, such incentives include
• The expected benefits of shared data and services.
• The need to reduce unnecessary duplication in data collection, processing, analysis, and dissemination.
• The need for cooperative decision-making on regional and global levels on pressing environmental and resource problems.
• The desire to make progress on shared international goals for poverty reduction and sustainable development through better access to vital data.
• The importance of expanding the use of Earth observation and related geospatial data in a variety of SBAs.
Incentives like those are likely to be just as important for the long-term success and sustainability of an SDI as a legal or government mandate.
National Geospatial-Intelligence Agency
The National Geospatial-Intelligence Agency (NGA) relies on imagery and geospatial information to “provide timely, relevant, and accurate geospatial intelligence (GEOINT) in support of national security” (NGA, 2011). NGA was formed in 2003 and is both a combat-support and national intelligence agency, so it is staffed, funded, and guided by the Department of Defense and the U.S. intelligence community. By using imagery intelligence, mapping, charting, and geodesy, NGA uses GEOINT to form a common operating picture (COP) for military and senior decision-makers. A COP consists of a model of Earth (such as a chart, map, or composite image taken from a variety of sources) and then layers on locations of friendly forces, enemy positions, roads, power lines, buildings, or geologic features. This multi-layered complementary information is used to build detailed pictures and enables decision-makers to work with the best available data.
The NGA has two continuing challenges relevant to the USGS: (1) developing and maintaining data-sharing partnerships with diverse national and international stakeholders, and (2) working with standards development organizations for the development and adoption of standards. The National System of Geospatial Intelligence (NSG) is a broadly-defined SDI that supports GEOINT across the many defense, military, and private-sector organizations involved in it. Those partners have geographic information services specialists, imagery intelligence officers, geographers, meteorologists, and others with a GEOINT perspective. The NSG collectively harnesses the skills and energy of those agencies to tackle the highest-priority challenges of the U.S. government. An NSG Senior Management Council meets twice a year to review unified operations, improve information-sharing, re-evaluate methods, and define the most difficult challenges ahead. NGA has an international program that provides a unified direction in building and maintaining international partnerships. There is a continuing need to provide information and address the concerns associated with releasing information to allies and coalition partners, and forging relationships with these partners is increasingly important because of the growth of coalition activities, evolving international threats, and the expanding globalization of GEOINT.
The second challenge facing the NGA is the critical role of universally accepted and agreed-on standards. Standardization ensures that NSG system components perform as they should and are integrated in a way that allows GEOINT to be exchanged between them. The National Center for Geospatial Intelligence Standards (NCGIS) is the coordinating organization in the NGA that is responsible for setting and implementing GEOINT standards-management policies for NGA and the NSG community. The NCGIS was established to ensure a coordinated standards-based approach to achieving data and system interoperability, implement collaborative business practices, and act as an advocate for the needs
of the NGA and the NSG community. Through strategic planning and enterprise architecture-based analysis, the NCGIS strives to optimize NGA resources as it implements a comprehensive NSG-wide standards-management policy. The NCGIS sponsors the Geospatial-Intelligence Standards Working Group, an NSG community forum that addresses issues on the latest standards that are critical for achieving the systems interoperability necessary for mission success. An NSG-wide plan for standards and continued involvement of the NSG community are crucial for developing and implementing standards that enable sharing of timely, relevant, and accurate GEOINT.
Several lessons can be taken from the NGA approach. First, it is important to establish a plan that is based on a vision and shared values among all partners involved in spatial data. The USGS could use NGA’s mission-driven approach to providing geospatial information and products to relevant users of USGS information, in the same way that NGA provides GEOINT to its stakeholder communities. Second, it is important to identify common standards for participation. Technology and data standards are key to enabling interoperability of information resources and services throughout a broad community. Third, it is critical to address the business requirements of the community. The purpose of an SDI is to provide an information infrastructure and a service to a community. The NGA explicitly recognized the service role as central to its mission, so priorities had to be set among the information needs of the users.
National Aeronautics and Space Administration
The National Aeronautics and Space Administration (NASA) is the U.S. operations and research organization for space and aeronautics. NASA provides numerous types of Earth-observing data, primarily from space-borne and airborne platforms through its Earth Observing System (EOS). The EOS is a coordinated series of satellites that produce long-term observations of land surfaces, biosphere, solid Earth, atmosphere, and oceans. The National Research Council report Earth Science and Applications from Space: National Imperatives for the Next Decade and Beyond (NRC, 2007a) recommended that NASA launch a set of 15 missions in the next decade to continue expanding its space-borne missions.
Data typically flow systematically from space-borne missions, but some mission events result in other data acquisitions in addition to or at the expense of the routine observations. That is analogous the program-vs-project tensions described in Chapter 2. For example, an active airborne science program results in intermittent, campaign-driven observations that are less continuous or consistent in time and location. However, one advantage of the airborne program is its flexibility to tailor data collections to observed phenomena. Numerous instruments collect the
observations, and this results in a wide variety of data of different measurement types, coverage, and resolution. The diversity of data streams presents a challenge in relation to workflow, coordination, and standardization.
In collecting and analyzing data from different types of observations systems, it is necessary to have an industry data standard for ensuring that data are properly integrated and georeferenced. Standardizing data formats increases the user base and makes it easier to integrate and access different types of data. NASA has learned that it is imperative to avoid frequently changing formats (Friedl and Donnellan, 2010), inasmuch as a change in detection or classification of data can suffer large errors from small co-registration or geo-registration errors (Townshend et al., 1992). Resampling or reprojection of data can also cause geospatial errors. Similarly, metadata need to be in a standard format and be easily interpretable. In the absence of consistent metadata, researchers run the risk of using the data and derivative products improperly; a single convention for variable names would help to avoid improper use of data.
It is important to archive existing data, so NASA uses several Data Analysis and Archiving Centers (DAACs). However, the distributed and locally controlled DAACs can make it difficult for users to access and use NASA data. Simple changes, such as an agency-wide single sign-on Web portal, could greatly improve data access. NASA has also learned that a consistent core set of capabilities would be beneficial, such as powerful, flexible, and consistent visualization tools that enable researchers to more fully explore and use data from NASA and other organizations. One such approach that NASA used is the Open Geospatial Consortium visualization and data-delivery services.
Texas National Resources Information System
The Texas National Resources Information System (TNRIS) was established (as the Texas Water-Oriented Data Bank) by the Texas legislature in 1968; its mission was to provide a “centralized information system incorporating all Texas natural resource data, socioeconomic data related to natural resources, and indexes related to that data that are collected by state agencies or other entities” (TNRIS, 2011). TNRIS evolved into one of the first state-wide clearinghouses for Geographic Information System (GIS) data with staff trained in the natural, computer, and library sciences to supply data to government, academe, the private sector, and the public. The TNRIS data catalog includes about 1 million frames of aerial photography and more than 50 unique datasets equal to about 50 terabytes that it distributes through an estimated 10,000 data downloads per month (TNRIS, 2011). Datasets include data on elevation, land cover, geology, soil survey, meteorology, hydrography, mineral resources, energy resources, orthoimagery, Landsat, light detection and ranging (LIDAR), and census. TNRIS has cooperative agreements with many data-collection agencies; for example, it
regularly combines funding with the U.S. Department of Agriculture (USDA) Natural Resources Conservation Service and makes specific recommendations for the soil-survey data.
Making data discoverable and viewable and archiving data are important functions of TNRIS. TNRIS ensures that its data conform with open geospatial data standards, which allow it to provide imagery to Google and Microsoft for their Web mapping systems. TNRIS publishes metadata about its holdings according to standard federal metadata practices, but it also associates additional “tags” for search-engine optimization and discoverability via Web searches and its own Web mapping viewer. TNRIS has periodically received grants to digitize over 1 million air photos from 1920 to the 1980s, and the scanned images are available to the public. TNRIS also maintains a limited historical map collection and has worked with other agencies to scan historical maps into accessible archives. The success of the TNRIS SDI program lies in the economic impact of making data available at no cost and the fact that TNRIS uses various metrics to track the use of the data provided, including number of downloads and frequently downloaded files. About 1 terabyte of data is downloaded monthly at an original equivalent cost of roughly $1 million. TNRIS is working to increase the granularity of the statistics to track its performance.
One of the primary challenges for TNRIS has been to maintain constant base funding for acquiring new data. To maintain funding of about $1 million per year, TNRIS has had to demonstrate the use and value of the data archives. Therefore, it is constantly seeking ways to improve how it measures data access and use. Data-storage requirements have become an issue as demand for increased frequency, accuracy, and quality of data has continued to drive the need to make more data available. Infrastructure costs are rapidly declining, and new cloud platforms present potentially efficient services for hosting and dissemination of data. Keeping pace with lowered infrastructure costs is important to offset increases in data-storage requirements. A review by the state Council on Competitive Government found that programs typically prohibited sharing of data for 2 years and presented barriers to free access to data. As published data lose currency and as changes occur, there are opportunities for incorporating community stewardship of key data, such as road and hydrography networks. Maintaining authoritative datasets with “crowdsourced” data will require more sophisticated technology and processes to strengthen the quality and accuracy of an SDI.
Some lessons can be learned from TNRIS that range from strategic programmatic and management concerns to technical ones. The alignment of large-scale data repositories with clear priority issues contributes to long-term sustainability of data-acquisition programs. In the case of Texas, current and accurate data have been associated with water-resources management in the state for over 50
years. It has been important for TNRIS to have active partnerships that serve all levels of government and the public: Texas has a long history of working with federal agencies, such as the USGS and USDA. Adaptive technology strategies are key to keeping current on migration to Web-based data services and to lowering long-term costs. Open standards for data and Web platforms are essential, and open access to public-domain data has been important for withstanding cycles in funding and priorities for public geodata. Texas’s statutory authority to designate members of a Texas Geographic Information Council has reinforced a long-standing culture of data-sharing and coordination. Adopting a state-wide data-acquisition contract has allowed greater response by state, tribal, and local governments to identify mutual projects and initiate procurement within weeks rather than months. Dedicated capital funding makes it possible to allocate funds for data purchases and supports clear and meaningful metrics for tracking data priorities and results. Organizational culture is important for developing a strong culture of sharing and value creation, so it is essential to have executive support in many agencies to foster that development. Also, a commitment to a long-term vision of open access to data is also necessary for success.
NAVTEQ, formerly a subsidiary of Nokia Corporation and now fully integrated into it, provides highly attributed digital roadmaps with extensive coverage throughout 85 countries. The U.S. database consists of more than 5.5 million miles of roads organized into five function classes with up to 260 attributes per segment. Data are gathered principally by driving the roads with GPS-equipped vehicles to record and verify visual attributes, such as address ranges, median types, and lane markings. When possible, the company also sources data from local authorities, government records, and, increasingly, individuals who submit “map reports” online, indicating a change or error in the database. Increasingly, the company processes data from contracted “probe” vehicles—position data gathered from vehicles equipped with positioning systems that can substitute for dedicated drives and help to identify changes in the road network. More recently, the company has fielded vehicles with scanning LIDAR and high-resolution video, providing dense point clouds and imagery of all features along the road. The latter generates a massive amount of data with each vehicle drive; each map feature is versioned, and a history of prior versions is maintained. Because of the complexity and volume of the data, a well-defined SDI is essential to NAVTEQ. NAVTEQ licenses data in a wide variety of formats to a wide variety of customers and applications; thus, data-sharing is already the nature of the business.
Worldwide, over 1 million changes or additions are made in the NAVTEQ database daily, presenting a global data quality and consistency challenge that
requires numerous coding tools and in-line checks. Field teams around the world upload their data to the main database, with coding of the data collected during a drive. Maintaining data quality over such a widely distributed data-collection system is a substantial challenge. NAVTEQ’s approach is to embed a series of redundant checks into each step of its data production and validation process. An independent quality department maintains all quality processes and conducts a large number of tests during processing and after the database is updated, just before its public release. Another challenge is integrating data from sources outside NAVTEQ. Because data quality and thoroughness vary widely beyond the core 85 countries, NAVTEQ found it necessary to introduce two lower classes of digital roadmaps in addition to the primary, fully navigable map. The two new classes of map include third-party data that require NAVTEQ to translate submitted data into the required format and process them in a multistage pipeline that ensures basic soundness and validity before inclusion in the production map.
During the evolution of its SDI, NAVTEQ learned some important lessons related to the processes instituted that would achieve maximum levels of data quality. Key aspects of the quality program included attacking problems at the source, such as when the data are initially collected by the field team, because they are most familiar with their local areas, and when third-party data are acquired where some sources had provided poor-quality data that required extensive rework by NAVTEQ. It was also important to qualify all sources extensively and ensure that the same standards were applied to third-party data as to internally generated data. The analogous solution for the USGS would be to invoke standardized quality requirements at its sources, including state- and county-level data inputs.
A second key lesson learned is that the technology of the underlying database structure and tools needs to adapt constantly. NAVTEQ went through a massive reformulation of its database over several years and continues to modify its data structures with the inclusion of 3-dimensional data and imagery. The technical tools and structures used for the database need to be designed for current and known future expansion of requirements. For the USGS, that would mean that database structures will need to be planned beyond the current set to handle anticipated data types, such as multispectral data and an expansion of data layers.
The National Ecological Observatory Network (NEON) is funded by the National Science Foundation (NSF) to build on research findings of the last century and create a set of continental observations over the next 30 years. These observations would allow greater understanding of ecological changes by observing biological, biophysical, and geochemical interactions of various ecosystems
across the nation’s landscapes. The ability to understand ecological dynamics calls for a suite of observations to be conducted in an integrated manner, which few studies have been able to do. NEON was established as a structure to manage the environmental data cycle. That would include the use of sensors in the field and in space, processing and visualizing data, and sharing data and providing tools for collaboration from local to global scales. NEON is designed to answer grand questions in environmental science in a way that has not been possible with the individual investigator-driven efforts of the past. To accomplish that goal, NEON is building a cyberinfrastructure to manage large volumes of spatial data, such as physical geography, human geography, and satellite data.
The complexity of data integration and synthesis is enormous. Much thought has been given to incorporating cyberinfrastructure and geospatial analysis tools needed to analyze ecological and environmental observations and the network of sites. The information framework provides synthetic capacity and provisions for forecasting ecological dynamics at specific sites and across regions. Other agencies have data that are critical for NEON’s mission, thus NEON is partnering early with these organizations to share and process data. The USGS is negotiating a memorandum of understanding to establish a relationship with NEON so that cyberinfrastructure framework development will have an interface with the USGS SDI.
Because NEON is still in the planning phase, the lessons learned are limited. However, in constructing a purpose-built cyberinfrastructure from scratch, NEON has had the advantage of customizing an SDI that would meet its specific needs. Scientific data in the environmental sciences have a spatial component, and an SDI can form the backbone for handling scientific data. NEON has recognized the importance of formatting and checking the quality of data at every step as they move through the cyberinfrastructure—an approach to spatial data quality that is similar to that taken by some private companies. It includes a multistage pipeline approach that feeds data through a series of quality checks, including an examination by a scientist before data are published on the Web portal. To accommodate data that do not meet high NEON standards, NEON also categorizes data by level of quality. Another lesson from NEON is the importance of early partnerships to ensure that SDI constructs are integrated with existing SDI infrastructures of partner agencies.
The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization that was established in 2001 to further hydrologic sciences. CUAHSI is supported by NSF and represents 122 U.S.
universities in advancing hydrologic science through programs such as a network of hydrologic observatories, the Community Hydrologic Modeling Platform, a Synthesis Center, and a Hydrologic Information System. The latter develops infrastructure and services to improve access to hydrologic data and is most relevant to a USGS SDI. Hydrologic data can be classified in several categories, including water-observations data, geographic data, climate and weather data, and remotely sensed data. The data are reported as time series measured at fixed geographic locations indexed by latitude and longitude. Major national holdings of time-series information include the USGS National Water Information System, the Environmental Protection Agency (EPA) STORET Data Warehouse, Climate Data Online from the National Climatic Data Center, and snow and soil water observations from the USDA National Resources Conservation Service. Additional water-observations data are collected by state and local water agencies and by academic investigators. A complete inventory of water-observations data on a particular subject and geographic region requires accessing and synthesizing the various data sources into a consistent form.
The CUAHSI Hydrologic Information System (HIS) project has designed a new language for transmitting time series of water-observations data through the Internet called WaterML, and the USGS now publishes its time series of real-time data and daily data by using WaterML (Zaslavsky et al., 2009). CUAHSI HIS has also designed an Observations Data Model (ODM) for storing time-series data and a Hydro Server for publishing such data in WaterML (Tarboton et al., 2008). A number of universities and NSF-supported research centers use this approach to publish their water-observations data. The Texas Water Development Board is also publishing a database of coastal observations in Texas collected by various state agencies as a series of ODM databases and corresponding WaterML time-series services. The Open Geospatial Consortium and the World Meteorological Organization have together formed a Hydrology Domain Working Group, which is evolving WaterML towards becoming an international standard for transmission of time series of water-observations data through the Internet. CUAHSI HIS has a service oriented architecture that is organized into multiple data servers for publishing data, a centralized catalog for collecting and publishing metadata to support data discovery, and a desktop client for data retrieval and analysis. What is emerging is a service-oriented architecture for water-observations information.
It has become apparent that a functional hydrologic information system in the United States will require data sources to be labeled with standard terms so that searches of multiple information sources will be consistent. At first, CUAHSI attempted to label the time series indexed at HIS Central with a standard set of concepts, but it found it to be a nearly overwhelming task. Instead, the community agreed that standard hydrologic ontology is necessary so that this task can be
performed by the data providers, which demonstrates the importance of semantics and ontologies as a necessary component of a dynamic SDI.
It is also apparent that a fairly formidable “digital divide” exists between GIS data and time-series water-observations data and between GIS data and continuous spatial arrays of weather, climate, and remote sensing data. With Hydro Desktop, it is not yet possible to ingest continuous arrays of information; smoothly combine them with time series of water data defined on discrete spatial objects, such as points, lines, areas, and volumes; and then link the resulting datasets with simulation models and analysis routines. That would create a true hydrologic information system, but this goal is still some distance away in research and technology development.
A key lesson learned from CUAHSI’s experience is the importance of standards. There is a vast array of time-series data on water observations, and it is possible to access them in a common language. However, a hydrologic scientist will need to find and access the specific data in a format that is user-friendly. The emerging service-oriented architecture is being developed for water-observations information in which a client application, such as Hydro Desktop, is built specifically to search, view, and download water-observations data in WaterML. The generality of this construct is a way of organizing the functionality present in many spatial data information systems.
The USGS has successfully provided surveys and maps in support of the nation’s science and economy for the last 125 years. In fulfilling its mission to map the nation, many lessons in conducting successful mapping programs have been learned. There have also been relevant lessons in conducting successful partnerships, ensuring the continuity of data and information, and keeping pace with changing needs and technologies. And there have been lessons in the importance of adequate and enforced specifications and standards, the benefits and difficulties of integrating data of disparate types, and the importance of conducting research on data needs, sources, production, and applications. This section describes several SDI-related initiatives at the USGS that are in addition to The National Map discussed in the previous chapter and outlines their key challenges and the lessons learned from them.
National Biological Information Infrastructure
The National Biological Information Infrastructure (NBII) was developed by the USGS as a platform to enable federal, state, and public partners to coor-
dinate ecological and biological data holdings through the use of protocols that enhance data-sharing, data transfer, and geographic investigations (Rugg, 2004; NBII, 2011). The partner community provides work on standards, tools, and technologies that make it easier to find, integrate, and apply biological resource information in a geographic framework. A key goal is to make these data available to land managers, other scientists, and the public.
The system includes the use of the Global Ecosystems Data Viewer to perform customized viewing and data selection and to download ecosystems data layers. Ecological and biological data are available as a continuous raster in which each pixel value represents class codes that are described in the metadata for each dataset. The effort to map standardized, meso-scale ecosystems for the contiguous United States provides a biophysical stratification system for the United States. The data used to develop the mapping process were not all of the same quality or spatial resolution, but each dataset obtained and used for the mapping was considered to contain “best available data” for a given theme at a national extent.
The NBII has developed various geographic data tools, but these have not been consistently applied among the diverse holdings of the NBII. An example of a GIS spatial modeling tool that links various databases can be viewed on the National Institute of Invasive Species Science Web site. This predictive spatial modeling tool is an online statistical tool used to help to develop predictive models by using various user-defined regression techniques, and it generates a predictive surface based on the selected model. The results can be overlaid on Google Maps, allowing the spatial distribution of a given species to be visualized with the original species occurrence data that were used to create the predictive species. The output results can be saved as a map or in a pdf file, depending on a user’s needs. Although this is one success, a truly integrated data search and analysis portal is still not available.
One critical lesson from the NBII experience has been that establishing and distributing standards for the biological data community has been critical for the NBII’s success. Another lesson applicable to a USGS SDI is that the best integrated data are of little value if they are not easily discoverable. The NBII Web portal will need to make some progress in this regard, and it is a formidable challenge when the data are as diverse as those maintained by the NBII.
National Hydrography Dataset Plus
The National Hydrography Dataset (NHD) was developed by the USGS as the surface-water map layer for The National Map. The National Hydrography Dataset Plus (NHDPlus) was first released in late 2006 and is a suite of geospatial
products that builds on and extends the capabilities of the NHD. The NHDPlus integrates the NHD (1:100,000 scale) with the National Elevation Dataset (30 m) and the Watershed Boundary Dataset (WBD). Interest in estimating NHD stream flow volume and velocity to support pollutant fate-and-transport modeling was the driver behind the joint EPA–USGS efforts to develop the NHDPlus. The NHDPlus includes improved NHD names and networking; value-added attributes (such as stream order) that enable advanced query, analysis, and display; and elevation-derived catchments that integrate the land surface with the stream network, catchment attributes (such as temperature, precipitation, and land cover), stream discharge and velocity estimates for pollutant dilution modeling, and associated flow direction and accumulation grids. The NHDPlus represents the initial implementation of the national surface-water geospatial framework envisioned by the Subcommittee on Spatial Water Data, a group cosponsored by the Federal Geographic Data Committee and the Advisory Committee on Water Information.
A major production-related challenge was integrating the vector-based NHD and raster-based National Elevation Dataset to produce the NHDPlus catchment (local drainage area) for each NHD stream segment. The catchments were used to associate temperature and precipitation attributes with each stream segment in estimating stream flow volumes. The underlying method used to produce the NHDPlus catchments is described in USGS SIR 2009-5233 (Johnston et al., 2009). One step in addressing this challenge is to align vector streams with hydrologically conditioning elevation data better during catchment production. Improved integration of data across international boundaries with both Canada and Mexico is also needed. For both countries, coarse representations of the portions of drainage areas that fell outside the United States border were used in the initial NHDPlus.
Good stewardship of the underlying data used to produce the NHDPlus is crucial. The federated data model with stewardship has been working well for the NHD, but it is threatened by limited resource support in the USGS. There is also concern that private efforts based on the NHD could eventually supplant the dataset rather than build on it. A potential solution is to encourage private-sector entities to compete vigorously to provide useful services based on the data but not allow them to own the data. The public could maintain ownership of the data themselves and keep them free and in the public domain. The NHD and WBD stewards have made major commitments of resources to support their side of stewardship, but it remains a challenge for the USGS to continue finding the necessary resources to support its obligation to data stewardship.
From an organizational perspective, there is much to be gained from multiagency cooperation in spatial data development. The NHDPlus team was able to leverage the collective interest and resources of EPA and USGS to complete what
has since become a highly valued data source for the water-resources community. One of the biggest technical lessons learned is the need for the production process to be as automated as possible so that it can be updated regularly as underlying ingredient datasets are improved through the stewardship process. That is the impetus behind the current EPA–USGS effort to develop new NHDPlus production tools on the basis of the latest GIS technology.
There is tremendous demand for consistently produced nationwide and continental datasets, and the user community has been very supportive of the NHD-Plus in particular, because it is easily digested by existing computer applications. Although it is challenging to find the necessary resources to produce such nationwide datasets, the long-term benefits will probably exceed the costs.
The federated data model appears to work and has been beneficial to all involved, but no single entity can afford to provide all the resources required to improve the data. The community currently supports the data through a stewardship process, and each partner benefits from improved data and reduced duplication of effort.
The Topographic Mapping Program
The USGS Topographic Mapping Program (TMP) began in 1884, and its topographic maps have become the signature product recognized by the public and industry as a versatile tool for viewing the nation’s landscape. It has served as an essential instrument in integrating and analyzing place-based information and is a seminal model of a federal agency that has successfully created and supported a comprehensive SDI for the United States. Almost from its beginning, topographic mapping was a cooperative effort of federal and state governments: Massachusetts and New Jersey cooperated with USGS in topographic mapping as early as 1885–1887 (Kelmelis et al., 2003). Since then, all states and many federal agencies have worked with USGS to make topographic mapping a cooperative effort.
Technological advancements have transformed topographic mapping science from printed products to digital data and online-based applications for accessing digital topographic maps. The USGS began developing the National Digital Cartographic Database (NDCDB) by converting existing maps to both raster and vector forms and developing new data to update them or create new maps. The USGS began to develop those data to be used in geographic information systems as well. A timeline of recent USGS developments in topographic mapping and GIS is provided in Box 3.5.
The continual and necessary co-evolution of topographic mapping, technology, and emerging applications has presented a series of challenges for the USGS. For example, integration of existing data layers has included transitioning from
analogue to digital maps for the NDCDB, separating topographic maps into data layers for the National Spatial Data Infrastructure, and recombining data layers to form The National Map (TNM) (Kelmelis, 2003). An additional challenge, one that the USGS continues to make progress on, is ensuring consistency with each new product release. For example, the National Land Cover Dataset (NLCD) released in 2001 incorporated many improvements learned in developing the previous release in 1992 (Homer et al., 2004). These improvements resulted in slight incompatibilities between these two releases that have since been rectified, but improving datasets while maintaining consistency among releases remains an ongoing challenge.
As previously mentioned, the USGS in 2001 released its vision for TNM, the topographic map of the 21st century. TNM is a seamless, continuously maintained, nationally consistent set of base geographic data that is available on the Internet. As a source of revised topographic maps, TNM serves as a national spatial data foundation for a broad array of issues such as land and resource management and homeland security, and the USGS recognizes the importance for which TNM can serve as the nation’s trusted resource for current, consistent, and integrated topographic information. There are eight layers of topographic information provided in TNM: boundaries, elevation, geographic names, hydrography, land cover, orthographic images, structures, and transportation (Usery et al., 2010). TNM uses data from seamless databases developed in the 1990s and early 2000s, and has added data from federal, state, local, and tribal sources. The USGS Center for Excellence in Geographic Information Science (CEGIS) has spent much of the last decade in finding ways to integrate these layers across the various spatial scales.
A challenge for TNM is the eventual integration of mapping and scientific data beyond the current data layers. Developing an SDI to organize, integrate, access, and use scientific data within the scope of the USGS Science Strategy requires the technical advancement of present capabilities of TNM, which was developed to meet a different set of objectives and is less focused on complex multifaceted geoscience domain databases. Adding geoscience domain data involves more than simply adding layers to available GIS records. The TMP has propelled the creation of 1:24,000-scale and 1:100,000-scale topographic maps for most states, but there is not yet a consistent standard data format for geologic map legends across state boundaries. National bedrock geology with a resolution sufficient to satisfy USGS scientific staff is a large challenge.
Research in energy and mineral resource geology requires much more complicated datasets that convey rock-forming processes, aerial distribution, age relationships, geochemical and geophysical data, and resource attributes. One potential solution is a raster-format geoTIFF geologic coverage of the United States on 1:24,000 and 1:100,000 scales. Later, vector data formats could be developed that could support functionalities beyond viewing, including searchable formats. Standardized formatting and metadata could replace the less pro-
Timeline of Recent USGS Efforts in Topographic Mapping and GIS
1987 — Introduction of digital orthophotograph quadrangle (DOQ). The USGS generated digital images with correct map geometry, which were created using photographic stereo pairs. The USGS partnered with USDA to generate digital orthophotographs at 1-meter resolution, and the USGS’s DOQ became the standard base image for GISs in the 1990s.
1991 — Completion of analogue map coverage of the contiguous United States. The USGS generated a map of the contiguous United States on a 1:24,000 scale, which included more than 55,000 7.5-minute quandrangles for the National Mapping Program. The most recent versions of the 1:24,000-scale topographic maps were converted to digital raster graphics (DRGs), which are geocoded and are a critical layer in GIS and used for applications such as feature extraction and image rectification.
1991-1992 — Transition to seamless nationwide layer-based datasets. Using seamless nationwide layer-based datasets, the USGS was able to first complete the National Elevation Dataset (NED) and then the National Land Cover Dataset (NLCD). The NED was created by using existing USGS databases to provide a seamless, nationwide, multi-resolution mosaic of elevations, with improvements now available to show 10-meter horizontal spacing and even 3-meter horizontal spacing with LIDAR generated elevations. The NLCD was created by using Landsat Thematic Mapper (TM) images to provide a seamless mosaic of land cover for the United States. The USGS cites the NLCD as one of its most frequently downloaded datasets, with the most recently released version in 2011 based on 2006 TM satellite data.
SOURCE: Usery et al., 2010.
ductive efforts involved in reformatting large datasets. However, there is still a discernible cultural divide among USGS scientists in how they perceive, share, and use interdisciplinary data. There is a need for incentives that serve the process of integrating science; save time in discovering, visualizing, and handling data; and propel the effective use of information.
There are many lessons to learn from the evolution of the TMP. First, partnerships with state and federal agencies are essential and allowed the USGS to share costs and to access data that would not otherwise be available. Second, compiling and managing spatial data require a long-term investment in evolving technology; it took over 100 years to complete the first coverage of the 48 contiguous states. An SDI would be best viewed as an ongoing initiative that would adapt with changing user needs and technical advances. Third, an SDI would need to be designed with the future in mind. The rate of spatial data collection is increasing exponentially: more data have been collected in the last decade than in
the entire previous history of the TMP. Establishing standards for data formatting that anticipate needs many years down the road can enable useful data integration in the future.
Center of Excellence for Geospatial Information Science
CEGIS was created in 2006 to “identify, conduct, and collaborate on geospatial information science research issues of national importance; assess, influence, and recommend for implementation technological innovations for geospatial data and applications; and maintain world-class expertise, leadership, and a body of knowledge in support of the National Spatial Data Infrastructure” (USGS, 2011). The role of CEGIS is not to collect or process data but to develop technology that aids data-processing, specifically to develop tools and data formulation for TNM. CEGIS plays a major scientific role in defining the standards and structure for the USGS SDI, so it has the role of implementing the SDI for the agency. Following the recommendations of the National Research Council report A Research Agenda for Geographic Information Science at the United States Geological Survey (NRC, 2007b), CEGIS now includes strong emphasis on three high-priority research areas: (1) investigating new methods for information access and dissemination; (2) supporting integration of data from multiple sources; and (3) developing data models and knowledge organization systems.
The USGS underwent transitions in recent years that led to a declining ability to coordinate national-level geospatial research. As a result of waning leadership, the USGS lacked a dynamic, nimble, cutting-edge research unit that could lead national efforts and harness capabilities in academia, government, and industry (NRC, 2007b). Furthermore, the 2007 National Research Council report recognized that challenges inherent in geographic information science would need to be addressed before TNM could be successfully implemented. TNM requires data to be generalized and fused with different scales, resolutions, and quality, and the standardization and integration of such disparate spatial data sources for TNM has been a serious challenge for CEGIS (NRC, 2007b).
In recommending how CEGIS could realize its potential, the 2007 National Research Council report emphasized the importance of collaboration with other agencies and organizations that carry out geographic information science research and emphasized the critical need for CEGIS and the USGS to establish effective leadership in geographic information science (NRC, 2007b). It would be difficult to coordinate an effective research agenda without external networks of partners and without cohesive leadership to drive the agenda.
A science organization’s core competence consists of expertise and data, and it follows that implementing an SDI would require both for success. Through its examination of SDI implementation in various agencies and countries and their key challenges and lessons learned, the committee found similar themes that are relevant as the USGS moves forward in implementing its own SDI. The committee found that successful implementation of an SDI depends on an agency’s roadmap and strategy, organizational leadership and culture, standardization, technical competence, funding and contracting, workforce competence, and cooperation and partnerships. Individuals who provided testimony to the committee expressed great hope that large benefits can come from a fully functioning SDI at the USGS (Box 3.6).
Roadmap and Strategy
The committee found that developing a roadmap and strategic goals are integral to implementing an SDI. In the case of INSPIRE, legislation was necessary for implementing an SDI in a complex federated system in a reasonable timeframe. In the absence of a legislative mandate, the committee found that SDI roadmaps that were well developed and consistently reviewed and updated were the most successful ones. The BGS and Geoscience Australia are the closest analogues to the USGS and the BGS in particular has a well-written business plan
Sample Testimony Provided to the Committee
(See Appendix D for additional responses.)
“Correctly organized, an SDI will give the USGS the flexibility and agility to increase its capability in the rapidly emerging field of computational geosciences and enable it to unlock the breadth and depth of its scientific data to a far wider group of clients and stakeholders.”—State-level respondent
“Carefully structured an SDI will give the USGS the flexibility and adaptability to meet not only its current 6 key strategic science directions: it will also enable the USGS to rapidly change directions to meet new Geo-scientific challenges in the decades beyond 2017.”—Federal-level respondent
“Properly managed an SDI will enable the USGS to conduct multidisciplinary, collaborative science projects that are focused on delivering influential scientific solutions to the current six key strategic science directions identified in the document US Geological Science in the Decade 2007-2017.”— International respondent
that clearly articulates the merit of an SDI and the community that it would serve. As demonstrated by the assortment of SDI roadmaps the committee examined, it was important for the accompanying strategic goals to be straightforward and for these goals to undergo periodic evaluation because compiling and managing spatial data require a long-term investment in evolving technology. In addition, incentives (such as reduced discovery costs and reduced duplication) are likely to be just as important for the long-term success and sustainability of an SDI as a legal or government mandate.
Organizational Leadership and Culture
The committee also found that organizational leadership and culture influence how roadmaps and strategic goals are carried out on a daily basis and probably shape the success of SDI implementation. Incremental SDI implementation was key to success: leadership that established progressive goals rather than a final deadline found greater adherence to those incremental goals and thus greater success. Executive support was essential for developing an institutional commitment to a long-term vision of open access to data and value creation. Executive-level support drove the commitment and enthusiasm of senior, middle, and junior members of staff. Agencies that found success were the ones that included knowledgeable and respected leaders in the community that could champion and articulate a strong case for an SDI in the organization. In examining several other agencies and their SDIs, the committee found that the organizations that were most successful in building SDIs were the ones that had a long history of collaborating with others and a culture that focused on making data and information accessible to the broader community. Also important was that these organizations developed a mantra of under-promising but over-delivering on deadlines and products.
Standardization was another key theme that echoed through the various case studies. Establishing standards for the data community and distributing them are critical for SDI success because technology and data standards enable information resources and services to be interoperable. Implementation was more seamless and effective for SDIs that incorporated the needs of the user community to develop and improve standards and for the ones that also accepted the need for data products to conform to international standards. For example, metadata standards included standardized variable and parameter definitions. Labeling data with standard terms allows searches of multiple information sources to proceed consistently. It was also essential to have open standards for data and Web platforms. Finally, it was important for data to be properly formatted and quality-checked as they entered the system and throughout each step as they moved
through the cyberinfrastructure pipeline. Once standards have been determined, the committee found, it was essential to move forward by consistently applying the standards and to avoid indecision over which standards to follow.
In examining the various SDIs and how they were implemented across different agencies, the committee found that agencies had to overcome some technical concerns. Data quality was an issue, and it was best addressed at the time of collection before data were propagated through the SDI. On a technical point, the primary requirement for fusing data is accurate georeferencing of data products; changing the detection or classification of data can result in large errors that arise because of small co-registration or geo-registration errors. The committee found that with evolving technology, the technology and tools of the underlying database structure would need to adapt constantly in anticipation of data types beyond the current set, such as multispectral data and an expansion of data layers. In this case, the automation of a stewardship process is valuable so that updates can occur regularly. It is imperative to avoid frequently changing formats. Large-scale data repositories with clear priority issues depended on the long-term sustainability of data-acquisition programs. In addition to data collection and analysis, it is important to archive data: the best integrated data in the world are of little value if they are not easily discoverable.
Funding and Contracting
The committee found that funding and contracting mechanisms affected how well implementation could be carried out. One key factor was adequate funding for carrying out activities—not overfunding or underfunding. Overfunding can lead to waste, whereas underfunding can lead to frustration and the inability to reach goals in a reasonable time, and the exact level of adequate funding for the USGS SDI will vary with each phase of the roadmap suggested in Chapter 5. With dedicated capital funds, resources can be properly allocated for data purchases and for developing clear metrics to track data priorities and results. An organization-wide purchasing contract allowed an organization to acquire technology and data in weeks instead of months. An open-data policy is fundamental for long-term support by stakeholders, and this long-term approach was necessary to withstand cycles in funding and priorities for public geodata.
The committee observed that workforce competence contributed to successful implementation of SDIs. Training and retaining a skilled workforce will be critical for introducing and maintaining an SDI. An SDI introduction will
initially be disruptive, and there will need to be a full understanding of that fact and a need to develop realistic expectations. As data are available and new areas emerge that are relevant to an SDI (for example, data science), it will be important to recruit talented experts in those areas who will continue to make the SDI useful and relevant. The committee also found that it was important to have highly trained and respected professionals who understood the technology.
Cooperation and Partnerships
Partnerships with state and federal agencies are essential for SDI implementation and for the long-term sustainability of an SDI. An SDI partnering plan can be successful if it is based on a common vision among its partners; there is much to be gained from multiagency cooperation on spatial data development. In the case of GEOSS, a voluntary intergovernmental framework has the potential to create a working global-scale spatial data infrastructure.
Australian National Audit Office. 2010. The Auditor-General Annual Report 2009-2010. Report No. 22 Available online at http://www.anao.gov.au/uploads/documents/2009-10_Audit_Report_No.22.pdf (Accessed June 8, 2011).
British Geological Survey, 2011. Annual Report of the British Geological Survey 2009-2010. Nottingham: UK. 80 pgs.
Friedl, L., and A. Donnellan. 2010. Earth Observations Missions Applications Workshop: Charge to Workshop and Overview of Needed Products and Outcomes. February 1-3. Available online at http://appliedsciences.larc.nasa.gov/pdf/2010EOMAW/Overview.pdf (Accessed June 9, 2011).
GEO (Group of Earth Observations). 2011. About GEO. Available online at http://www.earthobservations.org/about_geo.shtml (Accessed June 9, 2011).
Homer, C., C. Huang, L. Yang, B. Wylie, and M. Coan. 2004. Development of a 2001 National Landcover Database for the United States. Photogrammetric Engineering and Remote Sensing 70(7):829-840.
Johnston, C.M., T.G. Dewald, T.R. Bondelid, B.B. Worstell, L.D. McKay, A. Rea, R.B. Moore, and J.L. Goodall. 2009. Evaluation of catchment delineation methods for the medium-resolution National Hydrography Dataset: U.S. Geological Survey Scientific Investigations Report 2009–5233, 88 p. Available online at http://pubs.usgs.gov/sir/2009/5233/pdf/sir2009-5233.pdf (Accessed June 13, 2011).
Kelmelis, J.A. 2003. To The National Map and Beyond. Cartography and Geographic Information Science 30(2):185-198.
Kelmelis, J.A., M.L. DeMulder, C.E. Ogrosky, N.J. Van Driel, and B.J. Ryan. 2003. The National Map From Geography to Mapping and Back. Photogrammetric Engineering & Remote Sensing 69(10):1109-1118.
EU (European Union). 2010. Environment: Six member states face Court for failing to apply EU laws on their statute books. 3 June. Available online at http://europa.eu/rapid/pressReleasesAction.do?reference=IP/10/686 (Accessed June 9, 2011).
Griew, P. V. 1990. IS Strategy Scoping Study for the British Geological Survey. Consultancy Report.
McMahon, Gerard; Susan P. Benjamin; Keith Clarke; John E. Findley; Robert N. Fisher; William L. Graf; Linda C. Gundersen; John W. Jones; Thomas R. Loveland; Keven S. Roth; E. Lynn Usery; and Nathan J. Wood, 2005. Geography for a Changing World: A Science Strategy for the Geographic Research of the U.S. Geological Survey, 2005-2015. Circular 1281 U.S. Geological Survey: Sioux Falls, SD, 76 pg.
NBII (National Biological Information Infrastructure). 2011. National Biological Information Infrastructure at a Glance. Available online at http://www.nbii.gov/portal/server.pt/community/nbii_home/236 (Accessed June 13, 2011).
NGA (National Geospatial-Intelligence Agency). 2011. National Geospatial-Intelligence Agency: Vision, Mission, and Goals. Available online at https://www1.nga.mil/About/WhoWeAre/VisionMissionGoals/Pages/default.aspx (Accessed June 9, 2011).
NRC (National Research Council). 2007a. Earth Science and Applications from Space: National Imperatives for the Next Decade and Beyond. Washington, DC: The National Academies Press.
NRC. 2007b. A Research Agenda for Geographic Information Science at the United States Geological Survey. Washington, DC: The National Academies Press.
Rugg, D.J. 2004. Creating FGDC and NBII Metadata with Metavist 2005. General Technical Report NC-255. St. Paul, MN: U.S. Department of Agriculture, Forest Service, North Central Research Station. Available online at http://www.ncrs.fs.fed.us/pubs/gtr/gtr_nc255.pdf (Accessed June 13, 2011).
TNRIS (Texas National Resource Information System). 2011. About TNRIS. Available online at http://www.tnris.state.tx.us/About/Index.aspx (Accessed June 9, 2011).
Tarboton, D. G., J. S. Horsburgh, and D. R. Maidment 2008. CUAHSI Community Observations Data Model (ODM), Version 1.1, Design Specifications . Available online at http://his.cuahsi.org/documents/ODM1.1DesignSpecifications.pdf (accessed May 18, 2012).
Townshend, J.R.G., C.O. Justice, C. Gurney, and J. McManus. 1992. The Impact of Misregistration on Change Detection. Geoscience and Remote Sensing 30(5):1054-1060.
Usery, E.L., D. Varanka, and M.P. Finn. 2010. A 125 Year History of Topographic Mapping and GIS in the U.S. Geological Survey 1884-2009, Part 2. 1980-2009. Available online at http://nation-almap.gov/ustopo/125history_part_2.html (Accessed March 10, 2012).
USGS (U.S. Geological Survey). 2011. Center for Excellence for Geospatial Information Science. Available online at http://cegis.usgs.gov/about_us.html (Accessed June 14, 2011).
Zaslavsky, I., D. Valentine, D. Maidment, D.G. Tarboton, T. Whiteaker, R. Hooper, D. Kirschtel, and M. Rodriguez, 2009. The Evolution of the CUAHSI Water Markup Language (WaterML)”, EGU General Assembly 2009, Geophysical Research Abstracts, Vol. 11, 21 April, EGU2009-6824-3.