4
Building and Maintaining a Modern Infrastructure
Efficient investment in scientific infrastructure requires long-term planning and clear and transparent decision making.
—UK House of Lords Science and Technology Committee, 2013
The infrastructure provided by field stations is essential to advance science in a rapidly changing world. The National Research Council’s report Critical Infrastructure for Ocean Research and Societal Needs in 2030 (NRC 2011) identifies next-generation categories of infrastructure that should be included in planning, provides advice on criteria that could be used to set priorities for asset development or replacement, recommends ways in which federal agencies could maximize the value of ocean infrastructure investments, and addresses societal issues. Because many parallels can be drawn between infrastructure needs in ocean research and those in field station–based research, the committee developed a modified definition of infrastructure on the basis of the 2011 NRC report (Box 4-1)
BOX 4-1
Definition of Infrastructure
Field station infrastructure is the full portfolio of resources and assets that include technology, facilities, data, people, and institutions that can be brought to bear in answering questions about Earth, the oceans, and the atmosphere and that are (or could be) shared by or accessible to the research community as a whole.
Field station infrastructure has two tiers:
- Tier 1. Field stations themselves as collective elements of a nation’s broader scientific infrastructure.
- Tier 2. Individual components of field stations, such as laboratory space, scientific equipment, biological collections, cyberinfrastructure, historical data records, among others.
To ensure that field stations are adequately equipped to address and adapt to rapidly changing needs in science and education, consideration must be given to
the organization and maintenance of both tiers of field station infrastructure. The question of how to maximize the value of field stations as components of a larger scientific infrastructure was addressed to a great extent in Chapter 3. The present chapter touches briefly on Tier 1 infrastructure but focuses primarily on Tier 2.
Recognizing the importance of science infrastructure, the European Union (EU) established the European Strategy Forum on Research Infrastructures (ESFRI) to enhance the use and management of large-scale and mid-scale research infrastructure and to facilitate scientists’ access to research sites throughout the EU with the intent of strengthening its international reach (Figure 4-1). Eight large-scale facilities form the Partnership for European Environmental Research.
The U.S. National Science Foundation (NSF) maintains multiple programs that provide funding for science infrastructure. However, the United States does not have a central body that oversees scientific infrastructure, and it stands to learn from the ESFRI effort.
The die has been cast in part by the call in the National Research Council report on critical infrastructure (NRC 2011) for “a coordinated national strategic plan for critical shared ocean infrastructure investment, maintenance, and retirement.” A similarly coordinated strategic plan is needed for field stations.
There is no single list of infrastructure needs that fits all field stations. Field station infrastructure needs are driven by the strategic missions of the stations, the ecosystems within which they are embedded, the research questions they are addressing, and the levels of financial support they receive. Field stations vary along a continuum, from ones that have relatively simple infrastructural needs to those that have complex and sophisticated needs. The committee identified three basic types of field stations that reflect the continuum:
- Field stations that include little more than restricted access to research and teaching sites, parking, simple rustic housing or camping facilities, and a caretaker. These stations are used mainly for short-term visits by researchers that may recur over many years.
- Field stations that have laboratory space and housing, some autonomous environmental sensing equipment or data loggers, and an array of basic laboratory and field equipment, from microscopes and freezers to surveying equipment, small boats, and a support staff for maintenance. These stations often are used by researchers and classes for short- to intermediate-term stays.
- Field stations that have infrastructure resembling that of modern research laboratories that are engaged in cutting-edge science relevant to the study of ecosystems. They can incorporate a wide array of platforms (such as small and large boats and cyberinfrastructure), sensor networks, and other specialized facilities for accessing remote or extreme environments, including those in tree canopies, deep sediments, ice-covered habitats, and the open ocean; and they
have resident faculty and support personnel. These stations support a wide array of users, from resident researchers and site-based classes to day visitors and community events.
Every field station has strengths that make it appropriate and attractive for conducting particular kinds of research, education, and public outreach. That a field station has relatively simple infrastructure should not belie its value for research, teaching, or outreach. Indeed, the very nature of the site—its remote location, secure and rapid access to a particular ecosystem, and the absence of public disturbance—might be its greatest asset. The diversity and range of programs at field stations and their settings provide access to critical habitats, research opportunities on resident species, and sensor data (e.g., weather data and webcam videography) of interest and importance to local, regional, and national communities. Each field station’s infrastructure should align with its vision and mission and the needs of its users.
FIGURE 4-1. Large-scale research infrastructures funded by the European Union to provide transnational access to scientists across Europe. Map depicts country of and categories of research conducted at large-scale research centers (pie charts) overlain with the location of biological field stations (red triangles). The pie-chart diameters reflect the number of research facilities in each country (maxium = 139, Germany). Data to produce the pie charts were extracted from the website http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=mapri, ©European Union, 2013.
Challenges of Maintaining and Upgrading Infrastructure
Field station managers and users have long recognized the need for safe, functional housing and properly equipped workspaces. Their two primary challenges in this regard are maintaining aging facilities and keeping up with rapid advances in technology. The latter is particularly important because laboratory-based research is increasingly integrated with field research. All infrastructure requires preventive maintenance, replacement, upgrading, or some combination of the three. That is not peculiar to field stations. However, the sites in which field stations are embedded and that make them attractive—along coasts, in mountains, in forests, or in deserts—often expose their infrastructure to extreme, highly variable environmental conditions that can take a toll. In addition, many stations are located in areas that are vulnerable to wildfires, earthquakes, tornados, hurricanes, or other natural hazards. These vulnerabilities add to the cost of maintaining the facilities and pose a risk to research equipment (e.g., laboratory equipment such as microscopes, autoclaves, and ultra-cold freezers, and field equipment such as nets, boats, and environmental sensors), biological collections, and data stored at these stations. Field station facilities can degrade much more rapidly than equipment found in environmentally controlled laboratories, and this is a financial burden on field station managers and in some cases compromises the research.
The recent survey by the National Association of Marine Laboratories (NAML) and the Organization of Biological Field Stations (OBFS) reveals common infrastructure priorities among field stations (NAML-OBFS 2013b). The top priorities are electricity, Internet access, support staff, laboratory space, storage, long-term monitoring, classroom capacity, housing, on-site maintenance, and engineering capacity. Respondents suggested that increased support for Internet access would improve scientists’ ability to use field stations while providing potential visiting scientists with access to specific data catalogs that are critical for developing research programs. According to the survey, a major problem is that basic data catalogs—species lists, maps, weather data, and land-use history—often are lacking at field stations. Some respondents indicated that field stations had insufficient space for laboratories, classrooms, and storage (including refrigeration). Data-management systems were considered excellent by a few respondents but ineffective by others. In addition, field researchers may require transport to and from field sites. Transport needs can vary from a golf cart to a submersible, depending on the site and the research being conducted. Field stations with increasingly sophisticated scientific equipment and automated sensors will also have to make investments in the capacity to capture, process, store, and share increasingly large datasets. Consideration must also be given to data that do not typically lend themselves to classic deposition in accessible databases, such as video recordings of animal behavior or deep-sea observations. Upgrading data-management systems was also identified as a high priority in the survey.
Investments in maintaining existing infrastructure clearly are a primary concern for field station administrators but often are a relatively low priority for their host institutions, particularly if a field station is remote. Only 14 percent of respondents
(N = 197) to the NAML-OBFS survey noted that financial planning for field stations included depreciation of buildings and equipment. That result is a remarkably low percentage, considering the respondents’ overwhelming sense of vulnerability to anticipated funding losses in operational revenue (76%) and in federal (65%), state (60 %), administrative (54 %), and donor (54%) support over the next 5 years. There is clear need for every field station to develop a comprehensive infrastructure-management plan that is integrated with its strategic mission, its science plan, and its business plan (Lohr 2001).
Cyberinfrastructure and Connectivity
The inclusion of data as a type of infrastructure represents a paradigm shift for many field stations. Data constitute a primary product of field stations; if these data were made easily available, they could serve a broad audience. Long-term and baseline natural-history data should be an attraction to scientists and educators and be counted as part of a field station’s value (see Chapter 6), and move them from serving merely as environmental sentinels to active participants in solving ecological and economic problems at a variety of scales. The acquisition of data is only one part of the equation. Data must be stored, managed, and integrated to ensure that they can be mined, visualized, and accessed through high-performance Internet connections—all parts of the domain of cyberinfrastructure.
Cyberinfrastructure consists of the assortment of information technologies that enable data storage, management, integration, and analysis. It is increasingly recognized as essential to science in that it dramatically improves scholarship and research productivity. Efficient cyberinfrastructure generally requires reliable Internet connectivity and modern computer hardware. At a minimum, field stations need adequate Internet connectivity to facilitate user access and collaboration. The availability of adequate cyberinfrastructure attracts scholars who are interested in cross-disciplinary research and fosters new scientific endeavors in emerging fields. Every field station should provide—whether on site, at a selected hub location (such as a host institution, the National Ecological Observatory Network (NEON), or other research centers), or through a collaborative network—online access to the complete historical datasets of its natural and human history and provide means by which its users can contribute to these datasets. This type of interactive access to databases can provide quality control of data in that scientists can monitor data input and output in real time and respond to anomalies. Scientists, students, or even visitors can see how data that they collect fit into larger temporal and spatial contexts.
Infrastructure to organize, archive, and share data collected at a field station could expand the impact of a field station’s research by making data available to other researchers to use, and by facilitating the ability to track data use and impact. Many tools for ecological data storage and recovery have been developed
by the Long Term Ecological Reserve Network (LTER), the National Center for Ecological Analysis and Synthesis (NCEAS), the Knowledge Network for Biocomplexity (KNB), and others. Ecological metadata language developed by KNB and NCEAS has been widely used and is compatible with the larger aggregators (such as DataONE and Google). The National Park Service (NPS) has a research permitting and reporting database and a website that allows investigators to request reports and research data from specific national parks. The NPS website can be searched by park, taxon, or investigator.
Sharing the data products from field stations broadly would add value to the data and to the field station where the data were collected. Without centralized repositories, data developed at field stations are easily lost. Alternatively, if they are archived and made widely available, they have ever-increasing value to provide perspectives on environmental change. Archiving and sharing data from field stations are critical. The committee agrees with National Science Foundation’s (NSF) current policy that data become publicly available after 2 years of completion of NSF-funded projects, and believes that field stations should adhere to this standard regardless of the funder.
Most institutions that fund research have a basic expectation that recipients will have specific data-management and data-sharing plans that will advance scientific objectives, maximize learning, and improve understanding of the outcomes of public investment by providing timely and long-term public access to, and relatively straightforward retrieval of, their data. With the shift from “small science” to “big science” (Meyer 2009) and the advent of large-scale, long-term interdisciplinary projects, such as LTER and NEON, collaborators grew to expect not only access to each other’s data but data-management and data-sharing protocols built into the specific projects. That expectation was heightened in February 2013 when the Obama administration directed federal agencies to develop—in collaboration, if possible—plans to make federally funded research data freely available to the public within 1 year of publication as allowed by law.
The stricter guidelines for data management raise two critical questions: How are data to be stored? Who bears the cost? Some types of data (such as biological distribution data in spreadsheets) lend themselves to classical data-deposition methods, whereas others (such as video recordings) often do not. An example of the former is the data-management and data-sharing practices of VertNet,32 a publicly accessible database of vertebrate distributions compiled by 86 institutions worldwide. The site is maintained by NSF and managed by a small staff, but the contributing institutions serve as the authoritative sources, providing and controlling the data that appear on the website. Exponentially increased exposure, use, and correction of the data result in higher data quality and greater intellectual exchange among participating researchers (Constable et al. 2010). This system incentivized collaboration and data sharing to great effect.
Typically, support for data management starts when the institutions provide research funds and persists only for the lifetime of the award. The continuing costs
__________________
of data management fall to the home institutions, which generally consider them to be fundamental to the conduct of research, preserving both research quality and academic integrity (see, e.g., the University of Oxford research data management website33). Universities that have extensive research activities can afford this approach, but it is unlikely that many small independent field stations can bear the additional economic burden of even the most basic data-management system.
Researchers at field stations often record data in their logbooks and spreadsheets either by hand or electronically. They take the data with them and, historically, rarely share them with the field stations where they conducted their research. Some of the raw data are eventually analyzed and the results incorporated into peer-reviewed publications; some may be lost when a researcher is no longer active, and these fall into the realm of “dark data34”—data that are inaccessible to the broader scientific community that relies on new, more sophisticated data-management tools. Salvaging the large body of historical dark data that still reside in notebooks, file cabinets, or memories of aging investigators is a challenge, but worth pursuing.
The Berkeley Ecoinformatics Engine,35 funded by the Keck Foundation at $3.5 million, could serve as a model for addressing both the dark-data challenge and the problem of integrating diverse databases within regional networks of field stations. The intent of the program was to organize and unify the wealth of data in University of California, Berkeley laboratories, natural-history museums, and field stations and to merge them with diverse environmental baselayers on climate, land cover or use, vegetation indexes, hydrology, and fire and other freely available datasets. The results are available for rapid exploratory analyses, tests for correlation, and visualizations that communicate results to a broad community of users. The Ecoinformatics Engine unites previously disconnected perspectives from Earth and atmospheric scientists, geographers, paleoecologists, and ecologists and enables tests of predictive models of global change. This constitutes a critical advance in making the science more rigorous.
Field station cyberinfrastructure is physically and technically diverse—from digital sensor arrays to high-speed communication networks—and varies widely among field stations. Because of the diversity, a comprehensive infrastructure-
__________________
33http://www.admin.ox.ac.uk/rdm
34Data that are not systematically indexed or stored in a manner that is accessible to the broader scientific community, such as biological specimen collections, analog data (e.g., observations recorded in laboratory notebooks), and data only found in research publications. Such data are “nearly invisible” and probably will be underused or lost (Heidorn 2008).
management plan is best constructed around broad categories of use rather than type (physical, technical, and cyberinfrastructural). Three such use categories, modified from those described by ESFRI for large-scale research infrastructures (such as that of CERN, the European Organization for Nuclear Research), are (1) single sites, including infrastructure on the site of the field station itself; (2) networked sites, distributed resources and databases and infrastructure that are shared, possibly through collaborative networks; and (3) global infrastructure, available through online networks.
Those categories could be used to outline infrastructure in the context of individual field station needs and services. For example, the infrastructure-management plan would describe how and when data collected with a place-based infrastructure are to be stored (remain part of the field station infrastructure) and how and when they are to be shared in a distributed framework (a service provided by the field station to the scientific community).
Sharing information and resources among field stations is critical in a world in which technological advances and expenditures increase at a rapid rate. The resources may include datasets on soil types, land-use history, climate, and aspects of biology. Through networking, field stations and researchers can share resources and collaborate on common topics and scientific questions. Sharing of data requires use of standard formats and metadata.
When current best practices for data storage and metadata registry at the network level are used at a field station, they can become a part of a much larger, global infrastructure as modern data aggregators and information-management tools develop (e.g., DataONE and Google). As future information technology allows greater access to multiple data sources, the need increases for uniform data collection on target organisms and environmental properties and processes to allow analyses on regional and national scales.
Field stations vary in scope, size, and purpose; each contributes to the national research and education portfolio in critical ways. No array of infrastructure is applicable to all field stations, although there are similarities within each range of size and complexity. What is clear is that financial demands on field stations are increasing as they upgrade to meet today’s science challenges. Installation of new cyberinfrastructure requires data-management and data-sharing plans and data that conform to widely used metadata standards. Such infrastructure requires a long-term commitment of experienced technical support. High-tech infrastructure generally has a relatively short life cycle (about 10 years), and provision for timely upgrading and replacement of any newly funded infrastructure is needed. Staff at many field stations, particularly smaller ones, do not have the required technical expertise.
Recommendation: Because of their wide variety in purpose size, and scope, each field station should assess and define its own infrastructure needs.
However, Internet connectivity and cyberinfrastructure should be included in all infrastructure-management plans to allow field stations to facilitate collaborative research and participate in broader networking efforts. The process of archiving dark data into digitally accessible formats is critical, and should begin with the most recent datasets and progress back in time so that field stations can expand their sets of continuous longitudinal data.
This page intentionally left blank.