AIM Data Environment
The innovative observational and modeling programs described in this chapter will provide essential new information about how the AIM system works. However, this information will be embedded in the relationships between data sets as well as within the individual data sets themselves. It will be buried in petabytes of simulation data and in large volumes of heterogeneous data from new and ongoing space missions, from major new ground-based facilities, from suborbital platforms, and from arrays of ground-based all-sky cameras, lidars, radars, magnetometers, GPS receivers, ionosondes, imagers, and other instruments. There must be a parallel effort to develop the tools needed to convert the volumes of data into new knowledge about the AIM system. The challenge is to combine these heterogeneous data sources, housed in archives distributed around the world, into new browse summaries and new data products that contain information about AIM system behaviors in addition to the regional and process information contained in the individual data sets themselves. In turn, this global view will provide needed context for the interpretation of small-scale features and local observations.
Because of these requirements, new efforts in data exploitation and data synthesis are essential ingredients for the future of the AIM research environment. As data sources grow in size and complexity, exploiting the data requires being able to (1) locate specific pieces of information within a large distributed set of worldwide data archives, (2) manipulate and visualize the data while retaining knowledge of version information and all supporting analysis programs, and (3) combine data sources to create new data products while maintaining linkages back to the original data sources. This is where the development of data synthesis capabilities is essential. In the face of constrained budgets, much of this effort can be accomplished by using robust and sustainable commercial off-the-shelf (COTS) technologies that keep pace with new developments. To make heliophysics data “findable” by these commercial technologies requires the development of standard text-based metadata descriptors. Much of the groundwork for this capability has taken place in the last few years with the creation of an array of virtual observatories and the continuing development and implementation of the international Space Physics Archive Search and Extract (SPASE) data model.
The full and complete implementation of SPASE opens the way for the development of an array of shared software tools for essential capabilities in data mining, pattern recognition, statistical analyses, data visualization, and so on, both on the client side and on the server side in the case of large data volumes. Virtual observatories also enable the development of tools that require detailed information about instruments not easily obtained by individual investigators. One example is the calculation of common volumes in which in situ and remote-sensing measurements are made or in which space- and ground-based observations intersect.
AIMI Priority: Develop a data environment that preserves important elements of the current heliophysics data environment, while expanding the capabilities in directions that enhance data exploitation to maximize the scientific value of the data sets.
Essential to many of the AIM science frontiers identified in this chapter is the ability to synthesize information from multiple data sets into new knowledge about the AIM system. This includes, for example, mapping between geospace data sets using magnetic fields from continuously running magnetohydrodynamic simulations, browse products that superpose observations along satellite tracks onto global patterns from constellations or imagers, maps that combine information from a large number of individual ground-based instruments into global views, and combinations of ground- and space-based observations that address space-time ambiguities, among others.