National Academy of Sciences | 150 Year Anniversary

Questions? Call 800-624-6242

| Items in cart [0]

The National Academies Press

PAPERBACK
price:$63.75
add to cart

Rights & Permissions

topleft topright

Massive Data Sets: Proceedings of a Workshop (1996)
Commission on Physical Sciences, Mathematics, and Applications (CPSMA)

Citation Manager

. "Earth Observation Systems: What Shall We Do with the Data we Are Expecting in 1998?." Massive Data Sets: Proceedings of a Workshop. Washington, DC: The National Academies Press, 1996.

Please select a format:

BibTeX EndNote RefMan


Page
16
bottomleft bottomright

The following HTML text is provided to enhance online readability. Many aspects of typography translate only awkwardly to HTML. Please use the page image as the authoritative form to ensure accuracy.


thinking about data handling and analysis. This is followed by discussions of some issues relating to specific classes of data, and a summary of areas to which the statistics community may be well equipped to contribute.

2 Data Classification Scheme

The Committee on Data Management And Computing define five general classes of spacecraft data, based on the degree of processing involved (CODMAC, 1982, and subsequent refinements):

  • *  

    Level 0 -The raw data stream from the spacecraft, as received at Earth

  • *  

    Level 1 -Measured radiances, geometrically and radiometrically calibrated

  • *  

    Level 2 -Geophysical parameters, at the highest resolution available

  • *  

    Level 3 -Averaged data, providing spatially and temporally "uniform" coverage

  • *  

    Level 4 -Data produced by a theoretical model, possibly with measurements as inputs

This paper focuses on Level 2 and Level 3 data, which are the main concerns of most global change research scientists working on EOS instrument teams. Level 2 products are reported on an orbit-by-orbit basis. For a polar-orbiting satellite such as EOS, the Level 2 sampling of Earth is highly non-uniform in space and time, with coverage at high latitudes much more frequent than near the equator. Level 2 data are needed when accuracy at high spatial resolution is more important than uniformity of coverage. These situations arise routinely for validation studies of the satellite observations, in the analysis of field campaign data, and when addressing other local-and regional-scale problems with satellite data.

The spatially and temporally uniform Level 3 data are needed for global-scale budget calculations, and for any problem that involves deriving new quantities from two or more measurements which have different sampling characteristics. To derive a Level 3 product from Level 2 data, spatial and temporal scales must be chosen. It is to this issue that we turn next.

3 Grinning and Bidding to Create Level 3 Data

The creation of Level 3 data has traditionally involved the selection of a global, 2- or 3-dimensional spatial grid, possibly a time interval as well, and "binning" the Level 2 data into the grid cells. The binning process for large data sets usually entails taking the arithmetic mean and standard deviation of all Level 2 data points failing into a grid cell, with possible trimming of outliers or of measurements flagged as "low quality" for other reasons. Typically, all points included in a grid cell average are given equal weight. Occasionally a median value will be used in place of the mean.

The leading contender for the standard EOS Level 3 grid is a rectangular-based scheme similar to one that has been used by the Earth Radiation Budget Experiment (ERBE) (Green and Wielicki, 1995a). In the proposed implementation for EOS, the Earth is divided zonally into 1.25 degree strips (about 140 km in width). Each strip is then divided into an integral number of quadrilaterals, each approximately 140 km in length, with the origin at the Greenwich meridian. This produces a nearly equal-area grid.

Page
16
FRONT MATTER (R1-R10)
Opening Remarks (1-2)
PART I Participant's Expectations for the Workshop (3-12)
PART II Applications Papers (13-14)
Earth Observation Systems: What Shall We Do with the Data we Are Expecting in 1998? (15-22)
Information Retrieval: Finding Needles in Massive Haystacks (23-32)
Statistics and Massive Data Sets: one View from the Social Sciences (33-38)
The Challenge of Functional Magnetic Resonance Imaging (39-46)
Marketing (47-50)
Massive Data Sets: Guidelines and Practical Experience from Health Care (51-68)
Massive Data Sets in Semiconductor Manufacturing (69-76)
Management Issues in the Analysis of Large-Scale Crime Data Sets (77-80)
Analyzing Telephone Network Data (81-92)
Massive Data Assimilation/Fusion in Atmospheric Models and Analysis: Statistical, Physical, and Computational Challenges (93-103)
PART III Additional Invited Papers (103-104)
Massive Data Sets and Artificial Intelligence Planning (105-114)
Massive Data Sets: Problems and Possiblities, with Application to Environmental Monitoring (115-120)
Visualizing Large Datasets (121-128)
From Massive Data Sets to Science Catalogs: Applications and Challenges (129-142)
Information Retrieval and the Statistics of Large Data Sets (143-148)
Some Ideas About the Exploratory Spatial Analysis of Large Data Sets (149-156)
Massive Data Sets in Navy Problems (157-168)
Massive Data Sets Workshop: The Morning After (169-184)
PART IV Fundamental Issues and Grand Challenges (185-186)
Panel Discussion (187-202)
Items for Ongoing Consideration (203-204)
Closing Remarks (205-206)
Appendix: Workshop Participants (207-208)