Skip to main content

Statistical Analysis of Massive Data Streams: Proceedings of a Workshop

View Cover

Overview

Contributors

Description

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

Topics

Suggested Citation

National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. https://doi.org/10.17226/11098.

Import this citation to:

Publication Info

395 pages | 8.5 x 11
DOI: https://doi.org/10.17226/11098
Contents

Table of Contents

skim chapter
Front Matter i-vi
Sallie Keller-McNulty Welcome and Overview of Sessions 4-4
TRANSCRIPT OF PRESENTATION 5-5
James Schatz Welcome and Overview of Sessions 6-6
TRANSCRIPT OF PRESENTATION 7-9
Douglas Nychka, Chair of Session on Atmospheric and Meteorological Data Introduction by Session Chair 10-10
TRANSCRIPT OF PRESENTATION 11-11
John Bates Exploratory Climate Analysis Tools for Environmental Satellite and Weather Radar Data 12-13
2. Philosophy of the use of remote sensing data for climate monitoring 14-15
TRANSCRIPT OF PRESENTATION 16-27
Amy Braverman Statistical Challenges in the Production and Analysis of Remote Sensing Earth Science Data at the Jet 28-28
TRANSCRIPT OF PRESENTATION 29-41
Ralph Milliff Global and Regional Surface Wind Field Inferences from Spaceborne Scatterometer Data 42-42
TRANSCRIPT OF PRESENTATION 43-51
Global and Regional Surface Wind Field Inferences Given Spaceborne Scatterometer Data Ralph F.Milliff 52-52
GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACE-BORNE SCATTEROMETER DATA 53-53
Blending QSCAT and Weather-Center Analysis Winds 54-54
Bayesian Hierarchical Model for Surface Winds in the Tropics 55-55
A Bayesian Hierarchical Air-Sea Interaction Model 56-56
Figure Captions 57-62
Summary 63-63
Report from Breakout Group 64-65
David Scott, Chair of Session on High-Energy Physics Introduction by Session Chair 66-67
TRANSCRIPT OF PRESENTATION 68-68
Robert Jacobsen Statistical Analysis of High Energy Physics Data 69-69
TRANSCRIPT OF PRESENTATION 70-89
Paul Padley Some Challenges in Experimental Particle Physics Data Streams 90-90
ABSTRACT OF PRESENTATION 91-91
TRANSCRIPT OF PRESENTATION 92-113
Miron Livny Data Grids (or, A Distributed Computing View of High Energy Physics) 114-114
TRANSCRIPT OF PRESENTATION 115-133
Report from Breakout Group 134-135
Daryl Pregibon Keynote Address: Graph Mining - Discovery in Large Networks 136-136
ABSTRACT OF PRESENTATION 137-137
TRANSCRIPT OF PRESENTATION 138-164
Sallie Keller-McNulty, Chair of Session on Integrated Data Systems Introduction by Session Chair 165-165
TRANSCRIPT OF PRESENTATION 166-166
J.Douglas Beason Global Situational Awareness 167-167
ABSTRACT OF PRESENTATION 168-168
TRANSCRIPT OF PRESENTATION 169-176
Kevin Vixie Incorporating Invariants in Mahalanobis Distance-Based Classifiers: Applications to Face Recognition 177-177
TRANSCRIPT OF PRESENTATION 178-183
II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES 184-185
III. FACE RECOGNITION RESULTS 186-186
IV. CONCLUSIONS 187-188
REFERENCES 189-189
John Elder Ensembles of Models: Simplicity (of Function) Through Complexity (of Form) 190-190
TRANSCRIPT OF PRESENTATION 191-206
Report from Breakout Group 207-209
Mark Hansen Untitled Presentation 210-210
TRANSCRIPT OF PRESENTATION 211-222
Wendy Martinez, Chair of Session on Network Traffic Introduction by Session Chair 223-223
TRANSCRIPT OF PRESENTATION 224-224
William Cleveland FSD Models for Open-Loop Generation of Internet Packet Traffic 225-225
ABSTRACT OF PRESENTATION 226-226
TRANSCRIPT OF PRESENTATION 227-249
Johannes Gehrke Processing Aggregate Queries over Continuous Data Streams 250-250
ABSTRACT OF PRESENTATION 251-251
TRANSCRIPT OF PRESENTATION 252-260
Edward Wegman Visualization of Internet Packet Headers 261-262
ABSTRACT OF PRESENTATION 263-263
TRANSCRIPT OF PRESENTATION 264-279
Paul Whitney Toward the Routine Analysis of Moderate to Large-Size Data 280-280
TRANSCRIPT OF PRESENTATION 281-294
Leland Wilkinson, Chair of Session on Mining Commercial Streams of Data Introduction by Session Chair 295-295
TRANSCRIPT OF PRESENTATION 296-297
Lee Rhodes A Stream Processor for Extracting Usage Intelligence from High-Momentum Internet Data 298-298
TRANSCRIPT OF PRESENTATION 299-307
1. INTRODUCTION 308-308
2. BUSINESS CHALLENGES FOR THE NSPs 309-309
3.2 SESSION MEs 310-310
4. DATA STREAMS AND RIVERS 311-312
5. IUM HIGH-LEVEL ARCHITECTURE 313-313
6. STREAM COLLECTION AND NORMALIZATION 314-314
7. STREAM RULE PROCESSING 315-315
8. RULE CHAINS AND ASSOCIATED DATA STRUCTURES 316-317
9.1 CAPTURE MODELS 318-319
9.3 DRILL FORWARD 320-321
9.4 USER INTERACTION WITH STREAMING MODELS 322-323
10. SUMMARY 324-324
REFERENCES 325-325
Pedro Domingos A General Framework for Mining Massive Data Streams 326-326
TRANSCRIPT OF PRESENTATION 327-341
1 The Problem 342-343
2 The Framework 344-344
3 Time-Changing Data 345-345
Reference 346-346
TRANSCRIPT OF PRESENTATION 347-361
Andrew Moore kd- R- Ball- and Ad- Trees: Scalable Massive Science Data Analysis 362-362
TRANSCRIPT OF PRESENTATION 363-388
Concluding Comments 389-389
Rights

Copyright Information

The National Academies Press and the Transportation Research Board have partnered with Copyright Clearance Center to offer a variety of options for reusing our content. You may request permission to:

  • Republish or display in another publication, presentation, or other media
  • Use in print or electronic course materials and dissertations
  • Share electronically via secure intranet or extranet
  • And more

For most Academic and Educational uses no royalties will be charged although you are required to obtain a license and comply with the license terms and conditions.

Click here to obtain permission for Statistical Analysis of Massive Data Streams: Proceedings of a Workshop.

Translation and Other Rights

For information on how to request permission to translate our work and for any other rights related query please click here.

Copyright.com Customer Service

For questions about using the Copyright.com service, please contact:

Copyright Clearance Center
22 Rosewood Drive
Danvers, MA 01923
Tel (toll free): 855/239-3415 (select option 1)
E-mail: info@copyright.com
Web: https://www.copyright.com
Stats

Loading stats for Statistical Analysis of Massive Data Streams: Proceedings of a Workshop...