Skip to main content
Proceedings

VIEW LARGER COVER

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

Suggested Citation

National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. https://doi.org/10.17226/11098.

Import this citation to:

Publication Info

395 pages |  8.5 x 11 |  DOI: https://doi.org/10.17226/11098
Chapters skim
Front Matter i-vi
Sallie Keller-McNulty Welcome and Overview of Sessions 4-4
TRANSCRIPT OF PRESENTATION 5-5
James Schatz Welcome and Overview of Sessions 6-6
TRANSCRIPT OF PRESENTATION 7-9
Douglas Nychka, Chair of Session on Atmospheric and Meteorological Data Introduction by Session Chair 10-10
TRANSCRIPT OF PRESENTATION 11-11
John Bates Exploratory Climate Analysis Tools for Environmental Satellite and Weather Radar Data 12-13
2. Philosophy of the use of remote sensing data for climate monitoring 14-15
TRANSCRIPT OF PRESENTATION 16-27
Amy Braverman Statistical Challenges in the Production and Analysis of Remote Sensing Earth Science Data at the Jet 28-28
TRANSCRIPT OF PRESENTATION 29-41
Ralph Milliff Global and Regional Surface Wind Field Inferences from Spaceborne Scatterometer Data 42-42
TRANSCRIPT OF PRESENTATION 43-51
Global and Regional Surface Wind Field Inferences Given Spaceborne Scatterometer Data Ralph F.Milliff 52-52
GLOBAL AND REGIONAL SURFACE WIND FIELD INFERENCES FROM SPACE-BORNE SCATTEROMETER DATA 53-53
Blending QSCAT and Weather-Center Analysis Winds 54-54
Bayesian Hierarchical Model for Surface Winds in the Tropics 55-55
A Bayesian Hierarchical Air-Sea Interaction Model 56-56
Figure Captions 57-62
Summary 63-63
Report from Breakout Group 64-65
David Scott, Chair of Session on High-Energy Physics Introduction by Session Chair 66-67
TRANSCRIPT OF PRESENTATION 68-68
Robert Jacobsen Statistical Analysis of High Energy Physics Data 69-69
TRANSCRIPT OF PRESENTATION 70-89
Paul Padley Some Challenges in Experimental Particle Physics Data Streams 90-90
ABSTRACT OF PRESENTATION 91-91
TRANSCRIPT OF PRESENTATION 92-113
Miron Livny Data Grids (or, A Distributed Computing View of High Energy Physics) 114-114
TRANSCRIPT OF PRESENTATION 115-133
Report from Breakout Group 134-135
Daryl Pregibon Keynote Address: Graph Mining - Discovery in Large Networks 136-136
ABSTRACT OF PRESENTATION 137-137
TRANSCRIPT OF PRESENTATION 138-164
Sallie Keller-McNulty, Chair of Session on Integrated Data Systems Introduction by Session Chair 165-165
TRANSCRIPT OF PRESENTATION 166-166
J.Douglas Beason Global Situational Awareness 167-167
ABSTRACT OF PRESENTATION 168-168
TRANSCRIPT OF PRESENTATION 169-176
Kevin Vixie Incorporating Invariants in Mahalanobis Distance-Based Classifiers: Applications to Face Recognition 177-177
TRANSCRIPT OF PRESENTATION 178-183
II. COMBINING WITHIN CLASS COVARIANCES AND LINEAR APPROXIMATIONS TO INVARIANCES 184-185
III. FACE RECOGNITION RESULTS 186-186
IV. CONCLUSIONS 187-188
REFERENCES 189-189
John Elder Ensembles of Models: Simplicity (of Function) Through Complexity (of Form) 190-190
TRANSCRIPT OF PRESENTATION 191-206
Report from Breakout Group 207-209
Mark Hansen Untitled Presentation 210-210
TRANSCRIPT OF PRESENTATION 211-222
Wendy Martinez, Chair of Session on Network Traffic Introduction by Session Chair 223-223
TRANSCRIPT OF PRESENTATION 224-224
William Cleveland FSD Models for Open-Loop Generation of Internet Packet Traffic 225-225
ABSTRACT OF PRESENTATION 226-226
TRANSCRIPT OF PRESENTATION 227-249
Johannes Gehrke Processing Aggregate Queries over Continuous Data Streams 250-250
ABSTRACT OF PRESENTATION 251-251
TRANSCRIPT OF PRESENTATION 252-260
Edward Wegman Visualization of Internet Packet Headers 261-262
ABSTRACT OF PRESENTATION 263-263
TRANSCRIPT OF PRESENTATION 264-279
Paul Whitney Toward the Routine Analysis of Moderate to Large-Size Data 280-280
TRANSCRIPT OF PRESENTATION 281-294
Leland Wilkinson, Chair of Session on Mining Commercial Streams of Data Introduction by Session Chair 295-295
TRANSCRIPT OF PRESENTATION 296-297
Lee Rhodes A Stream Processor for Extracting Usage Intelligence from High-Momentum Internet Data 298-298
TRANSCRIPT OF PRESENTATION 299-307
1. INTRODUCTION 308-308
2. BUSINESS CHALLENGES FOR THE NSPs 309-309
3.2 SESSION MEs 310-310
4. DATA STREAMS AND RIVERS 311-312
5. IUM HIGH-LEVEL ARCHITECTURE 313-313
6. STREAM COLLECTION AND NORMALIZATION 314-314
7. STREAM RULE PROCESSING 315-315
8. RULE CHAINS AND ASSOCIATED DATA STRUCTURES 316-317
9.1 CAPTURE MODELS 318-319
9.3 DRILL FORWARD 320-321
9.4 USER INTERACTION WITH STREAMING MODELS 322-323
10. SUMMARY 324-324
REFERENCES 325-325
Pedro Domingos A General Framework for Mining Massive Data Streams 326-326
TRANSCRIPT OF PRESENTATION 327-341
1 The Problem 342-343
2 The Framework 344-344
3 Time-Changing Data 345-345
Reference 346-346
TRANSCRIPT OF PRESENTATION 347-361
Andrew Moore kd- R- Ball- and Ad- Trees: Scalable Massive Science Data Analysis 362-362
TRANSCRIPT OF PRESENTATION 363-388
Concluding Comments 389-389

What is skim?

The Chapter Skim search tool presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter. You may select key terms to highlight them within pages of each chapter.

Copyright Information

The National Academies Press (NAP) has partnered with Copyright Clearance Center's Marketplace service to offer you a variety of options for reusing NAP content. Through Marketplace, you may request permission to reprint NAP content in another publication, course pack, secure website, or other media. Marketplace allows you to instantly obtain permission, pay related fees, and print a license directly from the NAP website. The complete terms and conditions of your reuse license can be found in the license agreement that will be made available to you during the online order process. To request permission through Marketplace you are required to create an account by filling out a simple online form. The following list describes license reuses offered by the NAP through Marketplace:

  • Republish text, tables, figures, or images in print
  • Post on a secure Intranet/Extranet website
  • Use in a PowerPoint Presentation
  • Distribute via CD-ROM
  • Photocopy

Click here to obtain permission for the above reuses. If you have questions or comments concerning the Marketplace service, please contact:

Marketplace Support
International +1.978.646.2600
US Toll Free +1.855.239.3415
E-mail: support@copyright.com
marketplace.copyright.com

To request permission to distribute a PDF, please contact our Customer Service Department at customer_service@nap.edu.

loading iconLoading stats for Statistical Analysis of Massive Data Streams: Proceedings of a Workshop...