On December 13 and 14, 2002, the Committee on Applied and Theoretical Statistics of the National Research Council conducted a two-day workshop that explored methods for the analysis of streams of data so as to stimulate further progress in this field. To encourage cross-fertilization of ideas, the workshop brought together a wide range of researchers who are dealing with massive data streams in different contexts. The presentations focused on five major areas of research: atmospheric and meteorological data, high-energy physics, integrated data systems, network traffic, and mining commercial streams of data.
The workshop was organized to allow researchers from different disciplines to share their perspectives on how to use statistical methods to analyze massive streams of data, so as to stimulate cross-fertilization of ideas and further progress in this field. The meeting focused on situations in which researchers are faced with massive amounts of data arriving continually, making it necessary to perform very frequent analyses or reanalyses on the constantly arriving data. Often there is so much data that only a short time window’s worth may be economically stored, necessitating summarization strategies.
The overall goals of this CD report are to improve communication among various communities working on problems associated with massive data streams and to increase relevant activity within the statistical sciences community. Included in this report are the agenda of the workshop, the full and unedited text of the workshop presentations, and biographical sketches of the speakers. The presentations represent independent research efforts on the part of academia, the private sector, federally funded laboratories, and government agencies, and as such they provide a sampling rather than a comprehensive examination of the range of research and research challenges posed by massive data streams. In addition to these proceedings, a set of more rigorous, technical papers corresponding to the workshop presentations has also been published separately as a 2003 special issue of the Journal of Computational and Graphical Statistics.
This proceedings represents the viewpoints of its authors only and should not be taken as a consensus report of the Board on Mathematical Sciences and Their Applications or the National Research Council.