National Academies Press: OpenBook
« Previous: TRANSCRIPT OF PRESENTATION
Suggested Citation:"Report from Breakout Group." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 134
Suggested Citation:"Report from Breakout Group." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 135

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

REPORT FROM BREAKOUT GROUP 134 Report from Breakout Group Instructions for Breakout Groups MS. KELLER-MC NULTY: There are three basic questions, issues, that we would like the subgroups to come back and report on. First of all, what sort of outstanding challenges do you see relative to the collection of material that was in the session? In particular there, we heard in all these cases that there are real specific constraints on these problems that have to be taken into consideration. We can't just assume we get the process infinitely fast, whatever we want. The second thing is, what are the needed collaborations? It is really wonderful today. So far, we are hearing from a whole range of scientists. So, what are the needed collaborations to really make progress on these problems? Finally, what are the mechanisms for collaboration? You know, Amy, for example, had a whole list of suggestions with her talk. So, the three things are the challenges, what are the scientific challenges, what are the needed collaborations, and what are some ideas on mechanisms for realizing those collaborations? Report from High-Energy Physics Breakout Group GROUP TWO PRESENTER: I only took a few notes, so I am trying to stall, but I am glad to see Mark Hansen has arrived. So, we talked about experimental physics. What is interesting is that there is sort of a matrix in my mind of what we discussed. I think Paul had mentioned there was a conference in Durham earlier this year in March, in which there were 100 physicists and two statisticians starting to scratch the surface of issues. There is a follow-up meeting in Stanford in September. Somebody named Brad Efron is the keynote speaker. So, presumably, there will be at least one statistician. I think what was clear is that, sort of in the current context of what experimental physics is doing, there is a list of very specific questions that they think they would like answered. What we had discussed went beyond that. We were really looking, gee, if we had some real statisticians involved, what deeper issues could we get into. I think that, after a good round of discussion for an hour, we decided there were probably a lot of really neat, cool things that could be done by somebody who would like to have a career changing event in their lives. Alan Wilkes is feeling a little old, but he thinks he might be willing to do this. I think on the good note is what you have, which is often —on another good note—collaborations are clearly in their infancy. There are only a few statisticians in the world, is sort of my observation. So, there is a reason why there are not a lot more collaborations than there should be, perhaps. If you look at Doug's efforts in climatology, there are really some very established efforts. If you look at astronomy, you have had some efforts in the last four years that have really escalated to the next level, and I think physics is high on the list of making it to the next step. I think there are probably a lot of agencies here in this town that would help make that happen. The thing that gets more to sort of the issue at hand here is that there are a whole

REPORT FROM BREAKOUT GROUP 135 lot of statistical things involved in what are called triggering. So, things are going on in this detector and the thing is when to record data, since they don't record all 22 terabytes a second, although they would like to, I guess, if they could. The interesting statistic that I heard was, with what they do now, they think they get 99.1 percent of the interesting events among all the billions of ones that turn out not to be interesting. So, 99.1 is perhaps not a bad collection ratio. So, much of the really interesting statistics that we have talked about is sort of the off-line type. In other words, once you have stored away these gigabytes of data, there are lots of interesting pattern-recognition problems and stuff. Sort of on the real-time data mining sort of issue, we didn't sort of pursue that particular issue very deeply. What struck everybody was how time-sensitive the science is here, and that the way statisticians do science is sort of at the dinosaur pace and the way physicists do it is, if they only sleep three hours a night, the science would get done quicker, and it is a shame they can't stay up 24 hours a day. There is lots of discussion about magic tricks to make the science work quicker. All in all, I think the conversation really grew in intensity and excitement for collaborations, and almost everybody seemed to have ideas about how they could contribute to the discussion. I think I would like to leave it there and ask anybody else in the group if they wanted to add something.

Next: Daryl Pregibon Keynote Address: Graph Mining - Discovery in Large Networks »
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop Get This Book
×
 Statistical Analysis of Massive Data Streams: Proceedings of a Workshop
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!