National Academies Press: OpenBook
« Previous: Leland Wilkinson, Chair of Session on Mining Commercial Streams of Data Introduction by Session Chair
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 296
Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 297

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

INTRODUCTION BY SESSION CHAIR 296 TRANSCRIPT OF PRESENTATION MR. WILKINSON: All right, this session is on mining commercial streams of data, with Lee Rhodes, Pedro Domingos and Andrew Moore. Only one of our speakers is from a company although the other two, as you know, are involved in developing procedures, high-dimensional searches, and mining and other areas that are highly relevant to what is done by businesses. I just want to highlight the three major market shares of applications of streaming data analysis, and these are quite large. Monitoring and process control involves such applications as General Electric with its turbines worldwide. There are many, many turbines and, to shut down a turbine, can cost millions of dollars per day in their system. So, they need to maintain continuous multivariate data stream monitoring on those turbines, and they have real needs for display and alert and analysis capabilities. E-commerce goes without saying. We all know pretty much where that lies. Many are putting e-commerce data and Web logs into databases, but Amazon and other companies are analyzing these in real-time. Financial is another huge area for streaming data. I thought I would give you a quick illustration of how that gets used. This is a JAVA application called Dancer that is based on the graphics algebra, and the data we are putting into it now, we happen to be offline, of course, but this data feed is simulating a natural stream coming in. These are Microsoft stock trades, and these are coming in at roughly 5 to 10 per second. On the right, you see the list of trading houses, like Lehman Brothers, and so on. These trades, the symbol size is proportional to the volume of the trade. Up arrow is a buy, down arrow is a sell order, and then a cross trade is a rectangle. These traders want to be able to do things like alter time, back it up, and reverse it. Those of you who have seen the TiVo system for TV, video, know that these kinds of manipulations of time can be critical. This application, by the way, is not claiming this as a visualization. It is actually doing the calculations as soon as the real-time feed comes in. Notice all the scaling is being done on the fly. You can speed up the series. If you speed this up fast enough, it is a time machine, but I won't go into that. I will show you just one more aspect of real-time graphics, and these are the kinds of graphics that you plug into what the rest of you guys do. When you develop algorithms, you can plug them into graphic displays of this sort. This one simulates the way I buy stock. Actually, I don't buy stock for this reason. It is just a simple exponential forecast. You can see the behavior. This is trading in Oracle and SBSS. This type of a forecast represents exactly what I do and probably some of you as well which is, as soon as it starts going up a little bit, buy. What is being done here, the model is being computed in real-time. So, you get, in this kind of a system, anywhere from 10 updates a second to 10,000 data events per second, and 90 percent of the effort in developing software in this area is in the data handling. How do you buffer 10,000 events per second and then render in roughly frames per second using the graphic system? So, the rendering system is a lot simpler than the actual data handling system.

INTRODUCTION BY SESSION CHAIR 297 So, now we are going to see some presentations that will highlight how these systems work, and we will begin with Lee Rhodes from Hewlett-Packard, who will tell you about data collection on the Web.

Next: Lee Rhodes A Stream Processor for Extracting Usage Intelligence from High-Momentum Internet Data »
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop Get This Book
×
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!