National Academies Press: OpenBook

Statistical Analysis of Massive Data Streams: Proceedings of a Workshop (2004)

Chapter: 5. IUM HIGH-LEVEL ARCHITECTURE

« Previous: 4. DATA STREAMS AND RIVERS
Suggested Citation:"5. IUM HIGH-LEVEL ARCHITECTURE." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 313

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

A STREAM PROCESSOR FOR EXTRACTING USAGE INTELLIGENCE FROM HIGH-MOMENTUM INTERNET DATA 313 and queries anticipated. Unfortunately, Internet data can have a high number of dimensions, the variables can be highly skewed in both frequency and value, and some of the events or patterns of high interest can be very rare (e.g., a slow address scan by a potential intruder). To make matters worse, with the constant evolution of viruses and worms, the priority of what is important to examine is constantly changing. These complicating factors make the selection of data reduction techniques somewhat of an art form. Broadband service providers find themselves between a rock and a hard place. They need much richer information about their subscriber usage behavior with strong business rationale on both the revenue and the cost side. The rock is the very high cost of building and managing these large datasets. The hard place is that most general purpose data analysis tools presume that the data to be analyzed exists or will exist in a database. No database, no analysis. What if you could extract some meaningful information about a data stream before you had to aggregate and commit it to hard storage? This idea, by itself, is not exactly new. But what is needed in a number of these high- momentum, complex data stream situations is a high-performance, flexible, and adaptive stream processing and analysis platform as a pre-processor to long-term storage and other conventional analysis systems. In this context, high performance means the ability to collect and process data at speeds much faster (>10X) than most common database systems; flexible implies a modular architecture that can be readily configured with new or specialized components as needs evolve; adaptive implies that certain key components can change their internal logic or rules on-the-fly. These changes could be as a result of a change in the input stream, or a detected change in the reference data from the environment, or from an analyst's console. Starting in 2000 we set out to build a platform with these goals in mind. The remainder of this article discusses the progress we have made. 5. IUM HIGH-LEVEL ARCHITECTURE Figure 2 is a high-level view of the IUM architecture. Streams of data flow left to right. The purple boxes on the left represent different sources of raw data within a service provider's network infrastructure. The blue boxes on the right represent the target business applications or processes that require distinctly different algorithms or rule sets applied to the streams of data. The gold triad of a sphere, rectangular prism, and a cylinder represent a single instance of an IUM server software agent that we call a Collector. Each Collector is capable of merging multiple streams of input data and producing multiple output streams, each of which can be processed by a different set of rules. The basic unit of scalability is the Collector. The first dimension of scaling is horizontal (actually front to back in the graphic) in that different input streams can be processed in parallel by different Collectors on the left. The second dimension of scale can be achieved though the processing speed of the hardware hosts. The third dimension of scale can be achieved by using pipelining techniques that partition the overall processing task for the various target applications into smaller sequential tasks that can execute in parallel. The

Next: 6. STREAM COLLECTION AND NORMALIZATION »
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop Get This Book
×
 Statistical Analysis of Massive Data Streams: Proceedings of a Workshop
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!