National Academies Press: OpenBook
« Previous: 2. BUSINESS CHALLENGES FOR THE NSPs
Suggested Citation:"3.2 SESSION MEs." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.
×
Page 310

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

A STREAM PROCESSOR FOR EXTRACTING USAGE INTELLIGENCE FROM HIGH-MOMENTUM INTERNET DATA 310 factor of ~0.7). The top 20% of subscribers generate ~80% of all traffic. The top 5% of the subscribers generate ~50% of all traffic. Another way to look at this is that 95% of the subscribers can end up subsidizing the top 5%. Simple flat rate pricing plans for unlimited usage broadband services will naturally force the NSP to charge high monthly fees, which naturally restricts the economic accessibility and uptake of the broadband services. 3. SOURCES AND TYPES OF DATA Data sources can also be grouped by different device types, which vary considerably by the specific application. Device examples include network equipment (routers, switches, and gateways), application servers (Web, e-mail, game servers), general purpose computers, network probes, and database management systems (DBMS). We have found it useful to classify the types of data sources into usage, session, and reference categories based on how the data needs to be processed. For many real-time sources of data we have defined the term metered event (ME) as an atomic data structure that encapsulates information about or relevant to usage of a service at a specific point in time or within a specific window in time. 3.1 USAGE MEs Usage MEs contain metadata, which are data about data. At the lowest level of collection usage MEs are often grouped into small records where each of the fields contain basic statistics about an atomic usage event such as a single phone call or an Internet data transfer. Typical fields that are often found in usage MEs are source, destination, usage volume, start time, and end time. Depending on the context and applications involved, a usage ME may also include fields such as service type, quality of service level, termination conditions or error codes. In telephony a common usage ME is the Call Detail Record (CDR) that is produced by the originating switch and records key information about the calling number (source), the called number (destination), and the length of the call in minutes (usage) among other fields. It is from CDRs that telephone companies construct their billing records and perform extensive analysis of subscriber behavior. In the Internet context a single ME might capture the usage details of a large file download or a small GIF image of a button. Because Web pages can be containers for references to many other web pages or objects, clicking on a few pages of a complex Web site can result in hundreds of ME events. Considering this “session” of browsing a Web site as roughly comparable to a telephone call, it is easy to see that the number of MEs generated will be considerably higher than the single CDR produced from a phone call. 3.2 SESSION MEs Session MEs provide accounting and state information about the user originating a

Next: 4. DATA STREAMS AND RIVERS »
Statistical Analysis of Massive Data Streams: Proceedings of a Workshop Get This Book
×
 Statistical Analysis of Massive Data Streams: Proceedings of a Workshop
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Massive data streams, large quantities of data that arrive continuously, are becoming increasingly commonplace in many areas of science and technology. Consequently development of analytical methods for such streams is of growing importance. To address this issue, the National Security Agency asked the NRC to hold a workshop to explore methods for analysis of streams of data so as to stimulate progress in the field. This report presents the results of that workshop. It provides presentations that focused on five different research areas where massive data streams are present: atmospheric and meteorological data; high-energy physics; integrated data systems; network traffic; and mining commercial data streams. The goals of the report are to improve communication among researchers in the field and to increase relevant statistical science activity.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!