Frontiers in Massive Data Analysis

National Research Council

doi:10.17226/18374

Consensus Study Report

VIEW LARGER COVER

Frontiers in Massive Data Analysis

(2013)

Download Free PDF

Read Free Online

Buy Paperback: $46.00

Buy Ebook: $36.99

Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data.

Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale--terabytes and petabytes--is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge--from computer science, statistics, machine learning, and application disciplines--that must be brought to bear to make useful inferences from massive data.

Contributor(s): National Research Council; Division on Engineering and Physical Sciences; Board on Mathematical Sciences and Their Applications; Committee on the Analysis of Massive Data; Committee on Applied and Theoretical Statistics

RESOURCES AT A GLANCE

Video(s):

Video

Topics

Suggested Citation

National Research Council. 2013. Frontiers in Massive Data Analysis. Washington, DC: The National Academies Press. https://doi.org/10.17226/18374.

Import this citation to:

Publication Info

190 pages | 6 x 9 |

ISBNs:

Paperback: 978-0-309-28778-4
Ebook: 978-0-309-28781-4

DOI: https://doi.org/10.17226/18374

Chapters		skim
Front Matter	i-xiv
Summary	1-10
1 Introduction	11-21
2 Massive Data in Science, Technology, Commerce, National Defense, Telecommunications, and Other Endeavors	22-40
3 Scaling the Infrastructure for Data Management	41-57
4 Temporal Data and Real-Time Algorithms	58-65
5 Large-Scale Data Representations	66-81
6 Resources, Trade-offs, and Limitations	82-92
7 Building Models from Massive Data	93-119
8 Sampling and Massive Data	120-132
9 Human Interaction with Data	133-145
10 The Seven Computational Giants of Massive Data Analysis	146-160
11 Conclusions	161-166
Appendixes	167-168
Appendix A: Acronyms	169-170
Appendix B: Biographical Sketches of Committee Members	171-176

What is skim?

The Chapter Skim search tool presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter. You may select key terms to highlight them within pages of each chapter.

Videos

Video

Scott Weidman, director of the Board on Mathematical Science and their Applications at the NRC, explains the charge and key recommendation of the report along with the challenges and opportunties the Massive Data presents.

Copyright Information

The National Academies Press (NAP) has partnered with Copyright Clearance Center's Marketplace service to offer you a variety of options for reusing NAP content. Through Marketplace, you may request permission to reprint NAP content in another publication, course pack, secure website, or other media. Marketplace allows you to instantly obtain permission, pay related fees, and print a license directly from the NAP website. The complete terms and conditions of your reuse license can be found in the license agreement that will be made available to you during the online order process. To request permission through Marketplace you are required to create an account by filling out a simple online form. The following list describes license reuses offered by the NAP through Marketplace:

Republish text, tables, figures, or images in print
Post on a secure Intranet/Extranet website
Use in a PowerPoint Presentation
Distribute via CD-ROM
Photocopy

Click here to obtain permission for the above reuses. If you have questions or comments concerning the Marketplace service, please contact:

Marketplace Support
International +1.978.646.2600
US Toll Free +1.855.239.3415
E-mail: support@copyright.com
marketplace.copyright.com

To request permission to distribute a PDF, please contact our Customer Service Department at customer_service@nap.edu.

Loading stats for Frontiers in Massive Data Analysis...