IBM believes that such applications are only beginning, demonstrating the start of a possible next wave of business applications. Further, the combination of astute data analytics from IBM and continued contact with customers is key to success. Smith posited that integrating social media analytics is critical to reducing time to value.

DISCUSSION OF BIG DATA

Darrell Long and Gilman Louie

Workshop attendee Gilman Louie and committee member Darrell Long next led a discussion on big data challenges. The challenges of big data are difficult to categorize, primarily because the exact definition of "big data" varies according to the intentions of the speaker. As such, several participants noted that it is important to specify precisely which problem applies in which context and then approach the problem from that definitional space. There was a robust dialog that included four different ways to view the problem: volume3 of data (too much data), ubiquity of sensor data, data fusion challenges, and too much of a certain type of data.

Big Data as Too Much Data

Beginning with the problem of big data as simply too much data, or overwhelming amounts of data, many participants felt that the challenge was not new. It was pointed out that too much data had always been a problem, particularly for aggressive collectors of data, as large governments tend to be, and that the result of too much data typically inspires new approaches for handling the increasing amounts of data. Several participants, however, noted two elements of the current era of big data that seemed to be different from previous eras. The first element was the relationship of data to individuals, uniquely and globally; i.e., the big data challenges of this era seem to be mostly about the data associated with individual movements, preferences, sentiments, and thoughts. This situation differs from previous eras in which big data tended to be generated as a result of economic activities, wars, and science. The individualization of big data stems primarily from the social networking phenomenon but is also enabled by the credit and debit card industries and the logistics industry, particularly point-of-sale applications. Other workshop participants noted that the second element was the importance of algorithmic analysis of data; i.e., the use of math, machine learning, and human emotion-behavior analyses (such as sentiment analyses) seems to have made both a quantitative and a qualitative difference in how data is used and interpreted.

Big Data as Ubiquitous Sensor Data

It was noted by several workshop participants that part of the forcing function of the big data problem is what is called the data ingest challenge: more data is originating from more sensors. These sensors range from social network updates (Facebook posts, tweets, blog posts, etc.) to embedded, distributed utility functions (wifi repeaters, cameras, financial transactions). Some participants suggested that the greater variety of sources of data and data inputs requires different approaches to data integration and analysis, and also contributes to data communication and storage challenges.

image

3 Some believe that unlike in the “old world” where volume was a problem, in the big data world, volume is a friend: even dirty data can increase the “resolution” of an entity. In the big data world, data is processed differently. Unlike in the old world where data was processed by reducing collections down to semi-finished and finished intelligence (known as the INTs) and then re-integrating it (all-source analysis) to produce knowledge, in the big data world, data is computed all at once and across different data types, to reveal or allow discovery of knowledge and intelligence.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement