After the first presentations of the day, workshop participants began several hours of open discourse recapping and further exploring topics raised in the morning and on the previous day. Numerous anecdotes shared demonstrated that “drowning in data” was not a new problem for the intelligence community or for DoD at large, and that the coming paradigm shift arises from the challenge that big data presents more than just a “volume” problem. According to many workshop participants, the big data challenge has at least three aspects—technical, temporal, and personnel—each with very different implications.
The technical aspect of big data encompasses a range of obstacles that hide under the labels of “just hardware” or “just software” or “just human factors.” Investment by both the government and the private sector is ongoing in each of these areas, with a growing understanding that the greatest progress lies in attending to all three from a unified perspective rather than treating them as independent investments.
Discussion among the participants revealed two very different time-based challenges, real-time (an increasing rate of data production that accompanied some real-time sensor applications) and retrospective (an increasing amount of data over a larger and larger period of time). Many contributed to the discussion with examples of how the increase in sensor data acquisition rates was making it more and more difficult to transmit in real time to another point for analysis. This reality has prompted a great deal of attention to methods of digesting, parsing, or triaging data, resulting in transmission of only the actionable parts. Alternately, the discussion touched on the exponentially increasing size of historical data sets, which are fueling interest in inference techniques.
Many workshop participants argued that individuals who are trained in and work at the cutting edge of big data are currently in short supply and that the supply is dwindling even as demand continues to grow. Given that a large number of new advanced degrees in this area are awarded to foreign nationals who then return to their countries of origin, some argued that it would seem that efforts to recruit and retain such individuals in positions in the United States should be redoubled.
Some workshop participants said that machines and humans must learn to work together to exploit the burgeoning world of big data. For example, Gary Kasparov’s 2005 free-style chess tournament, in which teams could be composed of any combination of humans and computers, was won by two amateur chess players running open source chess engines on simple off-the-shelf laptops—not by grandmasters, prodigies, or chess supercomputers. Big data analysis requires much human interaction and guidance, and the optimal combination is not necessarily the best machines and the brightest humans. It may instead be the right interface between human and machine.
Asher Sinensky of Palantir Technologies
Asher Sinensky of Palantir started his talk by using a chess metaphor for human and machine interaction. Humans have an uncanny ability to make decisions and analyze ideas. This ability is unique to humankind. He stated that for the technological enterprise it is important that humans be in the analysis process. Humans are key for conceptualizing new innovations and new ideas after data analysis. See Box 3-1 for Sinensky’s full comments.
David Thurman of Pacific Northwest National Laboratory
David Thurman, computing strategy lead at PNNL, asserted that, in the future, computing applications will move toward computer architectures that bring together the different strengths of customized hardware and software capabilities to evaluate different types of distributed data (bringing together many types of computing). A key issue is being able to derive results-generated data from across different agencies.
He said that in 2005 PNNL created a new computer architecture that analyzes data where it resides rather than making copies of the data and transferring it to one central location. This architecture assumes the availability of new highly efficient algorithms tailored to distribute the heterogeneous data sets. Many of these algorithms are derived from the rapidly evolving commercial off-the-shelf (COTS) processing applications of big data.
At the end of the April 2012 workshop, the chair asked committee members and speakers in attendance to make any final comments on what they had heard over the two days. These comments are made as a summary for the workshop:
Ken Kress—“Big data” is more than just a change of scale—it is a more persistent threat than we have previously observed.”
Al Velosa—“Progress in the human-machine interface will reduce friction and will allow capability enhancement for the individual, but we will mostly likely experience a fluidity of people more pronounced than we have ever seen.”
David Thurman—“I am struck by how different are the threat and impact of big data versus ballistic missiles and other classical threats because of the acceleration of commercially driven offerings, none of which are as controllable as the classical threat domains.”
Asher Sinensky—“Now more than at any time in history, we must demand flexibility and adaptability in the tool sets we create for the problem at hand, because those problems are changing faster than ever before, and we don’t have time to create a new generation of inflexible tools to counter each new twist.”
Mikhail Shapiro—“The highest value should be placed on the human capital, the engineer, and that asset is an asymmetric economic issue.”
Brian Ballard—“The big question is how to organize the data and make it accessible to the problem solvers. Cyber is its own category, but big data is a force multiplier of massive scale, with far-reaching implications. Succeeding here will allow us to ‘own the net,’ delivering advantages that we posit today but even more importantly, advantages of which we are not yet even aware.”
One of the most important years in the history of big data was 1997, the year that Deep Blue beat Gary Kasparov at chess. At first blush, this might not seem like a big data challenge; chess after all has only 64 spaces, 32 pieces, 6 different types of pieces, and only two players. However, when chess is analyzed more deeply, its true complexity emerges. Claude Elwood Shannon, the so-called father of information theory, showed that the number of legal configurations a chess board could realize is approximately 1043. This is obviously an enormous number and is sometimes referred to as the Shannon number. A study* by the University of Southern California in early 2011 estimated the world’s total digital storage to be on the order of 1021. In this light, chess is clearly huge when considered against the scale of the digital world. Even beyond that, a Dutch computer scientist, Louis Victor Allis, estimated the game-tree of complexity of chess to be approximately 10123. That number is roughly 40 orders of magnitude greater than the estimated number of atoms in the entire universe. The act of computationally playing chess is clearly a “big data” problem, and 1997 showed us that computers can do this better than humans can.
The next important year in this story is 2005. In that year, Gary Kasparov decided to host his own chess tournament. In light of Deep Blue, Kasparov become extremely interested in the capabilities of computational systems but also in the ways that computers and humans approach problem solving. Kasparov’s 2005 chess tournament was a free-style tournament in which teams could be composed of any combinations of humans and computers available. Grandmasters, prodigies, and chess supercomputers could team up to form super teams. By 2005, it had already been shown that a chess master teamed with a chess supercomputer was far more capable than a supercomputer alone. Computers and humans have different and complementary analytic strengths: computers don’t make mistakes, they are highly precise, while humans can use intuition and lateral thinking. These skills can be combined to build truly formidable chess opponents. However, 2005 was different. The winning team, ZackS, performed so well many thought it was actually Kasparov’s team. However, the truth was much more intriguing. It turns out that ZackS was actually two amateur chess players running open source chess engines on simple off-the-shelf laptops—no grandmasters, no prodigies, no chess supercomputers.
This was a remarkable outcome that surprised everyone, including Kasparov himself. Kasparov drew the only conclusion he could: “Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.” This revelation points to the essential evolution of the conclusion from Deep Blue in 1997—that humans working together with machines can solve big data challenges better than computers alone. Tackling big data means more than just algorithms, high-performance computing, and massive storage—it means leveraging the abilities of the human mind.
* See http://www.computerworld.com/s/article/9209158/Scientists_calculate_total_data_stored_to_date_295_exabytes. See also ZackS - http://chessbase.com/newsdetail.asp?newsid=2461; “Friction in Human-Computer Symbiosis: Kasparov on Chess” at http://blog.palantir.com/2010/03/08/friction-in-human-computer-symbiosis-kasparov-on-chess/; and “A Rigorous Friction Model for Human-Computer Symbiosis” at http://blog.palantir.com/2010/06/02/a-rigorous-friction-model-for-human-computersymbiosis/.
This page intentionally left blank.