Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 9
3
Second-Day Discussion
After the first presentations of the day, workshop participants began several hours of open
discourse recapping and further exploring topics raised in the morning and on the previous day.
Numerous anecdotes shared demonstrated that “drowning in data” was not a new problem for the
intelligence community or for DoD at large, and that the coming paradigm shift arises from the
challenge that big data presents more than just a “volume” problem. According to many workshop
participants, the big data challenge has at least three aspects—technical, temporal, and personnel—
each with very different implications.
TECHNICAL
The technical aspect of big data encompasses a range of obstacles that hide under the labels of
“just hardware” or “just software” or “just human factors.” Investment by both the government and
the private sector is ongoing in each of these areas, with a growing understanding that the greatest
progress lies in attending to all three from a unified perspective rather than treating them as
independent investments.
TEMPORAL
Discussion among the participants revealed two very different time-based challenges, real-time
(an increasing rate of data production that accompanied some real-time sensor applications) and
retrospective (an increasing amount of data over a larger and larger period of time). Many contributed
to the discussion with examples of how the increase in sensor data acquisition rates was making it
more and more difficult to transmit in real time to another point for analysis. This reality has
prompted a great deal of attention to methods of digesting, parsing, or triaging data, resulting in
transmission of only the actionable parts. Alternately, the discussion touched on the exponentially
increasing size of historical data sets, which are fueling interest in inference techniques.
PERSONNEL
Many workshop participants argued that individuals who are trained in and work at the cutting
edge of big data are currently in short supply and that the supply is dwindling even as demand
continues to grow. Given that a large number of new advanced degrees in this area are awarded to
foreign nationals who then return to their countries of origin, some argued that it would seem that
efforts to recruit and retain such individuals in positions in the United States should be redoubled.
Some workshop participants said that machines and humans must learn to work together to
exploit the burgeoning world of big data. For example, Gary Kasparov’s 2005 free-style chess
tournament, in which teams could be composed of any combination of humans and computers, was
won by two amateur chess players running open source chess engines on simple off-the-shelf
laptops—not by grandmasters, prodigies, or chess supercomputers. Big data analysis requires much
human interaction and guidance, and the optimal combination is not necessarily the best machines
and the brightest humans. It may instead be the right interface between human and machine.
9
OCR for page 10
10 REPORT OF A WORKSHOP ON BIG DATA
BLUE PROCESS
Asher Sinensky of Palantir Technologies
Asher Sinensky of Palantir started his talk by using a chess metaphor for human and machine
interaction. Humans have an uncanny ability to make decisions and analyze ideas. This ability is
unique to humankind. He stated that for the technological enterprise it is important that humans be in
the analysis process. Humans are key for conceptualizing new innovations and new ideas after data
analysis. See Box 3-1 for Sinensky’s full comments.
David Thurman of Pacific Northwest National Laboratory
David Thurman, computing strategy lead at PNNL, asserted that, in the future, computing
applications will move toward computer architectures that bring together the different strengths of
customized hardware and software capabilities to evaluate different types of distributed data (bringing
together many types of computing). A key issue is being able to derive results-generated data from
across different agencies.
He said that in 2005 PNNL created a new computer architecture that analyzes data where it
resides rather than making copies of the data and transferring it to one central location. This
architecture assumes the availability of new highly efficient algorithms tailored to distribute the
heterogeneous data sets. Many of these algorithms are derived from the rapidly evolving commercial
off-the-shelf (COTS) processing applications of big data.
CLOSING REMARKS
At the end of the April 2012 workshop, the chair asked committee members and speakers in
attendance to make any final comments on what they had heard over the two days. These comments
are made as a summary for the workshop:
Ken Kress—“Big data” is more than just a change of scale—it is a more persistent threat than we
have previously observed.”
Al Velosa—“Progress in the human-machine interface will reduce friction and will allow
capability enhancement for the individual, but we will mostly likely experience a fluidity of people
more pronounced than we have ever seen.”
David Thurman—“I am struck by how different are the threat and impact of big data versus
ballistic missiles and other classical threats because of the acceleration of commercially driven
offerings, none of which are as controllable as the classical threat domains.”
Asher Sinensky—“Now more than at any time in history, we must demand flexibility and
adaptability in the tool sets we create for the problem at hand, because those problems are changing
faster than ever before, and we don’t have time to create a new generation of inflexible tools to
counter each new twist.”
Mikhail Shapiro—“The highest value should be placed on the human capital, the engineer, and
that asset is an asymmetric economic issue.”
Brian Ballard—“The big question is how to organize the data and make it accessible to the
problem solvers. Cyber is its own category, but big data is a force multiplier of massive scale, with
far-reaching implications. Succeeding here will allow us to ‘own the net,’ delivering advantages that
we posit today but even more importantly, advantages of which we are not yet even aware.”
OCR for page 11
SECOND-DAY DISCUSSION 11
BOX 3-1
Chess Analogy
Asher Sinensky
One of the most important years in the history of big data was 1997, the year that Deep Blue
beat Gary Kasparov at chess. At first blush, this might not seem like a big data challenge; chess
after all has only 64 spaces, 32 pieces, 6 different types of pieces, and only two players. However,
when chess is analyzed more deeply, its true complexity emerges. Claude Elwood Shannon, the
so-called father of information theory, showed that the number of legal configurations a chess
board could realize is approximately 1043. This is obviously an enormous number and is
sometimes referred to as the Shannon number. A study* by the University of Southern California
in early 2011 estimated the world’s total digital storage to be on the order of 1021. In this light,
chess is clearly huge when considered against the scale of the digital world. Even beyond that, a
Dutch computer scientist, Louis Victor Allis, estimated the game-tree of complexity of chess to
be approximately 10123. That number is roughly 40 orders of magnitude greater than the estimated
number of atoms in the entire universe. The act of computationally playing chess is clearly a “big
data” problem, and 1997 showed us that computers can do this better than humans can.
The next important year in this story is 2005. In that year, Gary Kasparov decided to host his
own chess tournament. In light of Deep Blue, Kasparov become extremely interested in the
capabilities of computational systems but also in the ways that computers and humans approach
problem solving. Kasparov’s 2005 chess tournament was a free-style tournament in which teams
could be composed of any combinations of humans and computers available. Grandmasters,
prodigies, and chess supercomputers could team up to form super teams. By 2005, it had already
been shown that a chess master teamed with a chess supercomputer was far more capable than a
supercomputer alone. Computers and humans have different and complementary analytic
strengths: computers don’t make mistakes, they are highly precise, while humans can use
intuition and lateral thinking. These skills can be combined to build truly formidable chess
opponents. However, 2005 was different. The winning team, ZackS, performed so well many
thought it was actually Kasparov’s team. However, the truth was much more intriguing. It turns
out that ZackS was actually two amateur chess players running open source chess engines on
simple off-the-shelf laptops—no grandmasters, no prodigies, no chess supercomputers.
This was a remarkable outcome that surprised everyone, including Kasparov himself.
Kasparov drew the only conclusion he could: “Weak human + machine + better process was
superior to a strong computer alone and, more remarkably, superior to a strong human + machine
+ inferior process.” This revelation points to the essential evolution of the conclusion from Deep
Blue in 1997—that humans working together with machines can solve big data challenges better
than computers alone. Tackling big data means more than just algorithms, high-performance
computing, and massive storage—it means leveraging the abilities of the human mind.
______________________________________
*See http://www.computerworld.com/s/article/9209158/Scientists_calculate_total_data_stored_to_date_
295_exabytes. See also ZackS - http://chessbase.com/newsdetail.asp?newsid=2461; “Friction in Human-
Computer Symbiosis: Kasparov on Chess” at http://blog.palantir.com/2010/03/08/friction-in-human-
computer-symbiosis-kasparov-on-chess/; and “A Rigorous Friction Model for Human-Computer
Symbiosis” at http://blog.palantir.com/2010/06/02/a-rigorous-friction-model-for-human-computer-
symbiosis/.
OCR for page 12
OCR for page 13
Appendixes
OCR for page 14