Read "Statistical Analysis of Massive Data Streams: Proceedings of a Workshop" at NAP.edu

« Previous: ABSTRACT OF PRESENTATION

Page 169 Cite

Suggested Citation:"TRANSCRIPT OF PRESENTATION." National Research Council. 2004. Statistical Analysis of Massive Data Streams: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/11098.

Page 170 Cite

Page 171 Cite

Page 172 Cite

Page 173 Cite

Page 174 Cite

Page 175 Cite

Page 176 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

GLOBAL SITUATIONAL AWARENESS 169 TRANSCRIPT OF PRESENTATION MR. BEASON: Thanks, Sallie. This will be from the perspective of a physicist. While I was flying out here, I sat by a particle physicist. I, myself, am a plasma physicist. I found out that, even though we were both from the same field, we really couldn't understand each other. So, good luck on this. Global situational awareness is a thrust that the government is undertaking, largely due to the events surrounding September 11 of 2001. Basically, it is a thrust to try to give decision makers the ability to be able to assess the socioeconomic or tactical battlefield situation in near-real time. The vision in this is to be able to do it anywhere any time, and this is versus everywhere all the time. Now, everywhere all the time may never be achieved, and we may never want to achieve that, especially because of the legal ramifications. The vision to be able to monitor nearly everywhere any time, what I am going to do is to walk you through some of the logic involved. First of all, what does it mean by that? What does it mean by some of the sensors? Then, really get to the core of the matter, which is how do we handle the data. That really is the big problem, not only the assimilation of it, understanding it, trying to fuse it together. We will mine it, fuse it, and then try to predict what is going to happen. Here is an outline of the talk. What are the types of capabilities that I am talking about in the concept of operations? Then, I will spend a little bit of time on the signatures. That is kind of the gravy on here. Again, I am a physicist, and this is where the fun part is. What do we collect and why, a little bit of the physics behind it, and how do we handle the data, how do we mine it, and then how do we fuse it? What scale of problem am I talking about? If we just purely considerâif we try to decouple the problem from the law enforcement to the space situational awareness, and just look, for example, at the battlefield awareness, what people are talking about in the Defense Department is some kind of grid on the order of 100 by 100 kilometers. So, that is 104 kilometers. Then, up to 50 kilometers high, and knowing the resolution down to a meter. That is like 1014 points. This is just the battlefield itself. So, it just staggers your mind, the problem. So, let me give some examples, and what do we mean about the global capabilities. First of all, the argument is being made that it is more than visible. That is, it is more than looking at photographs. It is more than looking at imagery. It includes all types of sensors, and I am going to walk you through this in a minute. It also includes cyberspace, Web sites, e-mail traffic, especially if there is a flurry of activity. What you would like to do is, you would like to have the ability to be able to look at what we call known sites, and to visit these in a time where, first of all, things don't change very much. That is, it could be a building going up and you may only have to revisit this site perhaps weekly, daily or even hourly, if you would like. These are sites where something may be going on, or even Web sites, but you know that the delta time change is not very much, so you don't have to really revisit it too much. The second thing is that you really want to have the capability for monitoring for specific events. If there is a nuclear explosion, if missiles are being moved around, if terrorists are meeting somewhere, you want to have those specific events. You want to be able to telescope down to them, to be able to tap into them. You want to be able to do it on a global scale. Second of all, for those kinds of activities, you may have to have

GLOBAL SITUATIONAL AWARENESS 170 some kind of a tip-off. You don't know, either through any kind of intercepts, like telephone conversations or visual intelligence. You may have to have human intelligence direct you to what is going to be happening. So, that is what you would like on a global scale. On a local scale, you would want very specific things to occur. For example, perhaps when equipment is being turned on or off, if this is a terrorist that you have located that you are communicating, you want to be able to not only geolocate the terrorist, but also to determine some of the equipment that they may be using. This thing of dismounts, right now, this is one of DARPA's largest problems. A dismount is, if you can imagine a caravan going through the desert and you are tracking this caravan and all of a sudden somebody jumps off the caravan, you don't want to divert your observation asset away from that caravan, but yet, this person who jumped off may be the key person. So, you would want to have the capability of not only following that caravan, but to follow this person across the desert, as they jump into a car, perhaps, drive to an airport, jump in a plane and then fly somewhere to go to a meeting. So, how do you do something like that? Again, it is not just visual. If you can imagine an integrated system of sensors that combine, say, acoustic sensors that are embedded in the ground that can follow the individual, and then hand off to some kind of RFâa radio-frequency sensorâthat could follow the car, that could, again, follow the plane that the person may go into. So, what type of sensors are needed, and how do you integrate this in a way so that you don't have a bunch of scientists sitting in a room, each person looking at an oscilloscope saying, okay, this is happening and that is happening and then you are going to hand it off to the next person. What type of virtual space do you need to build to be able to assimilate all this information, integrate it and then hand it off. So, these are some of the problems I will be talking about. The traditional way of looking at this problem is to build a layered system of systems. That is, you have to balance everything from sensitivity resolution, coverage and data volume. I will give you a quick example. Back in the Bosnian War, the communications channels of the military were nearly brought to their knees. The reason was not because of all the high information density that was going back and forth on the communications channel. It was because, when people would send out, say, air tasking orders or orders to go after a certain site, they would send them on PowerPoint slides with 50 or 60 emblems, each bit mapped all around the PowerPoint. So, you had maybe 20 or 30 megabytes of a file that had maybe 20 bits of information on it. So, you have to be smart in this. So, the point there is that when you are making these studies, the answer is not just to build bigger pipes and to make bigger iron to calculate what is going on. You have to do it in a smart way. Again, this is part of the problem, and now what I am going to do is walk you through, first of all, some of the sensors and some of the ways that people think we may be able to attack this problem. First of all, there is more to the sensing than visual imagery. Let me walk you through some examples of these. The case I am trying to build up here is that, for this global situational awareness, the problem is not really inventing new widgets. It is the information, and the information is really the key. It is where the bottleneck is. So, I am going to walk you through just some examples of some sensors that already exist. Some of them are already being used. It is not, again, a case of building new technology all the time. On the lower left-hand side, what you are looking at is a

GLOBAL SITUATIONAL AWARENESS 171 defense threat reduction agency. So, this is why I am here, is to translate that, our project. It is a hard, deeply buried target project. We are basically looking at an underground cavern and trying to determine where assets are in this underground cavern. Of course, that is a timely question today. You can do this by using acoustic sensors. That is, you know the resonances that are built up in these three-dimensional cavities. Just like you can calculate the surface of a drumhead when it is struck, in a three-dimensional cavity, if you know where the resonances are located, what you can do is back out of that where the assets are, if you know that trucks are in there, for example, or that people are walking around. It is kind of like a three-dimensional pipe organ. This just shows some unique characteristics that arise from the power spectrogram of that. On the upper right-hand side it is just showing that, believe it or not, there is a difference in the acoustic signatures of solid state and liquid fuelânot solid state, but solid propellant liquid fuel rockets. You can back out what the differences are, and you can identify whether or not somebody has shot off, not only what type of rocket, if it solid or liquid fueled, but also the unique rocket itself from that. So, there are other types of sensors, using sonicsâand I will talk a little bit more here when I talk about distributed networks. If you can have the ability to be able to geolocate your sensors in a very precise mannerâsay, by using differential GPSâthen what you can do is correlate the acoustic signatures that you get. You can, for example, geolocate the position of snipers. You can imagine, then, that those distributed sensors don't even have to be stationary, but they could also be moving, if you have a time-resolution that is high enough. What are the types of sensors that we are talking about? Well, radio-frequency sensors. For example, on the lower left-hand side, it shows a missile launcher that is erecting. Most of the examples I am using for global situational awareness are military in nature, but that is because of the audience that this was pitched at. What occurs in a physics sense, any time you have a detonation that happens in an engine, you have a very low temperature plasma that is created. Any time you have a plasma, plasmas are not perfect. That is, they are not ideal NHD plasmids. You have charge separation, which means that you have radio-frequency emissions. You are able to pick that up. In fact, the missions are dependent upon the cavity that they are created in. So, there is a unique signature that you can tag not only to each class of vehicle, but also the vehicle itself that you can pick out. Up on the right-hand side, it shows the same type of phenomenology that is being used to detect high explosives when they go off. I am not talking about mega-ton high explosives. I am talking about the pound class, 5- to 10-pound classes of explosives. Again, it creates a very low temperature plasma. An RF field is generated, and you can not only detect that RF field, but also, what you can do is, you can geo-located those. These are, again, examples of sensors that can be added to this global sensor array. We have two examples here of spectral type data. On the right-hand side is data from a satellite known as multi-thermal imager. It is a national laboratory satellite, joint effort between us and Sandia National Laboratory. It uses 15 bands in the infrared. It is the first time that something like this has been up and has been calibrated. What you are looking at is the actual dust distribution the day after the World Trade Center went down. This is from the hot dust that had diffused over lower Manhattan. I can't tell you exactly what the calibration is on this, but it is extremely low.

GLOBAL SITUATIONAL AWARENESS 172 On the lower left-hand side, what you are looking at is an example of another sensor, which is a one photon counter. What this sensor does, it works in either a passive or an active mode. This is in an active mode where we have used a source to illuminate a dish. This is a satellite dish. It was actually illuminated from above from an altitude of about 1,000 meters. What we are looking at are the returns, the statistical returns from the photon system that came back. The technical community now has the ability to count photons one photon at a time and to do so in a time- resolved way with a resolution of less than 100 picoseconds. What that means is that we can now get not only a two- dimensional representation of what a view is, but also, with that time-resolution, you can build up a three-dimensional representation as well. What you are looking at, the reason you can get the pictures from the bottom side, it is through a technique called ballistic photons. That is, you know when the source was illuminated, and you can calculate, then, on the return of the photons the path of each of those individual photons. So, basically, what this is saying is that you can build up three-dimensional images now. You can, in a sense, look behind objectives. It is not always true, because you need a backlight for the reflection. Again, there is more to sensors than visual imagery. That is kind of the fun part of this, as far as the toys and being able to look at the different things we are collecting. The question then arises, how do we handle all this data. Finally, how do we go ahead and fuse it together. I would like to make the case ofâI talked about a paradigm earlier about one way to collect data or to build bigger pipes and to make bigger computers to try to run through with different kinds of algorithms to assess what is going on. Another way to do this is to let the power of the computer actually help us itself by fusing the sensor with the computer at the source. It is possible now, because of a technique that was developed about 10 years ago in field programmable gate arraysâthat is, being able to hardwire instructions into the registers itself, to achieve speeds that are anywhere from a hundred to a thousand times greater than what you can achieve using software, that is because you are actually working with the hardware instead of the software, to execute the code. Since these things are reprogrammableâthat is, they are reconfigurable computersâthen you can do this not only on the fly, but also, what this means is that you can make the sensors themselves part of the computation at the spot, and take away the need for such a high bandwidth for getting the data back to some kind of unique facility that can help process the information. Plus, what this gives you the ability to do is to be able to change these sensors on the fly. What I mean by this is, consider a technology such as the software radio. All you know is that radio basically is a receiver, and then there is a bunch of electronics on the radio to change the capacitance, the induction. All this does, the electronics, really, is to change the bandwidth of the signal, to sample different bits in the data stream, and that type of thing. It is now possible to go ahead and makeâbecause computers are fast enoughâand especially reconfigurable computersâto go ahead and make a reconfigurable computer that can do all the stuff that the wires and years ago the tubes and, now, the transistors do. What this means is that, if you have a sensor, say, like a synthetic aperture array, and you want to change the nature of the sensor because it is detecting thing in an RF, to say an infraredometer, you can do it on the fly. What this provides people the power with

GLOBAL SITUATIONAL AWARENESS 173 is thatâor if you have platforms that you are going to put out in the field, be they ground-based, sea-, air- or space- based, you don't have to figure out, 5 to 10 years ahead of time, what these sensors are going to be. That is, if it is going to be an RF sensor, then all that is important is the reception of this, and the bandwidth that you have for the reconfigurable computer. You can change the nature, the very nature, of the sensor on the fly. That is a long explanation of what this chart is, but what this shows is that, by putting the power of the computation next to the sensor, then what you do is greatly reduce the complexity of the problem and the data streams that you need. You are still going to need the ability to handle huge amounts of data, because remember, I was talking about 1014 different nodes. What this does is help solve that problem. I don't want to go too much longer on that, but you all know about the distributed arrays. I talked a little bit about the power of that earlier. Basically, having a non- centralized access to each of these, once you have the positions of these things nailed down in a wayâsay, by using GPS or, even better, differential GPSâthen they don't even have to be fixed, if you can keep track of them. What is nice about distributed networks is that every node on this should automatically know what every other node knows, because that information is transmitted throughout. So, it degrades very gracefully. What this also gives you the power to do is not only to take information in a distributed sense, but also, if you know the position of these sensors well enough, then you will have the ability to phase them together, and to be able to transmit. What this means is, if you have very low transmitters in here at each of these nodes, say, even at the watt level, by phasing them together, you get the beauty of phasing, if you can manage to pull this off. It is harder to do at the shorter wavelengths, but at the longer wavelengths, it is easier to do. Once you have all this data, how are you going to move it around in a fashion where, if it is intercepted, then you know that it is still secure? When using new technologies that are starting to arise, such as quantum key distribution, this is really possible. For example, two and only two keys are created, but the keys are only created when one of the wave functions is collapsed, of the two keys that exist. This is something that arises from the EPR paradoxâEinstein, Polinski, Rosen â and I would be happy to talk to anybody after this about it. It involves quantum mechanics, and it is a beautiful subject, but we don't have really too much time to get into it. Anyway, keys have been transmitted now 10 kilometers here in the United States, and the Brits have, through a collaboration, I think, with the Germans, have transmitted keys up to, I think, 26 kilometers through the air. We also have the ability to use different technologies to transmit the data that aren't laser technologies. Why laser technologies? Well, laser energy is very opaque to certain atmospheric conditions. We know that RF can transmit, especially where there are holes in the spectrum, through the atmosphere. So, to be able to tap into regions of the electromagnetic spectrum that have not been touched before, in the so-called terahertz regimeâthis is normally about 500 gigahertz up to about 10 terahertzâis possible now, with advances in technology. What I have shown here is something that was initially developed at SLAC. IT is called the clistrino, which is based on their clistron RF amplifier. The key thing here is that the electron beam is not a pencil beam but, rather, a sheet beam which spreads out the energy density. So, you don't have a lot of the

GLOBAL SITUATIONAL AWARENESS 174 problems that you used to have for the older-type tubes. So, we talked a little bit about handling and we talked about the sensors. Let me talk about the data mining. What is envisioned here is having some kind of what we call distributed data hypercube. That is, it is a multidimensional cube with all sorts of data on it. On one axis, you have probably heard in the news, this is Poindexter's push at DARPA, tapping everything into credit cards on one axis to airlines transactions on another axis, another axis being perhaps telephone intercepts, another axis being RF emissions, another axis perhaps visual information or human intelligence. Tapping into that, and being able to do so in a legal wayâbecause there are large legal implications in this, as well as things that are prohibited by statute, as it turns out, especially when you talk about intelligence databasesâto be able to render that using different types of algorithms and then be able to compute that and then feed that back in does two things. First of all, it is to give you a state of where you are at today and, second of all, it is to try to predict what is going to happen. I will give you a very short example of about five minutes here of something that is going on, that is taking disparate types of databases to try to do something like this. So, that is kind of the mining problem, and there are various ways to mine large types of data. Let me talk to you about two examples here where, again, you are not relying just on the algorithm itself, but you are relying on the computer to do this for you. This is an example of a program called GENIE that, in fact, was developed by a couple of the individuals that are here in this room. It is a genetic algorithm that basically uses an array of kernels to optimize the algorithm that you are going to render, and it does so in a sense where the algorithm evolves to find you the optimal algorithm. It evolves because, what you do is, you tell the algorithm or the onset, or tell the computer, what are the features, or some of the features, that you are looking for. On the left-hand side, this is an example ofâthis is an aerial overhead here of San Francisco Bay. What you want to do is look for golf courses on this. Let's say that we paint each of the known golf courses green and then we tell the algorithm, okay, go off and find those salient characteristics which define what a golf course is. Now, as it turns out, it is very tough to do because there is not a lot of difference between water and golf courses. A lot of golfers, I guess, think that is funny. The reason is because of the reflectivity, the edge of the surface, which has no straight lines in it. So, it is kind of tough. Especially when you are talking about something like global situational awareness, if you find a golf course, you have got to make absolutely sure that it is a golf course. You can imagine that this could be something else that you are going after. What you do is, you let the computer combine those aspects, especially if you have hyperspectral data. That could be information in the infrared that may show the reflectivity, for example, of chlorophyll. It could be information about the edges. The computer assembles this data, using again, this basis of kernels that you have, and comes up with and evolves a unique optimized algorithm to search for these things. Now, this is different from neural nets because you can actually go back through and you can, through a deconvolution process, find out what the algorithms are, and it does make sense, when you look at it. Basically, it is by using the power of computation to help you itself, what you do is, you are reducing the complexity of the problem. Now, if you have taken that, you can go to the next step, and you can accelerate that algorithm,

GLOBAL SITUATIONAL AWARENESS 175 not just to run on a computer, but if you hardwire it into a reconfigural computer, with that floating point gate array I told you about, what I will do is, I will show you an example of how you can do things on aâI don't know if I am going to be able to do this. Say that you have streaming data coming in as video arrays. What is occurring here is that we have asked it to locate the car that you will seeâthat is the one in the green on the right-hand sideâat rates that are approaching 30 frames per second. The point is that, by marrying different types of technology, we will be able to help out and help you determine to do things in a near-real-time manner. Other techniques for pulling very small or highâvery low, I should say, signal-to-noise data out. What I have shown here is pulling some data out by using a template on the lower left-hand side. On the right-hand side, it is looking at some spectrographic data, and being able to pull out some chemical species. So, these are all examples of data fusing techniques. Let me really wrap this up and leave time for some questions here. The whole goal of this is to be able to synthesize what we call a complete view of a situation from disparate databases. Just trying to pull things together to give people a wide range of ability, be it from the national scene to the person, it could be a law enforcement officer, who may only want to know things that are happening 30 or 40 feet around him. What I have done is, I have put double-headed arrows on there, to show that there should be a capability for those sensors to be tasked themselves, which kind of makes the people who run the sensors kind of scared. On the other hand, if you don't have that ability, then you don't have the ability to allow feedback into the system. There is an example of something going on like this, that is not nearly as complex, where some Forest Service data is being combined on wild fires, where they start, how they originate, combine that with climatology data, looking at soil, wetness, wind dataâwhat else, soil information. Then, combine that with Department of Justice data on known arsonists, where they started forest fires before and where they are located now. What this is attempting to do, with this very small database, is to combine these disparate databases to predictâ first of all, to give you the situation where we are now, and then perhaps to be able to predict if an arsonist were to strike at a particular place. So, this is a real, no kidding, national security problem, forest fires, because you can imagine âwell, we, at Los Alamos ourselves, had devastating fires two years ago, that nearly wiped out a national lab with that. So, by using small test problems like this, we will show not only the larger problems that will arise, but also hopefully that doing something like this is not merely a pipe dream. In conclusion, I think it has been determined that a need for a global situational awareness really exists. Again, this is a synthesis and an integration of space situational awareness, battlefield situational awareness, law enforcement situational awareness, to be able to be used in anti-terrorist activities. A lot of work needs to be done. This is not a one- lab or a one-university project. It is something that I think will really tap the S&T base of the nation. The key here is seamless integration. It is the data and it is not really the sensors, but it is integrating it in a way to be able to show it in a way that makes sense in a seamless sense. So, that is the talk. It is accelerated about 10 minutes faster than I normally give it. Might I answer any questions you might have? AUDIENCE: In that golf course example, you actually had trained data, spectral data from real golf courses, and trained the model on that and predicted other golf

GLOBAL SITUATIONAL AWARENESS 176 courses? MR. BEASON: Actually, what we did on that was, we had that picture and we locatedâwe knew where the golf courses were, and we painted those with whatever technique we used, and then let the computer itself use that as its training aid. So, we allowed it to pick outâwe didn't give it any information a priori as to what might be a course. We let it decide itself. You did see some errors. So, there is an effort to push down the number of errors involved. Also, we were able to findâfor example, we went back and looked at some forest fire data that had occurred around Los Alamos, and what we were able to find was that there were three instances where the Forest Service had started fires before and they had not told us about it, and we were able to pick those out. That really ticked us off, when we found that out. MS. KELLER-MC NULTY: I would like to point out that Nancy David and James Theiler over here are the GENIE genies, if some people have some questions about that. AUDIENCE: I was going to ask a question about this term, data fusion. Is that the same as data assimilation? MR. BEASON: I am not sure what you mean by data assimilation. I think I know. Fusion, that is all in a smart way, because you can't just bring things together. You have to know what the context is.

Next: Kevin Vixie Incorporating Invariants in Mahalanobis Distance-Based Classifiers: Applications to Face Recognition »

Statistical Analysis of Massive Data Streams: Proceedings of a Workshop (2004)

Chapter: TRANSCRIPT OF PRESENTATION

Welcome to OpenBook!

Get Email Updates