The team should consider active use of multiple sensor input. This would involve joint simultaneous training and classification with multiple sensor streams, not just fusing best outputs from classification on individual sensor streams. The team should consider testing on data sets that the learning algorithm was not exposed to. Although it is common to train on one part of the data and test on other parts of the data, it is not so common to train on one data set and test on an entirely new data set, which would add value to the work. If possible, the team should consider acquiring data from the field—perhaps utilizing video, LADAR, and radar footage from deployments in Iraq and Afghanistan. Future effort should address processing in real life for real-time decisions. There is a need to explore work being done by other organizations, such as the Air Force Research Laboratory, the Defense Advanced Research Projects Agency (DARPA), and industrial laboratories.
The purpose of this effort is to reason about what is in the environment—for example, whether a group of pixels is a car or a window—and what things in the environment are doing—for example, moving in a threatening way. If successful, then the system should be able to perform these tasks automatically, albeit with offline human supervision or input.
The presentation clearly captured the technical barriers that make this work challenging. Complex interpretation leads to computationally intractable (optimization) problems because perception is noisy; fully supervised training is unrealistic; and integration of non-sensory sources of information (e.g., domain knowledge) is difficult (though clearly important).
The presentation described, at a high level, the technical approach as a focus on objects and activities relevant to robots and soldiers—not general objects and activities—using contextual cues (e.g., external information and domain-specific information). The work aims to develop new learning and optimization techniques to make perception problems tractable. At present, the problems are intractable even when the focus is restricted to objects and activities relevant to robots and soldiers, and when context is built in.
The presentation provided a clear evaluation strategy for reviewing the state of the art and setting and achieving specific quantitative goals for the program. The goals include standardizing metrics and data sets (the data sets will be made public—a laudable goal) and producing publications.
The concept of semantic perception connects machine visual perception that is largely driven by physical objects in the visual scene with knowledge (i.e., domain knowledge, mission knowledge, and cultural information). This connection represents an advance because it allows context to drive expectations and reduce the space of possibilities produced by bottom-up visual recognition of edges and vertices.
The team has made clear progress in all areas. Although sensing was not addressed in the presentation, the following topics were addressed in some detail:
• Semantic understanding of static areas: terrain and object classification, which works very well over large data sets;
• Semantic understanding of dynamic areas: activity recognition, which appears to work reasonably well on small data sets and a restricted set of activities; and
• Distributed and collaborative perception: multiple robots, and robots and people, which is a work in progress.