The Panel on Mechanical Science and Engineering at the Army Research Laboratory (ARL) conducted its review of ARL’s vehicle intelligence (VI) programs—intelligence and control, machine-human interaction, and perception—at Aberdeen, Maryland, on July 18-20, 2017. This chapter provides an evaluation of that work.
Among the priorities that the intelligence and control effort supports are artificial intelligence and autonomy, robotics and autonomous systems, autonomous and intelligent ground systems to extend warfighter reach, collaborative and intelligent air systems with improved maneuverability, and autonomous and intelligent agents to achieve mission command and intelligence operations at all echelons.
Work done under intelligence and control addresses gaps in the area of learning in complex data, including artificial intelligence (AI) and machine learning (ML) with small samples, dirty data, and high clutter; AI and ML with highly heterogeneous data; and adversarial AI and ML in contested and deceptive environments.
Accomplishments and Advancements
The work presented to the review panel addressed issues in pattern recognition and segmentation in video streams, linking perception and cognition, development, and implementation to world models, and learning of terrain characteristics for improved robot navigation.
Overall, the efforts attempt to demonstrate (over a 5-7 year time frame) the operation of a heterogeneous team of largely autonomous robots, both on the ground and in air, providing a “security bubble” around a dismounted team of soldiers in military operations in urban terrain setting.
Based on that scenario, many of the studied technical tasks have immediate relevance. These address perception that adapts to the environment, controls that learn from perception how to move on that terrain, planning that considers threats, and architectures that integrate world knowledge with perception.
Other parts of the work are more basic. These address certain issues that are less likely to be tackled by academic researchers—issues such as development of world models and linking perception with cognition.
Novel neural nets or perception algorithms, for instance, may help in a wide variety of tasks if they prove successful. The cognitive architecture work is of much longer range. It is unlikely that it will be integrated in the 5-7 year time frame; nevertheless, it is important to properly address longer term objectives and anticipated warfighter battle scenarios. As such, it is important that the Army continue to pay attention to such issues.
The research group included several early-career individuals who earned their Ph.D. degrees during the last decade or are currently working toward the degree. In general, the researchers exhibited substantial familiarity with the literature, with current and future needs, and with relevant existing techniques. Furthermore, the researchers were aware of the importance of presenting their work in professional conferences and publishing it in refereed journals. Evidence was presented of collaboration with academic researchers of significant relevant experience.
Four projects presented to the panel demonstrated a multifaceted strong effort in the area of automated and joint human-robot path planning. These efforts are autonomous mobile information collection using value of information-enhanced belief approach, context-driven visual search in complex environments, air-ground team surveillance for complex three-dimensional (3D) environments, and unsupervised semantic scene labeling for streaming data. These four projects complement each other well. They offer an opportunity to develop and use joint performance benchmarks and to compare performance and complexity of different approaches and algorithms on similar benchmark scenarios.
Unsupervised Semantic Scene Labeling for Streaming Data
This work has made significant progress over the past 2 years—now fully unsupervised, with a rational basis for selecting the parameters, and good results on merging nonadjacent blobs that belong to the same class. The results exceed the state of the art. The method processes single images and streams of image data. The researcher is developing important unsupervised capabilities.
This is a relatively small effort, trying to address a relatively big problem. World models for robots have gone through phases for many decades. The four-dimensional (4D) RCS model was proposed some three decades ago, before current robot capabilities had been developed. It is a good idea to once again think through the needs of a robot world model, and which elements could be made explicit in a central database versus distributed or kept implicit. This work is still in its infancy, and it is hard to predict where it will have its greatest impact going forward.
Cognitive Robotics: Linking Perception and Cognition
The cognitive model used here, ACT-R, has a wealth of data connecting it to human cognition. That makes it a reasonable basis on which to build a cognitive robot. This is a many-year effort. Over the past 2 years, this effort has made good progress on showing the generality of the underlying framework for relevant robotics tasks in perception and reasoning. The next technical tasks proposed are importing graphical data structures and doing probabilistic reasoning. If those capabilities can be integrated, that will enable more advances. The overall cognitive vision is exciting and important.
Online Learning for Robust Navigation
This project tries to learn control parameters for a tracked skid-steer vehicle, where the parameters are a function of different surfaces as inferred by an inexpensive onboard vision system. The system demonstrates a learning approach that measurably improves the estimate of future paths. The researcher is using a (disturbance estimation) technique that has worked well for many application areas. The system and the approach certainly show promise; it is important to quantify its capability.
Autonomous Mobile Information Collection Using a Value of Information-Enriched Belief Approach
This project uses value of information (VoI) as a central organizing principle for robot planning. The VoI can be used to guide robot exploration (where it should go, where it should point its sensors). The same principle can be used to infer value judgements made by human operators. The basic framework seems to be sound, and can tie together work of several other projects. It will be important to find out what are the actual performance advantages and limitations once this is fully connected to real robot perception and mobility modules in a realistic environment and mission. The Partially Observable Markov Decision Process (POMDP)-based reward function (capturing mission-specific human knowledge) approach taken by this researcher has potential. POMDPs have been proven to be successful in many application arenas.
Deductive, Analogical, and Associative Reasoning in a Semantic Vector Space
This work is performing deductive and analogical reasoning in a 300-dimension vector space, where the location in that space is decided by data mining in a large corpus. Words are then connected to other words throughout that high-dimension space. This gives a statistical basis for semantic queries, such as looking for analogous concepts. The semantic vector space approach taken by this researcher has been found to be useful in other domains.
Air-Ground Robot Team Surveillance of Complex 3D Environments
This system combines human hints and automatic planning to provide time-constrained coverage plans for a mobile robot doing surveillance. This project is addressing a very interesting problem, and is just at its beginning. This system will give soldiers a forward air and ground surveillance capability. Moreover, it will permit the injection of human knowledge to enhance route planning. This could help address the mathematical intractability of the NP-hard 3D surveillance task. Work was evaluated using a cluttered 3D urban environment. This is an excellent piece of work offering great potential and opportunities.
Opportunities and Challenges
It appears that all efforts can benefit from referral to benchmarks and test scenarios, and that the development of common benchmarks and test problems would benefit the group (and researchers in parallel efforts). A technique for detection or classification is never going to be “universal” in the sense that the scene has characteristics that affect the definition of the problem and that the detection or estimation or classification objectives are problem-dependent. The lack of common benchmarks and test scenarios make it difficult to assess the usefulness of the presented algorithms and solutions, and prevents meaningful technical exchange between researchers. Moreover, such benchmarks would significantly assist the group in meeting precise short-term objectives and planning for longer term goals.
A related issue is scenario relevance. A number of the reviewed studies could benefit from real or simulated data that are designed intentionally to be close to the scenarios anticipated by the warfighter. Development of such realistic “battle” scenarios would highlight challenges that may not be adequately addressed in the general literature, thereby increasing the value of the work to customers.
Another related issue is the complexity of the studied images. Detection of a movement in a single dark silhouette of a dancer against a white background is a very different problem from detecting hiding snipers in a noisy gray-scale video stream of a large cluttered urban scene during combat. It may be useful
to seek a common understanding of complexity for a library of images and video clips that can then be used jointly to assess the performance of algorithms, techniques, and associated speed-complexity tradeoffs. Such characterization of the difficulty of the detection or classification or estimation problem in the context of the expected applications is important to assess progress of the research over time.
The problems attacked by the intelligence and control research group require that solutions address the following characteristics of the developed algorithms (as well as the algorithms to which they are compared)—complexity, scalability, robustness, and operation in noise. Most studies presented included some of these elements in the analysis, but their inclusion needs to be routine and systematic in all studies where they are applicable. An issue that the group could consider is quantifying complexity. Some problems will be solved as a consequence of technological (computing) progress (and it is therefore of interest to quantify the time frame). Other problems and algorithms are inherently of intractable complexity, and would therefore require either polynomial-time approximations or injection of side information (such as aid by human intelligence) in order to make the problem mathematically tractable.
The following additional opportunities can be pursued to help the group’s researchers, management, and ARL leadership. ARL needs to develop a more precise statement of group goals (long- and short-term) and customers. The group and its members can benefit by articulating what will be realistically achievable within 2, 5, and 10 years. Systematic benchmark studies can greatly assist with this type of technology projection and planning.
Whenever applicable, studied techniques and algorithms could be presented along with an estimate of their complexity and scalability. It needs to be clear if the studied techniques can benefit from ongoing progress in computing technology (e.g., when their complexity is described by low-order polynomials) or if they still require low-complexity approximations or side information in order to become practical.
Whenever applicable, studied techniques and algorithms could address robustness to both small-scale changes and drifts and to larger scale failures and structural changes. Operation of algorithms and techniques could include a report on their nominal performance as well as on their performance in the presence of noise and interference (including the use of and development of realistic models of such noise and interference).
The general tendency in robotics research is to require that physical implementation become part of the presentation of new techniques and algorithms. While this approach has sometimes been criticized as thwarting theoretical research, it is increasingly recognized for providing an often much needed “reality check.” ARL owns and maintains a number of robotic platforms and, to the extent practicable, these need to be used to test and demonstrate new approaches and new proposed methods. Moreover, testing on real robots is likely to reveal practical challenges for which new theory is needed. Work with physical platforms can also allow researchers to address fundamental hardware and performance limitations.
Discussion of techniques and algorithms could state fundamental limitations associated with the approach and methods being taken. This discussion can be very illuminating to the researchers and management, as well as future panels. Moreover, it can lead to substantive directions for future research.
ARL could consider pursuit of the following areas of research. Physical interactions with the environment other than planned manipulation—pushing, sliding, kicking, running into, and other forms of manipulation not using end effectors can often be useful. Robot models of human behavior—a robot needs to understand the mental model of its human teammates—may answer the following questions: What can the human see? How busy is the human? What is the human trying to do? While human modeling is difficult, it is of great importance to properly address that critical soldier-robot trust factor. The development of benchmark human-robot interaction/mutual-awareness models can be particularly useful. Interplay between robot motion and autonomy and the communications infrastructure—robots can communicate implicitly by their behaviors, or they can maneuver to enable chains of communications, including line-of-site communications that are harder to jam or intercept. A small fleet of robots can, for example, maneuver so as to maximize communications connectivity subject to interception and threat constraints.
Unsupervised Semantic Scene Labeling for Streaming Data
This research could benefit by addressing the following: How and what will agglomerative clustering methods be integrated into? How can supervision and expert knowledge be systematically incorporated? What benchmarks could be used to guide future developments? What is the plan for transitioning to the Tank Automotive Research, Development, and Engineering Center (TARDEC)?
Additional questions on this research that when answered will lead to more progress on this important area might include the following: How can the work be generalized to segment objects that do not have a homogeneous appearance? It does a good job of handing objects with variance, but an object with disjoint classes (e.g., a Dalmatian dog, with black spots on a white background) will not be properly segmented as a distinct object—could the work be extended to pick one or more of the feature descriptors to ignore? What is an example use for this technique? If one of the potential uses, for example, is segmenting a dirt road from the surroundings, then that application suggests some performance criteria (speed, accuracy, reliability) and a demonstration scenario.
A world model will be very useful for future collaboration activities. The relationship of this work to future CRA projects and to other architecture projects needs to be clarified. Collaboration with others is needed.
Cognitive Robotics: Linking Perception and Cognition
The researcher and team could benefit from more precise near-term goals and a 5-year benchmark. It would be helpful to have a roadmap to show what additional work needs to be accomplished in this task before it becomes useful for a real robot demonstration scenario.
Online Learning for Robust Navigation
It will be important to do the following going forward: Implement a state-of-the-art control system as a benchmark, to see if the learning system really provides a performance advantage. Perform tests on a wider variety of surfaces to measure the effectiveness of the perception system on system performance. Examine how the disturbance estimation approach taken performs vis-à-vis an approach employing a higher fidelity slip model. Given its importance, other ground modeling issues could also be systematically addressed.
Autonomous Mobile Information Collection Using a Value of Information-Enriched Belief Approach
Injection of human knowledge can significantly increase the mathematical tractability of motion planning problems. As such, this research could investigate how this can systematically be done as mission is progressing (constraints permitting).
Intelligent Mobility (Minitar and RoboSimian)
It appears that the hardware is supported by useful simulation models and lower order control-relevant models. This could be carefully quantified by showing hardware data alongside supporting model-based simulation data. It would be good to see these platforms being more fully exploited by team members.
Deductive, Analogical, and Associative Reasoning in a Semantic Vector Space
The biggest question on this work is who else is doing related projects. There is certainly related work that is kept proprietary within Google and other companies; there is almost certainly relevant work within the federal government. It will be important to stay in touch, as much as possible, with relevant work in those and other communities. This research could more fully utilize relevant semantic vector space literature in order to achieve the desired “Watson-like” reasoning. The research could examine what types of questions are mathematically tractable (polynomial time) and which are not (nonpolynomial time). The insertion of real-time expert knowledge can potentially (and significantly) help with the latter.
Context-Driven Visual Search in Complex Environments
This project will develop “focus of attention” based on mission constraints, temporal constraints, semantic constraints, and 3D depth cues. It has direct applicability for robot sensor pointing and processor allocation in performing real robotics missions.
It is straightforward to see how individual constraints will get integrated. It is more interesting to see how constraints will evolve over time: once the perception system detects a window, for example, it is likely that other windows will be nearby and aligned; how will that sort of information be incorporated into the sensor aiming strategy?
The use of a pan-tilt-zoom camera for context driven search is very important. The researcher could show how the pan-tilt-zoom camera reinforcement learning approach can be used to incorporate battle-relevant temporal and semantic mission constraints.
Air-Ground Robot Team Surveillance of Complex 3D Environments
Many other extensions to the research are possible, such as multiple hints coming in from the human, perhaps asynchronously during mission execution requiring real-time replanning; multiple robots doing the exploration; mixtures of air and ground vehicles; and surveillance of moving targets.
This research needs to carefully examine time-accuracy-geometry-human-intervention trade-offs and plan a transition to TARDEC and other customers.
Parsimonious Online Learning with Kernels
This is a very basic machine learning and function approximation technique. It appears to do near-optimal clustering with online learning to create an efficient and sparse data representation. It is not clear how sensitive the parameters are, or how much tuning needs to be done for different applications. Future work could demonstrate the advantages of this method in a concrete and practical example. The research uses a kernel approach for online sequential sparse classification. The research could systematically compare this approach with other approaches and also examine battle-relevant applications.
Optimized Output Codes for Deep Constrained Neural Nets
The work proposes a new encoding of neural network outputs that shows superior performance in terms of similar outputs to similar inputs and in terms of resistance to perturbation. The examples shown are of a relatively small case, learning recognition of hand-written digits after having been trained on hand-written numbers. It would be important to show similar advantages on larger and more diverse data sets. This research examined how output codes can be optimized to assist with classification and lifelong learning issues—for example, reduction of catastrophic forgetting. It examined logistic regressive pairs, spectral hashing, and latent codes (zero shot learning). Two data sets were used. This research could show how this optimized code approach can be used within a realistic intelligence and control group battle-relevant benchmark.
The Army is positioning itself for a future in which humans and machines will closely collaborate to accomplish mission goals in dynamic, unpredictable environments. To this end, the machine-human interaction (MHI) effort focuses on developing basic research that will allow a machine to effectively and safely team with humans. The specific challenges being addressed include creating models of shared cognition in order to improve team performance, using multimodal sensors and technologies to promote effective communication, predicating a robot’s behavior on social and cultural information, and verifying the safety of new, artificial intelligence-driven technology in simulation prior to deployment.
ARL makes a distinction between machine-human interaction and human-machine interaction (HMI). MHI focuses on the development of algorithms that will allow a machine to more effectively communicate and act as a teammate to humans. HMI, on the other hand, examines the ways that people interact with robots. These closely related subfields of human-robot interaction (HRI) use different perspectives to look at the central HRI problem: How do and should people and robots interact? The work understandably combines simulation and research on real robots. Simulation experiments offer a means for rapid prototyping and inexpensive experimental validation. Research on real robots, although more challenging and expensive, is critical for verification of preliminary simulation experiments.
Transparency with respect to machine or robot behavior is an important underlying goal of ARL’s MHI campaign. Projects within the campaign attempt to produce not only accurate machine behavior, but behavior that will be understandable to a human operator and result in greater team performance, trust, and lower human workload.
Accomplishments and Advancements
ARL has a variety of MHI projects crosscutting a number of different human-robot interaction (HRI) problems. The projects that the panel examined are each at different stages of development and maturity. As a whole, the projects demonstrate both opportunities and challenges for the efforts in this area.
Wingman Software Integration Laboratory
The Wingman Software Integration Laboratory project integrates autonomous control and targeting of an unmanned high-mobility multipurpose wheeled vehicle weapon system with a manned vehicle, resulting in a human-machine combat team. The project’s initial focus has been on the creation of an integrated simulation and testing environment that utilizes data from test locations in Michigan to generate realistic simulation and training situations for soldiers. The simulation environment successfully integrates the Unity game engine for targeting and the Anvil game engine for autonomous vehicle control
using standard methods for communicating between game engines. This integrated system acts as a realistic software-in-the-loop simulation test-bed for the purpose of rapid prototyping in real-world vehicles and realistic experimentation of human-machine decision making and behavior.
Although only 6 months into the project, it is evident that this project is progressing rapidly and has already generated a number of significant accomplishments. The project has produced and demonstrated an initial version of the simulation environment for testing. These accomplishments are partially due to the partnerships between the research team and teams responsible for fielding autonomous components. Potential operational environments and concepts as discussed and developed at weekly meetings with a number of Department of Defense (DOD) partners including the U.S. Army TARDEC, the U.S. Naval Surface Warfare Center Dahlgren Division, the U.S. Army Armament Research Development and Engineering Center (ARDEC), the U.S. Army Maneuver Center of Excellence, the U.S. Army Test and Evaluation Command, and DCS Corporation. This large number of collaborators is evidence of significant Army support for the project. The project has resulted in the creation of several technical reports; while there are no publications to date, it is not surprising, given the short period of this project.
Data collection that is planned includes a simulation event at ARL to assess a warfighter machine interface for the roles in January-February 2018 and use of the Wingman System Integration Laboratory for training and human subjects’ data collection during warfighter experimentation in June-August 2018. The results from these experiments will be critical for determining the long-term applicability and impact of this research.
Leveraging Mutual Information to Enable Human in the Loop
This project explores the development of a type of sensor fusion application that combines physiology-based human neural classifiers generated from electroencephalogram (EEG) data with computer vision-based classifiers for object detection to create classifier ensembles that accurately identify objects in an image. The project examines a rapid serial visual presentation task in which a human subject is briefly presented with an image and must identify target items in the image. A significant portion of this research uses mutual information to evaluate the relevance and redundancy of the generated classifiers in relation to the target. Ideally, a set of highly relevant, minimally redundant classifiers will be identified maximizing performance. Results to date have been a mix of theoretical and empirical studies. These results indicate the strong potential of this approach to identify relevant classifiers but are less able to identify redundant classifiers.
Toward Natural Dialogue with Robots: BOT Language
This project examines natural bidirectional dialogue for human-machine teaming and collaboration. The project focuses on a search-and-navigation task in which a human commander uses dialogue to direct a robot during a search task. To date, the project has focused on the collection of large corpus of data that will be used to automate portions of the overall system. One problem identified is that a large percentage of the data has not been collected from realistic users of the system, such as soldiers and army operators. The directions generated by real users may differ significantly from the directions created by university students or other naïve populations.
Assessing Vulnerabilities in Autonomy
This project attempts to assess the vulnerability of an autonomous vehicle convoy to attack by opposing forces. Although the project is well motivated and important, it lacks a well-defined research problem, realistic scenario, articulated metrics of success, and scientific approach. The project’s primary
results to date are the recognition that a trade-off exists between system safety and vulnerability and the recognition that subsystems of the overall system will also play a role in overall system vulnerability.
Challenges and Opportunities
ARL has a unique opportunity to conduct soldier-centric MHI studies. Conducting soldier-based studies would ensure both that the research conducted is impactful for the end user and that the work is grounded with respect to real-world applications. It has been noted, however, that there are significant challenges associated with the use of soldiers as human subjects. Among these are the fact that soldiers are an Internal Review Board protected population, that soldiers are no longer located at the ARL Aberdeen location, and that soldiers may not have incentive to participate in such studies. On the other hand, it is also noted that ARL often hosts West Point cadets who could also serve as an important human subject population. Overall, ARL is better positioned than other research centers to perform soldier-centered human subject studies, and the verification of these technologies could be accomplished using soldiers. It will be important to include soldiers, to the extent possible, from the beginning of the project, in order to ensure that the work proceeds in a direction of value for its end customer.
It is also critical that ARL MHI research focuses on realistic army missions as study scenarios. While these missions need not be so realistic as to warrant being classified, notional realism is nevertheless critical for ensuring the development of technologies of value to the Army and the DOD.
Given ARL’s expertise and resources, an opportunity exists to investigate and solve problems that, in some cases, are more complex than the problems tackled by academic researchers. Academic researchers may be drawn to collaborate with ARL on these realistic problems. One challenge to developing and maximizing the potential of these collaborations is the ability of ARL researchers to share code and data with extramural partners. ARL has had some success creating an Open Campus Initiative allowing extramural researchers to more easily work with ARL researchers. But the success of Open Campus Initiative is uneven, with, for example, easier access for researchers at Adelphi than at Aberdeen. Further, not having an open network makes code and data sharing difficult.
Also, it is important that projects have a metric of success from the outset. These metrics will help researchers remain focused on the project’s research question, even in the case of Blue Sky projects.
Leveraging Mutual Information to Enable Humans in the Loop
This work would strongly benefit by using, to the extent possible, soldiers and intelligence analysts as human subjects. Subject matter experts (SMEs) and realistic operators may use unique heuristics resulting in significantly different performance than naïve subjects. This would likely impact the types of classifiers identified as relevant. Moreover, using images from realistic scenarios could also influence performance and the types of classifiers identified as relevant and redundant.
Toward Natural Dialogue with Robots: BOT Language
The project employs a “Wizard of Oz”-style setup to create dialogue data and intends to move away from real environments toward simulation environments in order to speed up the data collection process. While the movement from simulation is understandable, it was noted that the experimenters will need to keep the subjects convinced that they are controlling a real robot. Providing computer-generated images of a simulated robot and environment may alert the subjects to fact that they are not really controlling a robot.
Assessing Vulnerabilities in Autonomy
This project needs a well-defined and scoped research question with significant customer commitment. One possible approach might be to look at case studies of convoy attacks and identify common sets of vulnerabilities from these attacks. The project also needs precise metrics and measures of success. It is not currently clear if the project is attempting to generate as many vulnerabilities as possible, consider unique vulnerabilities resulting from autonomy, or some mix of these two. Last, the project needs to increase communication and buy-in from the customer. Weekly conference calls focused on specific, incremental project goals, rather vulnerability brainstorming, might keep the project focused.
ARL seeks to be a premier research organization whose discoveries and innovations successfully transition to the field and support the Army’s long-term strategy of land power dominance. In pursuit of this vision, the perception group has committed to a long-term program emphasizing the key campaign initiative—Force Projection and Augmentation through Intelligent Vehicles.
The perception group’s activities have focused around their fiscal year (FY) 2020 goal: “Semantic labeling of an increasingly larger vocabulary of objects and behaviors to permit a richer, more detailed description of the environment.” Additional activities emphasize the practical aspects associated with ensuring correct spatial interpretation of sensory signals, so that the environmental descriptions are spatially accurate. For this review, evaluation of the group’s output is assessed according to the stated FY 2020 goal, to its potential impact to the field of perception, and to its long-term link to the key campaign initiative, as well as the current campaign—Sciences for Maneuver.
Accomplishments and Advancements
The perception group’s research activities are tightly focused on advancing theoretical and practical aspects of learning and estimation tasks of importance to Army capabilities and domains. Areas of study include object detection and learning, action learning, environmental learning, perception on robot platforms, and online parameter estimation for calibration of robot sensor suites. Importantly, all of the projects support progress toward the stated FY 2020 goal.
Overall, the presentations and demonstrated projects represent solid advances in their fields. The majority of the work indicates awareness of current research activities in the field of computer vision. The group’s research has stayed at the cutting edge, embracing new successful methodologies in computer vision and advancing existing approaches through intelligent analytical insights into the problem at hand.
The group has successfully leveraged past external collaborations to strengthen its intellectual capacity, consequently strengthening these ties. The group demonstrates a commitment to maintaining high visibility of the work through dissemination of research in conferences and, to a lesser degree, journals.
The perception research can be broadly categorized into several key topical areas, which are summarized and discussed in the following paragraphs.
The recently initiated APPLE project focuses on techniques for object learning and refinement using fused color and depth data. To date, this project has generated useful reports on the state of the art, good progress on technology selection, and a roadmap for data collection. In the future, this project will integrate multiple technologies for shape and appearance capture, model creation and refinement, and
representation. It is necessary to carefully choose data and modeling domains that are maximally relevant to Army needs and scenarios. A related (and also new) project in object learning seeks to assess the impact of embodiment on agent-assisted training of humans to provide useful object views to an image-based model learner. As these projects mature, there will be an opportunity for cross-pollination of ideas.
One project and demonstration addressed the task of object detection—specifically, techniques for improvement of deep learning-based detection. A novel region-of-interest proposal mechanism provides “side information” that can be used to augment the set of examples used during training. Use of this additional information demonstrates some improvement in performance on an academic computer vision benchmark, and the demonstration included a close-to-real-time performance level. This work is interesting, but demonstration and performance characterization on data sets with close relevance to Army operations would be useful to assess the potential impact of the work.
Two projects address learning and representation of actions. One of these projects contributes a notable advance over the state of the art—namely, an unsupervised method for learning action attributes from data and segmenting video sequences into action primitives, which serve as a compact signature for the activity. A topically related project integrates textual features during training to improve the performance of a deep learning-based activity recognizer. There is an opportunity for these two projects to enhance one another.
ARL staff demonstrated an impressive integrated sensor suite for environmental sensing. This platform includes multiple depth and red, green, blue (RGB) sensors, easily and jointly calibrated using a fiducial. Data from this suite and lessons learned from integration could inform sensor platform configurations for future mapping robot designs. Also in this category is a project that integrates terrain and map data with hyperspectral measurements to perform improved classification of water and nonwater regions. Although narrowly scoped at present, affordable near-infrared (near-IR) hyperspectral imaging may add a useful new tool to sensing suites used in support of land operations.
Online Calibration of Proprioceptive and Exteroceptive Sensors
Known positioning of onboard sensors is essential to the correct spatial interpretation of data for modeling and estimation purposes. Having correctly calibrated sensors guarantees proper description of detected object and recognized activities within a world frame. ARL staff presented two projects covering calibration of onboard sensors, proprioceptive calibration using recursive filtering and exteroceptive calibration using a graph optimization. In both cases, the approaches are grounded on strong theoretical principles and numerical methods. Furthermore, the research activities are focused on problems that have contemporary value and are still insufficiently investigated by the perception community. The work is mature and demonstrates strong potential to transition to use within ARL’s robotic platforms, as well as to lead to new algorithms for maintaining correct sensor calibration during operations.
Perception on Robot Platforms
Several robotic demonstrations incorporating perception elements were provided and were generally impressive examples of technology integration. In addition to the multisensor platform and the object detection demonstrations mentioned earlier, the RoMan, Minitar, and RoboSimian platforms will offer highly useful test-beds for joint sensing and manipulation task embedding.
Challenges and Opportunities
Perception research has gone from being model based to data driven on the strength of statistical machine learning and deep learning algorithms. At the moment, industry needs to dominate the target application domains of researchers and the data being collected. In the case of deep learning, ever more massive data sets are created by the collective efforts of the research community, sometimes with funding from industry. This situation is both a challenge and an opportunity. It is a challenge because Army needs are not necessarily serviced by these data sets. It is an opportunity because the generation of an appealing and challenging data set with long-term Army significance could easily be picked up by the research community through proper targeting. In doing so, ARL might even benefit from community contributions to the data set, as well as from the creativity of the research community with regard to ARL-relevant applications.
In a related vein, deep learning architectures are diverse and their engineering has supplanted the traditional feature engineering strategy of the previous decade. ARL’s ability to attract new, knowledgeable talent will determine whether it keeps current with the trends in perception. Due to the rising popularity of machine learning, and deep learning in particular, ARL is in a good position to hire more experts in this area by exploiting existing academic relationships and cultivating new ones. The perception group has the opportunity to take advantage of the pipeline of experts graduating in machine learning, for both robotics and perception. Achieving critical mass in this area would help promote their stated goals.
The mid-term (FY 2026) goal embodies difficult challenges that need significant research effort, including inference, dealing with context, and extracting relationships between objects. The goal is stated as follows: “Creation of the ability to infer purpose from the relationships between objects in the environment and behaviors (activity) exhibited by people (teammates, adversaries, and noncombatants) and place objects and behaviors into context.” The perception research community has not yet fully embraced activities that would support this end-goal. It is not clear to what degree model-based methods and data-driven methods will be needed, or combined, in achieving it. ARL has the opportunity to define canonical problems in this arena, as well as to curate unclassified data sets and scenarios that could both help push the state-of-the-art in this area of perception and be of utility to ARL mission scenarios.
In general, research presentations and posters were professional, logical, content-rich, and useful. Clear growth in knowledge content by ARL researchers and support staff was demonstrated. Significant advances in the use of analytical and simulation tools were observed. The collaborative interactions—for example, the Collaborative Technology Alliances (CTAs) and Collaborative Research Alliances (CRAs)—continue to be productive. The Panel noted the various director-level responses to previous Panel recommendations. These positive responses are also reflected in the continuous improvement in Campaign research performance.
Several research programs were observed to be outstanding. Three such research programs stand out—research on low-ranked representation learning of action attributes (flexibility and extensibility) in focusing on human action attributes; research on autonomous mobile information collection using a value
of information-enriched belief approach (projected functional stochastic gradient-based approach with teams of robots); and research and simulation work on Wingman Software Integration Laboratory, which has a clear path to Army-relevant static and dynamic scenarios and multiple-machine and multiple-human interactions.
The overall technical quality of the intelligence and control effort is good, and has shown continual improvement—particularly since the 2015-2016 assessment by the ARLTAB. The group has benefited from the hiring of highly skilled postdoctoral researchers, some of whom are being groomed to become full-time ARL employees. Publication in peer-reviewed journals and participation at professional conferences has continued to grow, coupled with increasing participation in other professional activities. Collaborations with peer communities and reputable academic groups appear to be healthy, and provide the researchers with invaluable networking opportunities and options to leverage quality research elsewhere. The investment in quality R&D, especially in areas less likely to be pursued by academia, has increased the potential for impact. The connections between the individual research projects and the Collaborative Technology Alliance (CTA) and Collaborative Research Alliance (CRA) programs are very useful and are highly commended. While it would be a mistake to expect all basic research to be tied into the CTAs, the CRAs provide rich sources of data and research problems, and ready platforms for integration and testing in a research-friendly environment. The CTAs and CRAs may naturally serve as a starting point for the benchmarks mentioned earlier.
The research generated by the MHI group is generally of high quality, and is focused on important MHI areas and largely comparable to university-led research. In particular, the posters and presentations typically contained acceptable technical content, experimental methods, presentation of data, and statistical analysis of results. The research reflected a broad understanding of the science and references to related work, indicating knowledge of research conducted elsewhere. The qualifications of the research teams were well matched to the research problems and employed acceptable and often state-of-the-art equipment and models. The research typically utilized an appropriate mix of theory and experimentation to arrive at well-reasoned conclusions. The Wingman Software Integration Laboratory was identified as a promising project potentially resulting in outstanding data and knowledge that could ultimately be transitioned to the field. The project is focused on an important topic, necessary for the deployment and implementation of human-machine teams with automated targeting. ARL has a strong set of well-qualified MHI researchers addressing important, Army-related problems. These researchers have a unique opportunity to generate mission-critical data from a population of specifically trained human subjects. Doing so would increase the impact and applicability of the research while also helping the researchers better understand the needs of the population they serve.
Overall, perception research is addressing cutting-edge problems, with meaningful and relevant results. The group demonstrated an appropriate mix of theory, computation, and experimentation. The group’s publication list and strategy spans the gamut from respected, application-based conference venues to well-regarded academic conferences and publications. There is an opportunity for the group to extend and enhance key projects to yield publications in the field’s very best journals with some regularity. When considering the collective portfolio of researchers at individual, leading universities or laboratories, the work achieved by ARL is comparable in scope and outcome. Together, the perception group’s projects reflect an understanding of relevant state of the art, while demonstrating a commitment to pursuing key open questions of Army relevance. It is clear that ARL has attracted well-qualified research staff and provided them with excellent facilities for conducting cutting-edge research in perception. Several of the projects were particularly well presented and showed strong promise to transition to Army use. One such project is the online gyro calibration algorithm, while another is the embodied training project. They demonstrated solid understanding of the tactical ARL end point while bringing together the proper theory or practice, as needed.
Several opportunities are identified for even greater advancement in the Campaign research productivity. These include need to increase level of effort in several 6.1 and 6.2 research projects and internal collaborations; need to increase mentoring of junior research staff; need to increase use of Army (soldier) field experiences and scenarios, robots, and more relevant data sets in all Campaign research; need to address systematically the complexity, scalability, robustness, uncertainty, and operations in noise and interference, that is, boundary operations; need to establish metrics and benchmarks within a push-pull research context; need to increase ARL presence and participation in journal publications and conference papers so as to define the problem set; and need to increase strategic, collaborative engagements with industry including via CTAs and CRAs.
According to the U.S. Army Operating Concept,1 human-machine interaction and teaming will be an important near and intermediate focus of research for the army. The projects that were reviewed appear well positioned to provide mission-critical data and technologies toward developing enhanced human-machine teams. For the most part, these projects are progressing well and generating comparable technical quality to academic research. This finding is supported by the fact that the four projects reviewed have collectively published eight papers at peer-reviewed conferences or workshops. The venues for these publications are the same venues as academic researchers.
To date, the MHI research reviewed underutilizes military personnel as human subjects. This finding is supported by the fact that the only ARLTAB reviewed project that has used military personnel is the Toward Natural Dialogue with Robots: BOT Language project and this project will continue to recruit a mix of military personnel and civilians as human subjects in the future, as available. MHI project researchers could, to the extent possible, use soldiers, cadets, and realistic army operators.
Currently, the MHI research reviewed underutilized realistic mission scenarios with military relevance. This finding is supported by the fact that three of the four projects reviewed did not employ a realistic mission scenario and relied on somewhat contrived notional missions. MHI project researchers could use realistic relevant mission scenarios whenever possible.
ARL is conducting high-quality perception research that addresses important issues toward the near-term (FY 2020) goal of “semantic labeling of an increasingly larger vocabulary of objects and behaviors to permit a richer, more detailed description of the environment.” ARL has built up a strong group of researchers in perception, with appropriate resources and facilities to conduct important basic research. ARL has done a good job of disseminating their research through top-notch conferences, and has formed strong collaborations with external partners.
Given that much of the ARL perception research is currently being evaluated on commercial or public domain data sets with limited obvious relevance to Army missions, an open question is whether the ARL research will perform similarly when it is ultimately transitioned to Army applications. It is well known that the performance of machine learning algorithms varies when different data sets are used. ARL may discover that its research is not addressing the right problem or the right solution for Army-specific applications. It is thus critically important for ARL perception research to be validated using Army-relevant data. While recognizing the challenges of generating such data, performance evaluation of the developed approaches to perception for Army-relevant missions is not possible without doing so, and without validating again said data.
Recommendation: ARL should develop a research emphasis on the generation of Army-relevant data suitable for advancing data-driven perception methods and evaluating research in perception against mission-relevant outcomes.
1The U.S. Army Operating Concept (AOC): Win in a Complex World—2020-2040, TRADOC Pamphlet 525-3-1, U.S. Army Training and Doctrine Command, Fort Eustis, Va., October 31, 2014, http://www.tradoc.army.mil/tpubs/pams/tp525-3-1.pdf.
Many, but not all, of the active research projects are clearly motivated and informed by an Army-relevant use case. While motivation arising from general improvements in the state of the art and problems motivated by academic projects can be useful, a closer linkage to Army priorities and the technical roadmaps articulated by the group will help bind research and researchers more closely to organizational goals. Further, the research can be conducted with the ultimate sensor or platform constraints in mind, keeping in mind that vision-based research could eventually be embedded on mobile platforms in the field. Such constraints might influence the choice of approaches pursued in perception research. Such a linkage will help inform the project regarding platform constraints for deployment.
Recommendation: ARL should closely match all current projects and all new starts to a service-relevant goal in the organizational roadmap and employ platform/deployment constraints as research planning parameters. ARL should consult existing robotics roadmaps and organizational priorities, and develop a story line showing how existing and new efforts feed together to develop the desired future capabilities.
While the ARL vehicle intelligence research is of high quality, most of this research is at the single-investigator level. Even projects that have closely related objectives or approaches are conducted without strong connections between them. Furthermore, it is not clear to what degree existing functional robot demonstrations capitalize on prior ARL research. Identifying synergistic ties between related projects (such as the use of common data sets, platforms, frameworks, and benchmarks) can speed innovative progress and increase potential impact. This could be achieved by identifying important Army-relevant use cases that inform the research projects, and then comparing and contrasting related ARL research in that context. The research would still be fundamental and basic, but informed by the specific Army application, as well as the advances made in other relevant areas of ARL research.
Recommendation: ARL should explore incentives and goals for increased internal collaboration across closely related projects, forming tighter connections among algorithms, sensor suites, robot platforms, and Army-relevant use cases. ARL reports and studies should describe how the work fits into the overall ARL mission (with specific customers targeted); what major testbeds will be exploited; how younger scientists are working with more senior scientists; and what expertise will be developed in house versus what will be imported from industry, academia, and other laboratories.
In addition, ARL could accelerate progress in related research areas by developing a strategy for centralizing internal expertise on nontrivial tools and techniques, such as methods for deep learning. Centralized expertise would enable projects and experts to synergistically benefit each other, and shorten the learning curve for investigators who are using techniques closely related to other projects.
Recommendation: ARL should centralize internal expertise on nontrivial tools and techniques, to shorten the learning curve and accelerate progress on related projects in perception.
While the ARL research in perception is being disseminated in conference publications, the ARL record is more limited for journal publications. In general, however, the quality of the ARL work is sufficient for publication in top-quality journals. Such publications would increase the visibility of this work, as well as the potential impact of ARL research on the state of the field.
Recommendation: ARL should increase the dissemination of its perception research through top-quality journal publications.
While some benchmarks can emanate from the academic or scientific community, some need to definitely come from the military research community. The team could use test-beds and benchmarks to
serve as unifying umbrellas for the Sciences for Maneuver Campaign. A 5-year benchmark mission (possibly virtual or involving, for example, Building 570 with multiple ground and air robots) would help focus ARL’s efforts and the efforts of individual researchers. The benchmarking effort is expected to enhance essential and foundational modeling, simulation, and theoretical components.
Recommendation: ARL should identify several benchmarks that can be used to assess methods being presented and facilitate comparisons of ARL efforts as well as other state-of-the-art methods. These benchmarks should be ordered in terms of complexity of addressed scenarios (e.g., images and video clips) in order to systematically assess definitive and appreciable progress over time and from benchmark to benchmark. ARL should use these benchmarks to facilitate and measure the associated incremental or successive progress.