Proceedings of a Workshop
Key Challenges for Effective Testing and Evaluation Across Department of Defense Ranges
Proceedings of a Workshop—in Brief
To protect itself from attacks by foreign forces, the United States relies upon its armed services, which in turn rely upon weapons and other systems to provide them with the tools they need to successfully neutralize adversaries’ combat capabilities. Maintaining the armed services’ warfighting advantage requires a steady stream of new and improved weapons and technologies. A crucial step for acquiring and using these assets is testing their effectiveness and suitability on Department of Defense (DoD) ranges. The DoD has testing ranges that span the globe, where new military technologies are tested based on real threats, tasks, and environments to ensure their combat readiness. These ranges are a vital aspect of the nation’s defense, but will they be able to adequately test the increasingly complex military technologies of the future?
Against this backdrop, the DoD’s Office of the Director, Operational Test and Evaluation (DOT&E), requested that the Board on Army Research and Development (BOARD) of the National Academies of Sciences, Engineering, and Medicine perform a study assessing the physical and technical suitability of DoD test and evaluation (T&E) ranges and infrastructure. As part of that task, the study committee convened a 2-day workshop on January 28–29, 2021, to gather information on the challenges facing the nation’s military ranges. The workshop brought together experts from the military, industry, and government, who discussed the current status of T&E on military ranges and what will be required to ensure their effectiveness in coming decades. This Proceedings of a Workshop—in Brief summarizes the presentations and discussions that took place at the workshop, with an emphasis on those aspects focused specifically on the challenges facing the nation’s military ranges, as opposed to possible ways of addressing those challenges—a topic that will be covered in the committee’s final report, expected in summer 2021.
SETTING THE STAGE
Rapid technological change is fundamentally reshaping our world, with digital technologies among the most revolutionary. These changes have driven improvements in weapons and other military capabilities as well as the development of entirely new types of weapons. As Raymond O’Toole, the Acting Director of Operational Test and Evaluation (OT&E) at DoD, said in his opening remarks, DoD has a new wave of defense technologies that includes hypersonics, artificial intelligence (AI), autonomous systems, directed energy weapons, and space systems. O’Toole added that our adversaries are also developing novel and powerful weapons. Furthermore, the very nature of military conflicts is evolving, as AI and autonomous systems make possible new capabilities, and the speed at which actions take place and decisions are made is increasing
rapidly. The nation’s armed services are moving toward a more networked, mobile, intelligent, and autonomous integrated network of kill chains.
Meanwhile, the need to test weapons and other military systems has not lessened. O’Toole noted that because of the systems’ increasing complexity and sophistication, testing needs have grown. However, the network of military ranges used for these tests has not kept pace with technological changes. As O’Toole explained, much of the nation’s complex of testing ranges was built for World War II and updated during the Cold War, but the Cold War ended 30 years ago, and the ranges have not kept pace with technology development. The ranges need to adapt to test hypersonic technologies, artificial intelligence, autonomous systems, directed energy weapons, and space systems. He noted that DOT&E is deeply concerned that the T&E community is not adequately positioned to execute its mission over the next 10 to 15 years.
At the same time, the nation’s ranges face a variety of nonmilitary pressures. O’Toole explained that the ranges cannot expand, because of surrounding development or ecological concerns. In addition, physical encroachment by growing demands of electromagnetic spectrum, pressure for land and water use, and concerns about endangered species and other conservation issues place the currently available size and space of these ranges perpetually at risk. There are also increasing security constraints on what sorts of tests can be carried out on the ranges, such as the omnipresent satellites that make secrecy increasingly difficult.
Despite these pressures, it is vitally important that the nation’s military forces have the appropriate infrastructure for realistic operational testing. O’Toole said that one of his key priorities is making sure that they have the capabilities and the infrastructure that support thorough testing and evaluation of systems that are currently in the acquisition pipeline and that are expected to be in the acquisition pipeline over the next 10 to 15 years.
CHALLENGE: DEVELOPING NEW TESTING CAPABILITIES
Perhaps the clearest challenge—and the one mentioned by the largest number of speakers—is the need to develop the capability to test emerging technologies. For example, Devin Cate, the Director of Test and Evaluation for the U.S. Air Force, listed hypersonics, directed energy, autonomous systems, artificial intelligence, and cyber as emerging technologies that will require enhanced T&E infrastructure, but he noted that this is only a partial list.
The areas identified by speakers as requiring new OT&E capabilities are summarized below.
Increasing the Pace of Testing and Evaluation. A number of presenters commented that it is not sufficient only to develop new testing capabilities; it will also be important to increase the speed of the T&E process itself.
Joshua Marcuse, the head of strategy and innovation in the global public sector at Google, recalled a conversation he had with Robert Behler, the former DoD Director of OT&E. Improving software development is important, Behler told him, but the major limiting factor is how quickly software can be adapted for testing. “You could update the F-35 plane as fast as an iPhone app with the push of a button,” Behler told him, “and you wouldn’t actually be any faster relative to your adversary because you would still need to wait a year for me to test it.”
Ed Greer, president of Greer Consulting and former former Deputy Assistant Secretary of Defense for Developmental Test and Evaluation, noted that another challenge is keeping military systems as up to date as possible. Greer said that it takes an average of 3 to 5 years from the time that intelligence is collected on threats and the time that those threats are instantiated into testing. Our adversaries can build new radar systems much faster than our intel centers can build models, Greer said, and he concluded that it is vital to shorten the time it takes to produce new testing threats.
Conrad Grant, the chief engineer of the Johns Hopkins University Applied Physics Laboratory, offered another specific area in which speed is vital—the testing of “pop-up operational evolutions.” To illustrate, he described Operation Burnt Frost, a 2008 operation that shot down a non-functioning U.S. satellite because of various dangers that it posed. The decision was made to use
a Standard Missile 3 to shoot it down—something the missile was never intended to do—and there were only 6 weeks between making the decision and firing the missile. The operation required sensors, instrumentation, modeling and simulation, and exquisite weapons system knowledge, all employed in a different way than usual, as well as the establishment of connections among ranges, various research centers, laboratories, industry, the military, and various government agencies in order to carry out the collaboration and perform the necessary analyses. Grant predicted that this sort of pop-up operation will become more common in the future as the United States is faced with asymmetric threats requiring rapid action, which will pose a serious OT&E challenge.
AI-Enabled Systems. Several speakers mentioned the challenges that ranges face in testing systems that rely on artificial intelligence (AI). For example, Cate noted that AI and autonomous systems pose unique testing challenges because, as learning systems, they necessarily change and evolve throughout testing, making it difficult to characterize their performance in a repeatable manner. It will be necessary, he said, for the testing enterprise to tie in closely with developers so that the AI-enabled and autonomous systems are designed from the start with tests in mind.
Jane Pinelis, the Chief of Testing, Evaluation, and Assessment at DoD’s Joint Artificial Intelligence Center (JAIC), said that the military’s testing and evaluation capabilities have not been keeping pace with the speed of AI technology development. Given that AI-enabled systems can have some very high consequences, she said, it is crucial that DoD “push the test and evaluation for AI-enabled systems to where it needs to be with respect to science, data, knowledge, skills, workforce, and infrastructure.” There are a variety of challenges to doing this, she said. One is simply setting design performance goals because of the difficulty in tying accuracy constraints and other characteristics of AI-enabled systems to specific operational or mission outcomes and needs. Other testing challenges arise when humans team with AI-enabled systems. There is no well-established framework for testing such teams, but for complex problem sets it is critical to test the human–machine team in an integrated fashion, rather than separately. Yet another complication of testing AI-enabled systems is the potential appearance of emergent behavior—actions that appear because of the interaction of a system’s components that could not be predicted by examining the behavior of the components individually. Pinelis stated methods for defining, diagnosing, and understanding emergent behavior are needed, as well as human training so that the operator can identify emerging behavior as it occurs and respond to undesirable behavior.
Marc Bernstein, the chief scientist under the Assistant Secretary of the Air Force for Acquisition, Technology, and Logistics, said that a particular issue will be what should be considered “passing” when one is testing an AI-driven system. Speaking specifically of the Advanced Battle Management System (ABMS) under development by the Air Force, he said that the complex environments necessary to test the ABMS effectively will not yield a single answer to any particular battlefield situation, but rather a set of options with various advantages and disadvantages and issues for each. Bernstein raised the following challenge: How do you set up your operational testing and evaluation in such an ambiguous, gray environment? A further complication is that AI-enabled systems may create an option that human testers had not even thought of, and that might be the best option. Bernstein added that it is hard to judge that ahead of time.
Grant made a related point about future weapons systems on autonomous vehicles where AI may control the weapons themselves. How, he asked, can these weapons and vehicles be controlled on the ranges in such a way that range safety is maintained?
Space Warfighting Systems. Perhaps the most obvious and most critical gap in the DoD’s T&E capabilities, O’Toole said, is the lack of a national space testing and training range. He stated they have no operationally realistic way to test space-based systems or to train the U.S. Space Force guardians. Space is unquestionably a warfighting domain now, and so the T&E community will need to develop techniques and facilities for testing and evaluating space-related systems to give warfighters and decision makers confidence in those capabilities.
Col. Eric Felt, Director of the Air Force Research Laboratory Space Vehicles Directorate, expanded on some of the challenges involved in space warfighting system T&E. To begin with, he said,
many threats will need to be digitally modeled because they cannot be carried out in real-life scenarios. A nuclear weapon exploding in the atmosphere is an obvious example, he said, but other threats, such as anti-satellite weapons and laser weapons, must also be tested via simulations, so the threat models and the synthetic environments in which they operate must be anchored in reality if the results are to be trusted. This sort of challenge is not new, Felt noted. Those testing systems in surface, air, and underwater domains have developed synthetic test environments to address these problems, which is exactly what Felt identified is needed but not yet available for the space environment.
Software. Marcuse said that a fundamental challenge facing DoD is that, despite the digital revolution of the past several decades, testing remains optimized for hardware. The implications of that revolution have not permeated DoD’s rules, processes, institutions, or personnel. Military T&E needs to focus more on the digital elements of systems. Similarly, O’Toole listed “dramatically increasing and improving the test and evaluation of software-intensive systems” as one of DoD’s priorities in OT&E. Arun Seraphin, a professional staff member of the Senate Armed Services Committee, added that they have a real concern over the department’s ability to test software, referring to both the technical skills of the testing workforce and the available range infrastructure for testing software. The T&E areas that Seraphin listed as challenges included software for embedded systems, software for emerging AI systems, software for command-and-control systems, and software for business systems.
Cyber Warfare. John Garstka, Director of Cyber in the DoD Chief Information Security Office for Acquisition and Sustainment, spoke about one of the chief challenges related to cyber warfare—understanding the degree to which critical infrastructure underlying defense systems is defensible from a cyberattack. DoD systems rely on a variety of infrastructure, including DoD information technology and networks, DoD critical infrastructure, and commercial critical infrastructure such as water supplies and electrical power distribution, any of which could be the target of a cyberattack. Garstka noted the need to figure out how to either operate on the existing system or create a cyber range, either a test or a training range with the right scope, scale, and diversity. He explained that you need a cyber twin of the object that you want to understand, which can enable a variety of simulations to examine whether or not you can protect it or defend it. And you have to be able to perform some tasks that typically you do not care about—for example, if you have a weapons system and potentially an attack vector is a maintenance laptop, you have to be able to test your maintenance processes in a cyber-contested environment.
Electronic Warfare and Electronic Protection. The testing of electronic protection is a related challenge. As David Tremper, the Director of Electronic Warfare (EW) at the Office of the Secretary of Defense explained, electronic protection is the ability of a spectrum-using system, such as radar or radio, to continue using the spectrum without interference in a contested environment. Due to a lack of funds, the developers of such systems often do not perform sufficient electronic testing, Tremper said, but a bigger issue is the lack of ranges at which such testing can be done. Tremper claimed that they do not have an environment to take those systems into to see how survivable they are in a truly contested and congested environment. Even the EW ranges where such things as jamming and protection systems are tested do not suffice because these ranges tend to be at remote locations with pristine spectrum environments, not the types of environment where radars and other spectrum-using systems would actually operate. Currently, there is no testing environment for spectrum-using systems that adequately mirror real-world scenarios.
CHALLENGE: THE DEVELOPMENT OF REALISTIC THREATS
Closely related to the development of new testing capabilities is the development of realistic threats to test weapons and systems. Although threat replication is not a particular focus of the
current committee’s work—it will be examined in the subsequent phase of this study—several presenters did touch on this issue.
Grant remarked that today’s range targets are generally threat-representative in some flight profiles but that threats are constantly evolving and it is difficult for the target providers to keep up with that evolution. Targets have become a very limited resource, he added, so it is not possible to do as much testing as desired with many weapons systems.
A number of speakers said that many of the current threat scenarios tested are dated and need to be upgraded. For example, John Pearson, the operational test director for the Joint Strike Fighter Operational Test team, spoke about some of the challenges related to threats for the F-35. In 2014–2015, he said, DOT&E worked to get funding for radar signal emulators onto the Nevada Test and Training Range and the Point Mugu sea test range. The resulting radar signal emulators, which are still used at those ranges today, replicate threats that are about 10 years old. They improved the emulator replications to a more relevant threat environment on those ranges, he said, which does provide the F-35 with a challenging environment. Still, the country’s adversaries are steadily producing new threats with new capabilities, and we have to catch up to and then stay ahead of those threats.
Greer said that there is a need to upgrade our threats available to open-air ranges, to ground-based simulations, and to modeling and simulation laboratories. The current threats on our open-air ranges are mostly old, and he noted that they have a huge challenge in that area. In particular, he listed several specific threat environments that are important to keep current: surface-to-air missile models, threat aircraft models, and threat weapon models. Furthermore, he said, the current open-air ranges cannot handle the threat density necessary to truly challenge today’s weapons systems and provide them with a suitable test. He said that they are capable of handling only multiple tens of complex threats today, but that they really need to be able to handle multiple dozens, and one could easily argue that the number is closer to multiple hundreds of complex threats if we are going to be able to maximize the characterization of weapons systems that have already been built today.
A similar issue is true for electronic warfare testing and evaluation, Tremper said. Most tests are run against old threat systems, and they do not have the software-defined agile threat systems that would allow testing against more representative threats.
Bernstein added that the challenge of updating threat scenarios involves more than keeping up to date with the relevant technologies; it will also involve staying current on threat command and control and on the adversary’s concept of war. Adversary tactics respond to the systems deployed against them, so it is crucial to understand how these adversaries will build and use their command-and-control systems. Furthermore, he added, AI must be part of the equation. If we are going to use AI to improve our ability to go fast and smart, our adversaries are surely going to be doing that too. It will be AI against AI, and T&E must model those threats.
Last, Grant said, it is vital to have operational fidelity of threat representations, including targeting, flight profiles and evasive maneuvers, electronic warfare characteristics, and communications. In this vein, he noted, it will eventually be important to employ large numbers of coordinated, multidomain threats designed to overwhelm defenses, which will be yet another challenge.
CHALLENGE: LIMITED SPACE AND ENCROACHMENT
Carroll “Rick” Quade, the Director of Test and Evaluation for the U.S. Navy, was among several presenters who spoke about the challenges with securing sufficient space to carry out all OT&E. A number of emerging military technologies, such as hypersonic weapons, require large amounts of space that is relatively isolated—often more space than is available in today’s ranges—and as weapons continue to become longer range and more complex, the challenge will only grow. Furthermore, as the need for space grows, there are various forces encroaching on military ranges and limiting the amount of space—physical and otherwise—available for testing. Thus, a major challenge will be to carry out the increasingly complex and sophisticated tests required for forthcoming systems in these increasingly limited ranges.
Pearson illustrated this challenge by sharing how creating realistic threats for the F-35 and other aircraft requires more space than is available within the current boundaries of the ranges being used for these tests, such as Naval Air Weapons Station China Lake or Point Mugu.
Grant made a similar point. Noting that the Pacific Range is regularly used to test missile defense, long-range strike weapons, and strategic deterrence systems, he said that it is difficult to run other sorts of tests that require more space, such as those involving large-scale forces that need to test (and train) as they fight. Many ranges do not have enough space to replicate the maneuvers that these forces would undertake in warfighting. Various sorts of weapons require large areas for testing, Grant said, including boost-glide hypersonic vehicles and ballistic missile defense systems.
Michael White, the Principal Director for Hypersonics in the Office of the Under Secretary of Defense for Research and Engineering, elaborated on testing issues for hypersonic systems. The main reason that hypersonic weapons challenge military ranges, he said, is that they move so fast and so far. Compared with cruise missiles, which can fly between Mach 5 and Mach 6, hypersonic vehicles fly as fast as Mach 20, and they fly for hundreds—or, in the case of intermediate-range systems, even thousands—of miles. Furthermore, they fly at an altitude that is higher than cruise missiles but much lower than ballistic missiles, which is close enough to the earth that ground-based radars lose them over the horizon relatively quickly. A hypersonic vehicle under test may fly several thousand miles with a radar horizon that is only a few hundred miles, making it necessary to string together a series of test ranges—what White referred to as a “string of pearls”—to be able to gather the necessary data from the vehicle throughout its flight. This strategy poses a number of challenges for integrating the ranges that must remain in touch with the hypersonic vehicle and with each other throughout the vehicle’s trajectory.
A variety of encroachment issues place additional strain on testing ranges. Some encroachment is geographic, such as residential and commercial growth in areas near ranges. Another issue, Quade noted, is the presence of endangered species in the waters or land areas of military ranges. Electromagnetic encroachment offers a different sort of challenge. Many testing scenarios require a certain level of quiet in particular bands of the electromagnetic spectrum, and such quiet is becoming increasingly challenging as commercial and consumer technologies fill the nation’s airwaves.
Tremper offered a specific example of such electromagnetic encroachment. According to a report released in October 2020 by the Radio Technical Commission for Aeronautics, there is a spectrum interference problem between 5G transmitters and radar altimeters on commercial and military aircraft. There are two electromagnetic bands reserved for 5G communications, and the frequencies used by radar altimeters fall between those bands. Although the radar altimeter frequencies are theoretically distinct from the 5G bands, there are complex interactions between the frequencies that cause problems for the altimeters.
As Cate noted in his presentation, it is critical for operational tests to provide representative test environments. Test ranges must provide faithful representations not only of the relevant operational environment but also of the system under test, other systems with which the system under test interoperates, and threat systems. Integrating all of the relevant components into a realistic test environment poses numerous challenges.
One class of challenges relates to combining ranges when, for geographical or other reasons, a single range is insufficient to provide a realistic test environment. For example, Bernstein argued that the scale of future conflicts will be such that there is no way a single range could simulate them. In some cases, two or three ranges combined may suffice for a test, but even that will sometimes not be enough. Given how global communication systems will be used to monitor and command sensors and weapons across thousands of miles around the globe and given that many capabilities will be in space, Bernstein said, it will be necessary to combine many, if not all, of the nation’s test ranges into a very complex “range of ranges.” This in turn will raise a number of challenges, because combining the ranges in this way will likely require that they be controlled
remotely and coordinated from a central location. White said that in the case of hypersonic testing, many of the challenges are logistical and related to coordinating a number of ranges, often including ranges of allies such as Australia.
A second class of challenges arises from the need to bring together multiple entities—different systems, different domains, and different services—into a single test. As a number of presenters noted, while this strategy is the most effective way to test systems, it is difficult to execute.
Col. Jason Eckberg, the Deputy Director of Electromagnetic Spectrum Dominance Deputy Directorate (A5L) at the U.S. Air Force, said that there must be a shift from one-versus-one tests in which the singular focus is vehicle survivability toward tests with multiple components that assess overall force effectiveness. Such a shift will be challenging for a number of reasons, he said, mentioning specifically that it will run counter to much of the military’s acquisition system, which bases incentives on discrete individual survivability as opposed to contributions to the larger force as a whole.
Speaking specifically of electronic warfare (EW), Tremper said that EW testing has been historically conceptualized in one-on-one terms: “I’ve got a threat, I’ve got an EW system, can I detect it, can I jam it, how well does that work?” The test ranges were designed for one-on-one tests, but there is a growing realization that this is not sufficient, because real-world warfighting is not one-versus-one but many-versus-many. Warfighting involves complex interactions and long-range operations, Tremper said. Both the threats and the spectrum change so quickly, he explained, that we need to have an environment within which we can model or accurately represent the agility of that environment so that we can effectively test our systems. This is critical to ensure that EW systems are truly ready for an operational environment.
Bernstein described how ABMS is being developed based on the premise that overcoming certain adversaries will require that the United States bring all of its warfighting capabilities across the services, agencies, and intelligence communities together in a unified fashion. Individual capabilities, no matter how advanced, are insufficient against a technologically capable modern adversary, he said. Bernstein noted that military digital technologies will become most effective when they are highly interconnected through a “military Internet of Things.” ABMS will apply the same sorts of digital Internet capabilities used in the consumer world—including cellular and satellite networks, computing centers, and data storage in the cloud—to tie military systems together in a seamless fashion.
Grant echoed Bernstein’s point, contending that it will be important to carry out multidomain, multitheater force-level OT&E including space operations, missile and air defense, surface warfare, and undersea warfare. Grant raised the issue of characterizing interconnected systems from an operational consideration. Although ranges do well at weapon-on-weapon tests, multiwarfare testing stresses their capabilities. There are range safety considerations based on the size of the range, its location, and what is happening commercially around it. Another challenge involved in carrying out complex, multiwarfare force OT&E, Grant said, will be determining meaningful performance parameters. How can the operational capability and utility of a force be assessed across multiple theaters?
James Cooke, the U.S. Army Director of Test and Evaluation, offered a cautionary note about integrated testing. The Army tried performing multiple operational tests at the same time at the White Sands Missile Range and observed frustrations from product managers about performance problems with components outside their program, suggesting the need for early integration prior to testing.
CHALLENGE: MODELING AND SIMULATION
Modeling and simulation will become increasingly important for military testing in the coming years for two basic reasons. First, with the rapid increases in computing power and improvements in software sophistication, the capabilities of models are growing rapidly. Second, a variety of factors are pushing the OT&E enterprise away from direct testing and toward modeling. For example, security concerns associated with open-air testing, such as observation by adversaries, are driving
the need for some tests to be replaced by modeling. Running simulations is generally less expensive than running tests with expensive pieces of equipment, and as military weapons and systems become increasingly complex, modeling and simulation are increasingly necessary to guide the design of testing on ranges.
Several speakers made the point that the interplay between simulation and live range testing is crucial to OT&E. Quade shared that neither the testing ranges nor the laboratories alone have the full range of capabilities needed for combining simulation and live testing. In the future, he continued, there needs to be a hybrid approach where the open-air ranges are collecting the data to validate the simulations conducted in the laboratories. White referred to live–virtual integration as a crucial part of testing and evaluation. The data gathered from testing can inform models, the modeling needs guide data collection strategies, and integrating these two aspects of testing effectively can accelerate learning. Offering hypersonics as an example, White said that flight tests are expensive and stress the ranges, so it will never be possible to flight test and demonstrate the full envelope for the operation of hypersonic weapons. What we are really trying to do in our developmental and operational flight test, he said, is demonstrate key points in the performance envelope and then be able to verify our models and expand our models to evaluate the full envelope.
Presenters described a number of ways in which models will be important over the coming decades, including the following:
- Creation of cutting-edge threat environments. Specifically, the Joint Simulation Environment is intended to provide more realistic and denser threat environments than can be created on physical test ranges (Greer).
- Representation of threats that are too dangerous to be reproduced in the real world, such as nuclear weapons and certain anti-satellite and laser threats (Felt).
- Use of virtual systems informed by data from real-world tests to carry out testing that would not otherwise be possible. For example, when budget constraints limited the number of test fires that could be performed with a surface-to-air missile developed recently by the Navy, data from those few test fires were used in a high-fidelity model of the missile, and more than 900,000 runs were carried out to demonstrate the efficacy of that weapon system. Without such virtual testing, it would not be affordable to do the necessary realistic testing in the ranges (Grant).
- Use of models in operational testing. Resources like the Virtual Warfare Center in St. Louis are necessary to develop and refine joint tactics. These can then be brought to the ranges, where they are tested in the real world (Grant).
- Modeling and simulation of holistic unit actions. In the Army, it is important that testing of new equipment be done at a unit level—with a company (150 soldiers), battalion (1,000 soldiers), or even a brigade (4,500 soldiers). New technology is valuable to the Army only to the extent that it makes these units more effective, so operational testing requires first training everyone in a unit on how to use the new system and then carrying out tests with the entire units, which can take months. Various constraints at bases make it difficult to carry out this training and testing at the soldiers’ home bases. Modeling may hold the answer (Cooke).
The increased use of modeling and simulation will pose various challenges, beginning with the basic issue of building models that accurately predict the behavior of the systems they are modeling. Specific challenges identified by various speakers included
- The need to validate models (Fiore, Grant).
- The building and validation of threat models, which will depend on the collection of intelligence (Quade, Pearson, Felt).
- Balancing different test approaches—modeling versus live—in different situations and integrating live and virtual testing (Grant, White).
One workshop presenter offered a sobering warning about the difficulty of modeling autonomous systems. Missy Cummings, a professor in the Department of Electrical and Computer Engineering at Duke University, described the issues that arise in testing autonomous systems that rely on some sort of computer vision. “If a convolutional neural net [typically used to analyze images] is in your system at all, it means that you cannot rely on simulation,” she said. “Simulation can maybe help you do some baby testing early in the phases of autonomous systems, but it simply cannot represent the uncertainty of the real world.”
CHALLENGE: MEASUREMENT AND DATA ISSUES
The burgeoning power of digital technologies is opening many opportunities in T&E, but these technologies present many challenges as well. Many of those challenges arise from the basic fact that, with increasing digitalization, the tests generate large volumes of data. As Executive Test Director of the Army Test and Evaluation Command James Amato observed, the amount of data that has to be pushed around and that has to be pushed between ranges has grown exponentially. Unfortunately, he continued, they do not have the [data] infrastructure, technology, and solutions in place today to do that at scale, and at the speeds that will be required.
Grant listed a number of the sorts of data and measurement challenges that arise in large-scale tests, such as those carried out across multiple ranges. He said that they need instrumentation, telemetry, data collection, data handling, and data analysis that will work at the scale of these large ranges. This is made difficult, he continued, because of the desired volume of the data that we are trying to collect from the system under test and the desire to make that data available for analysis very quickly. Typically, data collected from range tests on one day are analyzed in order to decide what tests to run the next day, which means it is necessary to move large amounts of data to analysis centers each day.
Google’s Marcuse said that if the ranges are to effectively handle the necessary data from testing, planning must start early in the design phase. In observing DoD, he found that too often program officers build systems without a data strategy, so they fail to collect much of the data that should exist to inform operational testing. Thinking about the data requirements for a digital engineering approach to this has to begin at the beginning, Marcuse said, and not be a requirement that comes in at the end, when the system is handed over the wall to the group that will test it and then they realize what is missing. He said this will require a new mind-set on the part of the program officers.
Another challenge, Marcuse added, will be assembling the necessary digital resources to handle the data-intensive and computation-intensive aspects of OT&E. For example, performing modeling and simulation properly requires a lot of computing capacity, generally more than the DoD has. Some military ranges seem to have barely entered the digital age at all, Marcuse observed, describing an encounter at an Army testing facility where people were complaining because they had a difficult time keeping track of all the paper copies of the testing results that they needed to get from the range that they were supposed to inspect.
Speakers described a number of other challenges related to measurement and data. These included
- Moving beyond fixed receivers and mobile platforms to using autonomous vehicles such as wave runners, aircraft, and satellites to collect data. This will be crucial for handling data from large-scale, multidomain tests (Grant).
- Collecting the necessary data from tests carried out in integrated environments with multiple ranges tied together (Pearson).
- The lack of an efficient data infrastructure. There is a sense that the ranges are generating
large volumes of test data but not managing that data well because there is no infrastructure to best make use of all the data the ranges are generating (Seraphin).
- The lack of common data standards. Manually figuring out how to tie data from different ranges together decreases efficiency (Amato).
Workshop participants identified a number of security-related issues, many of which fell into the following categories:
- Security related to the integration of testing. Working and sharing data across multiple ranges or multiple services raises a variety of security issues (Quade). The lack of a “robust common IT infrastructure that can support multilevel security and the switching of classification levels quickly” makes it difficult to share data securely among various entities (Greer). Although the security barriers between the different ranges and services are partly technological in nature, much of the challenge is actually rooted in existing policies (Eckberg).
- Security in open ranges. Security concerns may preclude using open-area ranges (Quade). Space testing faces the same issue: there is no place to test in space where adversaries cannot see you (Felt).
Regarding cybersecurity, George Rumford, the principal deputy director of the Test Resource Management Center, warned that we need to make sure that we have a very cyber-resilient test infrastructure so that America’s adversaries cannot take advantage of any vulnerabilities that they can find with defense infrastructure. Cybersecurity is not a yes-or-no question, he added, but instead a matter of how much risk people are willing to accept in carrying out a particular task.
Although the issue of ensuring a suitable workforce for DoD ranges was not included in the committee’s statement of task or in the issues suggested for the workshop, several presenters broached that topic, arguing that there is a vital need to improve the workforce at the military ranges, particularly in the area of digital skills.
Google’s Marcuse, who had identified digital shortcomings as a major challenge for OT&E, emphasized the importance of a workforce that is digitally skilled. Improving the OT&E system and even boosting its funding will have little effect without improvements to the workforce. He stated that with less money and better-qualified people, you would have better outcomes than with less-qualified people and twice the budget. A particular problem, he said, is that there seem to be very few data scientists in developmental and operational testing and evaluation. With the growing volume of data generated from these tests, it is increasingly important to have a workforce capable of managing and harnessing the large amounts of data that are available.
Cate made a similar point. To succeed in a future in which digital engineering, agile software development, and emerging technologies such as AI, autonomous systems, and hypersonic weapons are more prevalent, it will be important to train the T&E workforce in new disciplines. It will become more important, for instance, for software testers to have coding experience, as they will be involved in the agile development and testing. Testers will also need to be comfortable with using digital engineering tools and be able to understand the systems they test.
CHALLENGE: FINANCIAL ISSUES
A number of speakers observed that finding money to support the necessary improvements to OT&E will be a major challenge. Cade shared that the services and the Office of the Secretary of
Defense have identified billions of dollars in investment backlogs that must be funded in order to meet today’s needs in OT&E. Moving forward, he added, taking on emerging technologies and the digital engineering revolution will require an additional major financial investment. Of course, he concluded, that will be a significant challenge in the current fiscal environment.
As an example of these issues, O’Toole pointed to the test ranges planned for the Space Force, for which $42 million has been requested for fiscal year 2021, with a total spending of $105 million anticipated through 2025. Yet DoD plans to spend $100 billion on space assets in coming years, and, O’Toole said, a rule of thumb in the test and evaluation world is that 1 percent of planned acquisition spending should be allocated to test and evaluation. That would mean that the department should invest $1 billion for space-based test capabilities and infrastructure, and so far the plan falls considerably short of that. He concluded his remarks by expressing concern that we will not be properly prepared for operationally realistic space test and evaluation.
A major problem, Quade said, is that in the budget process, making investments in ranges and test capabilities is often not a priority, and that with the budget expected to shrink in the future, it will be even more difficult for OT&E to compete with other priorities. Greer observed that any additional funding on testing and evaluation will most likely have to be accompanied with cuts elsewhere. Funders, he predicted, will entertain new investments in the future, but they are probably going to ask for some kind of efficiencies to be identified. Investing without identifying funding offsets will most likely not be supported by the Pentagon leadership.
In closing, committee chair Keoki Jackson summed up the workshop, saying, “We heard we’re on the cusp of a need for a fundamental reinvention of the range capabilities to support a fundamental reinvention of tests.” The world is changing, so how is the DoD going to reinvent tests for this kind of world? That will be the question the committee must address going forward as it prepares the consensus report that will emerge from the committee’s deliberations.