Testing of Software-Intensive Systems
Early in the panel's work, it became clear that software is a critical path through which systems achieve their performance objectives. We therefore recognized the need for special attention to software-intensive systems and better understanding of how operational testing is conducted on software-intensive systems across the military services. It has been reported that in the last three years, over 90 percent of Army initial operational test and evaluation slipped because software was not ready (U.S. Army Materiel Systems Analysis Activity, 1995). Since the 1970s, software problems discovered during operational testing have adversely affected the cost, schedule, and performance of major defense acquisition systems. In some cases, significant performance shortfalls have been identified after systems have been produced and put into operational use. Findings show that software-intensive systems generally do not meet user requirements because the systems are certified as ready for operational testing before their software is fully mature.
Several barriers have been identified that limit effective software test and evaluation. One such barrier is that DoD has not acknowledged or addressed the criticality of software to systems' operational requirements early enough in the acquisition process. There is a perception that software is secondary to hardware and can be fixed later. Other barriers to effective test and evaluation of software include the following: (1) DoD has not developed, implemented, or standardized decision-making tools and processes for measuring or projecting weapon system cost, schedule, and performance risks; (2) DoD has not developed testing and evaluation policy that provides consistent guidance regarding software maturity; and (3) DoD has not adequately defined and managed software requirements. Although DoD has carefully studied what needs to be done to develop and test quality software and to field software-intensive systems, it has not effectively implemented long-standing recommendations. On the other hand, despite the lack of a DoD-wide coordinated strategy, the individual military services have made attempts to improve their software development processes (U.S. General Accounting Office, 1993).
Given the above concerns, the panel formed a working group to focus on defense systems that are either software products or systems with significant software content. The group's goal is to prescribe statistical methods that will support decisions on the operational effectiveness and suitability of soft-
ware-intensive defense systems. The notion is that these methods will identify unfit systems relatively quickly and inexpensively through iterative use of techniques that promote progress toward passing operational test. A checklist associated with the methods would also be helpful to developers in knowing when their systems are ready for operational testing.
The remainder of this chapter addresses the potential role for statistical methods in operational testing of software-intensive systems and describes the panel's activities to date and planned future work in this area.
ROLE FOR STATISTICAL METHODS
The panel sees a strong role for the use of statistical methods in the test and evaluation of software-intensive systems. Recognizing the fact that not every scenario can be tested, we have formulated the following set of questions in order to understand current practices for operational testing of software-intensive systems and areas where statistical methods might be applied:
How does one characterize the population of scenarios to test and the environments of use?
How does one select scenarios to test from the population?
How does one know when to stop testing? What are the stopping criteria?
How does one generalize from the information gained during testing to the population of scenarios not tested?
How does one plan for optimal use of test resources and adjust the test plan as the testing unfolds?
These questions are quite similar to those asked in understanding experimental design issues. However, sample sizes are typically much larger for software systems than for hardware systems, and therefore the answers to these questions will likely lead to different procedures.
The panel believes that its greatest potential for significant contribution in this area will be achieved by concentrating on complex future systems, since there is potential for greater impact in targeting systems that have not yet passed through various developmental or operational test phases. The current paradigm appears to be bottom-up, with software emerging from a bewildering variety of methods, tools, and cultures, and each operational testing project having to struggle to find the money, time, and methods to test and evaluate the software to the extent necessary to put it into field use. Currently, operational testing of software-intensive systems is compromised because its methods are allowed to be driven by software development practices. The record of the software development community does not warrant adoption of their methods for operational testing. It is in the nature of software that it can be made needlessly complex beyond any threshold of evaluation and testability.
However, the panel sees taking place a shift to a new, top-down paradigm driven by the concept of “intended use” as articulated in the definition of operational testing. To implement this idea, it would be necessary to prescribe certain criteria that, if met, would support a decision that the software (or system containing the software) is fit for field use. These criteria might include experiments, tests, and other means of evaluation. The criteria, including costs, schedules, and methods, would have to be prescribed in technical detail, in turn becoming requirements and constraints on the design and development process. While such constraints will not limit the potential of software, they may induce change in development methods, schedules, and budgets, making them more effective and realistic for developing systems that will pass operational testing. Software can be designed so that it will satisfy well-defined criteria for evaluation and testing.
In an effort to gain a full understanding of test and evaluation of software-intensive systems, the
panel sought analogies between the methods and problems encountered in DoD testing of such systems and those encountered in testing by other federal agencies and industry. Process was an apparent common theme among all the analogies examined, which included the Food and Drug Administration; the nuclear industry; commercial aviation; and such for-profit industries as telecommunications, banking, and highly automated manufacturing industries that have serious dependencies on software (RTCA, Inc., 1992; Food and Drug Administration, 1987; U.S. Nuclear Regulatory Commission, 1993; Scott and Lawrence, 1994).
It is important to recognize that most software errors stem from design flaws that can ultimately be traced to poor requirements. Correcting design flaws is especially difficult because one must ensure that the changes do not have unintended side effects that create problems in other segments of the code.
The panel recognizes that there are several important software engineering issues involved in the defense system life cycle that are not in our purview. These include configuration control during development and after deployment, so that every change made to a program is controlled and recorded to ensure the availability of the correct version of the program, as well as such issues as software reliability and upgrades. It is beyond the panel's charge to address directly the fundamentals of software engineering and current best practice for creating and maintaining software across the full system development life cycle. We view operational testing as a special moment or “snapshot” in the total life cycle of a software system. On the other hand, we recognize that software engineering is critical to successful software development and revision. If the software engineering process is flawed, then the statistical measurements and analysis used in operational testing will be out of context.
ACTIVITIES TO DATE
A primary goal of the panel has been to identify and develop working relationships with representatives from the different services who have primary operational test and evaluation responsibility for designing and performing experiments for software-intensive systems so that we can understand operational testing from their perspective, the difficulties they experience, and the areas where they might seek panel assistance. We have also sought to establish contacts with those who in effect work with the results of operational testing and have responsibility for reporting those results to Congress. Through our activities to date, we have identified and established contact with several key players who are involved in the operational testing of software-intensive systems within both the individual services and DoD.
The panel has been engaged in information gathering through meetings, focused group discussions, conference calls, and a few site visits. In late April 1995, we made a one-day visit to Navy Operational Test and Evaluation Force headquarters to learn about the Navy's approaches to testing software-intensive systems. In conjunction with that visit, we also visited the Dam Neck Naval Surface Warfare Center to get a first-hand look at some of the Navy's software-intensive systems. In addition, we held an interservice session in Washington, D.C., with representatives from the services and DoD. This session allowed us to learn more about the services' approaches to testing software-intensive systems, while providing an opportunity for the service representatives in attendance to exchange views and share their experiences.
The panel has learned about several ongoing efforts to resolve long-standing software testing problems. The Army's Software Test and Evaluation Panel is seeking to streamline and unify the software test and evaluation process through the use of consistent terms, the identification of key players in the software test and evaluation community, and the definition of specific software test and evaluation procedures (U.S. General Accounting Office, 1993; Paul, 1993, 1995). Additional objectives of the
Software Test and Evaluation Panel are to improve management visibility and control by quantifying and monitoring software technical parameters using software metrics, and to promote earlier user involvement. Also with regard to the Army, we were informed of a fairly new operational testing strategy for expediting the fielding of software-intensive systems. This new strategy allows partial fielding of software-intensive systems once successful operational testing of a representative sample has been accomplished. Traditional operational testing of weapon systems requires that the entire system successfully complete operational testing of production-representative items before fielding (Myers, 1993). In late 1991, the Air Force developed a process improvement program that was used in its software development activities. The Air Force has also been developing standardized procedures for software test and evaluation for use in both developmental and operational tests. The Navy has been engaged in an effort to improve test and evaluation of software, and has taken actions to improve its software development and testing processes.
Through our meetings and conversations with DoD personnel, the panel has become aware of a category of acquisition known as evolutionary acquisition. With evolutionary acquisition systems, such as the Naval Tactical Command System-Afloat,1 the software code that is evaluated in operational testing is not necessarily what is deployed on the ship. Although several versions of the software code may be tested, it is possible that none of those versions is representative of the software to be used in the field. We are concerned that evolutionary acquisition compromises the utility of operational testing. We plan to pursue this issue further and attempt to develop a better understanding of the concept of evolutionary acquisition.
A major goal will be to obtain more information about the development of software-intensive systems for the four services. We will also be developing a recommended protocol for software development and software testing, drawing from state-of-the-art industrial practice.
The panel is planning more interaction with Air Force Operational Test and Evaluation Center software experts and continued interaction with software contacts in the other services. In addition, we will further examine for a potential case study the Naval Tactical Command System-Afloat system.
The Naval Tactical Command System-Afloat is the Navy's premier command and control system and is described as an all-source, all-knowing system that is installed in most ships and many more shore sites. It provides timely, accurate, and complete all-source information management, display, and dissemination activities—including distribution of surveillance and intelligence data and imagery to support warfare mission assessment, planning, and execution—and is a current segment of a large strategy system known as the Joint Maritime Command Information System.