Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 5
1 Motivation for and Structure of the Workshop ecent rough estimates are that the U.S. Department of Defense (DoD) spends at least $38 billion a year on the research, develop- , ~ Moment, testing, and evaluation of new defense systems, and that approximately 40 percent of that cost at least $16 billion is spent on software development and testing (Ferguson, 2001; Aerospace Daily, 20031. Given the costs involved, even relatively incremental improvements to the software development process for defense systems could represent a large savings in defense funds, in addition to producing higher-quality defense systems. Therefore, a high priority needs to be accorded to the identifica- tion of software engineering methods that can be used to provide higher- quality software at reduced costs. In addition to impacts on quality improvement and cost savings, soft- ware problems are known to cause delays in the delivery of new defense systems and to result in reduced functionality in comparison to initial speci- fications. More importantly, field failures can lead to mission failure and even loss of life. These are all important reasons to support broad-based 1 . 1 r investigations into various approaches to improve the process ot englneer- ing software for defense systems. In opening remarks at the workshop, Delores Etter, Deputy Under Secretary for Defense (Science and Technology), described the complicated software systems embedded in the Comanche RAM-66 Helicopter, in the NAVS TAR global positioning system, in the AEGIS weapon system, and
OCR for page 6
6 INNOVATIONS IN SOFTWARE ENGINEERING in the Predator Unmanned Aerial Vehicle. For example, for AEGIS, there are 1,200K lines of code for display, 385K lines of code for the test, 266K lines of code for the weapon, 1 lOK lines of code for training, and 337K lines of code for the command and decision making. Complicated soft- ware systems are ubiquitous in defense systems today. The Workshop on Statistical Methods in Software Engineering for Defense Systems grew out of the work of the Panel on Statistical Methods for Testing and Evaluating Defense Systems. This Committee on National Statistics panel, funded by the DoD Director of Operational Test and Evalu- ation (DOT&E), examined the use of statistical methods in several sepa- rate arenas related to the design and evaluation of operational tests, and, more broadly, the process used for the development of defense systems (NRC, 1998~. The panel identified a number of areas of application in which the problem-solving approach of statistical science (in contrast to simple, omnibus techniques) could be used to help provide extremely use- ful information to support decisions on defense system development. Two of the arenas examined by this panel were testing software-intensive sys- tems and testing software architecture (see NRC, 1998, Chapter 8~. Following this panel's recommendation to continue the examination of the applicability of statistical techniques to defense system development at a more detailed level, DOT&E, along with the Office of Acquisition, Technology, and Logistics of the Office of the Secretary of Defense, initi- ated a series of workshops to explore in greater depth issues raised in the different areas of focus of the National Research Council (NRC, 1998~. The first workshop, held lune 9-10,2000, addressed the issue of reliability assessment (see NRC, 2002, for details). The second workshop, held luly 19-20,2001, and jointly organized with the NRC's Committee on Applied and Theoretical Statistics, dealt with the use of statistical methods for test- ing and evaluation of software-intensive systems and is the chief basis for this report.1 STRUCTURE OF THE WORKSHOP The Workshop on Statistical Methods in Software Engineering for Defense Systems was structured to correspond roughly with the steps to 1For related work, see NRC (1996).
OCR for page 7
MOTIVATION FOR AND STRUCTURE OF THE WORKSHOP 7 carry out a statistical assessment of the functioning of any industrial sys- tem, whether hardware or software. The steps of a statistical analysis are: (1) specification of requirements, (2) selection of an experimental design to efficiently select test cases to collect information on system performance in satisfying requirements, and (3) analysis of the resulting experimental data (to estimate performance, check compliance with requirements, etch. This structure was used to organize the workshop presentations and we use the same structure to organize this workshop report. The linear structure of this report is unfortunate in that it does not communicate well the overlapping aspects of many of the methods de- scribed, given that research in one area is often relevant to others. This is one of the justifications for optimism that tools will be developed in the very near future that would further combine, say, requirements specifica- tions, testing, cost estimation, and risk assessment as methods emanating from a unified framework. For example, overlap can already be seen in the development of software tools that test software functionality; known as test oracles, these tools are needed for use with model-based testing and could be assisted by tools developed for requirements specification. Con- versely, the graph on which model-based testing relies can be used to sup- . .^ . port requirements specl~lcatlon. We now provide some detail on the topics examined at the workshop in these three areas of software engineering. (See also the workshop agenda in Appendix A.) Specification of requirements and software testability: A requisite for data collection on system performance is the specification of require- ments, which defines the successful functioning of a system. Requirements need to be examined with respect to: (1) the possibility of verification, (2) completeness, (3) consistency, (4) correctness, and (5) complexity. It is useful to judge how complicated a system is in order to estimate how many replications might be needed to adequately test a software system (given an agreed-upon definition of adequate) . The complexity of software is depen- dent on its architecture, which is related to the specification of require- ments. To address both whether requirements are well specified and whether the software is structured in a way to facilitate testing, the work- shop included a session concerning specification of requirements for soft- ware systems.
OCR for page 8
8 INNOVATIONS IN SOFTWARE ENGINEERING Selection of test cases experimental design: The better one un- derstands software performance, the better one is able to predict the ex- pected behavior of systems. The better one understands the software devel- opment process, the better one can intelligently allocate scarce resources (including personnel, tools, hardware, and machine cycles) to address any deficiencies. By collecting data and analyzing them, one can develop a better understanding of both software performance and potential deficien- cies in the software development process. To collect data on how a software system operates, the system must be exercised on selected test scenarios, i.e., a suite of test inputs. It is sometimes believed, particularly with auto- mated testing, that since many software systems execute in small fractions of a second, a software system could be executed a very large number of . times without needing to consider any notions of optimal selection of test scenarios. However, inherent system complexity leads to an astronomical number of test inputs for virtually all defense software systems, hence the need to carefully select test inputs. Furthermore, many systems either do not execute quickly or else have long set-up times for test cases. For ex- ample, testing the interoperability of a system that is composed of various subsystems which can be represented by different versions or releases is typically not an automated activity and therefore necessitates the careful selection of test inputs. Thus concepts and tools from the statistical field of experimental design are relevant to consider. Two statistically oriented approaches for the selection of inputs for software testing have been successfully applied in a variety of industrial settings. They are both examples of"model-based" testing methods, which rely (possibly implicitly) on a graphical representation of the software in action. The nodes of the graph represent user-relevant states of the soft- ware, and various user actions are represented as transitions from one node to another. One of the two approaches to input selection presented at the workshop relies on a Markov chain model of software use in transitioning through the graphical representation, described below. The second ap- proach, referred to as combinatorial design, identifies a surprisingly small collection of software test inputs that includes all k-wise combinations of input fields (typically at a given node of the graphical representation), where k is typically chosen to be small, often 2 or 3. Both approaches have been used in industrial applications to help provide high-quality software with substantial cost savings and schedule reductions. Given the different advantages of these two methods, there is interest in developing a hybrid method that combines and retains the advantages of
OCR for page 9
MOTIVATION FOR AND STRUCTURE OF THE WORKSHOP 9 both, and so some possibilities for achieving this are discussed. In addition, since Markov chain usage testing, while quite efficient, often requires a substantial test sample size to provide reliable statistics on software perfor- mance, test automation remains crucial, and so such methods were also briefly examined at the workshop. A common problem that is a high priority for the defense acquisition community is that of software systems that are composed of subsystems that have versions that are collectively incompatible. (This is referred to as the interoperability problem.) Many software-intensive defense systems utilize commercial-off-the-shelf (COTS) systems as subroutines or as lower- level component systems. Different COTS systems have separate release schedules and, combined with the fact that any custom software systems included as components will also be subject to modifications, there is the possibility that some of these configurations and modifications may be in- compatible. A session at the workshop provided a description of many of the tools used in industry to enhance interoperability, and showed that combinatorial design methods can be used to measure the extent of interoperability in a system of systems. Analysis of test slate: Data that are collected (possibly using the above experimental design procedures) consist of test inputs, which can be con- sidered from a statistical perspective as "independent variables," and the associated results of the software system exercised on those inputs, which can be thought of as "dependent variables." These input-output pairs can be used both to help determine whether the software functioned success- fully for specifically chosen inputs and to measure the extent of, and iden- tify the sources of, defects in a software system. Furthermore, based on these data and the assessment of whether the software functioned properly, a variety of additional analyses can be carried out concerning the perfor- mance of the system and the software development process. Accordingly, a session at the workshop was organized to consider various uses of data col- lected on software performance and development. This session included presentations on measurement of software risk, software aging, defect analy- sis and classification, and estimation of the parameters of models predict- ing the costs of software development. Two examples of performance measurement not covered at the work- shop are reliability growth modeling and decision making on when to re- lease a product. Dozens of models on reliability growth have been sug- gested in the literature with varying interpretations of the impact on system
OCR for page 10
10 INNOVATIONS IN SOFTWARE ENGINEERING reliability associated with code changes. Some assumptions imply better performance after changes, while others imply worse performance. Some assumptions imply linear growth, while others imply exponential growth. The test data are used to fit the parameters ofthese models and then projec- tions are made. This literature includes contributions by Musa, Goel, Littlewood, and Veral, among others. (For specific references to their work, see Lyu, 1996.) With respect to decisions on release dates for new software products, IBM uses test data not only to decide on the optimal release date but also to estimate field support budgets for the life of the product or until the next version is released (for details, see Whittaker and Agrawal, 19941. The literature on when to release a product is considerable; two recent contributions are Dalal and Mallows (1988) and Dalal and McIntosh (19941. The workshop focused on methods that were already in use, for which mature tools existed, and that were generally believed to be readily adapt- able to defense systems. Many of the techniques described in this report have already enjoyed successful application in DoD or DoD-like applica- tions. For example, a Raytheon application of testing based on Markov chain usage models resulted in an 80 percent reduction in the cost of auto- mated testing along with a substantial reduction in the percentage of project resources required for testing (17-30 percent versus 32-47 percent). An- other Raytheon application of combinatorial design for a satellite control center experienced a 68 percent reduction in test duration and a 67 percent savings in test labor costs. And IBM Storage Systems Division reported on the first industrial use of model-based statistical testing for mass storage controllers. To help show the maturity of the methods, the workshop con- cluded with demonstrations of many of the software tools that had been described in the presentations. WORKSHOP LIMITATIONS This workshop was not designed to, and could not, address all of the recent innovations that are already implemented or under development as part of software engineering practice in industry. In no respect should the workshop or this report be considered a comprehensive examination of tools for software engineering. The purpose was to examine some promis- ing statistically oriented methods in software engineering, necessarily omit- ting many important techniques. The workshop presentations can there-
OCR for page 11
MOTIVATION FOR AND STRUCTURE OF THE WORKSHOP 1 1 fore be used to support only relatively broad recommendations concerning next steps for DoD in its adoption of software engineering methods. Some topics that might be covered in a follow-up workshop of this type are design for testability, linking specifications requirements to testing models, development of certification testing protocols, and the behavior of systems of systems. The latter topic would entail a much broader look into issues raised by the development of systems of systems, rather than the narrow problem of testing for interoperability, which is mainly to identify faults. Another topic that could be covered is the selection and application of appropriate testing tools and strategies to achieve specific quality assur- ance requirements (as opposed to functional specification requirements for the software system). This topic could be examined more generally in terms of software process simulation models, which have obtained some use in industry and can be applied to optimize testing strategies as well as other aspects of the software development life cycle (for example, see Raffo and Kellner, 19991. In the remainder of this report, the methods presented at the work- shop for possible use by DoD are described. In addition, some recommen- dations are included relative to the implementation of these and related methods of software engineering practice for use with software-intensive defense systems. The report is organized as follows. This introduction is followed by chapters on requirements analysis, model-based testing meth- ods and methods to address interoperability, and performance and data analysis to support software evaluation and improvement of software devel- opment practice. The report concludes with a short summary chapter con- taining recommendations for next steps.
Representative terms from entire chapter: