Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 5
1
Motivation for and
Structure of the Workshop
ecent rough estimates are that the U.S. Department of Defense
(DoD) spends at least $38 billion a year on the research, develop-
, ~ Moment, testing, and evaluation of new defense systems, and that
approximately 40 percent of that cost at least $16 billion is spent on
software development and testing (Ferguson, 2001; Aerospace Daily, 20031.
Given the costs involved, even relatively incremental improvements to the
software development process for defense systems could represent a large
savings in defense funds, in addition to producing higher-quality defense
systems. Therefore, a high priority needs to be accorded to the identifica-
tion of software engineering methods that can be used to provide higher-
quality software at reduced costs.
In addition to impacts on quality improvement and cost savings, soft-
ware problems are known to cause delays in the delivery of new defense
systems and to result in reduced functionality in comparison to initial speci-
fications. More importantly, field failures can lead to mission failure and
even loss of life. These are all important reasons to support broad-based
1 . 1 r
investigations into various approaches to improve the process ot englneer-
ing software for defense systems.
In opening remarks at the workshop, Delores Etter, Deputy Under
Secretary for Defense (Science and Technology), described the complicated
software systems embedded in the Comanche RAM-66 Helicopter, in the
NAVS TAR global positioning system, in the AEGIS weapon system, and
OCR for page 6
6
INNOVATIONS IN SOFTWARE ENGINEERING
in the Predator Unmanned Aerial Vehicle. For example, for AEGIS, there
are 1,200K lines of code for display, 385K lines of code for the test, 266K
lines of code for the weapon, 1 lOK lines of code for training, and 337K
lines of code for the command and decision making. Complicated soft-
ware systems are ubiquitous in defense systems today.
The Workshop on Statistical Methods in Software Engineering for
Defense Systems grew out of the work of the Panel on Statistical Methods
for Testing and Evaluating Defense Systems. This Committee on National
Statistics panel, funded by the DoD Director of Operational Test and Evalu-
ation (DOT&E), examined the use of statistical methods in several sepa-
rate arenas related to the design and evaluation of operational tests, and,
more broadly, the process used for the development of defense systems
(NRC, 1998~. The panel identified a number of areas of application in
which the problem-solving approach of statistical science (in contrast to
simple, omnibus techniques) could be used to help provide extremely use-
ful information to support decisions on defense system development. Two
of the arenas examined by this panel were testing software-intensive sys-
tems and testing software architecture (see NRC, 1998, Chapter 8~.
Following this panel's recommendation to continue the examination
of the applicability of statistical techniques to defense system development
at a more detailed level, DOT&E, along with the Office of Acquisition,
Technology, and Logistics of the Office of the Secretary of Defense, initi-
ated a series of workshops to explore in greater depth issues raised in the
different areas of focus of the National Research Council (NRC, 1998~.
The first workshop, held lune 9-10,2000, addressed the issue of reliability
assessment (see NRC, 2002, for details). The second workshop, held luly
19-20,2001, and jointly organized with the NRC's Committee on Applied
and Theoretical Statistics, dealt with the use of statistical methods for test-
ing and evaluation of software-intensive systems and is the chief basis for
this report.1
STRUCTURE OF THE WORKSHOP
The Workshop on Statistical Methods in Software Engineering for
Defense Systems was structured to correspond roughly with the steps to
1For related work, see NRC (1996).
OCR for page 7
MOTIVATION FOR AND STRUCTURE OF THE WORKSHOP
7
carry out a statistical assessment of the functioning of any industrial sys-
tem, whether hardware or software. The steps of a statistical analysis are:
(1) specification of requirements, (2) selection of an experimental design to
efficiently select test cases to collect information on system performance in
satisfying requirements, and (3) analysis of the resulting experimental data
(to estimate performance, check compliance with requirements, etch. This
structure was used to organize the workshop presentations and we use the
same structure to organize this workshop report.
The linear structure of this report is unfortunate in that it does not
communicate well the overlapping aspects of many of the methods de-
scribed, given that research in one area is often relevant to others. This is
one of the justifications for optimism that tools will be developed in the
very near future that would further combine, say, requirements specifica-
tions, testing, cost estimation, and risk assessment as methods emanating
from a unified framework. For example, overlap can already be seen in the
development of software tools that test software functionality; known as
test oracles, these tools are needed for use with model-based testing and
could be assisted by tools developed for requirements specification. Con-
versely, the graph on which model-based testing relies can be used to sup-
. .^ .
port requirements specl~lcatlon.
We now provide some detail on the topics examined at the workshop
in these three areas of software engineering. (See also the workshop agenda
in Appendix A.)
Specification of requirements and software testability: A requisite
for data collection on system performance is the specification of require-
ments, which defines the successful functioning of a system. Requirements
need to be examined with respect to: (1) the possibility of verification, (2)
completeness, (3) consistency, (4) correctness, and (5) complexity. It is
useful to judge how complicated a system is in order to estimate how many
replications might be needed to adequately test a software system (given an
agreed-upon definition of adequate) . The complexity of software is depen-
dent on its architecture, which is related to the specification of require-
ments. To address both whether requirements are well specified and
whether the software is structured in a way to facilitate testing, the work-
shop included a session concerning specification of requirements for soft-
ware systems.
OCR for page 8
8
INNOVATIONS IN SOFTWARE ENGINEERING
Selection of test cases experimental design: The better one un-
derstands software performance, the better one is able to predict the ex-
pected behavior of systems. The better one understands the software devel-
opment process, the better one can intelligently allocate scarce resources
(including personnel, tools, hardware, and machine cycles) to address any
deficiencies. By collecting data and analyzing them, one can develop a
better understanding of both software performance and potential deficien-
cies in the software development process. To collect data on how a software
system operates, the system must be exercised on selected test scenarios,
i.e., a suite of test inputs. It is sometimes believed, particularly with auto-
mated testing, that since many software systems execute in small fractions
of a second, a software system could be executed a very large number of
.
times without needing to consider any notions of optimal selection of test
scenarios. However, inherent system complexity leads to an astronomical
number of test inputs for virtually all defense software systems, hence the
need to carefully select test inputs. Furthermore, many systems either do
not execute quickly or else have long set-up times for test cases. For ex-
ample, testing the interoperability of a system that is composed of various
subsystems which can be represented by different versions or releases is
typically not an automated activity and therefore necessitates the careful
selection of test inputs. Thus concepts and tools from the statistical field of
experimental design are relevant to consider.
Two statistically oriented approaches for the selection of inputs for
software testing have been successfully applied in a variety of industrial
settings. They are both examples of"model-based" testing methods, which
rely (possibly implicitly) on a graphical representation of the software in
action. The nodes of the graph represent user-relevant states of the soft-
ware, and various user actions are represented as transitions from one node
to another. One of the two approaches to input selection presented at the
workshop relies on a Markov chain model of software use in transitioning
through the graphical representation, described below. The second ap-
proach, referred to as combinatorial design, identifies a surprisingly small
collection of software test inputs that includes all k-wise combinations of
input fields (typically at a given node of the graphical representation), where
k is typically chosen to be small, often 2 or 3. Both approaches have been
used in industrial applications to help provide high-quality software with
substantial cost savings and schedule reductions.
Given the different advantages of these two methods, there is interest
in developing a hybrid method that combines and retains the advantages of
OCR for page 9
MOTIVATION FOR AND STRUCTURE OF THE WORKSHOP
9
both, and so some possibilities for achieving this are discussed. In addition,
since Markov chain usage testing, while quite efficient, often requires a
substantial test sample size to provide reliable statistics on software perfor-
mance, test automation remains crucial, and so such methods were also
briefly examined at the workshop.
A common problem that is a high priority for the defense acquisition
community is that of software systems that are composed of subsystems
that have versions that are collectively incompatible. (This is referred to as
the interoperability problem.) Many software-intensive defense systems
utilize commercial-off-the-shelf (COTS) systems as subroutines or as lower-
level component systems. Different COTS systems have separate release
schedules and, combined with the fact that any custom software systems
included as components will also be subject to modifications, there is the
possibility that some of these configurations and modifications may be in-
compatible. A session at the workshop provided a description of many of
the tools used in industry to enhance interoperability, and showed that
combinatorial design methods can be used to measure the extent of
interoperability in a system of systems.
Analysis of test slate: Data that are collected (possibly using the above
experimental design procedures) consist of test inputs, which can be con-
sidered from a statistical perspective as "independent variables," and the
associated results of the software system exercised on those inputs, which
can be thought of as "dependent variables." These input-output pairs can
be used both to help determine whether the software functioned success-
fully for specifically chosen inputs and to measure the extent of, and iden-
tify the sources of, defects in a software system. Furthermore, based on
these data and the assessment of whether the software functioned properly,
a variety of additional analyses can be carried out concerning the perfor-
mance of the system and the software development process. Accordingly, a
session at the workshop was organized to consider various uses of data col-
lected on software performance and development. This session included
presentations on measurement of software risk, software aging, defect analy-
sis and classification, and estimation of the parameters of models predict-
ing the costs of software development.
Two examples of performance measurement not covered at the work-
shop are reliability growth modeling and decision making on when to re-
lease a product. Dozens of models on reliability growth have been sug-
gested in the literature with varying interpretations of the impact on system
OCR for page 10
10
INNOVATIONS IN SOFTWARE ENGINEERING
reliability associated with code changes. Some assumptions imply better
performance after changes, while others imply worse performance. Some
assumptions imply linear growth, while others imply exponential growth.
The test data are used to fit the parameters ofthese models and then projec-
tions are made. This literature includes contributions by Musa, Goel,
Littlewood, and Veral, among others. (For specific references to their work,
see Lyu, 1996.) With respect to decisions on release dates for new software
products, IBM uses test data not only to decide on the optimal release date
but also to estimate field support budgets for the life of the product or until
the next version is released (for details, see Whittaker and Agrawal, 19941.
The literature on when to release a product is considerable; two recent
contributions are Dalal and Mallows (1988) and Dalal and McIntosh
(19941.
The workshop focused on methods that were already in use, for which
mature tools existed, and that were generally believed to be readily adapt-
able to defense systems. Many of the techniques described in this report
have already enjoyed successful application in DoD or DoD-like applica-
tions. For example, a Raytheon application of testing based on Markov
chain usage models resulted in an 80 percent reduction in the cost of auto-
mated testing along with a substantial reduction in the percentage of project
resources required for testing (17-30 percent versus 32-47 percent). An-
other Raytheon application of combinatorial design for a satellite control
center experienced a 68 percent reduction in test duration and a 67 percent
savings in test labor costs. And IBM Storage Systems Division reported on
the first industrial use of model-based statistical testing for mass storage
controllers. To help show the maturity of the methods, the workshop con-
cluded with demonstrations of many of the software tools that had been
described in the presentations.
WORKSHOP LIMITATIONS
This workshop was not designed to, and could not, address all of the
recent innovations that are already implemented or under development as
part of software engineering practice in industry. In no respect should the
workshop or this report be considered a comprehensive examination of
tools for software engineering. The purpose was to examine some promis-
ing statistically oriented methods in software engineering, necessarily omit-
ting many important techniques. The workshop presentations can there-
OCR for page 11
MOTIVATION FOR AND STRUCTURE OF THE WORKSHOP
1
1
fore be used to support only relatively broad recommendations concerning
next steps for DoD in its adoption of software engineering methods.
Some topics that might be covered in a follow-up workshop of this
type are design for testability, linking specifications requirements to testing
models, development of certification testing protocols, and the behavior of
systems of systems. The latter topic would entail a much broader look into
issues raised by the development of systems of systems, rather than the
narrow problem of testing for interoperability, which is mainly to identify
faults. Another topic that could be covered is the selection and application
of appropriate testing tools and strategies to achieve specific quality assur-
ance requirements (as opposed to functional specification requirements for
the software system). This topic could be examined more generally in
terms of software process simulation models, which have obtained some
use in industry and can be applied to optimize testing strategies as well as
other aspects of the software development life cycle (for example, see Raffo
and Kellner, 19991.
In the remainder of this report, the methods presented at the work-
shop for possible use by DoD are described. In addition, some recommen-
dations are included relative to the implementation of these and related
methods of software engineering practice for use with software-intensive
defense systems. The report is organized as follows. This introduction is
followed by chapters on requirements analysis, model-based testing meth-
ods and methods to address interoperability, and performance and data
analysis to support software evaluation and improvement of software devel-
opment practice. The report concludes with a short summary chapter con-
taining recommendations for next steps.
Representative terms from entire chapter:
software development