Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 45
5
Testing Methods
Robust testing is an important part of effective system development.
It can lead to early detection and correction of design deficiencies, and it
facilitates high quality and reliability throughout system development.
Testing at the U.S. Department of Defense (DOD) has been a subject of
several previous National Research Council (NRC) reports. This section
summarizes the conclusions and recommendations from those reports
that are relevant to this panel’s charge and offers additional analysis and
suggestions.
TESTING AS A CONTINUOUS PROCESS FOR LEARNING
Operational testing and evaluation (OT&E) is intended to support
a decision to pass or fail a defense system before it goes into full-scale
production, but this practice has not been consistently followed by DOD.
The National Research Council (1998) proposed a new paradigm recom-
mending that testing be viewed as a continuous process of information
gathering and decision making in which OT&E plays an integral role.
The new paradigm stressed the importance of adding operational
realism to developmental testing. A key motivation for this focus, which
is relevant to this report, is to discover design flaws much earlier in
system development than currently occurs, when such defects are much
less expensive to fix. It is well known that operational testing unearths
many design problems missed in earlier developmental testing due to
the better representation of operational realism. Adding operational real-
45
OCR for page 46
46 INDUSTRIAL METHODS FOR EFFECTIVE DEVELOPMENT AND TESTING
ism to developmental testing is very likely to help discover these flaws
earlier in the development process. Another benefit of adding operational
realism to developmental testing is that it provides a closer connection
between developmental and operational testing, thereby facilitating com -
bining information between the two forms of testing.
We also note that operational testing as currently done is typically too
short to be able to discover many reliability deficiencies, such as fatigue.
The time for developmental testing is also typically too short to find some
of these flaws. These weaknesses in the current testing approach motivate
the discussion below on accelerated testing, which, when properly imple-
mented, can effectively expedite the discovery of design flaws.
A later report (National Research Council, 2006:15) noted that con-
tinuous testing is especially appropriate for systems that are acquired in
stages, as one “learns about strengths and weaknesses of newly added
capabilities or (sub) systems, and uses the results to improve overall sys -
tem performance.” This report also recommended that DOD documents
and processes be revised “to explicitly recognize and accommodate [this]
framework” (p. 3) so that the testing community is engaged in a joint
effort to learn about and improve a system’s performance. Although such
formal changes have not been made, practices within DOD appear to be
moving in this direction, one that is consistent with commercial industry
practices.
There are a number of challenges in implementing the above paradigm.
Test data from various sources need to be readily available, including field
data from similar systems, data from previous stages of development, con-
tractor data, developmental data, and data from modeling and simulation.
Information from these sources can then be combined and exploited for
effective test planning, design, analysis, and decision making. There are,
however, major obstacles to meeting the challenges and accomplishing this
approach in DOD: lack of data archives (see discussion below); use of multi-
ple databases (with their own formats and incompatibilities); lack of access
to data; and perhaps most importantly, lack of an incentive structure that
emphasizes early detection of faults and sharing of information. As noted in
the NRC report (2006:19): “incentives need to be put in place to support the
process of learning and discovery of design inadequacies and failure modes
early.” In addition, the NRC recommended that DOD require that contrac-
tors share all relevant data on system performance and results of modeling
and simulation developed under government contracts. Similarly, Adolph
et al. (2008:219) noted: “Sharing and access to all appropriate system-level
and selected component-level test and model data by government DT
[developmental testing] and OT [operational testing] organizations” should
be required in defense contracts. Despite these recommendations, there has
been a lack of progress in this key area.
OCR for page 47
47
TESTING METHODS
COMBINING INFORMATION
The importance of collecting and using all available data for effective
decision making has been emphasized in several NRC reports.1 Further-
more, it was the major focus of a subsequent report (National Research
Council, 2004). Chapter 2 in that report deals with combining informa -
tion to improve test planning and test design as well as analysis, and
Chapter 3 discusses methods and examples related to reliability and suit -
ability assessment. There is also an extensive statistical literature on this
topic; in particular, an earlier NRC report (1992) is still a very useful
reference.
Our contribution in this section is to provide some concrete ideas on
how to parametrize the test space in order to improve test design and to
combine results from different testing environments.2
A defense system is typically designed with some specific missions in
mind. These missions can be characterized (at least partially) by variables
that describe the environment of use (temperature, precipitation, wind
speed, day/night, terrain, speed during use, weight of cargo, etc.). Other
relevant factors include presence of countermeasures and enemy systems
and the amount of training that the test personnel will have (which can
vary widely from the so-called golden crews to the amount of training
users will receive when a system is fielded). These factors may be ordered
categorical variables or continuous variables. All possible combinations
of these factors characterize the intended operational environment and
hence the test space. These characterizations will often be incomplete
in some respects since there may be some nominal (unordered) factors
or some nuisance or noise factors that cannot be fully captured. The
more effort that is placed in identifying and characterizing this space,
the more efficient the testing program will be.
Both operational and developmental tests can be viewed as points in
this space. Operational testing will use typical scenarios in the field and
so may fall in the middle region in the test space (at least for some of the
factors). Often, a systematic approach, such as statistical design of experi -
1See Recommendation 7.8 in National Research Council (1998:120) and Recommendation 2
in National Research Council (2003:53).
2The National Research Council (2006:18) report discussed such a test space: “We think
that for test purposes, ‘edge of the envelope’ can be defined fairly rigorously. The space of
conceivable military scenarios for operational testing includes a number of uncontrollable
dimensions (e.g., environmental characteristics, potential missions, threat objectives and
characteristics, etc.), and these dimensions can be usefully parameterized to identify the
edge of the envelope.” Bonder (1999) discusses parametric operational situation (POS) space
formulation: “Each point in this space represents an operational situation that U.S. forces
might have to be deployed to and operate in. Some of these situations are more stressful
than others.”
OCR for page 48
48 INDUSTRIAL METHODS FOR EFFECTIVE DEVELOPMENT AND TESTING
ments, is used to select the combinations of factor settings. Developmental
testing is more ad hoc and will not examine the space systematically.
Furthermore, it is likely to be based on more extreme scenarios, or what
is often referred to as testing at the edge of the envelope.
Most of the operational test studies that we have seen are simple
analyses that do not model the behavior of the factors over the test
space. There is clearly some value in such analyses that do not make any
assumptions and treat all the factors as nominal. But it would be very
useful to also conduct additional analyses in which the effects of the
factors are modeled parametrically (fitting parametric functions). Such
analyses will allow a framework in which data from developmental
tests (which may be isolated points in the test space) can be combined
with data from operational tests to improve the information. Of course,
part of the exploratory analysis will include checking for consistency
among the developmental testing, operational testing, and other types
of data, both empirically using extrapolations and using knowledge
of the similarities and differences in the testing environments—and
even for components and subsystems when available. If developmental
testing includes scenarios at the edge of the envelope, the data can be
interpolated to check for consistency with operational test data before
they are combined. This framework also allows for the use of sequential
testing during developmental testing with the aim of collecting more
information in areas of the test space in which there are higher levels
of uncertainty.
The panel recognizes that there are inherent dangers in combining
data across heterogeneous sources without carefully considering the dif -
ferences in the data sources and the reasons for the differences. Moreover,
the ideas described here may not be applicable in all situations. For exam-
ple, developmental test data may often be available only on components
or subsystems. Nevertheless, it is important to examine different ideas on
how to effectively combine data and effectively use test resources.
ACCELERATED TESTING
As the term suggests, accelerated testing involves conducting tests at
conditions that are quite different from the operating conditions. Testing at
the edge of the envelope, discussed above, can be viewed as one example.
The discussion in this section deals mainly with reliability testing for suit -
ability assessment.
The main goal in accelerated testing is to induce failures or degrade
performance rapidly. Highly accelerated tests are commonly used by reli-
ability engineers to identify failure modes. We focus here on the use of
moderate acceleration regimens to estimate product or system reliability.
OCR for page 49
49
TESTING METHODS
(An important caveat in these situations is that the acceleration should
not induce failure modes that would not occur during normal opera-
tion.) Accelerated tests have been used extensively in industry. They are
needed to estimate the reliability of highly reliable components or sys -
tems since few failures will occur during the (short) test phase of product
development.
There are two common types of acceleration schemes: (1) increasing
usage rate and reducing idle time and (2) using higher stress levels, such
as temperature, voltage, humidity, and pressure. In the latter case, the
appropriate stress factor(s) will depend on the component and failure
mode of interest—corrosion, fatigue, mechanical wear, etc. There is an
extensive discussion of stress factors corresponding to different types of
failure mechanisms in the engineering literature.3
There is also considerable literature on the planning, design, and
analysis of accelerated testing for life tests, where the outcome is lifetime
data. The approach has also been used with degradation data (continuous
measures of performance) although this literature is not as extensive (see
Meeker and Escobar, 1998:Chs. 13, 21). Accelerated testing relies criti -
cally on the use of models to extrapolate the test results to normal use
conditions. The literature emphasizes the need for using subject-matter
knowledge and caution in extrapolating and suggests the use of extensive
sensitivity analyses to assess the effects of using different models.
Accelerated testing is well known in the reliability community, and
the panel expects that it is used extensively by defense contractors. How-
ever, given the inherent assumptions involved in these studies, it would
be desirable for testers from DOD to either participate in their planning
and analyses or have access to the test schemes in advance. Accelerated
testing can and should play a prominent role in suitability assessment by
DOD.
SOFTWARE SYSTEMS
Software systems are a major part of defense acquisition programs,
either as exclusive systems or as critical parts of hardware systems. Soft-
ware problems are also ubiquitous in poorly performing defense sys-
tems.4 Although the use of processes such as agile development may lead
to higher software quality, testing will remain crucially important. There
is a substantial literature on software testing, and so we do not provide
3For an example, see Reliability, Life Testing and the Prediction of Service Lives: For Engineers
and Scientists (Saunders, 2007).
4For example, see the report of the Defense Science Board, Task Force on Defense Software
(2000).
OCR for page 50
50 INDUSTRIAL METHODS FOR EFFECTIVE DEVELOPMENT AND TESTING
an overview here. In particular, the NRC (2003:Ch. 3) has described tech -
niques for software testing and related issues, including model-based
testing, Markov-Chain usage models, and the use of combinatorial experi-
mental designs.
There are some unique challenges with embedded systems, in which
the software is embedded in hardware and has limited functionality (e.g.,
a GPS receiver) or is intended to react to a wide range of stimuli, such as
the avionics for a jet fighter. These and other factors will determine if the
software should be considered as simply a component of the full system
during either developmental or operational testing or if the software
needs to be tested separately from the remainder of the system. There
is only a limited literature on the testing of embedded systems (but see
Bringmann and Kramer, 2008, for some possibilities).