Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 20
3
Testing Methods and Related Issues
Testing of software in development has two primary purposes. First,
testing methods are used to assess the quality of a software system.
Second, while practitioners stress that it is not a good idea to try to
"test quality into" a deficient software system since some defects are likely
unavoidable, testing is very important for discovering those that have oc-
curred during software development. So for both verifying the quality of
software and identifying defects in code, testing is a vital component of
software engineering. But software testing is expensive, typically requiring
from one-fifth to one-third of the total development budget (see
Humphrey, 1989), and can be time-consuming. As a result, methods that
would improve the quality and productivity of testing are very important
to identify and implement. Because testing is an area in which statistically
oriented methods are having an important impact on industrial software
engineering, the workshop included a session on those methods.
INTRODUCTION TO MODEL-BASED TESTING
Two of the leading statistically oriented methods for the selection of
inputs for software testing are forms of model-based testing, and therefore
it is useful to provide a quick overview of this general approach.
Prior to the development of model-based testing, the general ap-
proaches used for software testing included manual testing, scripted auto-
mation, use of random inputs, finite state models, and production gram-
20
OCR for page 21
TESTING METHODS AND RELATED ISSUES
21
mar models. Each of these approaches has serious deficiencies. For ex-
ample, manual testing and scripted automation are extremely time-con-
suming since they require frequent updating (if they are not updated, they
are likely to catch only defects that have already been identified). Random
input models are very wasteful in that most of the inputs fail to "make
sense" to the software and are immediately rejected. Finite state models,
which operate at the same level of detail as the code, need to be extremely
detailed in order to support effective testing.
The idea behind model-based testing is to represent the functioning of
a software system using a model composed of user scenarios. Models can
be implemented by a variety of structured descriptions, for example, as
various tours through a graph. From an analytical and visual perspective it
is often beneficial to represent the model as a structured tree or a graph.
The graph created for this purpose uses nodes to represent observable, user-
relevant states (e.g., the performance of a computation or the opening of a
file). The arcs between a set of nodes represent the result of user-supplied
actions or inputs (e.g., user-supplied answers to yes/no questions or other
key strokes, or mouse clicks in various regions of the desktop) that corre-
spond to the functioning of the software in proceeding from one state of
use to another. (The graph produced for this purpose is at a much higher
level and not code based, as opposed to that for a finite state model.11
It should be pointed out before proceeding that even if every path were
tested and no defects found, this would not guarantee that the software
system was defect-free since there is a many-to-one relationship of inputs to
paths through the graph. Furthermore, even if there are no logical errors in
the software, a software system can fail due to a number of environmental
factors such as problems with an operating system and erroneous inputs.
Thus, to be comprehensive, testing needs to incorporate scenarios that in-
volve these environmental factors.
To identify defects, one could consider testing every path through the
graph. For all but the simplest graphical models, however, a test of every
path would be prohibitively time-consuming. One feasible alternative is to
efficiently choose test inputs with associated graphical paths that collec-
tively contain every arc between nodes in the graphical model. There are
algorithms that carry out this test plan; random walks through the graph
1One side benefit of developing a graphical model of the functioning of a software
system is that it helps to identify ambiguities in the specifications.
OCR for page 22
22
INNOVATIONS IN SOFTWARE ENGINEERING
are used, and more complicated alternatives are also available (see Gross
and Yellen, 1998, for details). It is important to point out that the graph
produced for this purpose is assumed to be correct, since one cannot iden-
tify missing functionality, such as missing nodes or edges, using these tech-
niques. The graphical representation of the software can be accomplished
at varying levels of detail to focus attention on components of the system
that are in need of more (or less) intensive testing. (For more information
on model-based testing, see Rosaria and Robinson, 2000.)
Test oracles,2 in this context, are separate programs that take small
sections of the sequence of user-supplied inputs and provide output that
represents the proper functioning of a software system as defined by the
system's requirements. (In other contexts, other test oracle architectures
and configurations are possible, e.g., architectures where individual oracles
cooperate to constitute an oracle for the whole. See Oshana, 1999, for
details.) When they exist, test oracles are placed at each node of the graph
to permit comparison of the functioning of the current system with what
the output of the correct system would be. Through use of an oracle,
discrepancies between the current state of the system and the correct sys-
tem at various nodes of the graph can be identified.3
Graphical models are initially developed at a very early stage in system
development, during the identification of system requirements. A number
of tools (Rational, Visio) and language constructs (Unified Modeling Lan-
guage) are available for creating graphical models, which, once created, can
be modified based on issues raised during system development. One can
also develop a procedure for the generation of test cases based on the model
and then update the procedure as the system and the graphical model ma-
ture. Many users have found that the graphical model is a useful form of
documentation of the system and provides a useful summary of its features.
An illustrative example of a graphical model for a long-distance tele-
ohone billing system is Provided in Figure 3-1. which has two parts: (a) a
A-- ------a -I------ -- A-- -- --- --arm ~
model of the general process and (b) a detailed submodel of the primary
component. This illustrative example suggests the hierarchical nature of
these models in practice.
2Software cost reduction (SCR) can be used to produce test oracles.
3Since model-based testing is not a direct examination of the software code, it is consid-
ered a type of "black box" testing.
OCR for page 23
TESTING METHODS AND RELATED ISSUES
23
A. Model: Long Distance Platform
| Session |
I begins I
| Receive
~ inbound call
| Determine billing
B. Model: Play Billing Prompt
Play /
billing
prompt \
,,~3
Play billing ~
prompt i~3
-
| Float call |
~ ~3\
- | Credit card |
Collect
I Session end I
| Release call |
~—
| Enter card number |
Billing
data
verified
I No input required
FIGURE 3-1 Example of a model used in model-based testing of long-distance tele-
phone flows: (a) high-level model and (b) detailed submodel of billing option prompts.
SOURCE: Adapted from Apfelbaum and Doyle (1997~.
Major benefits of model-based testing are early and efficient defect
detection, automatic generation (very often) of test suites, and flexibility
and coverage of test suites. Another primary benefit of model-based testing
is that when changes are made to the system, typically only minor changes
are needed to the model and thus test scenarios relative to the new system
can be generated quickly. In addition, since the graphical model can be
modified along with the software system, model-based testing works
smoothly in concert with spiral software development. On the downside,
OCR for page 24
24
INNOVATIONS IN SOFTWARE ENGINEERING
testers need to develop additional skills, and development of the model
represents a substantial up-front investment.
Testing, which requires substantial time and resources, is very costly,
both for the service test agencies and, much more importantly, for the soft-
ware developers themselves, and so it is essential that it be done efficiently.
DoD software systems in particular are generally required to be highly de-
pendable and always available, which is another reason that testing must be
highly effective in this area of application. The testing procedure often
used for defense systems is manual testing with custom-designed test cases.
DoD also contracts for complex large, custom-built systems and demands
high reliability of their software under severe cost pressures. Cost pressures,
short release cycles, and manual test generation have the potential for nega-
tively affecting system reliability in the field.
A recent dynamic in defense acquisition is the greater use of evolution-
ary procurement or spiral development of software programs. Particularly
with these types of procurement, it is extremely efficient for testing to be
integrated into the development process so that one has a working
(sub)system throughout the various stages of system development and so
that one can adjust the model to specifically test those components added
at each stage.
Model-based software testing has the potential for assuring clients that
software will function properly in the field and can be used for verification
prior to release. The workshop presented two methods among many for
model-based testing that have been shown to be effective in a wide variety
of industrial applications. The first example is Markov chain usage model-
ing, and the second is automatic efficient test generation (AETG), often
referred to as combinatorial design.4
MARKOV CHAIN USAGE MODELS
Markov chain usage modeling was described at the workshop by Jesse
Poore of the University of Tennessee (for more details, see Whittaker and
Poore, 1993, and Whittaker and Thomason, 19941. Markov chain usage
models begin from the graphical model of a software program described
4We note that SCR can also be extended to automatic test set generation, albeit in a
very different form than discussed here. A lot of techniques used in protocol testing also use
model-based finite state machines (Lee and Yanakakis, 1992).
OCR for page 25
TESTING METHODS AND RELATED ISSUES
25
above. On top of this graphical model, a Markov chain probabilistic struc-
ture is associated with various user-supplied actions shown as arcs in the
graphical model that result in transitions from one node to the nodes
that are linked to it. Given the Markovian assumption, the probabilities
attached to the various transitions from node to node are assumed to be (1)
independent of the path taken to arrive at the given node and (2) unchang-
ing in time. These (conditional) probabilities indicate which transitions
from a given node are more or less likely based on the actions of a given
type of user. Importantly, these transition probabilities can be used in a
simulation to select subsequent arcs, proceeding from one node to another,
resulting in a path through the graphical model.
Testing Process
The basics of the Markov chain usage model testing process are as
follows. There is a population of possible paths from the initial state to the
termination statefs) of a program. The Markov chain usage model ran-
domly samples paths from this population using the transition probabili-
ties, which are typically obtained in one of three ways: (1) elicited from
experts, (2) based on field data recorded by instrumented systems, or (3)
resolved from a system of constraints. By selecting the test inputs in any of
these three ways, the paths that are more frequently utilized by a user are
chosen for testing with higher probability. Defects associated with the more
frequent-use scenarios are thus more likely to be discovered and eliminated.
An important benefit of this testing is that, based on the well-understood
properties of Markov chains, various long-run characteristics of system per-
formance can be estimated, such as the reliability remaining after testing is
concluded. Additional metrics based on Markov chain theory are also pro-
duced. (See Poore and Trammell, 1999, for additional details on Markov
chain model-based usage testing.) A representation of a Markov chain
model is shown in Figure 3-2.
While the assumptions of conditional independence and time homo-
geneity are capable of validation, it is not crucial that these assumptions
obtain precisely for this methodology to be useful. It is very possible that
the conditional independence assumption may not hold; for instance, it
may be that the graphical model is at such a fine level of detail that move-
ment from prior nodes may provide some information about movement to
subsequent nodes. Also, it is possible that the time homogeneity assump-
tion may not hold; for example knowledge of the number of nodes visited
OCR for page 26
26
X(O: ~
O Z(0.1)
Y (0.5)
(I - X(O9) ~
Y (0.1 )\
INNOVATIONS IN SOFTWARE ENGINEERING
X(01) ~5) X(0.5)
\Z (0.2)
:~ /0-5)
~\3)
X (0.25)\
X(1)\ ~
Y(0.75)
FIGURE 3-2 Example of Markov chain usage model.
SOURCE: Workshop presentation by Stacy Prowell.
—Y(0.3)
~7)
/
X(1)
prior to the current node may increase the probability of subsequent term
nation. However, these two assumptions can often be made approximately
true by adjusting the graphical model in some way. Furthermore, even if
the time homogeneity and conditional independence assumptions are vio-
lated to a modest extent, it is quite likely that the resulting set of test inputs
will still be useful to support acquisition decisions (e.g., whether to release
the software).
The elements of the transition probability matrix can be validated
based on a comparison ofthe steady-state properties ofthe assumed Markov
chain and the steady-state properties associated with anticipated use. Test
plans can also be selected to satisfy various user-specified probabilistic goals,
e.g., testing until each node has been "visited" with a minimum probability.
The Markov chain model can also be used to estimate the expected cost
and time to test. If the demand on financial resources or time is too high,
the software system under test or its requirements may be modified or re-
structured so that testing is less expensive. Finally, the inputs can be strati-
fied to give a high probability of selecting those inputs and paths associated
with functionality where there is high risk to the user.
With defense systems, there are often catastrophic failure modes that
have to be addressed. If the product developers know the potential causes
OCR for page 27
TESTING METHODS AND RELATED ISSUES
27
of catastrophic failures and their relationship to either states or arcs of the
system, then those states or arcs can be chosen to be tested with certainty.
Indeed, as a model of use, one could construct the "death state," model the
ways to reach it, and then easily compute statistics related to that state and
the paths to it. (Markov chain usage model testing is often preceded by arc
coverage testing.)
Automated test execution, assuming that the software has been well
constructed, is often straightforward to apply. To carry out testing, the arcs
corresponding to the test paths, as represented in the graphical model, must
be set up to receive input test data. In addition, the tester must be able
both to control inputs into the system under test and to observe outputs
from the system. Analysis of the test results then requires the following:
(1) the tester must be able to decide whether the system passed or failed on
each test input, (2) since it is not always possible to observe all system
outputs, unobserved outputs must be accounted for in some way, and (3)
the test oracle needs to be able to determine whether the system passed or
failed in each case. These requisites are by no means straightforward to
obtain and often require a substantial amount of time to develop.
Stacy Prowell (University of Tennessee) described some of the tools
that have been developed to implement Markov chain usage models. The
Model Language (TML) supports definitions of models and related infor-
mation, and allows the user to develop hierarchical modeling, with subrou-
tines linked together as components of larger software systems; such link-
ing is useful as it supports reuse of component software. In addition,
lUMBL Java Usage Model Building Library) contains a number of tools
for analysis of Markov chain usage models, e.g., tools that compute statis-
tics that describe the functioning of a Markov chain, such as average test
length. This analysis is particularly useful for validating the model. lUMBL
also contains a number of tools that generate test cases in support of various
test objectives. Examples include the generation of tests constrained to
exercise every arc in the model, tests generated by rank order probability of
occurrence until a target total probability mass is reached, and custom-
designed tests that meet contractual requirements.
An interesting alternative use of Markov chain modeling by Avritzer
and Weyuker (1995) bases the transition probabilities on data collection.
Another feature in Avritzer and Weyuker is the deterministic, rather than
probabilistic, selection of the test suite, i.e., choosing those inputs of high-
est probability, thus ensuring that the most frequently traversed paths are
tested.
OCR for page 28
28
INNOVATIONS IN SOFTWARE ENGINEERING
Industrial Example of the Benefits of Markov Chain Usage Models
Markov chain usage models have been shown to provide important
benefits in a variety of industrial applications. Users that have experienced
success include IBM, Microsoft, U.S. Army Tank-Automative ancl Arma-
ments Commancl, computerized thermal imaging/positron emission to-
mography systems, the Fecleral Aviation Aclministration Tech Center,
Alcatel, Ericsson, Nortel, ancl Raytheon. Ron Manning reported at the
workshop on Raytheon's successful application of Markov chain usage test-
ing.
Raytheon's software development group, which has broad experience
in developing large software systems, until recently used structured analy-
sis/structurecl design methoclology. For testing, Raytheon used unit test-
ing, program testing, ancl subsystem testing. The unit testing emphasized
code coverage, ancl the program ancl subsystem testing used formal test
procedures based on requirements. Using this conventional approach to
testing, each software system eventually workocl, but many clefects were
found during software ancl system integration (even after 100 percent code
coverage at the unit level). The effectiveness of unit testing was marginal,
with many clefects discovered at higher levels of software integration. Re-
. . . . .
gresslon testing was attempter ~ using the unit testing out was too expensive
to complete. It was typical for approximately 6 clefects per 1,000 lines of
code to escape cletection.
The lack of success with conventional testing was a concern for
Raytheon, ancl the cost ancl time demands seemed excessive to support a
successful conventional testing system. Furthermore, a great clear of effort
was required for integration at each system level because of the uncliscov-
erecl clefects. To address these problems, cleanroom software methocls, in-
clucling Markov chain usage-basecl testing, were examined for possible use.
The results of applying Markov chain usage-basecl testing to eight major
software development projects were a greater than tenfold reduction in cle-
fects per 1,000 lines of cocle, software development costs within buclget,
ancl expedited system integration. For Raytheon's systems, it was found
that automated test oracles could be developed to facilitate automated test-
ing. The additional tools required were minimal, but careful staff training
was found to be vital for the success of this switch in testing regimen.
When this phase of the testing was completecl, there was still a role for
some conventional testing, but such testing was significantly expedited by
the reduction in the clefects in the software.
OCR for page 29
TESTING METHODS AND RELATED ISSUES
29
In its implementation of Markov chain usage-based testing, Raytheon's
approaches included three profiles: (1) normal usage, (2) a usage model
focused on error-prone modules, and (3) a usage model that explored hard-
ware errors. It also seemed evident that the graphical models must be kept
relatively simple, since the potential for growth of the state space could thus
be kept under control. In this specific area of application, the typical graphi-
cal model had 50 or fewer states. Finally, while development of an auto-
mated test oracle did represent a major expense, it could be justified by the
reductions in time and costs of development that resulted from the imple-
mentation of this methodology.
In addition to Markov chain usage-baseu testing, Raytheon initially
augmented its testing with purposive sets of test inputs to cover all nodes
and arcs of the graphical model for each system under test. This method
was also used to debug the test oracle. It was discovered that this purposive
testing was less necessary than might have been guessed, since relatively
small numbers of Markov chain-chosen paths achieved relatively high path
coverage. In the future, Raytheon will examine implementing usage-based
testing methods at higher levels of integration. lUMBL has tool support
for composing graphical models at higher levels of integration from
submodels, which will correspond to the integration of systems from com-
ponents. A poll of Raytheon software developers unanimously endorsed
use of Markov chain-based usage testing for their next software develop-
ment project.
. . . ~ .
AETG TESTING
AETG (see, e.g., Cohen et al., 1994, 1996; Dalal et al., 1998) is a
combinatorial design-based approach to the identification of inputs for soft-
ware testing. Consider an individual node of the graphical representation
of a software system described above. A node could be a graphics-user
interface in which the user is asked to supply several inputs to support some
action of the software system. Upon completion, very typically based on
the inputs, the software will then move to another node of the model of the
software system. For this example, the user may be asked to supply cat-
egorical information for, say, seven fields. If all combinations are feasible,
the possible number of separate collections of inputs for the seven fields
would be a product of the seven integers representing the number of pos-
sible values for each of the fields. For even relatively small numbers of
fields and values per field, this type of calculation can result in a large
OCR for page 30
30
INNOVATIONS IN SOFTWARE ENGINEERING
number of possible inputs that one might wish to test. For example, for
seven dichotomous input fields, there are potentially 128 (27) test cases.
With 13 input fields, with three choices per field, there are 1.6 million
possible test cases. For most real applications, the number of fields can be
much larger, with variable numbers of inputs per field.
Cohen et al. (1994) provide an example of the provisioning of a tele-
communications system where a particular set of inputs consisted of 74
fields, each with many possible values, which resulted in many billions of
possible test cases. Furthermore, there are often restricted choices due to
constraints for inputs based on other input values, which further compli-
cates the selection of test cases. For example, for a credit card-based trans-
action one needs to supply the appropriate credit card category (e.g.,
Mastercard, Visa, etc.) and a valid card number, while for a cash transaction
those inputs have to be null. This complicates the selection of test sce-
narios. This issue is particularly critical since almost two-thirds of code is
typically related to constraints in stopping invalid inputs. For this reason,
test cases, besides testing valid inputs, also need to test invalid inputs.
As argued at the workshop by Ashish lain of Telcordia Technologies,
rather than test all combinations of valid and invalid inputs, which would
often be prohibitively time-consuming, AETG instead identifies a small set
of test inputs that has the following property: for each given combination
of valid values for any k of the input fields, there is at least one input in the
test set that includes this combination of values. In practice, k is often as
small as 2 or 3. For example, in the pairwise case, for input fields Iand I,
there will exist in the set of test inputs at least one input that has value i for
field Iand value j for field I, for every possible combination of i and j and
every possible combination of two fields Iand ~ For invalid values, a more
complicated strategy is utilized.
A key empirical finding underlying this methodology is that in many
applications it is the case that the large majority of software errors are ex-
pressed through the simultaneous use of only two or three input values.
For example, a detailed root cause analysis of field trouble reports for a
large Telcordia Technologies operation support system demonstrated that
most field defects were caused bypairwise interactions of input fields. Nine
system-tested input screens of a Telcordia inventory system were retested
using AETG-generated test cases, and 49 new defects were discovered
through use of an all-pairwise input fields test plan. Given this empirical
evidence, users typically set k equal to 2 or 3.
To demonstrate the gains that are theoretically possible, for the situa-
OCR for page 31
TESTING METHODS AND RELATED ISSUES
31
tion of 126 dichotomous fields (a total of approximately 1038 paths), AETG
identified a set of only 10 test cases that included all pairwise sets of inputs.
More generally, with k fields, each with Ipossible values, AETG finds a set
of inputs that has approximately 12 logy) members. The set of inputs
identified by the AETG method has been shown, in practice, to have good
code coverage properties.
Industrial Examples of the Benefits of AETG Testing
Two applications of AETG were described at the workshop. The first,
presented by Manish Rathi of Telcordia Technologies, was its application to
the testing of an airplane executing an aileron roll (which could be input
into either an operational test or the test of an airplane simulator). The
objectives of the test were to: (1) assess the pilot's ability to respond to
various disturbances produced by the aileron roll, (2) detect unwanted side-
slip excursions and the pilot's ability to compensate, and (3) test inertial
coupling. Key inputs that affect the "rolling" characteristics of an airplane
are the air speed, the Mach number, the altitude, and the position of the
flaps and landing gear. (Various additional factors such as air temperature
also affect the rolling characteristics but they were purposely ignored in this
analysis.) For each of these four inputs, a finite number of possible values
were identified for testing purposes. (All possible combinations testing was
not feasible for two reasons: first, it would represent too large a set of test
events to carry out; second, there were additional constraints on the inputs,
prohibiting use of some combinations of input values.) Even given the
constraints, there were more than 2,000 possible legal combinations of test
values, i.e., more than 2,000 possible test flights. AETG was therefore used
to identify a small number of test events that included inputs containing all
pairwise combinations of input values for pairs of input fields (while ob-
serving the various constraints). AETG discovered 70 test flights that in-
cluded tests with all possible combinations of pairwise input values for
pairs of input fields, a reduction of more than 96 percent, which is ex-
tremely important given the cost of a single test flight.
The second application, described at the workshop by ferry Huller of
Raytheon, was for the software used to guide a Raytheon satellite control
center combined with a telemetry, command, and ranging site, which are
both used to communicate with orbiting satellites. The system contains a
great deal of redundancy to enhance its overall reliability. A primary test
problem is that, given the redundancy, there are many combinations of
OCR for page 32
32
INNOVATIONS IN SOFTWARE ENGINEERING
equipment units that might be used along possible signal paths from the
ground system operator to the satellite ancl back, ancl it is necessary to
demonstrate that typical satellite operations can be performed with any of
the possible paths using various combinations of equipment units. Ex-
haustive testing of all combinations of signal paths was not practical in a
commercial setting. An efficient way of generating a small number of test
cases was needed that provided good coverage of the many possible signal
paths. AETG generated test cases that covered all pairwise combinations of
test inputs, ancl it also handled restrictions on allowable input combina-
tions. In this application, there were 144 potential test cases of which
AETG identified 12 for testing, representing a 92 percent reduction in
,, . . . . . . . ..
testing. laying into conslc aeration some ac ( Tonal complications not men-
tionecl here, the AETG strategy provided an overall 68 percent savings in
test duration ancl a 67 percent savings in test labor costs.
INTEGRATION OF AETG AND MARKOV
CHAIN USAGE MODELS
For a graphical model with limited choices at each node ancl for a
relatively finite number of nocles, Markov chain usage model testing is a
methodology that provides a rich set of information to the software tester
along with an efficient method for selecting test inputs. However, the
derivation of test oracles ancl user profiles can be complicatecl, especially for
graphical models that have a large number of nodes ancl for nodes that have
a large number of arcs clue to many input fields ancl values per fielcl. One
possibility for these graphical models is to substitute, for the set of all pos-
sible transitions, just those transitions that are selected by AETG. This
would reduce the number of separate nodes ancl arcs neeclecl. Therefore,
AETG would be used to reduce the number of paths through a software
system ancl the Markov chain usage model would provide a probabilistic
structure only for the AETG selections. Another way to combine these
methods is where the usage models would handle the transitions from one
node to another ancl AETG would determine the possible choices for user
inputs at each node as they are encountered in the graphical moclel. In
other worcls, AETG could operate either at the level of the entire graphical
model or at the level of individual nocles. Other approaches may also be
possible. This is an area in which further research would likely provide
substantial benefits.
OCR for page 33
TESTING METHODS AND RELATED ISSUES
TEST AUTOMATION
33
Clearly, without test automation, any testing methodology, including
model-based testing methods, will be of limited use. All of the steady-state
validation benefits and estimates of the costs and number of replications
needed for Markov chain usage model testing are predicated on the ability
to carry out a relatively large number of tests. Reliability projections that
can be computed based on usage models reveal that the number of tests
needed to demonstrate reliability of 99 percent or more are almost always
beyond the budget for most complicated systems. Therefore, since test
automation is a virtual necessity for modern software testing, the topic was
examined at the workshop.
Mike Houghtaling of IBM presented a number of automation tools
that the company has used in managing its tape systems and libraries. The
company uses Markov chain usage models comprising roughly 2,000 states,
which is difficult to represent on paper. A software tool called TeamWork
is therefore used to provide a graphical capability for constructing the col-
lection of state transition diagrams. There is also a tool for the implemen-
tation of the test oracles. For selection of test cases through random sam-
pling and for composing the test results and generating the summary
measures, ToolSet_Certify is used. ToolSet_SRE is used to break up the
model into subsystems for focused sampling and testing and to reaggregate
the subsystems for full system analysis. To manipulate the device drivers
for the test execution, CBase is used. In addition, CORBA middleware (in
particular ORBLink) is used to increase automation, and QuickSilver is
used as a test visualizer to help depict results in a graphical environment.
Houghtaling noted several complicating factors with automation.
First, three concurrent oracle process strategies need to be supported: (1)
postoracle analysis, (2) concurrent oracle analysis, and (3) preoracle certifi-
cate generation. Second, several similar activities need to be separately
addressed: (1) failure identification versus defect diagnosis, since the cause
of the failure may have happened many events prior to the evident failure;
(2) usage and model abstractions versus implementation knowledge, be-
cause implementers sometimes make assumptions about uses or failures
that are inconsistent with new uses of systems (for example, some years ago
data began to flow over telephone systems); and (3) test automation com-
ponent design for interoperability versus analysis of the functioning of the
complete system. (Some test automation equipment is designed primarily
to watch the interface between two components so that errors are not made
OCR for page 34
34
INNOVATIONS IN SOFTWARE ENGINEERING
in the interface itself, but there may not be automated support for testing
the full system.) Besides these, there is an obvious challenge in restoring
the environment and preconditions before the tests can be run.
The separation of failure identification and defect diagnosis is impor-
tant since fault diagnosis requires in-depth knowledge of the system imple-
mentation. Test teams, especially black box-oriented and system-level test
teams, might not possess this knowledge. Test automation tools should
also be composed from the same perspective. Tools that are directed to-
ward usage certifications and that are consequently failure profile-oriented
should not be encumbered with diagnostic-oriented artifacts. The ability
to leverage the information collected during failure identification, such as
automatically replaying the usage scenario in an execution environment
that is configured with diagnostic tools, is beneficial, but should not sub-
. · .
vert tne usage testing environment.
In addition, designers and implementers of test automation tools em-
ploying high-level usage models need to be aware of the gap between the
usage model vocabulary and the more concrete implementation vocabulary
that is meaningful to the developers of the system. Attempts to document
justifications for claims about test case failures will need to bridge the gap.
Finally, with respect to test automation component design for
interoperability, a cost-effective test automation environment will need to
be based on industry standards and possess the capability of collaborating
within a framework supporting many of the aspects associated with devel-
opment and test processes (planning, domain modeling, test selections,
test executions and evaluations, system assessments, and test progress
management).
METHODS FOR TESTING INTEROPERABILITY
It is typical for software-intensive systems to comprise a number of
separate software components that are used interactively, known as a sys-
tem of systems. Some of these component systems may be commercial-off-
the-shelf (COTS) systems, while others may be developed in-house. Each
of these software components is subject to separate update schedules, and
each time a component system is modified there is an opportunity for the
overall software system to fail due to difficulties the modified component
system may have in interacting with the remaining components. Difficul-
ties in interactions between components are referred to as interoperability
OCR for page 35
TESTING METHODS AND RELATED ISSUES
35
failures. (The general problem involves interaction with hardware as well
as software components, but the focus here is on software interactions.) A
session at the workshop addressed this critical issue given its importance in
Department of Defense applications; the presentations examined general
tools to avoid interoperability problems in system development, and how
one might test the resulting system of systems for defects.
Amjad Umar of Telcordia Technologies discussed the general inter-
operability problem and tools for its solution, which he referred to as inte-
gration technologies. Consider as an example an e-business activity made
up of several component applications. Customer purchases from this e-
business involve anywhere from a dozen to hundreds of applications, which
need to interoperate smoothly to complete each transaction. Such
interoperability is a challenge because transactions may involve some sys-
tems that are very new as well as some that are more than 20 years old. In
addition to being of different vintages, these components may come from
different suppliers and may have either poor or nonexistent documenta-
tion. Integration is typically needed at several levels, and if the systems are
not well integrated, the result can be an increase in transaction errors and
service time. On the other hand, if the systems are well integrated, human
efforts in completing the transaction will be minimized. The overall chal-
lenge then is to determine good procedures for integrating the entire sys-
tem, to set the requirements to test against, and to test the resulting system.
A considerable amount of jargon is associated with this area. A glos-
sary is included in Appendix B that defines some of the more common
terms used in this context.
Amjad Umar's presentation provided a general framework for the wide
variety of approaches and techniques that address interoperability prob-
lems. This is possible because the number of distinct concepts in this area is
much more finite than the jumble of acronyms might suggest. The overall
idea is to develop an "integration bus" with various adapters so that differ-
ent applications can plug into this bus to create a smoothly interactive
system.
Integration Technologies
To start, integration is needed at the following levels: (1) cross-enter-
prise applications, (2) internal process management, (3) information trans-
formation, (4) application connectivity, and (5) network connectivity. So-
lution technologies that can be implemented at these various layers include,
OCR for page 36
36
INNOVATIONS IN SOFTWARE ENGINEERING
from high to low: (1) business-to-business integration platforms, (2) enter-
prise application integration platforms, (3) converters, (4) middleware and
adapters, and (5) network transport.
With respect to low-level integration between two systems, intercon-
nection technologies might work as a mediator between the order process-
ing, provided by a Web browser or lava apples, and the inventory, managed
by user interfaces, application code, and a data source. Interconnection
technologies may involve a remote user connector, a remote method con-
nector, and a remote data connector. At midlevel integration, there are
object wrappers (e.g., CORBA) that function in combination with screen
scrapers, function gateways, and data gateways (e.g., ODBC). At high-
level integration, but within an enterprise, software tools such as adapters
provide a smooth integration between order processing and the inventory
system. Finally, at very high-level integration, again tools such as several
kinds of adapters smooth the integration between inventory systems and
order processing systems and trading hubs across firewalls for external orga-
. .
nlzatlons.
General platforms that attempt to minimize integration problems are
available to oversee the process; leading examples include Sun's WEE plat-
form, IBM's e-business framework, and Microsoft's .NET. The choice of
integration technology to implement depends on the number of applica-
tions, the flexibility of requirements, the accessibility of applications, and
the degree of organizational control.
Clearly, integration raises persistent challenges. It is popular because it
permits use of existing software tools. On the other hand, it is problematic
because it increases the workload on existing applications, may not be good
for long-range needs, and creates difficult testing challenges (however, see
below for a possible approach to testing systems of systems). Many tools
are being developed to address these problems, but they are specific to
certain areas of application, and whether they will be portable to the de-
fense environment is an important question to address. Also, it is impor-
tant to note that addressing interoperability problems may sometimes not
be cost-effective compared to scrapping an old system and designing all
system components from scratch.
Additional information on integration technologies can be found in
Lithicum (2001) and Umar (2002, 20031. Information can also be found
at the following Web sites: (1) www.vitria.com, (2) www.webmethods.com,
and (3) www.tibco.com.
OCR for page 37
TESTING METHODS AND RELATED ISSUES
Testing a System of Systems for Interoperability
37
Consider a system that is a mixture of legacy and new software compo-
nents and that addresses interoperability through use of middleware tech-
nology (e.g., MOM, CORBA, or XML). Assume further that the applica-
tion is delivered using Web services, and so is composed of services from
components across the Internet or other secured networks. As an example,
consider a survivable, mobile, ad hoc tactical network with a variety of
clients, namely infantry and manned vehicles, that need to obtain informa-
tion from a command and information center, which is composed of vari-
ous types of servers. These are often high-volume and long-distance infor-
mation transactions. The distributed services are based on a COTS system
of systems with multiple providers. These systems themselves evolve over
time, producing a substantial amount of code and functionality churn.
The corresponding system can thus get very complex.
In an example described at the workshop by Siddhartha Dalal, for a
procurement service created on a system consisting of many component
systems, there was a procurer side, a provider side, and a gateway enabling
more than 300,000 transactions per month with 10,000 constantly chang-
ing rules governing the transactions. There were additions and deletions of
roughly 30 rules a day, and there was on average one new software compo-
nent release per day. As a result of these dynamics, 4,000 errors in process-
ing occurred in 6 months, but, using traditional testing methods, only 47
percent of the defects ultimately discovered were removed during testing.
The problem does not have an easy solution. In interviews with a
number of chief information officers responsible for Web-based services,
the officers stated that service failures reported by customers had the fol-
lowing distribution of causes:
(1) lack of availability (64 percent)
(2) software bugs (55 percent)
(3) bad or invalid data (47 percent)
(4) software upgrade failure (46 percent)
(5) incorrect, unapproved, or illegal content (15 percent)
(6) other (5 percent)
(Some of these causes can occur simultaneously, thus the sum of the per-
centages is greater than 100 percent.)
OCR for page 38
38
INNOVATIONS IN SOFTWARE ENGINEERING
In moving from software products to component-based services, the
traditional preproduction product-based testing fails because it is not fo-
cused on the most typical sources of error from a service perspective. When
testing a system of systems, it is critical to understand that traditional test-
ing does not work for two reasons: first, the owner of a system of systems
does not have control over all the component systems; second, the compo-
nent systems or the rules may change from one day to the next.
While product testing usually only needs to implement design for test-
ability with predictable churn, in the application of Web-based services one
needs a completely different testing procedure designed for continuous
monitorability, with a service assurance focus, to account for unpredictable
churn.
With this new situation, there are usually service-level agreements con-
centrating on availability and performance based on end-to-end transac-
tions. To enforce the service agreements, one needs constant monitoring of
the deployed system (i.e., postproduction monitoring) with test transac-
tions of various types. However, one cannot send too many transactions to
test and monitor the system as they may degrade the performance of the
system.
One approach for postproduction monitoring and testing was pro-
posed by Siddhartha Dalal of Telcordia Technologies (see Dalal et al., 20021.
In this approach, a number of end-to-end transactions are initially cap-
tured, and then a number of synthetic end-to-end transactions are gener-
ated. The idea is to generate a minimal number of user-level synthetic
transactions that are very sensitive for detecting functional errors and that
provide 100 percent pairwise functional coverage at the node level. This
generation of synthetic end-to-end transactions can be accomplished
through use of AETG.
To see this, consider an XML document with 50 tags.5 Assuming only
two values per tag, this could produce 25° possible test cases. The
AutoVigilance system, which was created by Dalal and his colleagues at
Telcordia, produces nine test cases from AETG that cover all pairwise
choices of tags in this case. These synthetic transactions are then sent
through probes, with the results analyzed automatically and problems
proactively reported using alerts. Finally, possible subsystems causing prob-
lems are identified using an automated root cause analysis. This entire
5A tag in XML identifies and delimits a text field.
OCR for page 39
TESTING METHODS AND RELATED ISSUES
39
process can be automated and is extremely efficient for nonstop monitor-
ing and testing as well as regression testing when needed. The hope, by
implementing this testing, is to find problems before end-users do.
Based on the various methods discussed during this session, the panel
concluded that model-based testing offers important potential for improv-
ing the software testing methods currently utilized by the service test agen-
cies. Two specific methods, Markov chain usage-based testing and AETG,
were described in detail, and their utility in defense or defense-type appli-
cations was exhibited. There are other related approaches, some of which
were also briefly mentioned above. The specific advantages and disadvan-
tages from widespread implementation of these relatively new methods for
defense software systems needs to be determined. Therefore, demonstra-
tion projects should be carried out.
Test automation is extremely important to support the broad utility of
model-based methods, and a description of automation tools for one spe-
cific application suggests that such tools can be developed for other areas of
. .
app. .lcatlon.
Finally, interoperability problems are becoming a serious hurdle for
defense software systems that are structured as systems of systems. These
problems have been addressed in industry, and various tools are now in use
to help overcome them. In addition, an application of AETG was shown
to be useful in discovering interoperability problems.
Representative terms from entire chapter:
chain usage