Read "Innovations in Software Engineering for Defense Systems" at NAP.edu

« Previous: 2 Requirements and Software Architectural Analysis

Page 20 Cite

Suggested Citation:"3 Testing Methods and Related Issues." National Research Council. 2003. Innovations in Software Engineering for Defense Systems. Washington, DC: The National Academies Press. doi: 10.17226/10809.

Page 21 Cite

Page 22 Cite

Page 23 Cite

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Page 34 Cite

Page 35 Cite

Page 36 Cite

Page 37 Cite

Page 38 Cite

Page 39 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 Testing Methods and Related Issues Testing of software in development has two primary purposes. First, testing methods are used to assess the quality of a software system. Second, while practitioners stress that it is not a good idea to try to "test quality into" a deficient software system since some defects are likely unavoidable, testing is very important for discovering those that have oc- curred during software development. So for both verifying the quality of software and identifying defects in code, testing is a vital component of software engineering. But software testing is expensive, typically requiring from one-fifth to one-third of the total development budget (see Humphrey, 1989), and can be time-consuming. As a result, methods that would improve the quality and productivity of testing are very important to identify and implement. Because testing is an area in which statistically oriented methods are having an important impact on industrial software engineering, the workshop included a session on those methods. INTRODUCTION TO MODEL-BASED TESTING Two of the leading statistically oriented methods for the selection of inputs for software testing are forms of model-based testing, and therefore it is useful to provide a quick overview of this general approach. Prior to the development of model-based testing, the general ap- proaches used for software testing included manual testing, scripted auto- mation, use of random inputs, finite state models, and production gram- 20

TESTING METHODS AND RELATED ISSUES 21 mar models. Each of these approaches has serious deficiencies. For ex- ample, manual testing and scripted automation are extremely time-con- suming since they require frequent updating (if they are not updated, they are likely to catch only defects that have already been identified). Random input models are very wasteful in that most of the inputs fail to "make sense" to the software and are immediately rejected. Finite state models, which operate at the same level of detail as the code, need to be extremely detailed in order to support effective testing. The idea behind model-based testing is to represent the functioning of a software system using a model composed of user scenarios. Models can be implemented by a variety of structured descriptions, for example, as various tours through a graph. From an analytical and visual perspective it is often beneficial to represent the model as a structured tree or a graph. The graph created for this purpose uses nodes to represent observable, user- relevant states (e.g., the performance of a computation or the opening of a file). The arcs between a set of nodes represent the result of user-supplied actions or inputs (e.g., user-supplied answers to yes/no questions or other key strokes, or mouse clicks in various regions of the desktop) that corre- spond to the functioning of the software in proceeding from one state of use to another. (The graph produced for this purpose is at a much higher level and not code based, as opposed to that for a finite state model.11 It should be pointed out before proceeding that even if every path were tested and no defects found, this would not guarantee that the software system was defect-free since there is a many-to-one relationship of inputs to paths through the graph. Furthermore, even if there are no logical errors in the software, a software system can fail due to a number of environmental factors such as problems with an operating system and erroneous inputs. Thus, to be comprehensive, testing needs to incorporate scenarios that in- volve these environmental factors. To identify defects, one could consider testing every path through the graph. For all but the simplest graphical models, however, a test of every path would be prohibitively time-consuming. One feasible alternative is to efficiently choose test inputs with associated graphical paths that collec- tively contain every arc between nodes in the graphical model. There are algorithms that carry out this test plan; random walks through the graph 1One side benefit of developing a graphical model of the functioning of a software system is that it helps to identify ambiguities in the specifications.

22 INNOVATIONS IN SOFTWARE ENGINEERING are used, and more complicated alternatives are also available (see Gross and Yellen, 1998, for details). It is important to point out that the graph produced for this purpose is assumed to be correct, since one cannot iden- tify missing functionality, such as missing nodes or edges, using these tech- niques. The graphical representation of the software can be accomplished at varying levels of detail to focus attention on components of the system that are in need of more (or less) intensive testing. (For more information on model-based testing, see Rosaria and Robinson, 2000.) Test oracles,2 in this context, are separate programs that take small sections of the sequence of user-supplied inputs and provide output that represents the proper functioning of a software system as defined by the system's requirements. (In other contexts, other test oracle architectures and configurations are possible, e.g., architectures where individual oracles cooperate to constitute an oracle for the whole. See Oshana, 1999, for details.) When they exist, test oracles are placed at each node of the graph to permit comparison of the functioning of the current system with what the output of the correct system would be. Through use of an oracle, discrepancies between the current state of the system and the correct sys- tem at various nodes of the graph can be identified.3 Graphical models are initially developed at a very early stage in system development, during the identification of system requirements. A number of tools (Rational, Visio) and language constructs (Unified Modeling Lan- guage) are available for creating graphical models, which, once created, can be modified based on issues raised during system development. One can also develop a procedure for the generation of test cases based on the model and then update the procedure as the system and the graphical model ma- ture. Many users have found that the graphical model is a useful form of documentation of the system and provides a useful summary of its features. An illustrative example of a graphical model for a long-distance tele- ohone billing system is Provided in Figure 3-1. which has two parts: (a) a A-- ------a -I------ -- A-- -- --- --arm ~ model of the general process and (b) a detailed submodel of the primary component. This illustrative example suggests the hierarchical nature of these models in practice. 2Software cost reduction (SCR) can be used to produce test oracles. 3Since model-based testing is not a direct examination of the software code, it is consid- ered a type of "black box" testing.

TESTING METHODS AND RELATED ISSUES 23 A. Model: Long Distance Platform | Session | I begins I | Receive ~ inbound call | Determine billing B. Model: Play Billing Prompt Play / billing prompt \ ,,~3 Play billing ~ prompt i~3 - | Float call | ~ ~3\ - | Credit card | Collect I Session end I | Release call | ~ | Enter card number | Billing data verified I No input required FIGURE 3-1 Example of a model used in model-based testing of long-distance tele- phone flows: (a) high-level model and (b) detailed submodel of billing option prompts. SOURCE: Adapted from Apfelbaum and Doyle (1997~. Major benefits of model-based testing are early and efficient defect detection, automatic generation (very often) of test suites, and flexibility and coverage of test suites. Another primary benefit of model-based testing is that when changes are made to the system, typically only minor changes are needed to the model and thus test scenarios relative to the new system can be generated quickly. In addition, since the graphical model can be modified along with the software system, model-based testing works smoothly in concert with spiral software development. On the downside,

24 INNOVATIONS IN SOFTWARE ENGINEERING testers need to develop additional skills, and development of the model represents a substantial up-front investment. Testing, which requires substantial time and resources, is very costly, both for the service test agencies and, much more importantly, for the soft- ware developers themselves, and so it is essential that it be done efficiently. DoD software systems in particular are generally required to be highly de- pendable and always available, which is another reason that testing must be highly effective in this area of application. The testing procedure often used for defense systems is manual testing with custom-designed test cases. DoD also contracts for complex large, custom-built systems and demands high reliability of their software under severe cost pressures. Cost pressures, short release cycles, and manual test generation have the potential for nega- tively affecting system reliability in the field. A recent dynamic in defense acquisition is the greater use of evolution- ary procurement or spiral development of software programs. Particularly with these types of procurement, it is extremely efficient for testing to be integrated into the development process so that one has a working (sub)system throughout the various stages of system development and so that one can adjust the model to specifically test those components added at each stage. Model-based software testing has the potential for assuring clients that software will function properly in the field and can be used for verification prior to release. The workshop presented two methods among many for model-based testing that have been shown to be effective in a wide variety of industrial applications. The first example is Markov chain usage model- ing, and the second is automatic efficient test generation (AETG), often referred to as combinatorial design.4 MARKOV CHAIN USAGE MODELS Markov chain usage modeling was described at the workshop by Jesse Poore of the University of Tennessee (for more details, see Whittaker and Poore, 1993, and Whittaker and Thomason, 19941. Markov chain usage models begin from the graphical model of a software program described 4We note that SCR can also be extended to automatic test set generation, albeit in a very different form than discussed here. A lot of techniques used in protocol testing also use model-based finite state machines (Lee and Yanakakis, 1992).

TESTING METHODS AND RELATED ISSUES 25 above. On top of this graphical model, a Markov chain probabilistic struc- ture is associated with various user-supplied actions shown as arcs in the graphical model that result in transitions from one node to the nodes that are linked to it. Given the Markovian assumption, the probabilities attached to the various transitions from node to node are assumed to be (1) independent of the path taken to arrive at the given node and (2) unchang- ing in time. These (conditional) probabilities indicate which transitions from a given node are more or less likely based on the actions of a given type of user. Importantly, these transition probabilities can be used in a simulation to select subsequent arcs, proceeding from one node to another, resulting in a path through the graphical model. Testing Process The basics of the Markov chain usage model testing process are as follows. There is a population of possible paths from the initial state to the termination statefs) of a program. The Markov chain usage model ran- domly samples paths from this population using the transition probabili- ties, which are typically obtained in one of three ways: (1) elicited from experts, (2) based on field data recorded by instrumented systems, or (3) resolved from a system of constraints. By selecting the test inputs in any of these three ways, the paths that are more frequently utilized by a user are chosen for testing with higher probability. Defects associated with the more frequent-use scenarios are thus more likely to be discovered and eliminated. An important benefit of this testing is that, based on the well-understood properties of Markov chains, various long-run characteristics of system per- formance can be estimated, such as the reliability remaining after testing is concluded. Additional metrics based on Markov chain theory are also pro- duced. (See Poore and Trammell, 1999, for additional details on Markov chain model-based usage testing.) A representation of a Markov chain model is shown in Figure 3-2. While the assumptions of conditional independence and time homo- geneity are capable of validation, it is not crucial that these assumptions obtain precisely for this methodology to be useful. It is very possible that the conditional independence assumption may not hold; for instance, it may be that the graphical model is at such a fine level of detail that move- ment from prior nodes may provide some information about movement to subsequent nodes. Also, it is possible that the time homogeneity assump- tion may not hold; for example knowledge of the number of nodes visited

26 X(O: ~ O Z(0.1) Y (0.5) (I - X(O9) ~ Y (0.1 )\ INNOVATIONS IN SOFTWARE ENGINEERING X(01) ~5) X(0.5) \Z (0.2) :~ /0-5) ~\3) X (0.25)\ X(1)\ ~ Y(0.75) FIGURE 3-2 Example of Markov chain usage model. SOURCE: Workshop presentation by Stacy Prowell. Y(0.3) ~7) / X(1) prior to the current node may increase the probability of subsequent term nation. However, these two assumptions can often be made approximately true by adjusting the graphical model in some way. Furthermore, even if the time homogeneity and conditional independence assumptions are vio- lated to a modest extent, it is quite likely that the resulting set of test inputs will still be useful to support acquisition decisions (e.g., whether to release the software). The elements of the transition probability matrix can be validated based on a comparison ofthe steady-state properties ofthe assumed Markov chain and the steady-state properties associated with anticipated use. Test plans can also be selected to satisfy various user-specified probabilistic goals, e.g., testing until each node has been "visited" with a minimum probability. The Markov chain model can also be used to estimate the expected cost and time to test. If the demand on financial resources or time is too high, the software system under test or its requirements may be modified or re- structured so that testing is less expensive. Finally, the inputs can be strati- fied to give a high probability of selecting those inputs and paths associated with functionality where there is high risk to the user. With defense systems, there are often catastrophic failure modes that have to be addressed. If the product developers know the potential causes

TESTING METHODS AND RELATED ISSUES 27 of catastrophic failures and their relationship to either states or arcs of the system, then those states or arcs can be chosen to be tested with certainty. Indeed, as a model of use, one could construct the "death state," model the ways to reach it, and then easily compute statistics related to that state and the paths to it. (Markov chain usage model testing is often preceded by arc coverage testing.) Automated test execution, assuming that the software has been well constructed, is often straightforward to apply. To carry out testing, the arcs corresponding to the test paths, as represented in the graphical model, must be set up to receive input test data. In addition, the tester must be able both to control inputs into the system under test and to observe outputs from the system. Analysis of the test results then requires the following: (1) the tester must be able to decide whether the system passed or failed on each test input, (2) since it is not always possible to observe all system outputs, unobserved outputs must be accounted for in some way, and (3) the test oracle needs to be able to determine whether the system passed or failed in each case. These requisites are by no means straightforward to obtain and often require a substantial amount of time to develop. Stacy Prowell (University of Tennessee) described some of the tools that have been developed to implement Markov chain usage models. The Model Language (TML) supports definitions of models and related infor- mation, and allows the user to develop hierarchical modeling, with subrou- tines linked together as components of larger software systems; such link- ing is useful as it supports reuse of component software. In addition, lUMBL Java Usage Model Building Library) contains a number of tools for analysis of Markov chain usage models, e.g., tools that compute statis- tics that describe the functioning of a Markov chain, such as average test length. This analysis is particularly useful for validating the model. lUMBL also contains a number of tools that generate test cases in support of various test objectives. Examples include the generation of tests constrained to exercise every arc in the model, tests generated by rank order probability of occurrence until a target total probability mass is reached, and custom- designed tests that meet contractual requirements. An interesting alternative use of Markov chain modeling by Avritzer and Weyuker (1995) bases the transition probabilities on data collection. Another feature in Avritzer and Weyuker is the deterministic, rather than probabilistic, selection of the test suite, i.e., choosing those inputs of high- est probability, thus ensuring that the most frequently traversed paths are tested.

28 INNOVATIONS IN SOFTWARE ENGINEERING Industrial Example of the Benefits of Markov Chain Usage Models Markov chain usage models have been shown to provide important benefits in a variety of industrial applications. Users that have experienced success include IBM, Microsoft, U.S. Army Tank-Automative ancl Arma- ments Commancl, computerized thermal imaging/positron emission to- mography systems, the Fecleral Aviation Aclministration Tech Center, Alcatel, Ericsson, Nortel, ancl Raytheon. Ron Manning reported at the workshop on Raytheon's successful application of Markov chain usage test- ing. Raytheon's software development group, which has broad experience in developing large software systems, until recently used structured analy- sis/structurecl design methoclology. For testing, Raytheon used unit test- ing, program testing, ancl subsystem testing. The unit testing emphasized code coverage, ancl the program ancl subsystem testing used formal test procedures based on requirements. Using this conventional approach to testing, each software system eventually workocl, but many clefects were found during software ancl system integration (even after 100 percent code coverage at the unit level). The effectiveness of unit testing was marginal, with many clefects discovered at higher levels of software integration. Re- . . . . . gresslon testing was attempter ~ using the unit testing out was too expensive to complete. It was typical for approximately 6 clefects per 1,000 lines of code to escape cletection. The lack of success with conventional testing was a concern for Raytheon, ancl the cost ancl time demands seemed excessive to support a successful conventional testing system. Furthermore, a great clear of effort was required for integration at each system level because of the uncliscov- erecl clefects. To address these problems, cleanroom software methocls, in- clucling Markov chain usage-basecl testing, were examined for possible use. The results of applying Markov chain usage-basecl testing to eight major software development projects were a greater than tenfold reduction in cle- fects per 1,000 lines of cocle, software development costs within buclget, ancl expedited system integration. For Raytheon's systems, it was found that automated test oracles could be developed to facilitate automated test- ing. The additional tools required were minimal, but careful staff training was found to be vital for the success of this switch in testing regimen. When this phase of the testing was completecl, there was still a role for some conventional testing, but such testing was significantly expedited by the reduction in the clefects in the software.

TESTING METHODS AND RELATED ISSUES 29 In its implementation of Markov chain usage-based testing, Raytheon's approaches included three profiles: (1) normal usage, (2) a usage model focused on error-prone modules, and (3) a usage model that explored hard- ware errors. It also seemed evident that the graphical models must be kept relatively simple, since the potential for growth of the state space could thus be kept under control. In this specific area of application, the typical graphi- cal model had 50 or fewer states. Finally, while development of an auto- mated test oracle did represent a major expense, it could be justified by the reductions in time and costs of development that resulted from the imple- mentation of this methodology. In addition to Markov chain usage-baseu testing, Raytheon initially augmented its testing with purposive sets of test inputs to cover all nodes and arcs of the graphical model for each system under test. This method was also used to debug the test oracle. It was discovered that this purposive testing was less necessary than might have been guessed, since relatively small numbers of Markov chain-chosen paths achieved relatively high path coverage. In the future, Raytheon will examine implementing usage-based testing methods at higher levels of integration. lUMBL has tool support for composing graphical models at higher levels of integration from submodels, which will correspond to the integration of systems from com- ponents. A poll of Raytheon software developers unanimously endorsed use of Markov chain-based usage testing for their next software develop- ment project. . . . ~ . AETG TESTING AETG (see, e.g., Cohen et al., 1994, 1996; Dalal et al., 1998) is a combinatorial design-based approach to the identification of inputs for soft- ware testing. Consider an individual node of the graphical representation of a software system described above. A node could be a graphics-user interface in which the user is asked to supply several inputs to support some action of the software system. Upon completion, very typically based on the inputs, the software will then move to another node of the model of the software system. For this example, the user may be asked to supply cat- egorical information for, say, seven fields. If all combinations are feasible, the possible number of separate collections of inputs for the seven fields would be a product of the seven integers representing the number of pos- sible values for each of the fields. For even relatively small numbers of fields and values per field, this type of calculation can result in a large

30 INNOVATIONS IN SOFTWARE ENGINEERING number of possible inputs that one might wish to test. For example, for seven dichotomous input fields, there are potentially 128 (27) test cases. With 13 input fields, with three choices per field, there are 1.6 million possible test cases. For most real applications, the number of fields can be much larger, with variable numbers of inputs per field. Cohen et al. (1994) provide an example of the provisioning of a tele- communications system where a particular set of inputs consisted of 74 fields, each with many possible values, which resulted in many billions of possible test cases. Furthermore, there are often restricted choices due to constraints for inputs based on other input values, which further compli- cates the selection of test cases. For example, for a credit card-based trans- action one needs to supply the appropriate credit card category (e.g., Mastercard, Visa, etc.) and a valid card number, while for a cash transaction those inputs have to be null. This complicates the selection of test sce- narios. This issue is particularly critical since almost two-thirds of code is typically related to constraints in stopping invalid inputs. For this reason, test cases, besides testing valid inputs, also need to test invalid inputs. As argued at the workshop by Ashish lain of Telcordia Technologies, rather than test all combinations of valid and invalid inputs, which would often be prohibitively time-consuming, AETG instead identifies a small set of test inputs that has the following property: for each given combination of valid values for any k of the input fields, there is at least one input in the test set that includes this combination of values. In practice, k is often as small as 2 or 3. For example, in the pairwise case, for input fields Iand I, there will exist in the set of test inputs at least one input that has value i for field Iand value j for field I, for every possible combination of i and j and every possible combination of two fields Iand ~ For invalid values, a more complicated strategy is utilized. A key empirical finding underlying this methodology is that in many applications it is the case that the large majority of software errors are ex- pressed through the simultaneous use of only two or three input values. For example, a detailed root cause analysis of field trouble reports for a large Telcordia Technologies operation support system demonstrated that most field defects were caused bypairwise interactions of input fields. Nine system-tested input screens of a Telcordia inventory system were retested using AETG-generated test cases, and 49 new defects were discovered through use of an all-pairwise input fields test plan. Given this empirical evidence, users typically set k equal to 2 or 3. To demonstrate the gains that are theoretically possible, for the situa-

TESTING METHODS AND RELATED ISSUES 31 tion of 126 dichotomous fields (a total of approximately 1038 paths), AETG identified a set of only 10 test cases that included all pairwise sets of inputs. More generally, with k fields, each with Ipossible values, AETG finds a set of inputs that has approximately 12 logy) members. The set of inputs identified by the AETG method has been shown, in practice, to have good code coverage properties. Industrial Examples of the Benefits of AETG Testing Two applications of AETG were described at the workshop. The first, presented by Manish Rathi of Telcordia Technologies, was its application to the testing of an airplane executing an aileron roll (which could be input into either an operational test or the test of an airplane simulator). The objectives of the test were to: (1) assess the pilot's ability to respond to various disturbances produced by the aileron roll, (2) detect unwanted side- slip excursions and the pilot's ability to compensate, and (3) test inertial coupling. Key inputs that affect the "rolling" characteristics of an airplane are the air speed, the Mach number, the altitude, and the position of the flaps and landing gear. (Various additional factors such as air temperature also affect the rolling characteristics but they were purposely ignored in this analysis.) For each of these four inputs, a finite number of possible values were identified for testing purposes. (All possible combinations testing was not feasible for two reasons: first, it would represent too large a set of test events to carry out; second, there were additional constraints on the inputs, prohibiting use of some combinations of input values.) Even given the constraints, there were more than 2,000 possible legal combinations of test values, i.e., more than 2,000 possible test flights. AETG was therefore used to identify a small number of test events that included inputs containing all pairwise combinations of input values for pairs of input fields (while ob- serving the various constraints). AETG discovered 70 test flights that in- cluded tests with all possible combinations of pairwise input values for pairs of input fields, a reduction of more than 96 percent, which is ex- tremely important given the cost of a single test flight. The second application, described at the workshop by ferry Huller of Raytheon, was for the software used to guide a Raytheon satellite control center combined with a telemetry, command, and ranging site, which are both used to communicate with orbiting satellites. The system contains a great deal of redundancy to enhance its overall reliability. A primary test problem is that, given the redundancy, there are many combinations of

32 INNOVATIONS IN SOFTWARE ENGINEERING equipment units that might be used along possible signal paths from the ground system operator to the satellite ancl back, ancl it is necessary to demonstrate that typical satellite operations can be performed with any of the possible paths using various combinations of equipment units. Ex- haustive testing of all combinations of signal paths was not practical in a commercial setting. An efficient way of generating a small number of test cases was needed that provided good coverage of the many possible signal paths. AETG generated test cases that covered all pairwise combinations of test inputs, ancl it also handled restrictions on allowable input combina- tions. In this application, there were 144 potential test cases of which AETG identified 12 for testing, representing a 92 percent reduction in ,, . . . . . . . .. testing. laying into conslc aeration some ac ( Tonal complications not men- tionecl here, the AETG strategy provided an overall 68 percent savings in test duration ancl a 67 percent savings in test labor costs. INTEGRATION OF AETG AND MARKOV CHAIN USAGE MODELS For a graphical model with limited choices at each node ancl for a relatively finite number of nocles, Markov chain usage model testing is a methodology that provides a rich set of information to the software tester along with an efficient method for selecting test inputs. However, the derivation of test oracles ancl user profiles can be complicatecl, especially for graphical models that have a large number of nodes ancl for nodes that have a large number of arcs clue to many input fields ancl values per fielcl. One possibility for these graphical models is to substitute, for the set of all pos- sible transitions, just those transitions that are selected by AETG. This would reduce the number of separate nodes ancl arcs neeclecl. Therefore, AETG would be used to reduce the number of paths through a software system ancl the Markov chain usage model would provide a probabilistic structure only for the AETG selections. Another way to combine these methods is where the usage models would handle the transitions from one node to another ancl AETG would determine the possible choices for user inputs at each node as they are encountered in the graphical moclel. In other worcls, AETG could operate either at the level of the entire graphical model or at the level of individual nocles. Other approaches may also be possible. This is an area in which further research would likely provide substantial benefits.

TESTING METHODS AND RELATED ISSUES TEST AUTOMATION 33 Clearly, without test automation, any testing methodology, including model-based testing methods, will be of limited use. All of the steady-state validation benefits and estimates of the costs and number of replications needed for Markov chain usage model testing are predicated on the ability to carry out a relatively large number of tests. Reliability projections that can be computed based on usage models reveal that the number of tests needed to demonstrate reliability of 99 percent or more are almost always beyond the budget for most complicated systems. Therefore, since test automation is a virtual necessity for modern software testing, the topic was examined at the workshop. Mike Houghtaling of IBM presented a number of automation tools that the company has used in managing its tape systems and libraries. The company uses Markov chain usage models comprising roughly 2,000 states, which is difficult to represent on paper. A software tool called TeamWork is therefore used to provide a graphical capability for constructing the col- lection of state transition diagrams. There is also a tool for the implemen- tation of the test oracles. For selection of test cases through random sam- pling and for composing the test results and generating the summary measures, ToolSet_Certify is used. ToolSet_SRE is used to break up the model into subsystems for focused sampling and testing and to reaggregate the subsystems for full system analysis. To manipulate the device drivers for the test execution, CBase is used. In addition, CORBA middleware (in particular ORBLink) is used to increase automation, and QuickSilver is used as a test visualizer to help depict results in a graphical environment. Houghtaling noted several complicating factors with automation. First, three concurrent oracle process strategies need to be supported: (1) postoracle analysis, (2) concurrent oracle analysis, and (3) preoracle certifi- cate generation. Second, several similar activities need to be separately addressed: (1) failure identification versus defect diagnosis, since the cause of the failure may have happened many events prior to the evident failure; (2) usage and model abstractions versus implementation knowledge, be- cause implementers sometimes make assumptions about uses or failures that are inconsistent with new uses of systems (for example, some years ago data began to flow over telephone systems); and (3) test automation com- ponent design for interoperability versus analysis of the functioning of the complete system. (Some test automation equipment is designed primarily to watch the interface between two components so that errors are not made

34 INNOVATIONS IN SOFTWARE ENGINEERING in the interface itself, but there may not be automated support for testing the full system.) Besides these, there is an obvious challenge in restoring the environment and preconditions before the tests can be run. The separation of failure identification and defect diagnosis is impor- tant since fault diagnosis requires in-depth knowledge of the system imple- mentation. Test teams, especially black box-oriented and system-level test teams, might not possess this knowledge. Test automation tools should also be composed from the same perspective. Tools that are directed to- ward usage certifications and that are consequently failure profile-oriented should not be encumbered with diagnostic-oriented artifacts. The ability to leverage the information collected during failure identification, such as automatically replaying the usage scenario in an execution environment that is configured with diagnostic tools, is beneficial, but should not sub- . · . vert tne usage testing environment. In addition, designers and implementers of test automation tools em- ploying high-level usage models need to be aware of the gap between the usage model vocabulary and the more concrete implementation vocabulary that is meaningful to the developers of the system. Attempts to document justifications for claims about test case failures will need to bridge the gap. Finally, with respect to test automation component design for interoperability, a cost-effective test automation environment will need to be based on industry standards and possess the capability of collaborating within a framework supporting many of the aspects associated with devel- opment and test processes (planning, domain modeling, test selections, test executions and evaluations, system assessments, and test progress management). METHODS FOR TESTING INTEROPERABILITY It is typical for software-intensive systems to comprise a number of separate software components that are used interactively, known as a sys- tem of systems. Some of these component systems may be commercial-off- the-shelf (COTS) systems, while others may be developed in-house. Each of these software components is subject to separate update schedules, and each time a component system is modified there is an opportunity for the overall software system to fail due to difficulties the modified component system may have in interacting with the remaining components. Difficul- ties in interactions between components are referred to as interoperability

TESTING METHODS AND RELATED ISSUES 35 failures. (The general problem involves interaction with hardware as well as software components, but the focus here is on software interactions.) A session at the workshop addressed this critical issue given its importance in Department of Defense applications; the presentations examined general tools to avoid interoperability problems in system development, and how one might test the resulting system of systems for defects. Amjad Umar of Telcordia Technologies discussed the general inter- operability problem and tools for its solution, which he referred to as inte- gration technologies. Consider as an example an e-business activity made up of several component applications. Customer purchases from this e- business involve anywhere from a dozen to hundreds of applications, which need to interoperate smoothly to complete each transaction. Such interoperability is a challenge because transactions may involve some sys- tems that are very new as well as some that are more than 20 years old. In addition to being of different vintages, these components may come from different suppliers and may have either poor or nonexistent documenta- tion. Integration is typically needed at several levels, and if the systems are not well integrated, the result can be an increase in transaction errors and service time. On the other hand, if the systems are well integrated, human efforts in completing the transaction will be minimized. The overall chal- lenge then is to determine good procedures for integrating the entire sys- tem, to set the requirements to test against, and to test the resulting system. A considerable amount of jargon is associated with this area. A glos- sary is included in Appendix B that defines some of the more common terms used in this context. Amjad Umar's presentation provided a general framework for the wide variety of approaches and techniques that address interoperability prob- lems. This is possible because the number of distinct concepts in this area is much more finite than the jumble of acronyms might suggest. The overall idea is to develop an "integration bus" with various adapters so that differ- ent applications can plug into this bus to create a smoothly interactive system. Integration Technologies To start, integration is needed at the following levels: (1) cross-enter- prise applications, (2) internal process management, (3) information trans- formation, (4) application connectivity, and (5) network connectivity. So- lution technologies that can be implemented at these various layers include,

36 INNOVATIONS IN SOFTWARE ENGINEERING from high to low: (1) business-to-business integration platforms, (2) enter- prise application integration platforms, (3) converters, (4) middleware and adapters, and (5) network transport. With respect to low-level integration between two systems, intercon- nection technologies might work as a mediator between the order process- ing, provided by a Web browser or lava apples, and the inventory, managed by user interfaces, application code, and a data source. Interconnection technologies may involve a remote user connector, a remote method con- nector, and a remote data connector. At midlevel integration, there are object wrappers (e.g., CORBA) that function in combination with screen scrapers, function gateways, and data gateways (e.g., ODBC). At high- level integration, but within an enterprise, software tools such as adapters provide a smooth integration between order processing and the inventory system. Finally, at very high-level integration, again tools such as several kinds of adapters smooth the integration between inventory systems and order processing systems and trading hubs across firewalls for external orga- . . nlzatlons. General platforms that attempt to minimize integration problems are available to oversee the process; leading examples include Sun's WEE plat- form, IBM's e-business framework, and Microsoft's .NET. The choice of integration technology to implement depends on the number of applica- tions, the flexibility of requirements, the accessibility of applications, and the degree of organizational control. Clearly, integration raises persistent challenges. It is popular because it permits use of existing software tools. On the other hand, it is problematic because it increases the workload on existing applications, may not be good for long-range needs, and creates difficult testing challenges (however, see below for a possible approach to testing systems of systems). Many tools are being developed to address these problems, but they are specific to certain areas of application, and whether they will be portable to the de- fense environment is an important question to address. Also, it is impor- tant to note that addressing interoperability problems may sometimes not be cost-effective compared to scrapping an old system and designing all system components from scratch. Additional information on integration technologies can be found in Lithicum (2001) and Umar (2002, 20031. Information can also be found at the following Web sites: (1) www.vitria.com, (2) www.webmethods.com, and (3) www.tibco.com.

TESTING METHODS AND RELATED ISSUES Testing a System of Systems for Interoperability 37 Consider a system that is a mixture of legacy and new software compo- nents and that addresses interoperability through use of middleware tech- nology (e.g., MOM, CORBA, or XML). Assume further that the applica- tion is delivered using Web services, and so is composed of services from components across the Internet or other secured networks. As an example, consider a survivable, mobile, ad hoc tactical network with a variety of clients, namely infantry and manned vehicles, that need to obtain informa- tion from a command and information center, which is composed of vari- ous types of servers. These are often high-volume and long-distance infor- mation transactions. The distributed services are based on a COTS system of systems with multiple providers. These systems themselves evolve over time, producing a substantial amount of code and functionality churn. The corresponding system can thus get very complex. In an example described at the workshop by Siddhartha Dalal, for a procurement service created on a system consisting of many component systems, there was a procurer side, a provider side, and a gateway enabling more than 300,000 transactions per month with 10,000 constantly chang- ing rules governing the transactions. There were additions and deletions of roughly 30 rules a day, and there was on average one new software compo- nent release per day. As a result of these dynamics, 4,000 errors in process- ing occurred in 6 months, but, using traditional testing methods, only 47 percent of the defects ultimately discovered were removed during testing. The problem does not have an easy solution. In interviews with a number of chief information officers responsible for Web-based services, the officers stated that service failures reported by customers had the fol- lowing distribution of causes: (1) lack of availability (64 percent) (2) software bugs (55 percent) (3) bad or invalid data (47 percent) (4) software upgrade failure (46 percent) (5) incorrect, unapproved, or illegal content (15 percent) (6) other (5 percent) (Some of these causes can occur simultaneously, thus the sum of the per- centages is greater than 100 percent.)

38 INNOVATIONS IN SOFTWARE ENGINEERING In moving from software products to component-based services, the traditional preproduction product-based testing fails because it is not fo- cused on the most typical sources of error from a service perspective. When testing a system of systems, it is critical to understand that traditional test- ing does not work for two reasons: first, the owner of a system of systems does not have control over all the component systems; second, the compo- nent systems or the rules may change from one day to the next. While product testing usually only needs to implement design for test- ability with predictable churn, in the application of Web-based services one needs a completely different testing procedure designed for continuous monitorability, with a service assurance focus, to account for unpredictable churn. With this new situation, there are usually service-level agreements con- centrating on availability and performance based on end-to-end transac- tions. To enforce the service agreements, one needs constant monitoring of the deployed system (i.e., postproduction monitoring) with test transac- tions of various types. However, one cannot send too many transactions to test and monitor the system as they may degrade the performance of the system. One approach for postproduction monitoring and testing was pro- posed by Siddhartha Dalal of Telcordia Technologies (see Dalal et al., 20021. In this approach, a number of end-to-end transactions are initially cap- tured, and then a number of synthetic end-to-end transactions are gener- ated. The idea is to generate a minimal number of user-level synthetic transactions that are very sensitive for detecting functional errors and that provide 100 percent pairwise functional coverage at the node level. This generation of synthetic end-to-end transactions can be accomplished through use of AETG. To see this, consider an XML document with 50 tags.5 Assuming only two values per tag, this could produce 25° possible test cases. The AutoVigilance system, which was created by Dalal and his colleagues at Telcordia, produces nine test cases from AETG that cover all pairwise choices of tags in this case. These synthetic transactions are then sent through probes, with the results analyzed automatically and problems proactively reported using alerts. Finally, possible subsystems causing prob- lems are identified using an automated root cause analysis. This entire 5A tag in XML identifies and delimits a text field.

TESTING METHODS AND RELATED ISSUES 39 process can be automated and is extremely efficient for nonstop monitor- ing and testing as well as regression testing when needed. The hope, by implementing this testing, is to find problems before end-users do. Based on the various methods discussed during this session, the panel concluded that model-based testing offers important potential for improv- ing the software testing methods currently utilized by the service test agen- cies. Two specific methods, Markov chain usage-based testing and AETG, were described in detail, and their utility in defense or defense-type appli- cations was exhibited. There are other related approaches, some of which were also briefly mentioned above. The specific advantages and disadvan- tages from widespread implementation of these relatively new methods for defense software systems needs to be determined. Therefore, demonstra- tion projects should be carried out. Test automation is extremely important to support the broad utility of model-based methods, and a description of automation tools for one spe- cific application suggests that such tools can be developed for other areas of . . app. .lcatlon. Finally, interoperability problems are becoming a serious hurdle for defense software systems that are structured as systems of systems. These problems have been addressed in industry, and various tools are now in use to help overcome them. In addition, an application of AETG was shown to be useful in discovering interoperability problems.

Next: 4 Data Analysis to Assess Performance and To Support Software Improvement »

Innovations in Software Engineering for Defense Systems (2003)

Chapter: 3 Testing Methods and Related Issues

Welcome to OpenBook!

Get Email Updates