APPENDIX B
Combining Information for Test Design with Staged Development
The extent to which one can use information from the previous stages to design experiments or tests in the current stage depends clearly on the similarities in the system at the different stages. This section considers several simple situations and describes some technical ideas on how information from previous stages can be exploited for efficient test design.
A SIMPLE EXAMPLE WITH TWO FACTORS
Suppose that at stage (k – 1), the test design for the system had two scenarios, each at two settings: factor 1 corresponded to sunny versus rainy conditions with x1 being the indicator variable for sunny/rainy; similarly, factor 2 corresponded to day versus night, and x2 is the indicator variable for day/night. Let Y be some performance of interest, and suppose the performance at the four combinations of test scenarios can be captured by the following simple model:
where ε is a random variable with mean 0 and constant variance. Note that there is no interaction term in this model.
Assume that the current stage (stage k) involves addition of a new night vision capability. Based on the subject-matter knowledge, we think that there will not be much change in performance for rainy-versus-sunny con-
ditions and that the only major change will be for day-versus-night test scenarios. Then, it is intuitively clear that most of the test resources in stage k should be used to assess the improvement in night-versus-day scenarios, with substantially fewer resources allocated to testing the effect of sunny-versus-rainy conditions for the new system. The actual allocation can be decided informally, depending on the level of confidence in the knowledge that the addition of the new night vision capability will not affect the rainy-versus-sunny comparison.
A more formal way of allocating test resources can be based on a Bayesian framework for design of experiments (Chaloner and Verdinelli, 1995). The estimates and Var () from stage (k – 1) can be viewed as prior information for the unknown parameters in stage k. Since we do not expect the new capability to affect the sunny-versus-rainy test comparison, we can expect the parameter β10 to remain relatively unchanged in stage k. So the prior information from stage (k – 1) for β10 is quite reliable, and and Var () will serve as good estimates of the mean and variance of the prior distribution. However, we expect β20 to change considerably, so we have to place a lot less weight on the prior information from stage (k – 1) for this parameter. The exact decision will depend on our belief on how relevant the data from the previous stage are. For example, if we believe that the data from the previous stage still provides unbiased information, we can use as the mean and an inflated value of Var with the inflation factor depending on our judgment of the relevance. In the extreme case in which the previous data provide no useful information, we can use a “noninformative” prior for β20. Alternatively, if the performance is expected to improve with the new night vision capability, the comparison of night-versus-day from the previous stage can be used as a lower bound for the prior distribution. The Bayesian theory for optimal design can now be used to obtain appropriate allocation of test resources to the various scenarios. There are several optimality criteria, such as D-, A-, and G-criteria. A good review can be found in Chaloner and Verdinelli (1995).
The same ideas can be used if we have additional interaction terms of the form
or a model with factors or test scenarios with more than two levels and quadratic terms:
EXTENSIONS TO MORE REALISTIC SCENARIOS
The above discussion can be generalized to address a variety of interesting, more complex cases. For example:
1. Suppose that we have new factors or test scenarios (new x’s) in stage k. In other words, the system at the new stage will be asked to perform well in entirely new environments, for new missions, or against new threats. The latter could arise in the above example, for instance, if there was no night vision capability in stage (k – 1) so that it was not a factor in the previous operational test. With the new capability added to the requirements, it becomes necessary to test for it. In this case, one will have to examine how the parameters in the model from stage (k – 1) map into those for stage k since only one level (daylight) was tested before. One can use a noninformative prior for the parameters to represent this state of ignorance.
2. Suppose that the functional form of Y = f (x;β) changes. If the change is arbitrary, then there is no linkage to borrow information from earlier stages of development. Often, however, there will be some common elements that we can usefully exploit. For example, suppose we have an additive model of the form
where g (k)(x2;β2)represents the effect of the new factors in the current stage. Then the problem decouples into two independent ones, and we can focus more resources in the current stage on estimating the second component. This is just a more complex version of the earlier, simple example with two factors. If there is, in addition, interaction between the two terms, then resources must also be allocated to estimating that term.
3. Suppose now that new measures of performance or new measures of effectiveness become of interest in the intermediate stages of system development. For example, suppose the amount of collateral damage becomes a new metric used to evaluate a system. Again, the relationship between the new measures and those from the previous stages can be exploited for improved efficiency. This will require careful modeling of the relationships.
4. In many situations, additional capabilities acquired in later stages
can be viewed as new subsystems. If the effect of these new subsystems on system performance can be assumed to be additive (as in point 2 above), then the majority of the experimentation resources in the k-th stage should be spent on the new subsystems with limited resources devoted to system integration (interactions). More specifically, the following strategy should be useful in such situations:
-
Do full operational testing and integration testing only after substantial stages.
-
Do limited integration testing at intermediate stages at which modifications are small to moderate.
-
Build in realism in developmental tests and carry out full component testing in developmental test.
As pointed out at the workshop by Steve Vardeman, one can use more formal decision-theoretic methods to combine both costs and data from test results at different stages of development, including results from developmental, operational, and field tests. Examples of such methods were described at the workshop. (See, for example, Gaver et al., 2005). These techniques require inputs (typically costs) that are often difficult to estimate or quantify; examples include cost of fielding a system late, cost of having a fielded system perform poorly in operations, the benefit of deploying a good system earlier, benefit of winning a battle faster as a result of a fielded new system, and so on. Nevertheless, analyses based on such approaches can provide useful insights into the trade-offs involved, especially when they are coupled with sensitivity analyses on the robustness of the conclusions to the inputs.
REFERENCES
Chaloner, K., and I. Verdinelli 1995 Bayesian experimental design: A review. Statistical Science 10:273-304.
Gaver, D., P. Jacobs, and E. Seglie 2005 Modern Military Evolutionary Acquisition and the Ramifications of “RAMS.” Technical Report. Monterey, CA: Naval Postgraduate School.