Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 53
4
Prerequisites for Combining Information
The development and implementation of techniques for combining
information, whether in design or in evaluation, are often sophis-
ticated activities. What may conceptually seem to be relatively
straightforward applications often require original thought, nontrivial
modification of existing techniques, and software development. But the
use of methods for combining information can be made easier if the appro-
priate methodological and logistic frameworks are in place. This chapter
discusses several key steps that should be taken to establish these frame-
works: broader definitions of data so that nontest data (e.g., expert judg-
ment and computer models) can be formally and correctly included in
analyses; development of test data archives so that what is learned about a
system continues to be of use to future evaluators; use of graphical repre-
sentations of complex systems to aid in the understanding of overall reli-
ability and performance; and use of formal statistical methods for informa-
. . . .
tlon combination.
~ . . . . ~ .
His chapter also identifies the statistical capabilities required to imple-
ment such strategies. There is no clear evidence that the service test agen-
cies have these capabilities in place today, and so if they find the advantages
presented here compelling, it will be necessary for them, with help from
higher-level officials within the services, to acquire the capabilities described
in this chapter.
53
OCR for page 54
54
IMPROVED OPERATIONAL TESTING AND EVALUATION
NEED FOR A BROADER DEFINITION OF DATA
When performing an assessment of a complex system, the most com-
monly used data are test data, whether from operational, developmental, or
contractor tests. Other sources of information about the system include
training exercises, field use, computer models and simulation, and military
and engineering judgment.
Figure 4-1 is a schematic diagram of a system and its available data
sources. If resources were available, it would be desirable to collect test data
on every part of the system and to perform system tests under a variety of
conditions. For large and complex systems, however, that is seldom pos-
sible, and so the assessment often resembles that shown in Figure 4-1, where
some parts are not tested at all, some have computer modeling and simula-
tion data, some have historical data, others have test data, and some have
multiple sources of data.
The challenges in methods for combining information are to (1) repre-
sent the system under test in a way that all of the stakeholders can under-
stand (in Figure 4-1, a fault tree is used, one of many useful representa-
tional schemes); (2) collect data (broadly defined) to assess the system and
map them onto the representation; and (3) perform appropriate statistical
analyses to combine the available information into estimates of the metrics
of interest. All of these steps are performed in some way by ATEC's current
operational evaluation; this chapter provides suggestions for additional ca-
pabilities. For example, the graphical representation of Figure 4-1 could be
used to facilitate understanding of the system evaluation plan data source
matrix, to suggest areas where data are (or will be) missing and where data
combination is possible, and to provide a structure for test planning.
It is important to acknowledge and account for possible weaknesses in
different kinds of data. The use of nontest data for evaluation can be con-
tentious, although it is done routinely. Military, engineering, and statistical
judgments are required to design test plans and interpret data; and com-
puter modeling and simulation are applied to test data collected under
certain scenarios to extrapolate the scope of their validity to other scenarios
or to larger fighting units. Methodological contention arises when attempts
are made to use military judgment or computer modeling and simulation
results formally as data, instead of using them only to inform design, mod-
. . .
e. .lng, or interpretation.
The use of expert judgments, in particular, is especially vulnerable to
inappropriate application due to procedural or cognitive biases. For
OCR for page 55
55
C) o
~ .
Q ~
o ~
in
+
C)
. _
IL
C'
a
I
c`i in ~
~ ~ C)
~ S ~
o
Z
+
i
,
:.
in
c)
c)
A_
c)
c)
TIC
.= ~ C)
.' =._
~ ~ in
Q
.m
in
o
be
o
.;
Fig
H
OCR for page 56
56
IMPROVED OPERATIONAL TESTING AND EVALUATION
example, suppose that test data are not available for a particular component
but that engineering judgment considers the system design "unreliable"
(Figure 4-11. Methods have been proposed (Meyer and Booker, 2001) to
formally elicit and quantify engineering judgment for inclusion in statisti-
cal calculations, and there is a growing body of literature by statisticians,
decision analysts, social scientists, and cognitive psychologists, developed
over the past two decades, describing methods for eliciting and using expert
judgment. Using information based on expert judgment requires consider-
able care, explicit documentation, and careful sensitivity analysis. With the
recognition that all statistical analyses depend, to some degree, on subjec-
tive judgment (Berger and Berry, 1988) comes the obligation to ensure that
such judgments are made in a rational and defensible manner.
It is well established that the major barrier to successful elicitation is
the presence of biases inherent in the process used to evoke expert responses
(see, for example, the pioneering work of Tversky, Slovic, and Kahneman,
19851. These biases are often characterized as cognitive or motivational,
and attributed to a variety of sources, including: intrinsic cognitive failures,
the instrument used to elicit responses, the social or institutional setting
within which the expert operates, and the response mode.
Cognitive biases are evident in effects such as anchoring, the tendency
not to adjust from a first response even after receiving information contrary
to the position; availability, the elicitation of event probabilities or other
values based on what readily comes to mind; conservatism, a reluctance or
inability to draw inferences agreeing with those that would be obtained
using Bayes' rule; and underestimation, an understatement of the uncer-
tainty of an assessment. Motivational biases include group think, whereby
experts tend to slant their assessments to what they perceive to be a consen-
sus; and misinterpretation, in which the method or instrument of elicitation
affects the expert's responses (as when, for example, the framing of a ques-
tion cues the expert to provide a preferred response).
The test and evaluation environment contains strong institutional in-
centives and is therefore possibly subject to equally pervasive motivation
biases on the part of experts asked to provide their judgment. These experts
can be specifically trained in methods to avoid or mitigate an array of cog-
nitive biases, and the elicitations themselves can be structured to minimize
the effects of bias. A growing literature of methods addresses these issues. In
particular, Meyer and Booker (2001) and Booker and McNamara (2003)
provide exemplary guides to such ameliorative methods as indirect prob-
OCR for page 57
PREREQUISITES FOR COMBINING INFORMATION
57
ability assessment, use of documented processes for elicitation, expert iden-
tification, motivation and training, modes of communication, and appro-
priate framing. If expert judgment is used in methods for combining infor-
mation, it is extremely important that these or similar techniques be used,
especially when arriving at prior distributions for critical parameters, such
as failure rates.
Some industrial organizations have become comfortable using such
techniques, while being aware of and adjusting for potential biases, in high-
profile, politically sensitive analyses. For example, General Motors Corpo-
ration reports on its ability to assess technical success probabilities in
Bordley (1998) and has used panels of over 40 experts to develop cumula-
tive prior probability distributions for the improvement of fuel economy
by using a novel powertrain concept.
Computer modeling and simulation, which can be thought of as com-
bining the original data with the knowledge incorporated in the model, can
also provide a cost-effective way of expanding the use of the data. The
appropriate use of computer modeling and simulation methods depends
crucially on the trustworthiness of the models in transporting data to other
scenarios. Although simulation can generate a large amount of new data, it
is a serious mistake to combine these generated data directly with the origi-
nal data to increase the sample size. Instead, more sophisticated statistical
methods (e.g., as described by Reese et al., 2000) should be employed.
NEED FOR A TEST DATA ARCHIVE
Given the wide variety of data sources available when performing a
system assessment, a mechanism should be developed to archive the data
and make them available for current and future assessments. At present,
such data are not saved in a readily accessible database along with contex-
tual information. This is true even for previous development stages of a
system. Once a system has been fielded, the absence of rigorous informa-
tion on system performance greatly limits the effectiveness offeedback loops
relating performance in the field to performance during testing, feedback
that could be very useful for improving system designs, the system develop-
ment process, and operational and developmental test design.
A data archive of military system performance could be put to several
uses that would assist in test design and system evaluation. In support of
test design, data archiving can be used to:
OCR for page 58
58
IMPROVED OPERATIONAL TESTING AND EVALUATION
help set the requirements for the test design and develop the opera-
tional mission summary/mission profile (OMS/MP);
determine the set of conditions and miniscenarios to be included in
the developmental and operational tests;
· identify scenarios in which the new system is expected to perform
better than previous systems (e.g., by providing information on how
other systems performed in similar scenarios);
· similarly, identify scenarios in which the new system may perform
poorly;
· identify factors that have an important impact on system pertor-
.
mance;
· understand the factor levels that stress the system weakly, moder-
ately, and severely; and
determine adequate sample sizes through power calculations.
In support of system evaluation, data archiving can be used to provide
information to support analysis of the validity of computer models and
simulations used in test evaluation; support identification of appropriate
statistical models for use in system evaluation; and support pooling and
other forms of information combining. With the increasing development
of statistical methods and models for information combining this last rea-
son has become increasingly more compelling.
Data archiving can also contribute to improvement of defense system
assessment by providing a means to better understand the differences be-
tween failure modes and failure frequencies in moving from developmental
to operational testing and from operational testing to field use; understand
the sources of system deficiencies identified in the field, which can then be
used to guide design improvements; improve both developmental and op-
erational testing and evaluation, e.g., by understanding how deficiencies
identified in the field escaped detection in the developmental and opera-
tional tests; and estimate system and component residual lifetimes and life
cycle costs.
The current lack of priority for data archiving, given the above advan-
tages, suggests that the primary purpose of test data is to evaluate a system
for promotion to the next stage of the milestone process of defense system
development. Processes and techniques for combining data across acquisi-
tion stages either for a given system or across systems are not currently
envisioned or well supported. However, such data, often acquired at enor-
OCR for page 59
PREREQUISITES FOR COMBINING INFORMATION
59
mous cost (e.g., operational tests can cost many millions of dollars), could
and should be stored in an accessible form that would facilitate the above
uses. Averaged over all defense systems in development, the cost of such an
archive would be extremely small, but its value, as has been discovered in
many industrial settings, could be substantial.
A test data archive would need to contain a rich set of variables to
adequately represent the test environment, the system under test, and the
performance of the system. Failure to initially include such a comprehen-
sive set of variables should not be used as an argument for not getting
started, since many of the potential benefits from such an archive could be
derived from a subset of what is described here, with increasing detail added
over time.
In order to accurately represent system performance, including the ap-
pearance of various failure modes and their associated failure frequencies,
the circumstances of the test must be understood well enough that the test,
training exercise, or field use can be effectively replicated, including the
environment of use (e.g., weather, terrain, foliage, and time of day) and
type of use (e.g., mission, intensity, and threat). This information is not
easy to collect in controlled settings such as operational testing, and is con-
siderably more difficult to collect in less controlled types of use, such as
training exercises or field use. However, much in this direction can be ac-
complished. In addition, contextual information that might be relevant for
an operational test might have little relevance in the developmental test,
because often only particular components are under test.
While a system is under development, the system design is often under
constant modification. Given the need, stated above, to be able to replicate
a test event in the database, it is crucial to represent with fidelity the system
that was in operation during the event so that proper inference is possible.
Since modifications can and do occur during late-stage operational testing
and after fielding, this is not only a concern for the developmental test.
Even for systems produced at the same stage of development, knowledge of
the order and location of manufacture can be useful to understanding why
some prototype systems perform differently from others.
In addition to storing the length of time between system failures, it is
also important to identify which hardware or software component mal-
functioned; the maintenance (including repair) record of the system; the
time of previous failures; the number of cycles of use between failures; the
degree offailure; and any other variables that indicate the stresses and strains
OCR for page 60
60
IMPROVED OPERATIONAL TESTING AND EVALUATION
to which the system was subjected, such as speed and payload. It is also
useful to include the environments and stresses to which individual system
prototypes have been exposed historically (e.g., in transport, storage, and
repeated on/off cycling), in order to support comprehensive failure mode
analysis, especially if an apparent declining trend in system reliability ap-
pears. This sort of information is difficult to collect in less controlled set-
tings; however, in many industries sensors have been attached to systems to
collect much of the information automatically.
The information stored should be both quantitative and qualitative.
The latter is important to include because the contextual information
needed to help recreate the environment of use often includes qualitative
information. To facilitate use across services, such an archive should make
use of terminology common across services and, in its design and accessi-
bility, should address classification issues.
With respect to the structure and function of the database, it should be
able to track failures over time and identify systems that, while considerably
different, have similar components. These needs argue for a database in
which these linkages are facilitated. An analysis of similar data archives in
industry would enable the DoD to build on existing processes and tech-
niques.
The panel is pleased to note that there are defense databases that satisfy
some of the above needs; the ATEC Distributed Data Archive and Re-
trieval System and several servicowide reliability or failure reporting data-
bases are leading examples. However, those that the panel has seen support
only a few of the potential benefits listed above, rather than the breadth,
structure, and accessibility that we envision.
The marginal costs of data collection, input, and maintenance could
be easily met through routine allocation of a small percentage of the devel-
opment funds from every ACAT I program. The initial fixed costs for the
Army might be funded by the Army Materiel Command and other related
groups.
Finally, as mentioned in Chapter 5 for the Future Combat System,
systems developed using evolutionary acquisition provide an additional ar-
gument for the establishment and use of a test (and field) data archive,
since it is vital to link the performance of the system as it proceeds through
the various stages of development. This test and field data archive could
(1) assist in operational test design for the various stages of system develop-
ment, (2) help in diagnosing sources of failure modes, and (3) assist in
operational evaluation.
OCR for page 61
PREREQUISITES FOR COMBINING INFORMATION
61
Recommendation: The Department of Defense should provide the
funds to establish a test data archive that will be a prerequisite for
combining information for test and evaluation of future systems.
REPRESENTATIONS
The fault tree represented in Figure 4-1 captures logically how the
parts of the system under study interact. The same can be conveyed in
reliability block diagrams. These and other classes of representations can be
quite useful when assessing systems as large and complex as those evaluated
by ATEC. For large, complex systems with heterogeneous data sources,
representations of a system have several advantages: they set out a common
language that all communities can use to discuss the problem; heteroge-
neous data sources can be explicitly located in the representation; and the
representation provides an explicit mapping from the problem to the data
to the metrics of interest.
If the system and its assessment are to be put in a decision context-
tor example, an overall assessment of system effectiveness and suitability
supporting an acquisition decision the fault trees and block diagrams may
need to be embedded in a representation that supports these broader goals
and connects the disparate and heterogeneous sources of data. Within the
data archive one can use representations of the test environments to under-
stand and compare variables such as the environment of use (weather, ter-
rain, foliage, and time of day) and type of use (e.g., mission, intensity, and
threat) across multiple tests.
It is important to develop a set of higher-level representations of the
system under evaluation for use both within the data archive and more
broadly in the system assessment. These representations, of necessity, change
over time as the system and the context of the evaluation change. Standard
reliability assessment methods focus on individual parts or simple groups
of parts within a system. Assessing the overall reliability and performance
of a complex system, however, involves understanding and integrating the
reliabilities associated with the subsystems and parts, and this understand-
ing and integrating are not always straightforward. Multiple and heteroge-
neous data types may exist, and the wider community that owns the system
may not understand all the features and relationships that can affect system
reliability. One way to illustrate all of the factors that characterize and im-
pinge upon system reliability is by building qualitative graphical systems
OCR for page 62
62
IMPROVED OPERATIONAL TESTING AND EVALUATION
representations that can be migrated to graphical statistical models to assess
reliability. For this reason, it is important that the information stored in a
data archive be both quantitative and qualitative, as noted above.
Most groups developing complex systems do develop compartmental-
ized graphical representations of reliability. These representations may in-
clude reliability block diagrams; timelines, process diagrams, or Gantt/
PERT charts dealing with mission schedule and risk; and engineering sche-
matics of physical systems and subsystems. But none of these disparate
representations capture all aspects or concerns of the integrated complex
system. Moreover, since the system is likely under development with users,
procurers, planners, managers, designers, manufacturers, testers, and evalu-
ators spanning multiple organizations, geographical locations, and fields of
expertise, these numerous, specialized, and compartmentalized representa-
tions foil attempts for the multiple groups to meaningfully discuss (or even
understand) total system reliability or performance.
There are sets of methods and graphical representations that capture
the full range of features and relationships that affect system reliability. For
example, Leishman and McNamara (2002) employ ethnographic methods
to elicit a model structure from the pertinent communities of experts in-
volved in developing the system. The information on system reliability can
initially be captured using "scratch nets" (Meyer and Paton, 2002) (i.e.,
simple diagrams that sketch out the important features of the system and
its decision frame), which also allow a preliminary mapping of the key
relationships between features. These scratch nets form the basis for more
formalized representations called conceptual graphs (Sowa, 1984), which
are a formal graphical language for representing logical relationships; they
are used extensively in the artificial intelligence, information technology,
and computer modeling communities. Similar to the less formal scratch
nets, conceptual graphs use labeled nodes (which represent any entity, at-
tribute, action, state, or event that can be described in natural language)
and arcs (relationships) to map out logical relationships in a domain of
knowledge.
The example in Figure 4-2 is a typical use of a conceptual graph to
convey the meaning of natural language propositions within the context of
a complex system. Generally, representations of complex systems are used
to capture higher-level concepts, but the grounding of conceptual graphs in
both natural language and formal logic also allows them to be used for
expert judgment elicitation (and even potentially for text mining) to build
OCR for page 63
63
1 ~ 1
o
W
o
o
o
tr In
in
~ g
o ~
In
lo ~
2 ~
Cot ~
~ o
O A
Q ~
~ O
m ~
_~
,~\
L~',~
be
-
Cat
o
o
be
o
Fag
H
OCR for page 64
64
IMPROVED OPERATIONAL TESTING AND EVALUATION
formal logic models that can then become formal mathematical and statis-
tical models.
From the initial scratch nets, conceptual graphs are used to create an
ontology (i.e., a representation of high-level concepts and main ideas relat-
ing to a particular problem domain) for the system. This ontology repre-
sents the major areas of the system and its decision frame, such that any
pertinent detail that needs to be added to the representation can be added
hierarchically under one of the existing nodes. The ontology is also used as
a boundary object (i.e., an information object that facilitates discussion
and interaction between divergent communities that share common inter-
ests but have different perspectives on those interests) so that the diverse
stakeholders involved with the project can understand and agree on the
features that must be taken into account when assessing system reliability
and performance.
Building on the ontology, important features and relationships from
. . . . . . . . . ..
the various existing representations e.g., engineering ( diagrams, tlmellne
and process diagrams) are integrated in a conceptual graph (or series of
graphs). One of the strengths of conceptual graphs is that they are an effec-
tive common format to capture diverse concepts and relationships and thus
provide an effective structure for combining information. If the concept or
relationship can be described in natural language, it can be represented
logically and eventually mathematically (the process works backward as
well). The graphical statistical models developed in this process can be eas-
ily explained to stakeholder communities because they are representations
of natural language in which relationships can be understood without hav-
ing to explain the underlying mathematical and statistical notation.
Unlike reliability block diagrams and fault trees, conceptual graphs do
not correspond directly to a particular statistical model. There must be a
translation from the qualitative conceptual graph model to a quantitative
model. Bayesian networks, in particular, are a flexible class of statistical
graphical models that capture causal relationships Jensen, 1996) in a way
that meshes well with conceptual graphs; they are considered flexible be-
cause standard reliability diagrams (like block diagrams and fault trees) can
easily be represented as Bayesian networks (Almond, 19951. The Bayesian
network can also be used to model the conditional dependence and inde-
pendence relationships important for specifying the more complex Baye-
sian, hierarchical, and random effects models mentioned earlier in this re-
port. (For an example of the development of representations and the
subsequent use of a Bayesian network for analysis, see Appendix C.)
OCR for page 65
PREREQUISITES FOR COMBINING INFORMATION
65
Not every performance and reliability assessment requires the careful
development of an integrated series of representations. However, these kinds
of representations do help accomplish the goals of a system evaluation plan
by making explicit the relationships among parts of the system and the
analysis and by providing an explicit mapping of the evaluation to the data
sources and the metrics of interest. The conceptual graph representations
are flexible enough to achieve these goals within a framework that can
change dynamically as the system and evaluation goals develop.
COMBINING INFORMATION FOR COMPLEX SYSTEMS
One of the steps required in an industrial reliability assurance program
is the use of failure modes and effects analysis (FMEA) and reliability block
diagrams to quantify the relationships between a system's subsystems, com-
ponents, interfaces, and potential environmental effects. (As noted in the
previous section, other representational methods can also be employed to
create a unified picture of the system and decision space under consider-
ation.) These representations result in a reliability model.
For large, complex, changing systems, however, developing and quan-
tifying a reliability or performance model can be an extremely challenging
problem. Complex systems tend to have complex problems, which usually
exhibit one or more of the following characteristics (Booker and
McNamara, 20031: a poorly defined or understood system or process, such
as high cycle fatigue effects on a turbine engine; a process characterized by
multiple exogenous factors whose impacts are not fully understood, such as
the effects on a new system of changing combat missions; an engineered
system in the very early stages of design, such as a new concept design for a
fuel cell; a system, process, or problem that involves experts from different
disciplinary backgrounds, who work in different geographical locations,
and/or whose problem-solving tools vary widely (as is the case in the work
involved to ensure the reliability of a manned mission to Mars); and any
new groups of experts in novel configurations brought together for its solu-
tion.
Any time these sorts of complexities are involved, stakeholders may
have difficulties coming to a common understanding of the problem to be
addressed. As discussed previously, experts are always involved in the devel-
opment and justification of assumptions used for modeling and analyses.
In complex systems, one approach to dealing with the difficulties of formu-
OCR for page 66
66
IMPROVED OPERATIONAL TESTING AND EVALUATION
rating the model is to formalize the involvement of the experts. Briefly, the
stages of expert involvement include the following:
1. Identifying the problem: What is the system under consideration?
What are the primary metrics that must be evaluated? Who are the relevant
stakeholders and what are their needs and expectations?
2. Operationalizing the problem: What are the operational definitions
of the metrics? How will the metrics be evaluated? What are the constraints
on the evaluation?
3. Developing the model: What are the core concepts that structure
this problem? How are they related to one another? What classes of qualita-
tive and quantitative models best fit the problem as currently structured?
4. Integrating and analyzing information: What data sources can be
used to characterize this problem? Who owns those data sources? Can the
postulated model answer the questions identified previously?
5. Statistical analysis: What are the appropriate techniques to combine
the information from the available data sources? What are the appropriate
graphical displays of information? What predictions or inferences must be
made to support decisions?
NEED FOR ADDITIONAL STATISTICAL CAPABILITIES
The procedures, both informal and formal, for combining informa-
tion comprise a broad set of techniques ranging from those that are ex-
tremely easy to apply (being robust to the precise circumstances of the
application and with solutions in a simple, closed form) to very sophisti-
cated models that are targeted to a specific application, require great imagi-
nation and technical expertise to identify and construct, and often require
additional technical expertise to implement, possibly involving software
development.] The more sophisticated category contains the rich collec-
tion of hierarchical and random effects models that have enjoyed recent,
very rapid development and that have been successfully applied to a large
number of new situations.
Flexible, public-use software currently exists for a rich set of models, greatly simplify-
ing software development. For example, R (http://cran.r-project.org/), BUGS (Bayesian In-
ference Using Gibbs Sampling), and WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/
winbugs/contents.shtml) are widely used in a variety of fields of application and are available
at no cost on the Internet.
OCR for page 67
PREREQUISITES FOR COMBINING INFORMATION
67
This report has described the potential applicability of various infor-
mation-combining techniques to operational test design and evaluation for
defense systems. Given the great complexity of ACAT I defense systems,
and the many important facets of their development and evaluation, it is
likely that many of these systems will require technically complicated meth-
ods to support their evaluation. It is also likely that use of these compli-
cated methods will provide tangible benefits in operational evaluation. With
respect to both hierarchical and random effects modeling, while there are
some standard models that have been repeatedly applied and that may be
useful for some defense applications, it is very likely that procedures at the
leading edge of research will often be needed for high-quality operational
test designs and evaluations.
Operational test evaluation is carried out under fairly substantial time
pressures, in circumstances where errors can have extremely serious conse-
quences. Experts with demonstrated proficiency in the use of combining
information methods are required to ensure their fast, correct application.
Furthermore, so that the ultimate decision makers can fully understand the
findings based on these techniques, careful articulation of the methods and
findings, including the important contribution of sensitivity analyses of
divergence from assumptions, is very important, and also argues for the
involvement of individuals with a complete understanding of the methods
and their strengths and weaknesses.
Though there are exceptions, these sophisticated techniques are ordi-
narily not fully understood and correctly applied by those with a master's
degree in statistics. A doctorate in statistics or a closely related discipline is
generally required. This raises the question as to how ATEC can gain access
to such expertise. The 1998 NRC report (National Research Council, 1998)
mentioned available expertise at the Naval Postgraduate School, the Insti-
tute for Defense Analyses, RAND, and other federally funded research and
development centers, as well as academia. This panel generally supports the
recommendations contained in that report. One complication with the use
of statisticians either on staff or, even more crucially, as consultants, is that
more than statistical expertise will be required. Statisticians working on the
methodology for system evaluation will need to work in close collaboration
with experts in defense acquisition, military operations, and the system
under test. Knowledge of physics and engineering would also be extremely
useful.
Given the need for collaboration and acquired expertise, having appro-
priately qualified on-site staff is clearly the best option. Another option that
.. . . . .
OCR for page 68
68
IMPROVED OPERATIONAL TESTING AND EVALUATION
should be considered, especially in the short term, is to offer sabbaticals
and other temporary arrangements to experts in industry and academia.
The panel suggests that one approach, which would institutionalize the use
of sabbatical arrangements but that would require cooperation of the ser-
vices, would be for each service to create an Interagency Personnel Act
(PIA) position for a statistical expert for test and evaluation, reporting to
the head of each service's operational test agency. DOT&E could also cre-
ate a similar position. These statistical experts would work both as indi-
vidual resident experts in each of the service test agencies, and would also
be available to work jointly on the most challenging test and evaluation
issues in the DoD. These temporary positions would rotate every three
years and would have sufficient salary and prestige to attract leading statis-
ticians from academia and industry.
In addition, ATEC should consider making available all sources and
types of information for a candidate defense system to a selected group of
qualified statisticians in industry and academia as a case study to under-
stand the potential advantages of combining information for operational
evaluation.
Recommendation: ATEC should examine how to increase statistical
capabilities to support future use of techniques for combining infor-
mation. As a first step, ATEC should consider providing all sources of
information for a candidate defense system to a group of qualified
statisticians in industry anti academia as a case stutly to untlerstantl the
potential advantages of combining information for operational evalua-
t~on.
Representative terms from entire chapter:
operational testing