Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 25
3 Testing and Analyses of the ASP and PVT/RIID Systems The committee was asked to evaluate the adequacy of past testing and analyses of the advanced spectroscopic portal (ASP) systems performed by the Department of Homeland Security’s (DHS’s) Domestic Nuclear Detection Office (DNDO), and the scientific rigor and robustness of DNDO's testing and analysis approach. The Joint Explanatory Statement from Congress states that the intent of the Secretary of Homeland Security’s consultation with the National Academies is to “bring robustness and scientific rigor to the procurement process.” As noted at the beginning of this report, when the committee ended its information gathering for this interim report in mid-January, the testing and analyses were incomplete and DNDO had not provided written reports describing test results. No one on the study committee observed ASP tests before the committee was formed in May 2008. This chapter is based on the committee’s observations in visits to ports of entry and test sites, reports of testing done before 2008 and documented plans for 2008 tests, observations of performance tests conducted in 2008 at the Nevada Test Site, and a briefing (October 8, 2008) on preliminary results from performance tests done in 2008. The Government Accountability Office (GAO), DHS’s Independent Review Team (IRT), and Congress already have reviewed and criticized pre-2008 testing of ASPs and PVT/RIIDs. The criticism resulted in the requirement for additional testing to support a decision about procurement of ASPs. Another factor that led to the requirement for DNDO to revisit testing in 2008 is that Customs and Border Protection (CBP) was dissatisfied with the ASP systems’ reliability and compatibility with other CBP systems. Systems qualification testing, and particularly systems integration testing, were more rigorous and demanding in 2008. These tests took much longer than expected and only one vendor had successfully completed systems integration testing, as of January 2009. DNDO, CBP, and their contractors have conducted many tests over the last three years. A list of the major tests conducted on the ASPs and RPMs can be found in Table 3.1. DNDO has a complex set of criteria to evaluate. The characterization of a system is a process, and no one set of tests is expected to describe thoroughly all variables. Indeed, the scientific method describes a cycle of hypothesis and experimentation, which when applied to instrument development, allows for an iterative process of identification and mitigation of weaknesses. How the tests could be better crafted to carry out this process is described in detail later in this chapter. The process for testing radiation portal monitor systems, such as the ASP systems, begins at the component level and progresses to the subsystem and system level. Initial testing is conducted with components and subsystems in the laboratory, such as functional and environmental testing of individual detector elements, graduating to larger subsystems and full systems in systems qualification testing. The last of these is done at Pacific Northwest Laboratory. Overall systems performance is measured with live radiation sources and a simulated port of entry at the Nevada Test Site (NTS, see Figure 3.1), and field validation testing is conducted outdoors at U.S. ports of entry with representative container cargo loadings. 25
OCR for page 26
26 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT Table 3.1 Tests and Key Questions Tests Description Objective Key Questions NYCT Tests ASP and PVT portals were To collect data (spectra) What does radiation in the stream installed in primary and on stream of commerce of commerce look like? What is the secondary screening sites. cargo containers to feed range and variation in radiation The data collected were into injection studies emitted by typical cargo? used for modeling and injection studies. Special Set of 12 “relatively blind” To assess vulnerabilities in Has bias been introduced into the (“Blind” or test configurations. Tests the performance test plan. ASP test results by either vendors “Demo”) performed at NTS. To evaluate the possibility or the test team? Testing Anticipated results were that bias had been Does the test plan contain enough compared to results given introduced into the test of a diversity of sources and test to the operator. When results by vendors or the configurations? available, underlying data test team. (raw spectra) were To provide additional data evaluated by third party to the vendors for system isotope identification development. algorithm. These results were compared to operator results. Statistical analysis was performed by NIST to determine how special test results compared to standard test results. Phase 3 Tests Tests performed at NTS To aid in development of How do known areas for with various sources and secondary screening improvement affect the attenuating materials in operations and procedures. performance of ASPs, and what can cargo containers moving at be done to address them? different speeds. Environmental Tests took place at the Verify that the system can Are all components of the ASP Product vendor’s facility and at a function within the system durable enough to withstand Qualification National Recognized Test environment, including the climate and environmental Testing Laboratory and witnessed weather and climate, in stresses at ports of entry (POEs) by government which the system will be across the country? representatives. operated and maintained. Systems A series of tests designed Verify technical Have the basic system requirements Qualification by the vendors and achievement of the system been met? Is the system ready to Tests approved by DNDO to requirements as described enter performance testing? assure that the system in the Performance Is the ASP system suitable and requirements of the Specification for ASPs deployable within the existing performance specification nuclear detection architecture? have been met. Tests took place at the vendor’s facility and PNNL’s 331G facility and were witnessed by government representatives. Performance Cargo containers loaded Evaluate system How do the ASP systems perform Tests at NTS with varying configurations performance and collect relative to the current generation of of shielding material, data to support operational detection and identification masking material, threat test and evaluation. systems? - Compare ASP system objects, and surrogate sources are run on a performance with that What are the thresholds for roadway flanked by the detection of threat materials? of the PVT and RIID
OCR for page 27
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 27 PVT and ASP detectors in systems. - Characterize the effect sequence. Secondary RIID How do the systems perform with screening is carried out in threat sources in the presence of of shielding and the staging area masking and attenuating material? masking on ASP and RIID performance against threat objects and NORM - Collect data to support verification of system requirements Collect data in support of operational testing and evaluation requirements Integration Tests conducted by DNDO Demonstrate that the ASP Do the ASP systems meet the Tests at PNNL’s 331G test systems are ready to be necessary integration requirements facility. Test systems were integrated into the associated with their deployment, placed in a simulated port interdiction systems at and are they suitable for operator of entry environment and U.S. POEs for field use? evaluated for compatibility validation in primary and with CBP standard secondary configurations operating procedures (SOP) and other equipment, such as gate arms and traffic lights. Both hardware and software were evaluated. - Perform system Field Test conducted at ports of Does the ASP system fit readily Validation Test entry. Conducted by CBP into the existing POE RPM sites? installation procedures with ASP systems in place and process Are they suitable for operator use? - Train officers in the use screening the stream of Is the ASP system interoperable commerce trucks. PNNL with users/stakeholders to execute of the system - Familiarize officers with will draft the final report. the nuclear detection and reporting mission? operations of ASP systems with PVT systems - Conduct operations with ASP alone Operational ASP systems will be placed Validate the operational How effective is the ASP system in Test at a POE in both primary effectiveness and terms of time to conduct screening, and secondary locations in suitability of ASP at ports number of referrals to secondary conjunction with PVT of entry under realistic screening, involvement of LSS, and monitors to screen stream operating conditions reliability, availability, and of commerce cargo maintenance of the system? Have containers. The systems CBP personnel identified any will be operated by CPB concerns or limitations of the officers using standard system? operating procedures. A Is the ASP system interoperable survey of CBP personnel with users/stakeholders to execute will also occur. the nuclear detection and reporting mission? Is the ASP system suitable and deployable within the existing nuclear detection architecture?
OCR for page 28
28 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT (a) (b) Figure 3.1 (a) Computer rendering of the PNNL 331-G site; and (b) ASP Test track at the Nevada Test Site. Because certain masking or shielding materials can interfere with the ability of the warning system to detect or identify objects containing special nuclear material (SNM), tests are also conducted at NTS with such masking or shielding materials and SNM. Fully integrated operational tests follow the field validation tests and also are conducted outdoors at selected U.S. ports of entry. The committee has focused much of its attention on performance testing. This is not because the other tests are unimportant: Regardless of the performance, the portals will be of little use if they cannot operate in real conditions (rain for example) or if they are incompatible with CBP’s computer systems. However, the design, execution, and evaluation of these tests are comparatively routine, even if solutions to problems revealed by the tests are not. The design, execution, and evaluation of performance tests for the portals is more challenging and involves more of the science and engineering principles on which the committee has advice to offer. Some types of testing for ASPs are constrained in ways that testing of many Department of Defense procurement subjects (for example) are not. The main restrictions arise from the DOE security regulations for SNM and health and safety requirements. These requirements result in the need to separate the testing venues to meet the security needs and not impact health, safety, and commerce at operational ports. While it was hoped that later testing would address the criticisms of the earlier testing, DHS still has to operate under the limitations and constraints of security required for SNM and minimal impact to the flow of commerce. Furthermore, it is neither possible nor desirable to test every possible combination of cargoes and configurations. Physical testing with radiation sources, especially special nuclear material, is expensive and time consuming, and procurement decisions must be made in a timely fashion. For all of these reasons, the tests need to be designed strategically to answer questions about performance across the vast space of possible cargo and threat objects, rather than testing that space comprehensively through gross effort. As a general principle, the goals of testing and criteria for evaluation need to be clear and testable for a test and evaluation program to be effective. In some past testing, the goals and criteria were not clear, or they shifted with time. This is one factor that led to test designs the
OCR for page 29
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 29 results of which did not adequately answer key questions about performance. Furthermore, to be useful, the goals and criteria need to be relevant. In this case, relevance means that the tests need to reflect conditions in real world cargo, real environments, and the actual operation of detectors in the field. DNDO did base some of its test design on data collected on the stream of commerce using a PVT system and an ASP system at NYCT. Much more information relevant to test design could have been elicited from data collected on alarms, correlated to shipping manifests at ports of entry around the country, even without ASP data. One set of goals has been articulated following Congress’ language that requires that the ASPs demonstrate “a significant increase in operational effectiveness.” DHS was responsible for defining these terms and in July 2008 issued the definition, found in Sidebar 3.1. The criteria in the definition pertain to detection, identification, referrals from primary screening to secondary screening, and speed of screening. SIDEBAR 3.1 DHS definition of Significant Increase in Operational Effectiveness of the ASP-C Criteria for Significant Increase in Operational Effectiveness [SIOE] of the ASP-C when deployed for: Primary Screening If ASP-C satisfies all of the following four criteria for primary screening, then a SIOE has been demonstrated, independent of whether the criteria for deployment to secondary screening have been satisfied. These enhancements would increase CBP's capability to interdict SNM as well as reduce the volume of traffic requiring secondary screening. 1. When Special Nuclear Material [SNM] is present in cargo without NORM, the probability of a correct operational outcome for the ASP-C must be equal to or greater thana the PVT RPM. 2. When SNM is present in cargo with NORM, the ASP-C in primary must increase the probability of a correct operational outcome compared to the current end-to-end system as defined above. 3. When licensable medical or industrial isotopes are present in cargo, the probability of a correct operational outcome for the ASP-C must be equal to or greater than the PVT RPM. 4. When the only radioactive source present in the cargo is NORM, the ASP-C must refer at least 80% fewer conveyances for further inspection than the PVT RPM. Criteria for Significant Increase in Operational Effectiveness of the ASP-C when deployed for Secondary Screening If ASP-C satisfies both of the following criteria for secondary screening, then a SIOE has been demonstrated, independent of whether the criteria for deployment to primary have been satisfied. These enhancements would increase CBP's capability to interdict SNM while more consistently and expeditiously executing secondary screening operations. 1. When compared to the handheld Radioactive Isotope Identification Device (RIID), ASP-C must reduce, by at least a factor of two, the probability that SNM is misidentified as NORM, a medical/industrial radionuclide, unknown, or no source at all. 2. When compared to the handheld RIID, the ASP-C must reduce the average time required to correctly release conveyances from secondary screening. a For HEU, ASP-C must show improved performance compared to PVT RPMs at operational thresholds. SOURCE: Oxford et al. (2008)
OCR for page 30
30 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT PAST TESTING FINDING Performance tests prior to 2008 had serious flaws that were identified by the Government Accountability Office and the Secretary’s ASP Independent Review Team. Tests prior to 2008 did not adequately establish the full capabilities of the ASP systems compared with the currently deployed PVT and RIID screening systems, nor whether the ASP systems met criteria for procurement. This finding is based on several factors, which are discussed in some detail below. In briefings to the committee in 2008, DNDO staff agreed with several of the criticisms of its prior tests and stated that its 2008 tests were designed to address those deficiencies. The 2008 testing approach is described in the next section. The GAO in 2007 stated that DNDO used biased test methods that enhanced the performance of the ASPs; DNDO’s NTS tests were not designed to test the limitations of the ASPs’ detection capabilities; and DNDO did not objectively test the performance of handheld detectors because they did not use a critical CBP standard operating procedure that is fundamental to this equipment’s performance in the field (GAO 2007b). Specifically, GAO wrote “DNDO conducted numerous preliminary runs of almost all of the materials, and combinations of materials, that were used in the formal tests and then allowed ASP contractors to collect test data and adjust their systems to identify these materials.” With respect to bias, the IRT (2008) stated: However the IRT’s assessment is that the system’s configurations were locked and the test results were derived from automated systems that had not been modified to benefit from the reduced set of possible outcomes. Operators were given no advance guidance on the sequence in which threat objects were presented. In short, the IRT did not find any evidence to support the notion that the NTS test procedure resulted in the manipulation or biasing of test results, nor does the committee believe that the NTS data needs to be discarded on the basis of this issue. [Page 91.] The committee did not independently verify these facts (e.g., that the configurations in 2007 were locked). The committee’s understanding of the operational use of the ASP and PVT is that the systems provide alarm outputs based on programmed algorithms, not on operator decisions, so no intentional real-time biasing of results by test operators was possible during the tests. However, DNDO utilized the same sources, masking material, attenuating material, and configurations in performance testing that were used in the set up for testing (dry runs and dress rehearsals). If the vendors were allowed to calibrate their equipment and adjust their algorithms using the test threat objects, then the equipment could more easily recognize the spectra. The numbers of sources available were small, but this is not sufficient reason to use the same sources for both set up and testing. Device setup and any calibration must use separate sources from those used for testing. In contrast with the ASP, the RIID requires much more operator interaction. DNDO performance tests prior to 2008 did not follow all of the relevant standard operating procedures for use of the RIIDs. According to the test plan (DNDO test plan) and briefings to the committee,
OCR for page 31
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 31 this error was corrected in the 2008 performance tests. Regarding those procedures, the committee observed in visits to ports of entry that the operator actions with RIID and Laboratories and Scientific Services (LSS) are inconsistent, which could affect results, and would even permit bias—either a positive or a negative bias—for comparing PVT/RIID and ASP in secondary, although the committee observed no operator bias. Based upon observations at operational ports and during the testing at NTS in 2008, even under the best circumstances (ideal technical performance by the RIID), the effective use of the RIID depends on the actions of the operator and decisions on the spot, which may not be consistent. The committee observed variations in procedure, from one inspection to another, even with the same operator. The committee therefore concludes that the RIID is susceptible to ineffective use. The committee agrees that pre-2008 tests did not examine the limitations of the ASP’s detection capabilities. If all of the results from a particular test are either positive (able to detect) or negative (unable to detect), the examiner does not know how close the detector is to the transition between positive and negative. The transition can be quite steep, and can be affected by other factors that are not controlled in an operational environment. Furthermore, it is useful to identify cases in which the ability to detect is poor both because it could help to provide guidance on how to improve the system and because there is good reason to believe that smugglers will choose smuggling strategies that result in poorer detection. A good physical test of the capabilities and performance of a detector system maps the output of the system (the result) as one parameter, such as the shielding, is increased stepwise and the detector transitions from being able to detect to not being able to detect the radiation of interest. For example, according to the IRT review (IRT 2008), the average NORM used in the 2007 NTS tests was comparable to the average NORM in cargo observed at NYCT. But a small percentage of cargo observed at NYCT had much higher levels, which may be sufficient to mask at least some of the threat objects identified by DOE and DNDO. SCIENTIFIC RIGOR AND ROBUSTNESS OF DNDO'S 2008 TESTING AND ANALYSIS APPROACH FINDING The 2008 performance tests were an improvement over previous tests. DNDO physically tested some of the limits of the systems. However, the following shortcomings remain. (1) Without modeling to complement the physical experiments, the selected test configurations are too limited; (2) the sample sizes are small and limit the confidence that can be placed in comparisons among the results; and (3) in its analysis, some of the performance metrics are not the correct ones for comparing operational performance of screening systems. Many of the flaws in past testing were addressed in 2008 tests. For example, in 2008 performance tests, real CBP officers conducted the RIID screening of containers referred to secondary screening, and DNDO included LSS analysis in evaluating the outcomes of those screens. The threat objects (highly enriched uranium and plutonium sources) used in 2008 tests had not been used in any previous tests or calibrations, which addressed another criticism of the 2007 NTS tests. Also, more challenging masking material was used for some cases. Appendix D lists the combinations of threat objects, shielding material, and masking material, and their configurations used in the 2008 performance tests.
OCR for page 32
32 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT However, even with these improvements, shortcomings remain. These include structural problems with the testing. Without modeling to complement the physical experiments, the selected test configurations are too limited DNDO was limited by time and resources in what could be evaluated. For example, the number and type of threat objects available to the testers through NTS and the Device Assembly Facility (DAF) was small, and only one was the same mass and shape as the objects described in the threat guidance. 32 DNDO and its supporting scientists adapted to the lack of a threat source that corresponds to the guidance threat by using computer simulations to model the sources and determine what mass of threat material in a standard shape would emit equivalent radiation. The number and type of sources tested cannot be considered “canonical,” i.e., they do not comprise a “complete set” from which any possible source in a cargo container can be constructed. Although a complete set is not practical or feasible, in the context of modeling described below it is likely that a useful subset that spans the space of possible threats can be identified. Because the number of possible permutations of cargo material is very large, loading and unloading the shipping containers during the tests to cover all possible shielding and masking variants is impossible, and the fact that the test sources are only available at NTS precluded the assessment of background effects at multiple sites. In light of these limitations, the tests were designed to evaluate the response of the detectors to containers with different configurations: empty, a radiation source without additional shielding, a radiation source with shielding, and a radiation source with masking material. The test design takes advantage of factorial design, which allows for multiple factors to be tested and evaluated at one time, and is considered a sound method of experimental design to obtain much information in a limited number of test runs (see Appendix C). 33 However, while the test design is reasonable as far as it goes, the tests performed are not adequate to fully characterize the instruments nor to predict their performance when monitoring the stream of commerce. In part to address this problem, DNDO engaged scientists at Pacific Northwest National Laboratory, Sandia National Laboratories, the Johns Hopkins Applied Physics Laboratory, and Los Alamos National Laboratory to carry out “injection studies.” These are virtual tests in which the gamma spectra of additional test sources, which were experimentally recorded at the national labs under controlled circumstances, are added to (“injected into”) spectra of cargo in the stream of commerce collected by ASPs during the 2007 New York Container Terminal test. These combined spectra were then used to challenge the threat identification algorithms of the ASPs. For example, of the 22 radiological and industrial isotopes of concern to DNDO, 13 were acquired for testing, and nine were considered impractical or unnecessary to obtain for physical testing. The response of the detectors to these nine radioisotopes is assessed by “an inspection of the threat algorithm” alone. (Description of Medical and Industrial Radionuclides in version 4.10 of the ASP-C Performance Specification April, 2008) 32 The committee was told that DNDO selected among the few SNM sources available from the DAF. 33 Practical constraints on the performance testing prevented DNDO from conducting random trials. In other words, the same threat object and configuration was passed through the portals repetitively in a linear sequence. Such a testing approach is unlikely to detect some kinds of systematic errors, although the committee could not identify credible, significant systematic errors that would be missed. Randomness is important because the usual methods for assigning uncertainties to the results assume random trials and do not account for possible systematic effects. However, there are good reasons why these tests could not be random and the committee was unable to identify a significant consequence of the non-random tests.
OCR for page 33
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 33 This type of testing is appropriate, and calculations of this kind seem to have helped DNDO address the problems from 2007, when the performance tests did not chart the performance across detection thresholds. The preliminary 2008 test results that the committee has seen suggest that the tests found the transition ranges from undetectable to detectable. The committee concludes, however, that DNDO should go beyond the existing tests and model a set of test sources that represents the spectrum of possible sources and compare the results of the studies to the physical data acquired during testing to identify flaws in the modeling and algorithms. For baseline information, DNDO needs to characterize the performance of the ASP and PVT detection systems for the cases of highly enriched uranium, plutonium, uranium-238, with and without NORM, and shielding, as well as NORM without threat material. In addition, DNDO needs characterization data for the background spectra for non-radioactive containers at both NTS and one or more of the representative ports. These data will provide basic detector characterization information, which will assist in the development and assessment of computerized system models. The committee recognizes that the security and health and safety restrictions for using SNM in tests preclude doing realistic tests at operational ports of entry and that some calculational bridge is needed to explore a detection system’s capability. At the time of this interim report the committee had not received a full description of the “Injection Studies,” but the briefing the committee received indicates that they were done by adding experimental threat- object spectra to data collected on actual commerce traffic with NORM present and using the algorithms to see what the detection probability would be for the superposed spectra. The committee would like to see this approach extended to a more robust modeling approach that uses simulations of the radiation source, radiation transport through the material in the container and to the detector, and the response of the detector to generate the spectrum. These simulations need experimental validation and so should be compared to the performance data collected at NTS. If they do not agree within statistical uncertainties, then the reasons for disagreement should be examined and corrected. When broad agreement has been obtained, then examples of observed NORM and medical and industrial radiation sources can be integrated in a model with threat material to explore the capabilities of the ASPs and PVTs against a much larger, more multidimensional threat space. These new simulations are distinct from the isotope identification step. DNDO has required that the detector systems record data in a standard format, which represents the gamma spectrum. The isotope identification software algorithm analyzes the spectrum in that data file. Any isotope identification software should be able to analyze the spectrum from any detector and from any simulations. There are other important elements of the software, such as reading the occupancy sensor and operating the gate arms. Those pertain to integration with the physical system, but the isotope identification module is the essential piece for performance of the system and is separable from the rest of the system (see Figure 3.2).
OCR for page 34
34 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT Radiation Detection Portal Cargo Container Detector Scattered gamma ray Pulse Ana- gamma ray lyzer Source Spectrum in Isotope ID standard output Algorithm format Software Module New Modeling Spectrum in standard output format Figure 3.2 Illustration of the physical system that generates a detected gamma ray spectrum (top) and the suggested new modeling to simulate the same process and generate a spectrum (bottom). Note: This drawing is not to scale and does not show all of the elements or components of the detector system. To overcome the inherent limitations of physical testing, modeling of the ASP systems responses would be invaluable to the DNDO testing and analysis. With these models, many test geometries could be evaluated and the selected results compared to the actual physical tests to verify the modeling. Modeling can help to identify configurations for physical testing, and the physical tests can be used to validate the models. Accurate modeling could help identify the limitations inherent to the technology and the detectors and can assist in the development of new technology over time. In the current round of testing, the effects of shielding and masking were assessed separately. While this allows for characterization of instrument response when faced with each scenario, it does not reflect a realistic scenario in which both masking and shielding material could be used to conceal radioactive material. The effects of the two types of concealment are not simply additive, and a combination of the two should be investigated. The number of test configurations that can be tested physically is finite. Loading and unloading of containers with shielding and masking material is time-consuming, and time spent on testing is costly. Here again is a case where a thorough modeling of the well-characterized spectral response of the ASP systems would be beneficial in assessing a wider range of scenarios for concealment of radioactive material. Data from the shielded-only, masked-only, and shielded + masked sources would enable DNDO to assess the validity of the simulations and their ability to accurately reflect detector performance capabilities. Using modeling calculations with the vendors’ algorithms, test scientists can determine configurations of shielding and masking that
OCR for page 35
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 35 would likely result in detection and identification in primary and identification in secondary with a probability of 50 percent. This would enable DNDO to identify the critical portion of the performance curve, that is the transition from correct to incorrect results from the ASP system and to confirm these calculations by measurements at NTS. The probability of each outcome can be tested at the NTS to confirm the accuracy of the models for select cases and either cause a re- evaluation of the models or build confidence. The subset of configurations for physical testing to validate models would be chosen to test the cases where the expected results, based on simulations, are most sensitive (transition regions). In other words, the simulations would be used to predict the configurations that are in the detectors’ performance transition (from high-confidence detection to low- or no-confidence detection), and the physical tests would be run to test that hypothesis. Each set of physical tests would be used to validate the performance of the models in different regions of the test space. Tests that DNDO has already done (including the pre-2008 tests, which used a wider range of source materials) could be used in this effort, despite their shortcomings as performance tests. Performance testing takes place only at NTS, and DHS’s operational testing of the ASPs is planned to take place at only one location: The Port of Long Beach. The committee believes that it is important to evaluate the effects of a variation in background intensity and spectra because significant variations are expected among the ports of entry across the United States. Computer modeling would be able to assist in the identification of limits of the algorithms’ ability to differentiate threat materials from the background radiation. There are many factors that can affect a radiation detector’s capability, but it is not possible to test all of the possible variations to threat material configurations, background, shielding, and masking within the stream-of-commerce at all ports of entry. The current round of physical testing does not reflect realistic scenarios well, although it does provide important information about the response of the detectors to specific, controlled cases. A thorough consideration of the methods of concealment of nuclear and radiological material that could reasonably be expected from an adversary would better characterize the performance of ASPs for the cargo-screening mission. The models could better cover the full test space of scenarios that need to be evaluated, a goal that cannot be attained practically by physical testing alone. The sample sizes were small and limit the confidence that can be placed in comparisons among the results The time and resource constraints mentioned above limited the number of runs for each configuration (the sample size) severely: as few as 6 and as many as 12. With such small sample sizes, the uncertainties associated with the results are relatively large. This is mostly a concern in the performance transition range for the detectors (where the detection probability is neither 1 nor 0). The number of runs (sample size) for each configuration needs to be large enough that the uncertainties (error bars) are small enough for reasonable comparisons to be made to each other and to results of simulations. The size of the sample needed can depend on the results of the tests. In its analysis, some of the performance metrics are not the correct ones for comparing operational performance of screening systems. Test system performance usually is characterized in terms of detection probabilities, measuring the probability that the test system alarms (the test result is positive), given that the
OCR for page 36
36 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT screened cargo truly contains threat material, or that it does not alarm (the test result is negative), given that the screened cargo does not contain threat material. Because measurement of the detection probabilities relies on true knowledge of the cargo contents, one can estimate those probabilities only from a designed experiment. In real life, however, with real trucks, one observes only the result (alarm status) of the screening system. Either the system alarms or it does not, but one does not know the true state of the cargo. The result of an accurate system ("alarm" or "no alarm") would be a reliable indicator of the cargo contents (SNM or no SNM), but an inaccurate system would be an unreliable indicator. One is concerned especially with this question: Given that the test system did not alarm, what is the probability that the cargo contained SNM? That is, what risk does CBP take by allowing a "no-alarm" cargo to pass? This "false-negative rate" (FNR) has serious consequences. But translating from the measured probabilities to the false-negative rate and the false positive rate requires some mathematical manipulation and introduction of an additional parameter: the prevalence of threat material in cargo. Given that this parameter is neither known nor measurable, comparisons between the performance of two screening systems can best be measured by using ratios between the rates for the systems being compared. Such a metric will more accurately reflect the relative performance of the screening systems. This issue is described in detail in Appendix B. Performance Testing Results and Evaluation FINDING Because they have large detectors and because of their configuration, ASPs would be expected to improve isotope identification, and provide greater consistency in screening each container, greater coverage of each container, and increased speed of screening over that of the PVT/RIID combination when used in secondary screening. Consequently, tests of ASPs in secondary screening are focused on confirming and quantifying that advantage for a variety of threat objects, cargos, and configurations. The greater consistency, better coverage, and increased speed of secondary screening are the results of the configuration of the ASP systems. The ASPs have larger sodium iodide crystals than the RIIDs. That size results in higher gamma count rates than in a handheld RIID examining the same source, which compensates for the greater standoff distance and the shorter exposure time for the ASP. The ASPs have better coverage of the containers. The consistency of ASP screening depends on the speed of the truck through the portal. As noted elsewhere in this report, different CBP officers using the handheld RIID place it differently. Preliminary results from 2008 tests confirmed that this is true for the tested cases, but the physical tests could not demonstrate that ASPs are superior to the screening system currently in place over the whole operational envelope. As noted above, when used for primary screening, an ASP system should be compared to the existing combined primary and secondary screening system (both PVT and RIID) because of differences in standard operating procedures for primary screening. DNDO’s preliminary analysis appears to have accounted for this difference. It is not clear to the committee how DNDO will interpret the performance test results in the context of the criteria for “significant increase in operational effectiveness. Each tested configuration is distinct, and averaging across configurations is not meaningful without applying normalization or weighting factors. DNDO could use the NYCT data as weighting factors,
OCR for page 37
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 37 although there are two challenges associated with this approach: (1) the relevant features are multidimensional (gamma flux, radionuclides in cargo, density of attenuating material, composition of attenuating material) and (2) NYCT data reflect cargo passing through one large port at the time of the data collection, and cargo is different in different ports of entry and changes with time. Even if these challenges are addressed weighting factors may only be valid for evaluating likely referral rates, not performance against threat objects in containers in commerce. The configurations could be weighted according to their frequency in the actual stream of commerce (if that could be determined). However, there is no reason to think that malefactors will choose the configuration of a cargo container for smuggling a nuclear weapon randomly from configurations in the stream of commerce. Finally, as noted above, there are large uncertainties in the results of these tests. The numbers of conveyances for each source were small and the uncertainty associated with a small sample is large. The costs of conducting larger sample tests with the same number of configurations may have been prohibitive, which simply highlights the need to select the physical test configurations carefully to maximize the information gained from those tests. Operational Testing The current plans call for operational testing of the ASP systems that is of short duration and limited breadth. ASP systems will be installed at only one site for three weeks. This limited testing and subsequent analysis does not allow DNDO to take full advantage of the opportunity to collect information about real-world stream-of-commerce effects on detector performance. While Pier A at the Port of Long Beach, the location for the test, does have a high volume of cargo traffic, it is a location where the weather generally does not vary a great deal, and the type of container coming through the terminal is predictable and not representative of all ports of entry (POEs). By limiting operational testing to the environment and the cargo mix at a single site, the curtailed field test is missing a prime opportunity to assess detector performance in the real world. Operational testing is designed to determine if the system is effective and fully useful in field, operational settings and when operated by regular users, not just in a laboratory or test setting. Operational test and evaluation means the field test, under realistic operational conditions, of any equipment item or system intended for use by typical DHS users in defending the U.S. homeland; and the evaluation of the results of such tests. Realistic operational testing is intended to be independent from the contractor or developer of the system being tested, with the evaluation of the results also reported independently. Realistic operational testing is intended to use production representative systems, operated by typical users who may not have the same training or expertise as the scientists and engineers who developed the system in the first place. To the extent possible, the system or equipment under test is to be operated under realistic stress and operational tempo, in an end-to- end manner, using the same procedures as would be expected in everyday use, in an operationally realistic environment, with the other interfacing systems with which the proposed system is to be interoperable on line. In the case of an RPM, the “threat” is to be as realistic as possible, including both the types of radioactive materials defined in the threat, and the naturally occurring radioactive materials that are found in routine commerce. If the system under test might be vulnerable to interferences, such as radio communications or other electromagnetic interference, those sources should be present in the test also. Finally, because it may not be
OCR for page 38
38 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT practicable to conduct a statistically significant number of operational tests, the test challenges to the system are to be at the edges of the operating envelope and not only at the center of the operating envelope. Contractor involvement in these operational tests is to be strictly avoided to eliminate a possible source of bias, the effects of having a highly trained “golden crew” operating the system, and to gauge the effectiveness of the system when operated by expected users. At the time that this Interim Report was written, the operational tests planned by DNDO had not been conducted, and the committee does not know whether the general guidelines for operational testing described above will be followed. Changes to the DNDO Approach to Testing RECOMMENDATION For a more rigorous approach, DNDO should use theory and models of threat objects, radiation transport, and detector response to simulate performance, predicting outcomes, and use physical experiments to validate or critique the models’ fidelity to reality and enable developers to refine the models iteratively. With validated models, DNDO can evaluate the performance of the ASP systems over a larger, more meaningful range of cases than is feasible with physical tests alone. To make the testing and evaluation more scientifically rigorous, the committee recommends an iterative approach with modeling and physical testing complementing each other. As is noted earlier in the report, the threat space—that is, the set of possible threat objects, configurations, surrounding cargoes, and conditions of transport—is so large and multidimensional that DNDO needs an analytical basis for understanding the capabilities of detectors for screening cargo. DNDO’s current approach is to physically test small portions of the threat space and to use other experimental data to interpolate within the threat space to test the identification algorithms in the detector systems. Computer models are essential to the testing process: It is not feasible to examine all of the relevant permutations of cargo and threat materials with physical tests alone. Computer modeling can examine detector-system and algorithm behavior for a large number and breadth of cases with a relatively modest commitment of funds and time. However, the models need to be validated against results of physical tests that are carefully designed and selected to represent cases covering the test space (the full domain of configurations and compositions of cargo, masking material, shielding material, and threat objects). The injection studies that DHS and DOE have sponsored enable scientists to test the isotope identification algorithms, but the role of injection studies in the overall test plan is still very limited and does not establish an analytical basis for understanding the detector systems’ capabilities, so a more full and more fully integrated approach to modeling and physical testing is needed. 34 34 GAO describes a PNNL report that discusses the limitations of injection studies. According to a Pacific Northwest National Laboratory report submitted to DNDO in December 2006, injection studies are particularly useful for measuring the relative performance of algorithms, but their results should not be construed as a measure of (system) vulnerability. To assess the limits of portal monitors’ capabilities, the Pacific Northwest National Laboratory report states that actual testing should be conducted using threat objects immersed in containers with various masking agents, shielding, and cargo. (GAO 2007b)
OCR for page 39
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 39 DHS and DOE are both deploying detectors that screen vehicles and cargo for nuclear and radiological material, and both have an interest in better understanding the capabilities of deployed and proposed detection systems. The committee recommends that DHS and DOE integrate the modeling and testing in a scientific, iterative approach: theory and models would be used to predict outcomes of tests; the test outcomes would then be used to validate or critique the models; and the models would be used to explore a variety of possible threats, the full range of which is very large and cannot be individually tested. This kind of interaction between computer models and physical tests is essential for building scientific confidence. DOE and its national laboratories have extensive experience with both detector development and iterative simulation and experimental validation of models, most prominently in the stockpile stewardship program. The performance tests conducted to date provide some validation points for modeling as well as some assessment of detection capability for parameters such as the effects of source, shielding, masking, speed, and background radiation level on ASP system performance. These existing results are a sensible starting point for validation, but large uncertainties remain in these parameters due to limited experimental conditions and small sample sizes. For all of the reasons cited above about 2008 performance tests, DHS cannot conclude definitively whether ASPs will consistently outperform the current PVT-RIID systems in routine practice until the shortcomings are addressed. Better measurement and characterization are a necessary first step but may not be sufficient to enable DHS to conclude that the ASPs meet the criteria DHS has defined for achieving a “significant increase in operational effectiveness.” The committee recommends modifications to the current DHS approach to the evaluation procedure. These modifications would influence subsequent procurement steps. Recommended Approach to the ASP Procurement Process RECOMMENDATION DHS should develop a process for incremental deployment and continuous improvement, with experience leading to refinements in both technologies and operations over time, rather than a single product purchase to replace current screening technology. In attempting to meet a procurement schedule, DNDO has approached the development of the ASP systems as a point goal rather than the beginning of a longer-term process of technological improvement. The DNDO approach limits the possibility of iterative improvements to the technology and could result in unnecessary constraints on the ability to deploy future nuclear detection systems that would have improved performance characteristics. The committee agrees that injection studies and modeling cannot be seen as valid without physical tests with threat objects. Physical tests are needed for validation, as noted above, but they also can reveal engineering or manufacturing flaws. Modeling tells how a system should perform, assuming that the equipment as built matches the modeled detector, but confirmatory tests are needed with different units of the same equipment and under different conditions. The committee’s recommendation above states that well validated models can and should be used in conjunction with well selected physical tests when it is impractical to do sufficiently comprehensive testing by physical tests alone.
OCR for page 40
40 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT The passive radiation screening of cargo at ports of entry is expected to operate for a long time. Although this capability may be enhanced with scanning or interrogation equipment, 35 Congress has directed CBP to deploy passive detectors as part of the screening procedures for cargo entering the United States. CBP has put RPMs in place at hundreds of ports of entry. The threat environment, the composition of container cargo, technological and analytical capabilities, and the nature of commerce at the ports of entry have changed significantly over the last decade and are expected to evolve in both predictable and unpredictable ways in the coming years. Containerization changed the nature of shipping in recent decades. Patterns of flow in commerce continue to evolve as international trade changes, the world economy adjusts, and production shifts among different countries. Patterns of transport also shift in response to costs and incentives—for example, rail transport may increase relative to truck transport as pressures to reduce carbon emissions and other environmental impacts increase. Rather than focusing on the single decision about the deployment of ASPs, the current testing should be viewed as a first step in a continuous process of improvement and adaptation of the systems. DHS should develop a process for continuous improvement able to address and exploit these changes, rather than a single product to replace current screening technology. This would enable the system to be updated continuously so that it is not outdated or obsolete by the time all of the systems are deployed. RECOMMENDATION DHS should deploy its currently unused low-rate initial production ASPs for primary and secondary inspection at various sites. This would allow extended operational testing with a small investment. Such deployment, even on this limited scale, would provide additional data concerning their operation, reliability, and performance, and allow DHS to better assess their capabilities in multiple environments without investing in a much larger acquisition at the outset. The committee has heard DNDO staff say that under current law such deployments are not permitted prior to certification. The committee did not examine this question and cannot offer a legal opinion, but the committee considers a phased deployment to be a sensible approach. The committee recommends that DNDO reexamine the perceived restrictions and, if DNDO concludes that such deployments are not permitted, ask for permission to go ahead with them. RECOMMENDATION DHS should match the best hardware to the best software (particularly the algorithms), drawing on tools developed for the competition and elsewhere, such as the national laboratories. This should be applied to ASPs and also to improved RIIDs. The development of the hardware for radiation detection and the software for analyzing the signals from the detectors is separable. It has been useful to have a competitive approach for the systems and to see the results. However, as DHS moves forward, it should match the best hardware to the best software (particularly the algorithms). In doing so, DHS should draw on tools developed for the competition and elsewhere, such as the national laboratories. 35 Scanning is a process that actively irradiates the subject with x-rays or gamma rays to generate images of the interior of the container. Interrogation systems if deployed, would use pulsed neutrons or gamma rays to irradiate a container and would alarm on particular radiations from the irradiated cargo.
OCR for page 41
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 41 The NaI detectors used in the ASP are a mature technology but continued improvements in the detection and analysis algorithms can occur with research supported by DOE, DHS, and others. The vendors’ algorithms are somewhat limited compared to algorithms developed at government expense. With data from the hardware in a standard format, it would be straightforward to later incorporate new and improved detection and analysis algorithms. Further, improved algorithms, or even current ASP algorithms, could be used to substantially improve the performance of handheld RIIDs. ASPs will not eliminate the need for handheld detectors with spectroscopic capabilities. The greatest deficiency of the RIIDs currently in use is their software. Because some of the improvement in isotope identification offered by the ASPs over the RIIDs results from software improvements, the best software package should also be incorporated into improved handheld detectors. Newer RIIDs with better software might significantly improve their performance and expand the range and flexibility of deployment options available to CBP for cargo screening. If integration of improved software in hand-held devices is deemed impractical because of the computational limitations of a low-power, handheld device, the computational capabilities of a handheld device could be replaced or enhanced with a nearby desktop computer system that receives data from the handheld detector by wireless transmission. In 2006, DNDO rolled out a program to improve RIID software, called the Human Portable Radiation Detection System (HPRDS). However, the committee saw no evidence that this effort was linked to the ASP program or that potential improvements in the RIID were being considered in cost-benefit analyses (CBA). Linkage makes sense for the technology development, as noted above, and also for the CBA. If the HPRDS yields improved RIIDs in the next few years then the ASP performance tests will have compared the ASPs to outdated technology, which can lead to poor choices in cost-benefit tradeoffs. By separating the software and hardware elements and engaging the broader science and engineering community, 36 DHS would have increased confidence in its procurement of the best product available with current technology, and simultaneously could advance the state of the art. Correlation of Models and Simulations with Physical Test Results In addition to operational testing to demonstrate the performance of the system under realistic conditions, one must develop faithful models and simulations to examine scenarios that may not have been attempted in the field. The process of validating these models and simulations will include predictions of systems performance under conditions that are well- defined and can be tested in the field. Only if the models and simulations actually predict observed performance under conditions that are amenable to testing (within statistical uncertainties) will DHS have confidence that the models and simulations might be dependable for describing other configurations. Even then, there may be some configurations which the models and simulations do not predict adequately. This would not be surprising. To minimize the number of potential non-conforming configurations in this set, physical testing needs to explore informative, challenging cases. 36 Even short of the innovation that might arise from broader scientific perspectives, better documentation and peer review of the algorithms would make it easier to compare the algorithms and to evaluate this critical part of the system.
OCR for page 42
42 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT The RPMs must be tested as a complete operational system (not just as components), and under conditions that reproduce a fully integrated installation under a range of conditions to demonstrate correlation between test results and models and simulations. Similarly, test objects must be selected to adequately represent the threat that the system is meant to address. If the threat is nuclear terrorism, then the test objects and configurations would include nuclear materials in the quantities, shapes, and intensities, along with shielding or masking materials designed to foil the RPM, such as might be expected from an inventive terrorist. In addition to the improved understanding such testing affords, it can offer operational solutions to problems arising from the limitations of the detectors. If the threshold that would mask threat objects were known, then all cargo containers that are above that threshold could be referred to secondary screening and more thorough analysis. (As noted earlier in this chapter, DNDO revised its performance testing for 2008 to address this problem, and preliminary results suggest that the tests found the transition ranges.) The committee believes that by approaching the test, evaluation, and future technology development as an iterative process, the limited deployment of the existing ASP systems could be a vital tool in improving the technology prior to blanket deployment at U.S. ports of entry. Distribution of the existing ASP systems to ports and border crossings in a variety of locations and environments (Port of LA/LB, NYCT, and Detroit for example), would provide information about the variables in the real-world system that could be fed back into models and could be used to develop future generations of the hardware, software, and analytical algorithms. At the very least, operational testing should be expanded to take advantage of some of these opportunities. Other considerations RECOMMENDATION Scenarios identified by red-teaming efforts should be used in developing new models and physical tests of detection systems to learn ways of improving the technologies and their deployment. DNDO already has a red-teaming capability that is applied to operations, and the test programs are already intended to identify systematically the detection capabilities of the ASP systems. Red teams suggested here as part of an on-going testing and development program could help DNDO (a) identify strategies that smugglers without detailed knowledge of the systems are more likely to try and what the adversaries’ adaptation might look like; (b) identify new vulnerabilities that the new technologies and CONOPs introduce; and (c) identify what technological changes affect the effectiveness of the systems and their applications. Similarly, this approach is valuable in test design, ensuring that a realistic range of cases is examined and validating the testing protocols. The Special Tests (see Table 3.1) may have served some of this function, although they were designed for a slightly different purpose and appear not to have been as systematic as what one would expect from a red teaming effort. As noted earlier in this report, DNDO, CBP, and DOE have similar and overlapping missions and needs for screening vehicles and cargo. They use and are considering procuring much of the same equipment. DNDO has consulted and cooperated with DOE on some aspects of the ASP development, but these efforts should be expanded. A wealth of experience dealing with algorithm development and archives of data relating to radioactive material and spectral analysis exists within the DOE national laboratories. A call to the labs and other agencies for a
OCR for page 43
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 43 survey of past research and information, assistance, and collaboration could help DNDO tap into the expertise within those institutions.