3
Lessons from Other Large-Scale Systems

The many large-scale biometric systems in use today are deployed in a broad range of systems and social contexts. The successes and failures of these biometric systems offer insights into what can be learned from careful consideration of the larger system context, as well as purely technological or component-level aspects, during planning. Common characteristics of successful deployments include good project management and definition of goals, alignment of biometric capabilities with the underlying need and operational environment, and a thorough threat and risk analysis of the system under consideration. Common contributors to failures include the following:

  • Inappropriate technology choices,

  • Lack of sensitivity to user perceptions and requirements,

  • Presumption of a problem that does not exist,

  • Inadequate surrounding support processes and infrastructure,

  • Inappropriate application of biometrics where other technologies would better solve the problem,

  • Lack of a viable business case, and

  • Poor understanding of population issues, such as variability among those to be authenticated or identified.

Many of these factors apply in any technology deployment, biometrics-related or not.

While much can be learned from studying biometrics systems, it



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 76
3 Lessons from Other Large-Scale Systems The many large-scale biometric systems in use today are deployed in a broad range of systems and social contexts. The successes and fail - ures of these biometric systems offer insights into what can be learned from careful consideration of the larger system context, as well as purely technological or component-level aspects, during planning. Common characteristics of successful deployments include good project manage- ment and definition of goals, alignment of biometric capabilities with the underlying need and operational environment, and a thorough threat and risk analysis of the system under consideration. Common contributors to failures include the following: • Inappropriate technology choices, • Lack of sensitivity to user perceptions and requirements, • Presumption of a problem that does not exist, • Inadequate surrounding support processes and infrastructure, • Inappropriate application of biometrics where other technologies would better solve the problem, • Lack of a viable business case, and • Poor understanding of population issues, such as variability among those to be authenticated or identified. Many of these factors apply in any technology deployment, biometrics- related or not. While much can be learned from studying biometrics systems, it 

OCR for page 76
 LESSONS FROM OTHER LARGE-SCALE SYSTEMS seems appropriate, given their scale and scope, to consider whether the biometrics community can learn lessons from large-scale systems that have been deployed in other domains. This chapter explores some of the technical/engineering and societal lessons learned from large-scale systems in manufacturing and medical screening and diagnosis. In each case, the discussion points out useful analogies to biometric systems and applications. MANUFACTURING SYSTEMS Manufacturing systems convert initial materials into finished prod - ucts that must meet quality specifications. Each step in the conversion may consist of a complex process sensitive to multiple characteristics of the input materials and processing conditions. Each step also represents an economic investment; modifications to the process that can achieve equal or higher quality at lower cost are every company’s goal. Production-line systems have been studied systematically since before World War II from the perspectives of industrial engineering, statistics, experimental design, operations research, and quality control. Insights gained from the study of such systems have been generalized to better understand and improve the performance of systems for product development and other industrial processes and to facilitate improvements in corporate management. A simple example, used in a 2005 briefing to the study committee by Lynne Hare of Kraft, Incorporated, is the development of a new sensor for a manufacturing production line. The process begins with identifying the business need for the sensor and proceeds through its implementation and then deployment in the production line. The stages include explicit translation of the business need into the scientific requirements for the sensor, fabrication of a prototype sensor, preliminary (static) testing, for- mal static and dynamic testing, pilot installation and testing, and produc - tion line implementation and validation. The process never ends, because revalidation is scheduled at periodic intervals. At each stage of testing and data collection, the information obtained may send the development process back to an earlier stage to correct any observed deficiencies and improve robustness of the sensor to varying conditions. This example can be interpreted directly or as analogy. Directly, it gives a model for developing and implementing devices required by any biometric system to sense biometric traits, for example, fingerprint scan - ners, iris scanners, and audio recorders. There is also an analogy between development and validation of a sensor and the development and imple - mentation of a biometric system. In this analogy, the multiple levels of testing—preliminary static, formal static and dynamic, and production line testing—are counterparts to technology, scenario, and operational

OCR for page 76
 BIOMETRIC RECOGNITION evaluations of a biometric modality. Motivations for these three levels of testing in the sensor development environment can be informative for the development and testing of an entire biometric system. Additionally, a biometric system may be considered as a produc- tion line, the inputs as individuals presenting for recognition, and the output as a series of decisions that will achieve a high quality, reflected in low values of the false match rate and false nonmatch rate, and in a ratio appropriate for the system’s intended purpose(s). When a biometric system is looked at in this way, it can be seen that the methods of indus - trial engineering and statistical quality control can be applied to achieve system quality. At least three fundamental insights into managing industrial pro- cesses are also relevant to biometrics. The discussion below paraphrases selected core concepts from the work of Deming, Shewhart, Box, and their many successors.1 One insight is that careful articulation of requirements, preferably in measurable terms and derived from an end product or process, is exceptionally important to the successful development and implementa - tion of component parts. In the case of a production line, for example, a requirement might be for the sensor to respond reliably and repeatedly only to stimuli in the desired range and measure stimuli accurately, under conditions in the production environment. The range of stimuli, sensor sensitivity and resolution, and resistance to environmental disturbances must be accurately specified during the design process in order for the sensor to properly identify defective units, which is its ultimate purpose. By analogy, biometric system design should be driven by clear objectives for the recognition task in the context of the broader application rather than merely by the existence of an attractive technology. A second insight is that a scientific approach is invaluable to under- standing systems, particularly the interrelatedness of system components. The hallmark of the scientific approach is exploration through both theory and data. The performance of complex systems can be often improved by identifying and correcting bottlenecks or other localized problems whose negative effects may not have been fully perceived and articulated. Such problems and other aspects of interrelatedness and individual compo - nent performance can be identified by planned data collection guided by careful theorizing about the system. Such data may be collected by observation, as exemplified by the use of statistical control charts in a 1 George E.P. Box and Owen L. Davies, The Design and Analysis of Industrial Experiments, Edinburgh: Oliver and Boyd (1954); Walter A. Shewhart, Economic Control of Quality of Manu- factured Product, Milwaukee, Wisc.: Quality Press (1980); W. Edwards Deming, Out of the Crisis, Cambridge, Mass.: MIT Press (1986).

OCR for page 76
 LESSONS FROM OTHER LARGE-SCALE SYSTEMS production line, or by direct experimentation on the system itself. In such experimentation, system inputs or conditions are systematically modified to learn their effects on the functioning of the system and the quality of its output. Such experiments can be carried out from time to time or can be built into the system itself. Evolutionary operation (EVOP), an example of the latter, refers to the regular alteration of baseline system parameters by small amounts during production runs. The changes made are too small to disrupt system operation, and the system is run with these changes in place just long enough to assess the effects on product quality and other aspects of performance. Changes that most improve performance may then be retained, and the process continued from new baseline values of parameters. Iterations of such experimentation gradually nudge the sys - tem toward optimal parameter values by exploring nonlinear regions of the “response surface” that relates performance to different combinations of parameter values. A third insight, stressed in statistical quality control and one of the four pillars of Deming’s “system of profound knowledge,”2 is the impor- tance of understanding background variation in system performance and identifying separable contributors to it. The first and foremost meaning of “understanding” in this context is recognition that systems exhibit natu - ral variability due to random influences, and that inordinate reaction to such short-run variability is often wasteful and of little benefit. Although dramatic but relatively brief slumps and streaks are a major source of dis - cussion by sports analysts and some stock traders, basing major decisions on such brief events rarely leads to prosperity for a baseball team or an investor. A deeper level of understanding develops from the awareness that random variation in output typically comes from multiple sources that persist even as its momentary influences fluctuate. In technical par- lance, these sources and the measures of their strength are often referred to as components of variation (or variance), and in industry parlance as “the voice of the process.” In a manufacturing process they might include variability in raw material batches, calibration drift of instruments guid - ing system processes, problems with machinery maintenance, and human error. Reducing the variation from such common sources can improve product quality over the long run. Some variation arises, however, from “special” sources that would not be expected to recur and against which changes to the system can do little to protect. Thus, identification and reduction of the largest components of variance from common sources is generally accepted as critical to quality improvement in industrial pro - duction systems. Some version of Deming-Shewhart plan-do-study-act 2 See W.E. Deming, The New Economics for Industry, Goernment, Education, Cambridge, Mass.: MIT Press (2000).

OCR for page 76
0 BIOMETRIC RECOGNITION (PDSA) cycles is generally used to bring a scientific approach to bear on this task. The insights sketched above apply to biometric recognition systems no less than to any other systems. But since they provide an approach rather than a prescription for learning about and improving systems, their implications will vary greatly according to context. Moreover, for many operational biometric systems the “ground truth”—that is, the “correct” answer in terms of system objectives—is indeterminate for many trans - actions. The approach described above is invaluable in developing such systems. The emphasis on examination of process variables in an opera - tional mode is potentially very helpful. Its potential benefits are even greater if challenge experiments can be superimposed on the operational system. There is substantial precedent for such challenge experiments in other contexts, including evaluations of Internal Revenue Service tax assistance and Transportation Security Administration airport passenger and baggage screening. So, independent of the particular biometric modality and its applica - tion, the following lessons can be drawn from the experience and meth- odologies that have evolved in industrial production: • System objectives must be clarified at the outset if the system is to be designed efficiently and if the ability to evaluate system performance is to be preserved. In particular, the often-interrelated but distinct goals of improving convenience, controlling access, detecting threats, lowering costs, tracking and managing employees, and deterrence must be distin - guished and prioritized in system planning. • The operational environment, including the range within which environmental characteristics and characteristics of the populations pre - senting to the system will vary, should be anticipated as much as possible in systems development. This includes consideration of operation under routine conditions; under unusual conditions unconnected to any specific threat; under realistic threat scenarios for attempts to defeat the system at the individual level; and under realistic threat scenarios for penetrat - ing, degrading performance, or shutting down operations at the system level. • To the extent that systems are mission-critical, large-scale, and addressed to national security, controlled observation at the operational level, including ongoing challenge experimentation, is essential. In rou - tine operation, many errors are likely to occur in which an individual making a true recognition claim is at first erroneously restricted but the mistake is later discovered and corrected, at which point these errors become visible and available for analysis. However, when an individual gains unauthorized access because, for example, a false claim of identity goes undetected, the error may remain undiscovered for a long time.

OCR for page 76
 LESSONS FROM OTHER LARGE-SCALE SYSTEMS Challenge experiments, which observe and compile system responses to inputs representing (1) typical experience, (2) variations in conditions, (3) difficult presentations requiring adjudication or systems adaptation, and (4) attack modes, are the best way to identify the potential for such errors and ways to prevent them. Erroneous rejections of true recognition claims and erroneous acceptances of false claims should be documented and subject to rigorous fault analysis, just as would take place in the case of an investigation into a transportation crash. Such analysis should include comparison with a sample of correct recognitions used as controls in order to distinguish factors predisposing to errors from those predisposing to correct decisions. • Studies of system behavior, including those attempting to discover and reduce the largest contributors to system error and the most variable components of intermediate products that contribute to recognition deci - sions, may be as revealing and helpful for biometric systems as they have been for systems involving other repeatable processes. MEDICAL SCREENING SYSTEMS Medical screening systems collect diagnostic information, generally in a staged sequence, in an attempt to locate individuals with an unde - tected disease that can be more effectively treated early in its course. The input to such a system is data from a population of individuals, some with disease but most without. Results of the stages generally are clas- sified as positive or negative, and only individuals who test positive at each stage are labeled by the system as having the disease. Consider the following simplified (and not necessarily medically realistic) view of a prostate cancer screening system. Screening is initiated by a digital rectal examination. The patient with a normal exam is not screened further. Abnormal palpation results, however, are followed by a prostate-specific antigen (PSA) test. Patients with a PSA level below a certain point are not screened further. When the PSA level is at or above this point, the prostate is biopsied. When the pathologist finds the biopsy to be negative for can - cer, the patient is so informed. When the pathologist finds it to be positive for cancer, the diagnosis is considered to be established, and the patient is referred for consideration of treatment alternatives. The alternatives, and indeed the importance of any treatment, may vary depending on age of the patient, stage of the cancer, and rate of progression, which may often be determined by a watch-and-wait period.3 Each component test of this progression will detect some prostate 3 Thepoint here is that the disposition of a medical screening result may vary as a function of patient factors, and what happens after that is, nonetheless, appropriately viewed as a system output. Similarly, the disposition of a biometric recognition (or lack of recognition)

OCR for page 76
 BIOMETRIC RECOGNITION cancers and miss others, its false negatives. Some men without prostate cancer, perhaps with another disease such as benign prostatic hyperplasia (BPH), will be classified positive at one or more steps. The proportion of cancers detected is known technically as “sensitivity” and the proportion of prostate cancer-free individuals classified as negative is known as the “specificity.” The complementary proportions—that is, the proportion of prostate cancers missed and the proportion of men without cancer who are identified as positives—are called the false negative and false positive rates. These are analogous to the false match and false nonmatch rates in a biometric recognition application. Note that each component of the screening system will have its own values of these numerical character- istics describing the performance of that component and that another set of values characterizes the performance of the screening system overall. In practice, the true values are generally unknown, but hypothesized or estimated values coupled with well-established mathematical relation - ships can provide useful guidance for screening policies. Screening systems have been extensively studied in a medical context. Their general characteristics are well understood, but their specific per- formance levels may be unclear. The following lessons are among those that have been learned: • Individual components in general usage are rarely as sensitive and specific as the components when they were under development because tests are usually developed and evaluated by researchers exceptionally skilled in their use on subjects whose states of health or disease are well known. • The value of each component to the screening system is determined not just by its individual properties but by the information it contributes in addition to the contribution of the other components. For instance, con - firming the result of a test by repetition is less valuable than confirming it by a different test that screens for a different disease marker. • Limitations of individual components can vitiate the effective- ness of other components. For instance, in the system described above, a pathologist who cannot detect true prostate cancer renders the accuracy of earlier components in the sequence virtually irrelevant. • Effectiveness of a system is highly population-specific, even when the system’s overall sensitivity and specificity are exceptionally high. This is easily seen by considering a screening system implemented in a popula- tion from which the disease in question is absent. No matter how high the might vary according to situational and subject factors; because the consequences of the system’s results affect system output, they are important in evaluating a biometric system.

OCR for page 76
 LESSONS FROM OTHER LARGE-SCALE SYSTEMS sensitivities and specificities of the system components and of the system as a whole, all positives will be false positives and the screening system will provide no health benefit. • In view of the preceding item, the performance of a system is best represented by its population-specific predictive values—that is, the pro - portion of screen-positive individuals who truly have the disease (positive predictive value) and screen-negative individuals who truly do not have the disease (negative predictive value). Alternatively, the ratios of screen- positives with the disease to screen-positives without the disease and the ratios of screen-negatives without the disease to screen-negatives with it, may also be used to represent performance. These measures combine information on the accuracy of the testing (sensitivity and specificity) with information on the composition of the population, since both are critical to determining whether screening is informative. • The ability of a system to detect disease and the importance of detection may vary by characteristics of the disease and the patients in whom the disease occurs. For instance, screening is more likely to detect slowly progressing (indolent) than rapidly progressing (aggressive) dis- ease, because the symptom-free period is longer for the former. But sensi - tivity is less important in detecting indolent disease, because subsequent rounds of screening may detect it before it has progressed much further. In the case of prostate cancer, elderly men with the indolent form may be more likely to die from something else before the cancer kills them. These observations are general, and the analogy to biometric systems is imperfect. They do, however, have some implications for biometric systems: • Laboratory and scenario testing are apt to underestimate field error rates of biometric applications. • Combinations of independent or minimally dependent characteris - tics and processes generally incorporate more information, and thus offer higher potential for improved performance, than combinations of more correlated components. Hence, in biometrics systems design, independent features, components of multimodal biometrics, and components of deci- sion-making scores are preferable to combinations of correlated alterna- tives of comparable cost. • A poor adjudication process, or an ineffective backup process for dealing with failures-to-acquire (see Chapter 2) in a biometric system, may negate the benefits of good error rates in the basic biometric technology. • Biometric technologies must be calibrated to the environment and population in which they will be implemented. For instance, one might expect different operational characteristics for biometric border-control

OCR for page 76
 BIOMETRIC RECOGNITION systems using identical technology on the Mexican border with Texas and the Canadian border with New York, in part because the frequency of attempted illegal border crossings in these places is so different. • System performance characteristics may vary by major popula- tion subgroups and by the types of challenges presented to the system. Extrapolation of technological or system performance characteristics across settings or challenges—for example, from (1) laptop access control to auto theft control to border control or (2) from illegal immigrants to narcotics smugglers to terrorists—is unlikely to be reliable.