Hypothetical and Illustrative Applications of the Framework to Various Scenarios
This appendix illustrates how elements of the framework described in Chapter 2 might be applied to various hypothetical scenarios. Each scenario posits a particular kind of terrorist threat, a possible technological approach to addressing the threat, and some of the possible impacts on privacy entailed by that scenario. The scenarios are intended to illustrate how application of the framework draws out important questions to consider and answer when deciding on the deployment of a program. They are by no means exhaustive in their application of the framework, and they do not exemplify all the technologies considered in this report.
NOTE: The committee emphasizes that the descriptions of technological approaches in this appendix are NOT an endorsement of or a recommendation for their use.
Terrorists continue to target air travel as an important objective. For the foreseeable future, aviation authorities will have to guard against the threat of an armed hijacking or the destruction of one or more fully loaded passenger planes.
A Possible Technological Approach to Addressing the Threat
Checkpoint screening of airport passengers and their baggage to prevent the transport of weapons (e.g., firearms, explosives) will continue. However, with advancing technologies, future security checkpoints could be different from today’s checkpoints in several ingenious respects:
Use of new sensors. New imaging sensors could be introduced to reveal whether weapons are being hidden under clothing, although these sensors might also reveal anatomical features of the body. Retinal scans and other biometrics could be introduced to help validate passenger identity. Sensors for thermal imaging of the body or portions of the body could be introduced to detect signs of nervousness or excitement, and additional video cameras could be introduced with new software for face recognition and for analyzing body motion to search for signs of nervousness and other suspicious activity. Some of these sensors could be positioned so that passengers are aware they are being sensed, while others might be positioned so that passengers have no specific, explicit warning that they are being sensed.
Use of real-time networking to share data instantaneously across multiple airport security checkpoints (both within the same airport and at different airports), and to integrate data with information in other databases. This approach would enable real-time sharing and fusion of information such as the detection that a nonstandard homemade briefcase containing unacceptable materials was found in airport A, and another similar event occurred in airport B, resulting in immediate transmission of information about the briefcase that would enable detecting other copies of it at other airports.
Use of data mining methods to draw inferences from a large shared data set, and to provide guidance to the human checkpoint operators. For example, computer-based screening profiles for luggage and passengers might be improved continuously based on experience with millions of passengers across many airports. As one example, consider that today a human operator decides to hand inspect a certain fraction of luggage after it has passed through the x-ray scanner, perhaps because a suspicious-looking object is seen in the x-ray scan. Each time this occurs, the result of the hand inspection could be provided as a training example to a data mining program so that it could learn, from hundreds of thousands of such experiences, which x-ray images correspond to truly dangerous objects as opposed to false alarms. Computer-based machine learning algorithms could use such training data, collected from many security checkpoints at many airports, to formulate a potentially more accurate profile that could automatically estimate a risk level for each object seen in an x-ray scan and to assist the human screener with the goal of reducing the number of false alarms leading to invasive manual searches.
This imaginary future security checkpoint allows grounding many of the generic issues faced when deciding whether and how to introduce new information collection, fusion, and analysis systems and how to manage their potential impacts on civil liberties both in their specific implementation and in terms of general policies and legal frameworks.
Possible Privacy Impacts
The privacy impact of detection technologies can vary significantly depending on choices made during deployment. The committee suggests that future regulations should differentiate systems and deployments based on features that can significantly affect perceived privacy impact, including:
Which data features are collected. For example, when capturing images of baggage contents, the images might or might not be associated with the name or image of the passenger. Anonymous images of baggage, even if stored for future data mining, might be perceived as less invasive than baggage images associated with the owner.
Covertness of collection. Images of passengers might be collected covertly, without the awareness of the individual, throughout the airport, or alternatively with the passenger’s awareness and implicit consent at the security checkpoint. Many will consider the former to be more invasive of their privacy.
Data dissemination. Data might be collected and used only for local processing, or disseminated more widely. For example, images of bags and passengers might be used only locally, or disseminated widely in a nationwide data store accessible to many agencies.
Retention. Data might be required by regulations to be destroyed within a specified time interval, or kept forever.
Use. Data might be restricted to a particular use (e.g., anatomically revealing images of airport passengers might be available for the sole purpose of checking for hidden objects), or unrestricted for arbitrary future use. The perceived impact on privacy can be very different in the two cases.
Use by computer versus human. The data might be used (processed) by a computer, or alternatively by a human. For example, anatomically revealing images might be accessible only to a computer program that determines whether there is evidence of a hidden object under clothing. Alternatively, these images might be examined by the human security screener to manually search for hidden objects. The former case may be judged as less invasive by many passengers. Note that if a computer examination identifies a suspicious case, then a manual examination can be the next step. If the computer examination is sufficiently accurate, such
a two-stage computer-then-human process might significantly reduce the perceived privacy impact.
Control of permissions. If data are retained for future uses, regulations might be placed on who can grant permission for subsequent dissemination and use (e.g., the collector of the data, a court, or the subject of the data). If the subject of the data is given a hand in granting permission, then the perceived privacy impact may be lessened.
Applying the Framework
To illustrate the use of the framework proposed in Chapter 2 for evaluating the potential deployment of new systems, consider how it might be used to evaluate the possible deployment of one of the technologies suggested above. In particular, consider that company X comes to the U.S. government with a proposal to deploy a system that would (1) create a network to share images of baggage that are currently collected at all U.S. airport checkpoints, as well as the outcome of any manual searches of those bags by security screeners, and (2) use this nationwide database for two purposes: first, to perform data mining to identify homemade versus mass-produced luggage bags (based on their relative frequency of appearance at airports), and second, to use the results of the thousands of manual searches performed nationwide to automatically train more accurate software to spot suspicious items in x-ray images of baggage. How would the proposed framework apply to evaluating such a proposal?
First, the framework asks for a clearly articulated purpose for the new system, an evaluation of why it may out perform current methods, and a thorough experimental evaluation of the system before full deployment. Note one might experimentally evaluate whether the data mining software of company X is capable of distinguishing home-made versus mass-produced luggage without going to the step of a full network deployment, by testing its use in one or two individual trial airports first. However, in many data mining applications, including this one, proving the value of collecting the full data set by testing on small sets is difficult, because performance sometimes improves as the size of the data set grows.
The framework would also raise issues regarding the rational basis for the program (Is it of significant value to spot custom-made luggage or custom-modified mass-produced luggage?, Does data mining the results of manual luggage inspections actually lead to more accurate automated luggage inspections and if so does this lead in turn to safer or less inva-
sive screening?). It would raise issues about scalability (Can the computer system and human infrastructure handle the large volume of data from all U.S. airports in real time?), and data stewardship (Who will be responsible for the data collection?, and How will it be administered?).
Compliance with Laws and Values
The framework asks whether an information-based program is consistent with U.S. law and values. The criteria for such consideration have been divided into three categories: data, programs, administration, and oversight. Does the proposed system operate with the least personal data consistent with goals of the system? Note that this question raises the issue of whether the owner of the luggage should be identified with each luggage image, and of evaluating the impacts of this on both system utility and on personal privacy. Does the system produce a tamper-resistant audit trail of who accesses which data? Is it secured against illegal tampering? What process is in place to assure monitoring the performance of the deployed system in terms of false positives and in terms of likely impacts on individuals? The framework asks questions about the agency collecting and deploying the system, perhaps the Transportation Security Administration in this case. Does this agency have a policy-level privacy officer, are its employees and others who might access the data trained appropriately, and are all of the uses of this nationwide luggage image dataset clearly articulated and in compliance with existing laws?
A major issue for those concerned with ensuring public health is the early detection of an outbreak or attack capable of causing widespread disease, injury, or death. The presumption behind most early detection systems is that early warning would aid the rapid deployment of emergency resources and the initiation of public health and medical responses that would help to limit the spread of disease or any ill effects.
A Possible Technological Approach to Addressing the Threat
In the past, officials have relied on hospitals and doctors to signal outbreaks by reporting disturbing or unusual trends and troubling cases or indicators. Today, however, with the increasing sophistication of technology, including data mining, sensors, and communications capabilities, many officials are investigating better ways of getting earlier warning of
outbreaks or attacks. For example, could it be useful to monitor pharmacy sales of over-the-counter (OTC) drugs to get early warning for, say, something like an influenza epidemic? Perhaps it could be helpful to monitor school or work absentee rates for indications of widespread illness or biological attack. These forms of so-called “syndromic surveillance” are geared toward achieving the earliest possible detection of public health emergencies.1
Syndromic surveillance requires access to many different kinds of data. For example, in a large city, the data streams into a syndromic surveillance system might include digital records of common, OTC sales of medicines from pharmacies in the city, absentee records from city schools and some select businesses, counts of 911 calls to the city categorized into more than 50 call types (e.g., “influenza like illness,” “breathing problems,” and so on), and records of chief complaints from hospital emergency departments. In addition, these data streams could contain temporal and spatial information.
Such data streams would be monitored periodically (say, ever 24 hours) and compared automatically to archived data collected over the past. Changes from expected values would be automatically analyzed for statistical significance. The geographical data in the streams would also enable the system to identify the location of “hot spots” that might indicate possible outbreak points in the city.
Box E.1 describes how a syndromic surveillance system might be used in practice.
Possible Privacy Impacts
From a privacy perspective, personal health data are among the most sensitive pieces of information. However, to generate initial indicators, only anonymized data are needed. Follow-up may be needed, as might be the case if interviews with patients or providers are necessary, and undertaking follow-up is impossible if anonymity rules. (In many pub-
For more information on syndromic surveillance generally, as well as more information about previous efforts, see http://www.cdc.gov/mmwr/PDF/wk/mm53SU01.pdf. Overviews of syndromic surveillance can be found at http://iier.isciii.es/mmwr/preview/mmwrhtml/su5301a3.htm; K.D. Mandl, J.M. Overhage, M.M. Wagner, W.B. Lober, P. Sebastiani, F. Mostashari, J.A. Pavlin, P.H. Gesteland, T. Treadwell, E. Koski, L. Hutwagner, D.L. Buckeridge, R.D. Aller, and S. Grannis, “Implementing syndromic surveillance: A practical guide informed by the early experience,” Journal of the American Medical Informatics Association 11(2):141-150, 2004; and J.W. Buehler, R.L. Berkelman, D.M. Hartley, and C.J. Peters, “Syndromic surveillance and bioterrorism-related epidemics,” Emerging Infectious Diseases 9(10):1197-1204, 2003.
An Illustrative Operational Scenario for the Use of Syndromic Surveillance
On a winter afternoon, a GoodCity public health official conducting routine daily data analysis notes a spike in the number of hospital emergency department (ED) visits and pharmacy sales detected by GoodCity’s syyndromic surveillance system, which is designed to detect early, indirect indicators of a possible bioterror attack. None of the other data streams indicate unusual patterns.
The health official, who has been specially trained to operate the statistical data mining software involved, analyzes the temporal and spatial distribution of ED visits using scan statistics and finds that two hospitals in the same zip code, and located within blocks of each other, accounted for most of the excess visits. A third hospital in the same area of the city experienced a normal volume of ED visits during the previous 24 hours. Further examination of available data reveals that respiratory illness was the chief complaint of a majority of the patients seen in the two EDs of interest. Further analysis shows that in the past 24 hours, both hospitals experienced higher rates of ED visits for “respiratory illness” than expected based on comparisons with hospital-specific rates gathered in previous years.
Meanwhile, the health officer’s examination shows that over-the-counter (OTC) medicine sales, in particular medicines to treat cough and fever, are much increased compared to the previous week and compared to the same week of the previous year. The system tracks sales by store and zip code, but no pattern is evident. Past analyses have shown that increased purchases of OTC medications do not consistently presage a higher volume of ED visits.
Concerned that the increased incidence of respiratory complaints in a geographically discrete neighborhood of the city, combined with city-wide increases in the purchase of cough and fever medicines, might indicate the leading edge of an aerosolized anthrax attack or some other disease outbreak of public health significance, the health official assigns a public health nurse to conduct a telephonic descriptive review of the ED cases seen in the affected hospitals. The nurse will also query staff from a sample of hospitals that are not part of the surveillance system, looking for unusual presentations or higher-than-usual volume.
After several hours of phone calls, the public health nurse discovers that many of the excess ED visits were indeed for cough and respiratory complaints, but most patients were not deemed seriously ill and were sent home with a diagnosis of “viral illness.” Early in her calls, the nurse heard of two young adult patients who had been extremely ill with apparent “pneumonia” and admitted to the intensive care unit. Since it is unusual for healthy young adults to require hospitalization for pneumonia, the nurse tracked down and interviewed the admitting physicians for both patients. In both cases, the patients involved had an underlying illness that explained their condition.
The hospital staff consulted reported that ED volume throughout the day was not abnormally high; today’s syndromic surveillance data documenting ED visits city-wide would not be available for another 12 hours. Public health officials decided on the basis of these investigations to do nothing more, but to continue to closely monitor hospital ED visits and OTC sales over the coming days.
lished articles on syndromic surveillance, the emergency department (ED) data is the most important and useful data stream for both detecting and ruling out disease outbreaks.)
The efficacy of a surveillance system could be significantly enhanced through the potential inferential power of multivariate information about specific individuals arriving through different data streams. For example, two data streams might be purchases of OTC medications for coughs and school attendance records. Rather than simply analyzing these data streams separately and noting temporal correlations in them, considerably more inferential power would be available if it were possible to associate a specific child absent from school on Tuesday with the purchase of cough syrup on Tuesday by his father.
However, linking attendance records to drug store purchasing records in such a manner would require personal identifiers in each stream to enable such a match. Privacy interests would therefore be implicated as well. For example, while the Health Information Portability and Accountability Act allows the use of medical information for public health purposes, it is unclear how to interpret the privacy restrictions in the context of regular surveillance systems. Further, different laws govern access to or restrictions on data associated with educational systems and organizations, and grocery chains restrict access to proprietary information on customer purchases.
Applying the Framework
Since a number of syndromic surveillance systems are in operation, the committee has been able to draw on public information and research in its reflection on the application of the framework presented in Chapter 2 and hence report some of that information to the reader. However, the illustration here does not constitute an endorsement or disapproval of such systems by this committee. The implementation of syndromic surveillance systems was prompted largely by the federal government when the U.S. Department of Health and Human Services (HHS) made bioterrorism preparedness monies available to state health agencies in 2002. Many such systems, of varying type and scope, were created—some by city health departments, some by state health agencies in collaboration with universities, and others by private contractors who not only designed but also operated the systems and then reported analyzed results to government officials.
The framework asks for a clearly stated purpose. The purpose of most syndromic surveillance systems is to detect a covert bioterrorist attack
before large numbers of victims seek medical care, in order to improve response and save lives. HHS did not specify any operational standards or explicit goals for the systems it helped fund, although some standards have been evolving.2
When considering a rational basis, the concept that lives might be saved if a bioattack were recognized earlier rather than later, thus lengthening the time available to get countermeasures (medicines and vaccines) to those infected, to conduct investigations into where the attack occurred and who is at risk, and so on gives merit to syndromic surveillance systems. However, the systems’ integration into current practices in the field should be taken into account. There is good evidence that syndromic surveillance systems can detect large disease outbreaks, but it is less clear how and if such detection improves public health response. Health officials confronted with a spike in syndromic signals typically seek more definitive evidence of a true rise in illnesses among city residents before taking action.3 This is in part because there is a lot of noise in the systems—illness rates, OTC medicine purchases, 911 reports—that varies widely even within a given season and location. Also, syndromic surveillance generates many false positives (discussed below), and the “signal” is not specific enough in most instances to guide action. Syndromic signals spur health officials to look harder but do not usually trigger a public health response.4 Whether syndromic surveillance would actually improve the rapidity of the response to a bioattack compared to clinical case finding is unproven and probably not testable.5 Recently, Buckeridge and colleagues attempted to compare clinical case finding and syndromic
surveillance for detection of inhalational anthrax due to a bioterror attack using a simulation study.6 These investigators found that syndromic systems could be designed such that detection of an anthrax attack would be improved by one day, but when systems were sensitive enough to detect a substantial portion of outbreaks before clinical case finding, frequent false positives were also produced, which could impose a considerable burden on public health resources.
There are limits to the experimental basis for syndromic surveillance systems, and any gains should be weighed against the costs of developing and operating such systems. Observable behaviors that might precede patients seeking medical care for an illness are not precisely known. Although in 1993 a run on OTC medicines in Milwaukee famously preceded public health detection of a large, waterborne cryptosporidiosis outbreak, the purchase of nonprescription, OTC medicines does not reliably precede outbreaks of illness in populations.7 Moreover, a retrospective analysis of 3 years of syndromic surveillance data gathered by the New York City Health Department concluded that “syndromic surveillance signals [for gastrointestinal disease outbreaks] occur frequently, [and] are difficult to investigate satisfactorily….”8
The New York City Department of Health operates one of the country’s most sophisticated syndromic surveillance systems, which has been in use since the late 1990s and has been continually upgraded. This system has been documented as detecting seasonal influenza a week before culture-positive samples of flu were found in New York City and has detected large sales of OTC antidiarrheal medicines which subsequent investigations associated with gastrointestinal illness and eating spoiled food after a city-wide blackout. The system failed, however, to detect either the unprecedented outbreak of West Nile Virus in 1999 or the anthrax cases of 2001.9
Syndromic surveillance systems should be developed from technical specifications, data flows, and types of signals that have been rigorously shown to be most reliable and productive. However, such development
has many challenges. Because bioterrorist attacks are rare events, most of the “positive” signals syndromic surveillance produces will be false positives. Setting the system to be very sensitive (i.e., increasing the types and size of data streams) will generate more false positives, which can, over time, erode confidence in the system. The complex and larger data streams are also likely to increase the complexity of the investigations that follow the detection of syndromic “signals,” which could further delay any response action.10
There are also great difficulties in doing real-time record linkage on multiple data streams. With static record linkage, all of the databases in question are available for analysis, which means that it is possible to perform cross-validation, error assessment, and careful blocking to reduce comparisons. With real-time linkage, only a limited data sample is applicable (i.e., those that relate to present cases), which means that the data available to revise parameter estimates and error rates are limited.
Finally, a key challenge in assessing the utility and efficacy of a syndromic surveillance system is to differentiate between the power of the particular algorithmic approach used in analyzing the data (which may be inadequate regardless of the quality of the data) and the quality of the data used in that particular approach (which may be too poor regardless of the power of the algorithm).
Assessing the scalability of such systems is also challenging. Consideration must be placed on whether this approach is viable for all localities of any size as well as whether some data streams are more important than others or must be of certain minimal scope. The trade-offs between the size of the signal (number and size of different data streams) and the sensitivity and specificity of the signal (i.e., number of false positives and negatives) must be taken into account.
A syndromic surveillance system should be designed to allow the enforcement of business processes; business processes define the ways in which the system is used, who the agents are, who are authorized to use it, and the steps taken in each individual task. Business processes can be different for different syndromic surveillance systems. For example, an agency in one city will allow anyone above a certain pay grade to execute a report but with the concurrence of the chief epidemiologist, whereas the comparable agency in another city will only allow the chief epidemiologist and two other delegated individuals to do so. Business processes will help determine how syndromic surveillance systems can be integrated into routine public health practice and what additional resources are
required. When private contractors or university partners operate the syndromic surveillance systems, processes will have to be defined to ensure that health officials receive data and analyses in a timely manner with no uncertainties about the validity of analyses.
Syndromic surveillance systems have the potential to contain large amounts of data. Those operating such systems will have to consider how to guarantee appropriate and reliable data as well as appropriate data stewardship. Some questions to consider include: Is the system collecting only the data necessary to detect a threat? Can syndromic data be forwarded to health departments in a manner that protects patient privacy in routine uses but allows identification to subsequently interview particular patients, in keeping with routine public health practice, in crisis? Can the utility of the system be preserved if geographic aggregation or some other form of protection is done to protect individual privacy? What is known about the accuracy of data submitted from different sources? How long do data streams need to be retained? Can records of illness patterns be retained without individual data streams? If such data are retained for long periods, will clinical data about specific patients and their commercial records (e.g., drug purchases) be available in these systems? Who will have access to the data? What policies need to be established to protect from unlawful or unauthorized disclosure, manipulation, or destruction?
The framework asks to consider whether an information-based program, such as syndromic surveillance, is consistent with U.S. law and values. The criteria for such consideration have been divided into three categories: data, programs, and administration and oversight. For effective syndromic surveillance systems, the need for personal medical data from emergency rooms is clear, and in most (not all) current syndromic systems the data are anonymized before being sent to public health agencies. In many published articles on syndromic surveillance, the emergency room data constitute the most important and useful data stream for both detecting and ruling out disease outbreaks. Data from OTC purchases and attendance records seem useful to this system. However, they, as personal data, should be considered only if they are reasonably shown to prove the effectiveness of system. Within currently operating systems, data on OTC medications are used but are more easily associated with particular stores and less easily associated with individuals.11 Linking
such information to school absences and clinical information raises major privacy issues and has not been attempted as part of any biosurveillance program, to the committee’s knowledge.
Public health agencies do have legal authority to release personal medical data if such information is pertinent to public health. Frequency of false positives is a major concern with these systems, as the scenario in Box E.1 demonstrates. In large public health agencies where resources exist to maintain and staff syndromic surveillance systems appropriately and where digitized data streams are available, such systems may be cost-effective. A bioattack alarm may lead to revelations of the names and medical conditions of specific patients seen in emergency rooms associated with syndromic reporting. In such “emergencies” the violation of an individual’s privacy might be deemed acceptable given the public’s right to know what is going on. However, agencies should have procedures in place for dealing with consequences of false positives. They should also assess and identify the impact on individuals in non-alarm routine operations. The system itself should produce a tamper-resistant audit trail, and all personnel authorized to use the system and its outputs should receive training in appropriate use and the laws and policies applicable to its use. The agency should employ a privacy officer to ensure compliance with laws, policies, and procedures designed to protect individual privacy. These are but a few considerations toward assessing whether syndromic surveillance systems are consistent with U.S. laws and values.