Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
5 Research Challenges There are deep intellectual challenges where the disciplines of com- puter science and engineering, health/biomedical informatics, related social sciences, information technology (IT), and health care overlap. Indeed, interdisciplinary work will be necessary to go beyond incremen- tal improvement of existing health care IT or the automation of traditional paper-based workflows. Systematic development of the health care IT- related research agenda is beyond the scope of this brief study, but the committee offers a framework for organizing such an agenda. It is important to distinguish between a solution to a specific problem in the health care domain and the technology-related efforts needed to realize it. The committee conceptualized the necessary technology-related efforts with respect to two separate dimensions. The first lies along an axis describing the extent to which new, generally applicable research is needed. A second lies along an axis describing the extent to which new research specific to health care and biomedicine is needed. Technol- ogy-related efforts can thus be separated into four (2 Ã 2) quadrants, as illustrated in Box 5.1. From a research management standpoint, such a clustering is helpful for better understanding the parties needed to undertake any given tech- nology-related research effort, the likelihood of its success, the timescale Conceptually, the segmentation of the domain into these four quadrants is quite similar to the division proposed in Donald Stokes, Pasteurâs Quadrant: Basic Science and Technological Innovation, Brookings Institution Press, Washington, D.C., 1997. 36
RESEARCH CHALLENGES 37 Box 5.1â A Segmentation of Health-Care-Related Technology Efforts General applicability Health care specific Relatively clear path â â Quadrant 1: Quadrant 2: ââ forward from existing ââ technologies Generalâapplied Health careâapplied efforts efforts ââ Advanced research Quadrant 3: Quadrant 4: ââ needed Generalâadvanced Health careâ efforts advanced efforts needed to achieve success, the appropriate funding mechanisms, and other such parameters. For example, efforts in quadrants 1 and 3 might be pursued by computer science researchers working in loose coopera- tion with the health and biomedical informatics communities, whereas efforts in quadrants 2 and 4 would require much tighter coordination and cooperation. These two dimensions emerge from the observation that health care IT draws on classic computer science challenges such as providing high availability with low system management overhead [C4O18], high data integrity, and a very high degree of usability. Such goals are essential foundations of many IT systems but are especially challenging to achieve in the context of health care IT, given the scale and diversity of the health care establishment and, in some cases, the need to support a large, broad user base. In addition, many benefits of systems often accrue only when they are viewed by researchers and caregivers as sufficiently trustworthy to replace older solutions. At the same time, some problems related to health care IT involve solutions that are highly specific to health care (e.g., developing high-quality devices for human-computer interaction [C1O2] that do not inadvertently help to spread infection as care providers move from patient to patient). As an illustration of how a solution to a major problem in health care might be decomposed into a technology-related research agenda, consider that most clinicians spend a significant amount of time in documenting the care provided to a patient. One challenge for health care IT would be The committee noted this point in its site visits. And the literature has important examples as well. For instance, a survey of more than 2500 clinical oncologists showed that the amount
38 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE the creation of a self-documenting environment in which the necessary documentation could be generated with little or no additional effort on the part of the clinicians [C5O19] (see Section 5.2.5). But making progress toward this goal calls for efforts in all four quadrants of the matrix shown in Box 5.1. The existing technology and general applications of Quadrant 1 pro- vide a clear path for indexing voice recordings. Speech-to-text transcrip- tion is a relatively mature technology for vocabularies of modest size as indicated by the variety of commercial software packages available. Speaker identification is routinely performed using voiceprints of the known participants, the patient typically being the remaining unknown speaker during a clinical encounter, and once a voice recording is tran- scribed to text, indexing within a known domain borders on the trivial. Full-text transcription today has relatively high error rates that make it unreliable as a basis for making clinical decisions, although as the technol- ogy further matures, error rates can be expected to drop. Another general application is information extraction from discourse analysisâa computer listening to a dialog (or examining a transcript) between two people would be able to make inferences about the topics under discussion. Research in this area would build on work in computa- tional linguistics that dates to the 1980s. For deep information extraction (e.g., linking the conversations to key terms in the medical literature), fun- damental research in Quadrant 3 is needed (for example) to understand how to relate concepts embedded in the words themselves to the rich store of background knowledge about the world that informs everyday discourse. As for health-care-specific applications, there is a fairly clear path using existing technology to develop systems that support patient-sup- plied documentation or documentation provided by the patientâs support system (e.g., family), which would increase the continuity and richness of information available for the clinician, as well as being helpful in deal- ing with expected future burdens on patients to manage their own care outside traditional health care organizations; this research agenda would fit into Quadrant 2. On the other hand, a system to provide a patient or caregivers with interactive explanations of a disease, particularized by the of time they spend filling out paperwork and documenting patient care has increased more than fourfold over the past 25 years. See S. Mayor, âU.S. Cancer Care Is Worse Due to More Paperwork,â British Medical Journal 322(7296):1201, 2001. To be sure, claims regarding the impending maturity of speech recognition have been made for a long time, but as with user customization of interfaces (see Footnote 22), speech recognition is another example of an idea that was difficult to implement with the technol- ogy of 20 years ago but now is much more feasible with todayâs technology and just as important today to pursue.
RESEARCH CHALLENGES 39 patientâs culture, learning style, value system, education, and life experi- ence, remains beyond the current state of todayâs science and would fit into Quadrant 4. Other examples of technology-related research efforts in each of the four quadrants are provided below: â¢ Quadrant 1 (Generalâapplied efforts). Adaptation of existing IT and process solutions from other domains and industries, e.g., process and data integration technologies, human-computer interaction technolo- gies, ubiquitous networking technologies, security, search, blogging, and social networking. â¢ Quadrant 2 (Health careâapplied efforts). Identification of the best examples of coupled health care improvement and health care IT that have been successfully deployed or prototyped, followed by wide deployment of those examples. Use of existing data and process standards to obtain low-hanging fruit, e.g., portals, electronic messaging, disease manage- ment dashboards, decision support and reminders, process automation, and so on. â¢ Quadrant 3 (Generalâadvanced efforts). Invention of new infor- mation technologies that are needed in health care, such as ontology management, systems that help to explain why decisions are made, large- scale machine learning, voice technologies, natural language processing, privacy management for access and data mining, and so on. â¢ Quadrant 4 (Health careâadvanced efforts). Specific advanced work on advanced ontologies and reasoning in the medical domain, modeling of the human body and the virtual patient, interpretation of medical information to different communities, approaches to learning and improving data quality, aggregation of patient health care information into a trustworthy database with explicit representation of uncertainty [C4O17, C5O23]), and so on. 5.1â AN OVERARCHING RESEARCH GRAND CHALLENGE: PATIENT-CENTERED COGNITIVE SUPPORT Patient-centered cognitive support emerged as an overarching grand research challenge during the committeeâs discussions. This sec- tion discusses how a research agenda might be assembled, together with representative research challenges, to illustrate the magnitude of the opportunity. Much of health care is transactionalâadmitting a patient, encoun- tering a patient at the bedside or clinic, ordering a drug, interpreting a report, or handing off a patient. Yet transactions are only the operational expression of an understanding of the patient and a set of goals and plans for that patient. Clinicians have a âvirtual patientâ in mindâa conceptual
40 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE model of the patient reflecting their understanding of interacting physi- ological, psychological, societal, and other dimensions. They use new findingsâraw dataâto refine their understanding of their virtual patient. Then, based on medical knowledge, medical logic, and mostly heuristic decision making, they formulate a plan, expressed as an order (transac- tion), to try to change the (real) patient for the better. Today, clinicians spend a great deal of time and energy searching and sifting through raw data about patients and trying to integrate the data with their general medical knowledge to form relevant mental abstrac- tions and associations relevant to the patientâs situation. As reported by Kushniruk, decision making by health care professionals is often compli- cated by the need to integrate ill-structured, uncertain, and potentially conflicting information from various sources. These various sources include but are not limited to myriad journal articles; memories from personal clinical experience; clinical guidelines; medical records from a host of providers (often working for different health care organizations); informal observations and thoughts from colleagues; and patient com- mentary and insights. Efforts to sift the data from this collection of sources force clinicians to devote precious cognitive resources to the details of data and make it more likely that they will overlook some important higher-order consideration. The health care IT systems of today tend not to provide assistance with this sifting task. Rather, they squeeze all cognitive support for the clinician through the lens of health care transactions and the related raw data, without an underlying representation of a conceptual model for the patient showing how data fit together and which data are important or unimportant. There is little or no cognitive support for clinicians to reason about their âvirtual patient.â So the health care IT systems force clinicians to a transactional view of the raw data. As a result, an understanding of the patient can be lost amidst all the data, all the tests, and all the moni- toring equipment. In the committeeâs vision of patient-centered cognitive support, the clinician interacts with models and abstractions of the patient that place the raw data into context and synthesize them with medical knowledge in ways that make clinical sense for that patient. Raw data are still avail- A. Kushniruk, âAnalysis of Complex Decision-Making Processes in Health Care: Cog- nitive Approaches to Health Informatics,â Journal of Biomedical Informatics 34(5):365-376, 2001. The notion of putting individual medical facts into an appropriate context is not new, having been described in the literature as early as 1969 (Lawrence L. Weed, Medical Records, Medical Education and Patient Care, Case Western Reserve University Press, 1969). Neverthe- less, IT has progressed a long way since then, providing a more suitable medium in which to implement such a notion.
RESEARCH CHALLENGES 41 able, but they are not the direct focus of the clinician. These virtual patient models are the computational counterparts of the clinicianâs conceptual model of a patient. They depict and simulate the clinicianâs working the- ory about interactions going on in the patient and enable patient-specific parameterization and multicomponent alerts. They build on submodels of biological and physiological systems and also exploit epidemiological models that take into account the local prevalence of diseases. The avail- ability of these models would free clinicians from having to scan raw data, and thus they would have a much easier time defining, testing, and exploring their own working theories. What links the raw data to the abstract models might be called medical logicâthat is, computer-based tools examine raw data relevant to a specific patient and suggest their clinical implications given the context of the models and abstractions. Computers can then provide decision supportâthat is, tools that help clinicians decide on a course of action in response to an understanding of the patientâs status. At any time, clinicians have the ability to access the raw data as needed if they wish to explore the presented interpretations and abstractions in greater depth. One possible framework for future health care IT is depicted in Figure 5.1. This framework, which emerged over the course of the committeeâs discussions and contrasts with the limited focus of todayâs health care IT, represents an all-encompassing view of components and interactions among components needed to support the Institute of Medicineâs vision of 21st century health care. Future clinician and patient-facing systems would draw on the data, information, and knowledge obtained in both patient care and research to provide decision support sensitive to workflow and human factors. The decision support systems would explicitly incorporate patient utilities, values, and resource constraints such as those mentioned above. They would support holistic plans and would allow users to simulate interven- tions on the virtual patient before doing them for real. To carry out orders, clinicians would use transactional systems like todayâs, but built into the decision support system rather than the other way around. In todayâs systems, decision support is commonly an add-on to systems designed primarily for transaction processing and does not benefit directly from results of data mining. Rather than having data entered by clinicians into computer systems, the content of clinical interactions would be captured in self-documenting environments with little or no additional effort on the part of the clinicians. (That is, an intelligent, sensor-rich environment would monitor clinical interactions and reduce sensor input to notes that document the medically significant content of those interactions.) In addition to the research challenges related to modeling the virtual patient and biomedical knowledge are the challenges in modeling and
42 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE PATIENT CARE RESEARCH Medical Virtual Patient knowledge Where clinicians want to stay Medical logic Decision support Clinical Transactions research transactions Where health IT chains clinicians Raw data Raw research data Workflow modeling and support, usability, cognitive support, computer-supported cooperative work, and so on FIGURE 5.1 The virtual patientâa component view of systems-supported, evi- dence-based practice. The left side of the figure concerns patient care. Raw data about a patient (the electronic health record) constitute the foundational base. Next come the trans- actional systems that both produce and use raw data as health care is provided. These two components make up the majority of todayâs health care IT. Above them, the committee envisions a computational model of the virtual patient. The right side of the figure represents biomedical science and research and its integral role in health care. Again, raw research data about biological and medical phenomena are at the base. Clinical research transactional systems add to and use raw data during the process of executing or running clinical research protocols. At the top are the models and abstractions that constitute biomedical knowledge. The thread connecting the top three components is what might be called medical logic. Mapping from medical logic to cognitive decision support is the process of applying general knowledge to a care process and then to a specific patient and his or her medical condition(s). This mapping involves workflow modeling and support, usability, cognitive support, and computer-supported cooperative work and is influenced by many non-medical factors, such as resource constraints (cost- effectiveness analysis, value of information), patient values and preferences, cost, time, and so on. The virtual patient poses the greatest research challenge but is only one com- ponent. Smooth integration with other components is the goal.
RESEARCH CHALLENGES 43 supporting multiparty decision making (that is, medical decisions made by family, patient, primary care provider, specialist, payer, and so on). Techniques to interconnect the components are likely to be equally chal- lenging (see, for example, the discussions in Sections 5.2.3 and 5.2.4 on data integration and data management). Box 5.2 describes some of the technical research challenges for patient- centered cognitive support organized by quadrant. On the non-technical side, a variety of questions arise as to how the use of clinically oriented systems such as those described above might fit into the actual workflow of a health care organization. How would such support fit into the work patterns of future clinicians? What would the impact be on their work efficiency? How and under what circumstances would clinicians trust the output of these systems? How would responsi- bility for clinical error be apportioned given the integrative functions of these systems? A failure to answer such questions adequately may well impede clinician acceptance of new approaches, even if the technical chal- lenges can be overcome. The committeeâs vision for patient-centered cognitive support is not wholly new. Indeed, development of IT-based tools that examine raw data relevant to a specific patient and suggest their clinical implications was the focus of a great deal of medical expert system work a number Box 5.2â Research Problems Categorized by Quadrant for Patient-Centered Cognitive Support â¢ Quadrant 1 (Generalâapplied efforts). Data and process integration tech- nologies, high-quality graphics and sensitive user interface design, coding and application of existing human/health models, application of human language translation technology in some regions â¢ Quadrant 2 (Health careâapplied efforts). Careful use of existing data stan- dards and models, codification of best practices â¢ Quadrant 3 (Generalâadvanced efforts). Reasoning, machine learning, ex- planation (why the software reaches a particular conclusion), multimodal inter- faces (see Section 5.2.5 below); a model of models that would support needed extensibility â¢ Quadrant 4 (Health careâadvanced efforts). Creation of new advanced models of differential diagnosis; automated machine learning at large-population scale, based on outcomes; a model of models for this domain supporting requisite extensibility
44 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE of decades ago. Similarly, biomedical informaticians have worked for decades on the problem of how best to summarize and present data using visual methods, a point of special import in the setting of hospital intensive care units (ICUs), where multiple streams of real-time data can be overwhelming. Much of that research also had to deal with issues of acceptance by ICU clinicians and with trust of the technology. And the importance of connecting biological knowledge to clinical applications has been given new emphasis by a recent focus on translational research by the National Institutes of Health. Nevertheless, the committee believes both that new challenges have indeed emerged and that many âoldâ problems have proven more difficult to address effectively than was first appreciated. Advances in IT such as the World Wide Web and ubiquitous computing challenge the health care IT community to think differently about how to exploit IT for health care purposes. A final and significant benefit for the committeeâs vision of patient- centered cognitive support is that patients themselves should be able to make use of tools designed with such support in mind. That is, entirely apart from being useful for clinicians, tools and technologies for patient- centered cognitive support should also be able to provide value for patients who wish to understand their own medical conditions more com- pletely and thoroughly. Obviously, different interfaces would be required (e.g., interfaces that translate medical jargon into lay language)âbut the underlying tools for medical data integration, modeling, and abstraction designed for patient-centered cognitive support are likely to be the same in any system for lay end users (i.e., patients). One of the primary lessons from this work was that although well-designed medical expert systems did have potential to improve clinical diagnoses and recommendations for treatment, many other issues needed to be addressed before they were ready for âprime- timeâ application. In addition, much of the early work on medical expert systems focused on relatively small problem domains, whereas the overarching medical context for improving health care involves the large problem domain of how all of the patientâs data and problems fit together. See, for example, R.A. Fleming and N.T. Smith, âDensity ModulationâA Technique for the Display of Three-Variable Data in Patient Monitoring,â Anesthesiology 50(6):543-546, June 1979; M.M. Shabot, P.D. Carlton, S. Sadoff, and L. Nolan-Avila, âGraphical Reports and Displays for Complex ICU Data: A New, Flexible and Configurable Method,â Computer Methods and Programs in Biomedicine 22(1):111-116, March 1986; I.A. Galer and B.L. Yap, âEr- gonomics in Intensive Care: Applying Human Factors Data to the Design and Evaluation of Patient Monitoring Systems,â Ergonomics 23(8):763-779, August 1980; Y. Shahar and C. Cheng, âIntelligent Visualization and Exploration of Time-Oriented Clinical Data,â Topics in Health Information Management 20(2):15-31, November 1999. See, for example, Jocelyn Kaiser, âNIH Funds a Dozen âHomesâ for Translational Re- search,â Science 314(5797):237, October 13, 2006, available at http://www.sciencemag. org/cgi/content/full/314/5797/237a.
RESEARCH CHALLENGES 45 5.2â OTHER REPRESENTATIVE RESEARCH CHALLENGES In addition to patient-centered cognitive support, there are for the computer science community many other interesting research challenges relevant to health care. Several examples are provided to illustrate this main point, but there are indeed many more that are not covered in this report. 5.2.1â Modeling One aspect of the âvirtual patientâ in Section 5.1 involves modeling various subsystems within a real patient (e.g., different organs, diges- tive system, and so on) to show how they interact. Such models might operate on different or variable timescalesâa model focusing on the absorption of nutrients through the digestive system might operate on a timescale of hours, whereas a model focusing on skeletal health, calcium depletion, osteoporosis, or particular bones might operate over years. Similarly, some models might represent molecular interactions, and others might represent particular cells, organs, or organisms. To first order, the physiological subsystems of all human beings are identical. Thus, a sensible approach to modeling subsystems in a specific patient is to appropriately parameterize a generic model of those subsys- tems. But finding appropriate parameterizations for any given model and coupling the different models and the data to drive them pose significant intellectual challenges. Some insight into model interoperability can be gained through the use of ad hoc techniques (e.g., XML-based âmash- upsâ [Web applications that combine data from multiple sources] used in Web 2.0 applications) or through other existing component frameworks, but the overall problem of model interoperability for health care purposes is vastly more complex than applications that have been tackled before. Progress is being made in understanding specific metabolic path- ways.10 The effects of a medication, as well as of some other treatments, are candidates for modeling. Such models will still require many of the parameters used to manage and classify the data.11 Genetic makeup, The notion of a computational virtual human being that would provide a high-fidelity computational model of a human being that would respond realistically to various stimuli is not new. See, for example, âThe Virtual Human Project: An Idea Whose Time Has Come?,â Oak Ridge National Laboratory Review 33(1), 2000. 10See, for example, www.HumanCyc.org. 11See, for example, PharmGKB, a project to curate information that establishes knowledge about the relationships among drugs, diseases, and genes, including their variations and gene products, available at http://www.pharmgkb.org/.
46 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE including the capability to produce pathway-controlling enzymes, is one of the most challenging aspects of making such simulations relevant. Coupling models will require a computational platform that can sup- port multiple interacting components that can be combined into larger and more complex models. Such a platform must not only support par- allel operation of the analytical processes but also allow assembly of hierarchical simulation and information structures, dynamically built, exploited, modified when possible on the basis of individual patient data and statistical aggregates thereof, and abandoned when no longer effective. At the supporting levels, multiple processing alternatives will exist. Specific, detailed simulations will provide the most specific and cur- rent results. Cached results can greatly reduce the computational effort for repeated sub-analyses. Where no analytical methods exist, results from biological or clinical trials or clinician assessments can be provided. Search and interpretation can provide yet another set of inputs. Being able to operate with a variety of computational paradigms in one setting can greatly enhance collaboration among communities that have similar objectives but that now ignore each other. Yet another challenge in mod- eling is building multilevel models that can successfully couple highly detailed physiologic models to the much looser clinical âmodelsâ that typically are based more on phenomenological relationships than on true underlying causes. Finally, keeping records of predictions and actual patient outcomes will allow incremental tuning of the approach. It will take much experi- ence as well as careful approaches to do so in a way that converges on a stable and more optimal outcome. The actual determination of patient treatment will remain in the hands and minds of the clinician. But the feedback that can be provided by bringing data collections, metabolic models, and their processing to an interactive care setting is essential to extract value out of the many technology investments that are in process or being planned. Box 5.3 describes some of the technical research challenges for model- ing organized by quadrant. 5.2.2â Automation The technical definitions of automation allow for multiple forms, depending on the degree of intelligence and autonomy exhibited. Systems that are completely automatic and that can be trusted to work properly without any need for human oversight or attention have proven to be effective and valuable. Systems that require human oversight or control, which in actuality is almost any complex system, fall under the category
RESEARCH CHALLENGES 47 Box 5.3â Research Problems Categorized by Quadrant for Modeling â¢ Quadrant 1 (Generalâapplied efforts). Development of a framework for easy use of existing, piecemeal models, to gain experience and create a framework for evolutionary advance â¢ Quadrant 2 (Health careâapplied efforts). Coding and deployment of existing health care models â¢ Quadrant 3 (Generalâadvanced efforts). Development of models that self- adapt (or propose self-adaptation) on the basis of changing evidence â¢ Quadrant 4 (Health careâadvanced efforts). Integration of multiple models, and development of new models of human-automation interaction and require considerable care in their design and implementation.12 Automatic systems, especially in medicine, do not operate in a vacuum.13 They are part of a complex network, and the outputs and alarms of automatic systems have to be integrated with other components and often interpreted and, when necessary, overridden by human opera- tors. The intermix of different complex systems plus humans provides widespread opportunity for both good and harm. Historically, automated systems have often been developed and deployed quite independently of the others with which they must co-exist, leading to confusing and sometimes contradictory signaling, monitoring requirements, and safety concerns. The result is an ever-growing set of alarms (often indistinguishable from one another) and different operating requirements, meaning that new users may not know how to proceed, yet the proliferation of new systems makes it impossible for training to keep apace. The problem of alert fatigue is well known, as evidenced by the 12For more discussion of this point, see J.D. Lee, âHuman Factors and Ergonomics in Au- tomation Design,â in G. Salvendy (Ed.), Handbook of Human Factors and Ergonomics, 3rd ed., Wiley, New York, pp. 1570-1596, but especially see pp. 1580-1590, 2006; also, T.B. Sheridan and R. Parasuraman, âHuman-Automation Interaction,â in R.S. Nickerson (Ed.), Reviews of Human Factors and Ergonomics, Human Factors and Ergonomics Society, Santa Monica, Calif., 2006. 13See, for example, National Research Council, The Future of Air Traffic Control: Human Op- erators and Automation, National Academy Press, Washington, D.C., 1998; National Research Council, Flight to the Future: Human Factors in Air Traffic Control, National Academy Press, Washington, D.C., 1997; National Research Council, The Case for Human Factors in Industry and Government, National Academy Press, Washington, D.C., 1997.
48 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE large number of publications and symposia dedicated to this problem in all industries that are affected: aviation, process control, and medicine. 14 The worst problem of automatic systems is an issue of trust. If person- nel trust them, the trust is often over-generous, so that personnel are apt to believe erroneous indicators and operations for longer than is prudent, or they may neglect attending to and monitoring of the system even though it is not fully reliable. Similarly, a lack of trust may also be inap- propriate, leading people to add to their workload to continually check on the operation of a system that is, in fact, quite capable of autonomous operation. The problems of over- and underautomation have been well docu- mented in other domains and industries, but the committee believes that they have not been appropriately appreciated within the medical community. Much can be gained in an industry by the introduction of more intelligent, more autonomous systems, but the lessons from other disciplines must also be acquired and followed.15 Automation has been implemented most successfully in aviation and process-control manufac- turing. Automation is also used in warehousing and traditional manufac- turing, as well as in many modern electronic-commerce back-end systems. Stock trading is another example of an activity in which automation can be used successfully. All these cases differ from medicine (although prescription filling and checking may come closest to matching order-filling systems), however, and the lessons they provide cannot be carried over directly into medi- cine. But drawing on such hard-earned experience as a point of departure for medicine makes good sense. Finally, the introduction of automation is always a systems problem 14In the medical domain, see, for example, J. Edworthy and E.J. Hellier, âFewer But Better Auditory Alarms Will Improve Patient Safety,â Quality and Safety in Health Care 14:212â215, 2005; J. Edworthy and E.J. Hellier, âAlarms and Human Behaviour: Implications for Medi- cal Alarms,â British Journal of Anaesthesia 97(1):12-17, 2006; A. Otero, P. Felix, F. Palacios, C. Perez-Gandia, and C.O.S. Sorzano, âIntelligent Alarms for Patient Supervision,â Proceedings of the IEEE International Symposium on Intelligent Signal Processing, WISP 2007, pp. 1-6, 2007. 15See, for example, T.B. Sheridan, Humans and Automation: System Design and Research Issues, Human Factors and Ergonomics Society, Santa Monica, Calif. (Wiley Series in Sys- tems Engineering and Management), 2002; D.A. Norman, âThe âProblemâ of Automation: Inappropriate Feedback and Interaction, Not âOver-Automationâ,â in D.E. Broadbent, A. Baddeley, and J.T. Reason (Eds.), Human Factors in Hazardous Situations, pp. 585-593, Ox- ford University Press, Oxford, 1990; C.E. Billings, Aviation Automation: The Search for a Hu- man-Centered Approach, Lawrence Erlbaum Associates Publishers, Mahwah, N.J., 1997; D.A. Norman, The Design of Everyday Things, Doubleday, New York, 1990; B. Lussier, A. Lampe, R. Chatila, J. Guiochet, F. Ingrand, M.-O. Killijian, and D. Powell, âFault Tolerance in Au- tonomous Systems: How and How Much?,â in 4th IARP-IEEE/RAS-EURON Joint Workshop on Technical Challenges for Dependable Robots in Human Environments, Nagoya, Japan, 2005.
RESEARCH CHALLENGES 49 Box 5.4â Research Problems Categorized by Quadrant for AutomationÂ â¢ Quadrant 1 (Generalâapplied efforts). Application of automation systems that exist; more use of business process integration technology as it exists in infor- mation technology; application of simple rules that can make a big difference â¢ Quadrant 2 (Health careâapplied efforts). Codification of low-hanging fruit; use of open-source and other community techniques to pool necessary informa- tion to produce better automation rules; application of simple things first, like electronic messaging, automated scheduling of various resources, and so on, and an emphasis on avoiding paralysis by analysis â¢ Quadrant 3 (Generalâadvanced efforts). Explanation, self-testing of effica- cy, advanced learning, and management of false-negative and false-positive conditions â¢ Quadrant 4 (Health careâadvanced efforts). Extension of underlying data uses and modeling to improve model precision (e.g., more data feeding into drug interactions systems could be used to reduce false alarms); efforts to ensure that outcomes are known to the system so that it can self-report and learn that intermixes equipment, administrative procedures, and real people. Accordingly, research on automation for medicine will require a multi- disciplinary team approach, including technical, medical, and social sci- ence expertise. Good design cannot be added on afterward, and intensive cooperative efforts involving people from all disciplines affected by any IT-based system are necessary from the start. Box 5.4 describes some of the technical research challenges for auto- mation organized by quadrant. 5.2.3â Data Sharing and Collaboration The data relevant to health care are highly heterogeneous, and the types and quantity of data evolve rapidly. In addition to patient-record information that exists in multiple forms, health care requires data about drugs and diagnoses, including data from signals captured by biomedical devices, voice recordings, and data captured as codes. Data are typically stored in multiple locations on multiple systems. Sometimes such data are stored in structured databases, and in other cases relevant data are found in legacy systems, structured files, and databases and text files behind Web forms. Data are increasingly multimedia and high-dimen- sional, including voice, imaging, and continuous biomedical signals. Data of various types have different degrees of reliability, ranging from test
50 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE results (which may be quite conclusive) to patient-provided data (which could contain significant biases). Numerous health care IT challenges require the ability to share and integrate data across multiple systems and seamlessly move data from one system to another. To exploit highly heterogeneous data effectively, usersâsuch as care- givers, medical researchers, and patientsâneed the ability to ask queries that span multiple data sources without requiring the data to be stan- dardized or requiring the user to query each single database in isolation. That is, the user wants a single interface through which any query can be posed. Today, the challenge for data integration, by which is meant systems that enable data owners to share data and collaborate in flexible ways without having to store all the data in a single repository or have them all conform to a common schema, is understood from the systems and logical perspectives. One approach is to aggregate patient health care information into a common data repository [C4O14]. Although aggrega- tion is a basic building block of data integration, aggregating all relevant data into a single repository is likely to be infeasible. As a result of a sig- nificant amount of research, there are commercial systems today that are capable of answering queries that span multiple sources without loading all the data into a single warehouse with a uniform schema. The user of such a system accesses the data through an abstraction called a mediated schema, and queries are then reformulated from the mediated schema onto the relevant data sources using a set of semantic mappings. These systems perform adequately, and the small additional cost of accessing remote sys- tems at query time is offset by the management benefits of having systems that can share locally owned and maintained components. The main shortcoming of current data integration systems is that they are too hard to use. Designing a mediated schema and creating the semantic mappings between the sources and the mediated schema entail a significant effort that requires considerable subject-matter expertise. This is especially true when the schema is large, complicated, and likely to be continually evolving, as in the case of health care data. As a consequence, integration projects often fail midway since the costs of this design work are incurred up front before the benefits from that work are obtained. The above challenge suggests three specific research directions: â¢ Data integration systems that are fundamentally easier to use. The sys- tem should be able to examine the data sources available and suggest to the designers a possible mediated schema and mappings from the data sources to semantically related entries in the mediated schema. The system should point to gaps in the coverage of the data sources so that additional sources can be discovered or enhanced. The system should
RESEARCH CHALLENGES 51 present to the designer effective visualizations of the data and the sche- mata to further facilitate the process. Gaps in the systemâs coverage can be detected by analyzing queries (e.g., frequent queries asking for an attribute of a patient which is not represented in any of the data sources of which the system is aware). â¢ Data integration that can proceed incrementally. It should not be neces- sary to completely integrate data sources in order to get some benefit from the collection of sources. One approach to reducing the effort required in data integration is what might be called âpay-as-you-goâ data integration. A design goal should be the construction of systems that offer access to multiple data sources with little or no human effort, and that improve over time as the users realize where integration is needed most. For example, a system could begin by guessing approximate (and possibly incorrect) semantic mappings; over time, semantic mappings would be improved, thereby enabling more comprehensive answers to queries over the collection of data sources. Some of the specific challenges to obtain- ing such systems are (1) leveraging user interactions with the system to understand the semantics of the data, (2) developing collaborative tech- niques for improving the semantic cohesion of a collection of data sources, and (3) maintaining compatibility of incremental integration efforts with previous versions. â¢ More flexible architectures for data sharing and integration. Currently, the common architecture for such systems envisages a single mediated schema and mappings to that schema.16 While this architecture has the advantage that the data can still remain in the sources and be managed there, the creation of the mediated schema is still a centralized effort. Systems are needed that enable data owners to share data in a more ad hoc fashion and extend the coverage of data sharing as they see fit. 17 Peer-to-peer architectures are needed for sharing data whereby it is easy to (1) discover data sources, (2) join the network of available sources with- out significant effort, and (3) retain control over the data and its privacy as necessary.18 In addition, such a system should enable tracking differ- ent versions of the data as the data evolve over time, and highlight the changes when appropriate. If these challenges can be met, it will be much easier to build and deploy data integration systems that require minimal set-up time and pro- 16See, for example, a common architecture for enterprise information integration products from IBM (http://www-01.ibm.com/software/data/integration/) and BEA (now Oracle) (http://edocs.bea.com/liquiddata/docs81/index.html). 17This embodies the philosophy underlying the Semantic Web approach. 18See for example, Gio Wiederhold, âMediators in the Architecture of Future Information Systems,â IEEE Computer 25(3):38-49, March 1992.
52 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE Box 5.5â Research Problems Categorized by Quadrant for Data Sharing and Collaboration â¢ Quadrant 1 (Generalâapplied efforts). Application of known data integration technology, ontology management and analysis tools, and state-of-the-art search techniques (including user-machine learning and information retrieval technology to enable systems to self-tune) â¢ Quadrant 2 (Health careâapplied efforts). Application of existing ontologies and knowledge sources in scalable, efficient systems â¢ Quadrant 3 (Generalâadvanced efforts). Development of easier-to-use data in- tegration and ontology management systems, to allow for incremental creation and annotation of semantic information; work toward resolving understanding about how to decide when and where semantics must be added, and when se- mantics can be induced based on raw information stored and usage models â¢ Quadrant 4 (Health careâadvanced efforts). Advanced privacy management that supports needs for aggregative, epidemiological research vide valuable services without specifying complete and accurate semantic mappings. For example, certain data regarded as critical might be made interoperable through explicitly designed semantic mappings. But all data might be made available (i.e., visible) subject to control for confidentiality even if no mappings had been created. A care provider needing data for which no mappings were available would have to work harder to query those data, but those data would at least be visible and usable for clini- cal purposes. If and when a need is recognized for making a particular class of data semantically consistent, mappings could be createdâand the systemâs overall interoperability could be incrementally improved. Box 5.5 describes some of the technical research challenges for data sharing and collaboration organized by quadrant. To illustrate the importance of data integration, consider its applica- tion to the personal health record. In its ideal future form (not that of today), a personal health record contains an individualâs entire medical history, that is, from all interactions with all health care providers (and self-provided care as well) and is under the control of the patient.19 For information to be easily accessible to the patient, data supplied by differ- ent providersâlikely each with their own local health care IT systems gen- erating data in idiosyncratic formats and with different meaningsâmust be integrated in a way that they appear to have common semantics. Data 19See, for example, Kenneth D. Mandl and Isaac S. Kohane, âTectonic Shifts in the Health Information Economy,â New England Journal of Medicine 358(16):1732-1737, April 17, 2008.
RESEARCH CHALLENGES 53 protectionâa key element of personal health records, in that the patient is empowered to apply fine-grained control of the information contained thereinâalso requires that patient-specified security and privacy policies act on all data elements referring to the targets of those policies. This requirement presents yet another data integration task. 5.2.4â Data Management at Scale20 Presuming the existence of large integrated corpora of data (the focus of Section 5.2.3 on data integration), another major challenge is in manag- ing those data. Some of the important dimensions of medical information management include: â¢ Annotation and metadata. Raw data almost never speak for them- selves, and their interpretation inevitably relies on metadataâannotations to the primary data that provide the necessary context. For example, the primary data for the human genome consist of a sequence of some 3 bil- lion nucleotides. Metadata associated with the primary data help scien- tists to identify significant patterns within those dataâa given sequence might be annotated as a gene or a regulatory element. Metadata could also be used to trace the provenance or lineage of data. For example, the value of certain data in an electronic health record could be enhanced if the data included information about the conditions under which certain data were obtained (e.g., physician observations of a patientâs description of symptoms might be accompanied by video and audio recordings of the session with the patient). With metadata, a primary problem is the design and development of tools to facilitate machine-readable annotations in large databases. â¢ Information extraction from text. The volume of medically significant information rendered in text form (e.g., physician or nursing notes) is large, and may in various instances be as or more significant than infor- mation rendered in different forms (e.g., lab instrument readings). Extract- ing useful medical information from textual notes is therefore an impor- tant problem that calls for computer science expertise in text processing, natural language processing, and statistical text-mining techniques as well as medical expertise to understand the concepts and ideas to which the information refers. New techniques are needed for extracting informa- tion such as patient names, doctor names, medicine names, and disease names from textual notes, and for generating automatic linkages between 20An extended discussion of the data management challenges in biomedical data can be found in National Research Council, Catalyzing Inquiry at the Interface of Computing and Biol- ogy, The National Academies Press, Washington, D.C., 2005.
54 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE different relevant entities. Such extraction would make it possible to piece together a larger picture automatically while pulling information from multiple heterogeneous data and information sources. Extraction of data from tables and figures in reports is another example of a useful informa- tion extraction capability. â¢ Linkage. Clinicians often rely on multiple types of data to render a diagnosisâe.g., blood tests and clinical observations and imaging. Rela- tionships between different types of data are best captured in ontologies,21 which are descriptions of concepts and relationships that exist among the concepts for a particular domain of knowledge. In addition to providing controlled, hierarchically structured vocabularies for medical terminol- ogy, they specify object classes, characteristics, and functions in ways that capture important concepts and relationships between those concepts (perhaps in a given area, such as internal medicine or cardiology or oncol- ogy). Ontologies containing such information facilitate the representation of working hypotheses and the evidence that supports and refutes them in machine-readable form, and can help clinicians reason their way through complex cases. Ontologies must also be revisable in the light of new research that may discover previously unknown relationships or develop new interpretations of existing concepts. An important research problem is thus the design of appropriate ontologies and automated approaches to populating and updating them through sources such as medical dictionar- ies, textbooks, and recent articles in the relevant literature, although it is an open question to what extent declarative approaches can capture and exploit all the relevant relationships. Fallback to programmed solutions provides an escape and should be possible to allow putting into practice implementations that can provide feedback and thus enable progress. â¢ Privacy. Epidemiological research and phase IV drug testing (post- approval) both depend on the aggregation of select medical data from large numbers of individual records, even if individual identities need not be associated with these data. The electronic storage of these records facilitates such aggregation, but aggregation on a large scale also has many privacy implications. An important research problem is thus how to mine these data without unduly compromising individual privacy when individuals have not explicitly granted data access permission. Addition- ally, even outside the world of epidemiological research, the management 21The term âontologyâ is a philosophical term referring to the subject of existence. The computer science community borrowed the term to refer to âspecification of a conceptual- izationâ for knowledge sharing in artificial intelligence. (See, for example, T.R. Gruber, âA Translation Approach to Portable Ontology Specification,â Knowledge Acquisition 5(2):199- 220, 1993.)
RESEARCH CHALLENGES 55 of data in ways that permit data sharing among those with a need to know, while prohibiting other access, is a significant technical challenge. â¢ Scale and other systems issues. There are many challenges in creating and implementing the protocols and systems that will allow a variety of interlocking systems to provide a robust, high-performance information store that can be reliably and easily accessed by a variety of different classes of users, ranging from the patient and her designees to caregiv- ers. For example, interlocking health care IT systems must enable and preserve the relationships among the different applications and work- flows. In addition, the need to store data for a lifetime presents significant technical challenges if only because the storage lifetime could exceed the lifetime of some organizations. â¢ User interface. While technically not data management per se, the data models, data federation technologies, and security and privacy approaches must all support the wide variety of usage that is expected. What an emergency room physician needs is very different from what is required by a physician reviewing the data with an eye toward wellness, a point understood by at least some in the biomedical informatics com- munity since the 1980s.22 Visualization tools that help users integrate and manage data pulled from multiple sources might also be considered part of a sophisticated user interface, and coupled with analytic techniques may help to solve problems that are not possible to solve using analytic techniques alone. There are many more dimensions to the problem than those described above, which are intended to be illustrative rather than exhaustive. In addition, Box 5.6 describes some of the technical research challenges for data management at scale organized by quadrant. In summary, the problems addressed in Section 5.2.3 and in this section are core problems that would lead to the creation of health care records with enormously diverse applications. These applications include providing the information that would, among others things, (1) power the virtual patient described in Section 5.1, (2) provide a strong foundation for epidemiological research, (3) improve communication throughout the caregiver ecosystem, and (4) offer information storage and retrieval that would enable patients and their family and friends to be more involved in their own health care. 22See, for example, Eric Sherman and Edward Shortliffe, âA User-Adaptable Interface to Predict Usersâ Needs,â pp. 285-315 in M. Schneider-Hufschmidt, T. Kuhme, and U. Mal- linowski (Eds.), Adaptive User Interfaces, Elsevier, Amsterdam, 1993. User customization of an interface is an example of an idea that was difficult to implement with the technology of 20 years ago but now is much more feasible with todayâs technology and just as important today to pursue.
56 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE Box 5.6â Research Problems Categorized by Quadrant for Data Management at Scale â¢ Quadrant 1 (Generalâapplied efforts). Creation of systems that scale, using notions of âcloudâ computing, coupled with local information to reduce manage- ment complexity â¢ Quadrant 2 (Health careâapplied efforts). Compression, understanding of what to store and what not to store, prioritization of information; privacy of patient information â¢ Quadrant 3 (Generalâadvanced efforts). Techniques for correcting or coding degrees of accuracy and precision in data; techniques for learning about and forming aggregate data sets; automated management techniques for large, highly valuable data sets that are often used across many organizations â¢ Quadrant 4 (Health careâadvanced efforts). Applications for handling inaccu- rate data to improve input to health care data models, better coding techniques for information 5.2.5â Automated Full Capture of Physician-Patient Interactions As noted above, care providers spend a great deal of time in docu- menting their interactions with patients. Automated capture of patient- provider interactions would release such time for more productive uses and help to ensure more complete and more timely patient records. A comprehensive environment for capturing interactions would nec- essarily be multimodal, involving ways of capturing and interpreting visual images and conversations. Rather than one general-purpose envi- ronment, capture environments would likely be specialized to different settingsâsuch as hospital room (e.g., nurse/patient), emergency room (e.g., ER physician/patient), routine consultation (primary care provider/ patient), and specialist consultation (e.g., cardiologist or surgeon and patient). Some of the important dimensions in this problem domain include: â¢ Real-time transcription and interpretation of the dialog between patient and provider. Individual voices must be identified as being associated with the provider or the patient. The transcript must be parsed unambiguously, irrelevant information identified and ignored, and relevant information interpreted. â¢ Summarization of physical interactions between patient and provider based on the interpretation of images recorded by various cameras in the patient care room. In a hospital room, the system must be able to distinguish
RESEARCH CHALLENGES 57 between the administration of an intravenous antibiotic or a tubal feed- ing. In an examination room, the system must be able to identify parts of the body to which the patient or provider is pointing and correlate such gestures with the dialog. In all settings, cameras should be able to iden- tify documents presented to patients, and to capture written annotations made by patient or provider, subject to appropriate privacy safeguards. The goal would be a system able to produce a useful summary and/or the equivalent of a video transcript that describes what happened. â¢ Transcript visibility for patients, and patientsâ ability to correct and anno- tate the transcript. â¢ Correlation of the information contained in the audio and visual tran- scripts. Use of both types of information should increase the accuracy and utility of the resulting summaries. Some pieces of this technology exist, but even when they do, integrat- ing them and making the results available smoothly, with little latency, are challenges to todayâs computer science. Box 5.7 describes some of the technical research challenges for automated full capture of physician-patient interactions organized by quadrant. Box 5.7â Research Problems Categorized by Quadrant for Automated Full Capture of Physician-Patient Interactions â¢ Quadrant 1 (Generalâapplied efforts). Use of photographic technology, inte- gration of sensor systems (perhaps, from the simple temperature sensor to imaging), use of speech dictation for transcription and/or indexing of audio files, natural language processing on existing textual records â¢ Quadrant 2 (Health careâapplied efforts). Creation of high-quality workflows, customization of physical devices for the hospital environment (e.g., with due regard for infection control and to minimize physician/patient distance), creation and use of appropriate language models to maximize machine capabilities, workflows to make transcripts available to patients, use of software systems post-visit to provide information â¢ Quadrant 3 (Generalâadvanced efforts). Ever-improved speech recognition, multimodal interface development, summarization and extraction of key infor- mation, sentiment analysis, automatic privacy management â¢ Quadrant 4 (Health careâadvanced efforts). Development of new modes of caregiver-patient-computer interaction where the interaction is tri-partite and the computer is not âin the wayâ; advanced empirical, health care informatics work aimed at understanding how to efficiently acquire and provide information via computer systems
58 COMPUTATIONAL TECHNOLOGY FOR EFFECTIVE HEALTH CARE Lastly, a key non-technical issue to be faced by any full-capture sys- tem is patient acceptance. In some of todayâs interactions between clini- cian and patient, a patient may rely on a clinicianâs discretion to refrain from entering into the record certain sensitive information related by the patient. In the absence of believable assurances in full-capture clin- ical interactions that such sensitive information will not be recorded, patients may well be less forthcoming or complete in their accounting of their medical histories and circumstances. Such problems will have to be addressed before any such system will be widely acceptable.