Past and Current Evaluations of Telemedicine
Evaluation is a form of applied research. It seeks not just to build knowledge but also to provide information useful to decisionmakers. As discussed in Chapter 1, decisionmakers have sharply increased their demand for evidence of the effectiveness of medical technologies. Clinicians, manufacturers, health officials, and others interested in the potential benefits of telemedicine face the challenge of securing resources and devising strategies for well-designed studies to provide decisionmakers with credible information on the comparative benefits, risks, and costs of specific telemedicine applications. Evaluations that will help identify acceptable "low-cost, low-technology" strategies are particularly important in an environment dominated by cost concerns.
Most clinical applications of telemedicine—in common, it must be recognized, with many other clinical practices—have not been subject to rigorous comparative studies to assess their effects on the quality, accessibility, cost, or acceptability of health care. The literature on telemedicine reveals a general consensus on this evaluation deficit and the importance of correcting it. The following statements illustrate the point:
With the exception of image-oriented subspecialties such as teleradiology and telepathology, few clinical studies have documented the accuracy, reliability, or clinical utility of most applications of telemedicine as a primary
diagnostic or therapeutic modality. With few exceptions, clinical studies … are descriptive rather than analytic [Perednia and Allen, 1995, p. 486].
There is little research demonstrating the effects and effectiveness of telemedicine. … Some commentators assert that it is effective across the board, whereas others are less sanguine. … It is clear that lingering questions will be answered only by well-designed and carefully conducted research [Grigsby, 1995a, p. 20].
To date, the appeal of telemedicine remains largely intuitive and based mostly on logical speculation and anecdotal evidence [Bashshur et al., 1994, p. 9].
With telemedicine there's sometimes more furor than fact [Wood, quoted in Scott, 1994, p. 35].
The paucity of evaluation research and results for telemedicine applications and programs was the primary stimulus for this study. It also is a factor in the reluctance of third party payers, including Medicare, to extend coverage and set payment rates for telemedicine technologies. Health plans and institutions that receive capitated or per case payments likewise want evidence that investments in telemedicine make sense.
Evaluation Efforts In Telemedicine
Recognizing the limited evaluative base for telemedicine applications, a number of government and private organizations have moved to support improved evaluation strategies and frameworks, to fund evaluation research, or to require demonstration project grantees to conduct their own internal evaluations and participate in related activities. These organizations include the National Library of Medicine, the Health Care Financing Administration, the Agency for Health Care Policy and Research, and the Office of Rural Health Policy in the U.S. Department of Health and Human Services; the National Telecommunications and Information Administration and the National Institute of Standards and Technology in the Department of Commerce; the Department of Veterans Affairs; and various Department of Defense units under the lead of the Surgeon General of the Army. Many federal agencies are, however, facing budget cuts. The burden of supporting telemedicine research may, as a result, fall more heavily on the military and the veterans' health
systems, both of which have internal economic incentives to evaluate the utility of telemedicine.
In addition to federal agencies, some state governments, foundations, and health care delivery organizations have provided support for demonstration projects or other evaluations. Private industry has sponsored research to document for regulators the safety of medical devices used for telemedicine and has provided resources in-kind (e.g., hardware and software) for various evaluation projects.
Evaluation initiatives fall into three broad categories: frameworks for evaluation; actual evaluations; and training or technical assistance in evaluation. Those involved in these efforts are aware of the problems encountered by the demonstration projects funded in the 1970s, including the small numbers of patients, high cost per patient served, less than satisfactory equipment, and inattention to the organizational, behavioral, and financial conditions for sustaining programs beyond the grant period.
The rest of this chapter reviews these evaluation initiatives. Most involve "works in progress." Evaluation plans change as sponsors and researchers attempt to implement them. Thus, these descriptions, although they have been reviewed by those involved in the projects, are provisional.
The following discussion highlights both obstacles to telemedicine evaluations and attempts to overcome them. Although the problems are not unique to telemedicine, the committee concluded that evaluators in the field face particular problems in generating sufficient cases for analysis, securing appropriate comparison sites, projecting costs, and maintaining the evaluation of quality, access, and cost outcomes for a reasonable length of time.
The emphasis on evaluation frameworks reflects both a desire to encourage systematic evaluation and a concern with the particular challenges of evaluating telemedicine. These challenges lie not so much in evaluating particular pieces of equipment (e.g., digital cameras) as in evaluating applications and processes of care that combine complex technologies, people, and organizational systems in varied ways to fit different institutions, environments, and objectives.
Joint Working Group on Telemedicine
The federal Joint Working Group on Telemedicine (described in Chapter 4) has, among a number of other tasks, attempted to develop a broad evaluation framework for telemedicine (Puskin et al., 1995). The goal was not a single evaluation strategy for all agencies but rather a document and a discussion process that would strengthen evaluation designs and promote comparable evaluations to build a stronger base of knowledge about telemedicine.
As discussed with the committee, the working group approach identified three "operative goals" for evaluations under the joint framework. The first was to maximize the data collected by seeking sound ways to aggregate comparable data from both civilian and military projects. The second was to minimize the duplication of resources by developing and maintaining a centralized listing of data collection protocols. The third objective was to minimize the burden on individual data collection sites, when possible, by centralizing data collection for site information common to multiple studies.
The working group identified different kinds of evaluations that called for different evaluation strategies. They distinguished (1) early "proof of concept" studies that tested the basic feasibility and logic of the intervention; (2) assessment studies to further demonstrate operational feasibility and perceived value in the field; and (3) clinical trials that more rigorously collect and analyze data on the intervention's effect.
Finally, the group identified central questions to "measure the impact of telemedicine within the national health care delivery system" in six key areas or domains. The questions were
- Are acceptable clinical outcomes associated with the use of telemedicine?
- Is the system technically acceptable?
- How well is the system integrated into the overall health system?
- What are the costs and benefits in day-to-day operations? Is the system affordable?
- Will patients and providers accept and value telemedicine-enabled care?
- Will the use of telemedicine improve access to health care?
Department of Defense
Work initiated in the Department of Defense (DOD) has played an important role in the activities of the Joint Working Group.1 Within the DOD, the Assistant Secretary of Defense for Health Affairs established the Telemedicine Testbed to promote and manage digital telecommunications technologies within the Military Health Services System. The Army Surgeon General and the Commander of the Army Medical Research and Materiel Command at Fort Detrick, Maryland, were designated to lead these efforts with the Medical Advanced Technology Management Office, which includes personnel from the three services, coordinating activities at the program level.2
Testbed results are intended primarily to inform military decisionmakers, but an important additional objective is to collaborate with and contribute information relevant to civilian decisionmakers (Zajtchuk, 1995; Chestertown Roundtable, 1995). For example, the Army convened a two-week session at Tripler Army Medical Center that focused on the construction of clinical indicators for evaluation. Tripler is the hub of a developing telemedicine system that is to cover military operations around the Pacific Rim and to involve other governmental and private organizations in improving access to care in more remote parts of the region.
As a relatively self-contained system, the military offers a number of attractions for evaluators compared to the civilian sector. These include a large, defined population; an integrated health care delivery and financing system; salaried, full-time personnel; freedom from state regulation; multiple sites for comparing care alternatives and providing data; well-developed research and development
(R&D) resources; integrated medical records and better access to follow-up data on patients; and a command structure that can promote cooperation across diverse sites. As a major purchaser of information and communications technologies, the military also has the leverage to stimulate the development of better vendor data on the effectiveness and costs of relevant hardware and software.
The initial DOD work has focused on evaluation strategies and tools and early "proof of concept" and description evaluations. For example, during U.S. operations in Somalia in 1993, the Army tested several technical, clinical, and administrative components of a new telemedicine support system for troops deployed in military or peacekeeping missions. A model is being developed to define medical specialty requirements for deployed troops, but it should also be applicable during peacetime.
Department of Veterans Affairs
Like the military, the health care system operated by the Department of Veterans Affairs (VA) has characteristics that are attractive to those evaluating telemedicine. In addition, the VA has, over several years, developed a fairly comprehensive and flexible patient information system that can integrate text, test results, and images in a computer-based patient record (Dayhoff, 1996).
Taking advantage of the system's health services research capacity and its link with academic medical centers, clinicians at a number of medical centers are already engaged in or planning evaluations of various clinical applications of telemedicine. For example, this chapter cites activities involving the VA Medical Centers in Baltimore and Palo Alto.
The committee learned that a task force had proposed a coordinated telemedicine evaluation strategy for the Department of Veterans Affairs but that no decision had been made to adopt and implement it. The department has, however, moved to inventory the system's telemedicine activities. Results indicate that most VA Medical Centers (VAMCs) either have some kind of operational telemedicine program or are planning one (VA, 1996). Beyond ordinary telephone-based consultation and triage, the most common clinical application appears to be teleradiology. Interest in telemedicine applications is likely to grow as a result of recent policies to shift more care from inpatient to outpatient settings and to encourage
more regional coordination among medical centers (McAllister, 1996).
Health Care Financing Administration
As part of a continuing initiative to evaluate telemedicine and inform Medicare coverage policies, the Health Care Financing Administration (HCFA) has supported a number of projects intended to provide general guidance for telemedicine evaluations as well as data on the effects of particular applications of telemedicine. Two of these projects are described immediately below. A third (involving the University of Michigan and the Medical College of Georgia) is described later.
Center for Health Policy Research
In 1993 and 1994 with support from HCFA, Grigsby and his colleagues at the Center for Health Policy Research (CHPR, which is affiliated with the Center for Health Services Research and the University of Colorado Health Sciences Center) presented a series of four reports on telemedicine (Grigsby et al., 1993, 1994a,b,c). (Their specific research projects are described later in this chapter.) HCFA asked the CHPR to develop an evaluation framework and a general strategy for assessing the effects and effectiveness of telemedicine. The key components of the framework included
- a conceptual framework with three dimensions (technological adequacy, medical effectiveness, and appropriateness);
- a taxonomy and classification of telemedicine applications; and
- recommendations for telemedicine research on medical effectiveness, cost, access, utilization, acceptance, payment, and related issues.
The researchers noted that although telemedicine is more than simply hardware and software, "a crucial aspect of the conceptual framework … is the delineation of a method for establishing minimally acceptable system parameters and standards for hardware and software" (Grigsby et al., 1993, p. 3.2). Such evaluations can provide important information for those responsible for developing national
and international standards, as described in Chapter 3. Technological adequacy was not directly defined but informally described as whether a technology is "good enough for now" for the intended purposes and circumstances. The researchers argued that evaluators need better strategies for assessing the adequacy of the
- input data—including its quality (e.g., image resolution, sound quality), the speed of the equipment for encoding and delivering it to the main transmission medium, and the quality of any data compression and other pretransmission modification of the data;
- transmission of data—based on the bandwidth (information carrying capacity) of the communications medium, its cost, and practicality; and
- display of data received—including the quality of the images, sound, or other information, and the options for enhancing or otherwise manipulating the information (e.g., increasing or decreasing contrast).
The CHPR discussion of medical effectiveness is consistent with the definition offered in Chapter 1 (results under normal conditions of use) and emphasizes the need for comparison with conventional services. The discussion focuses on practical strategies such as (a) narrowing the range of conditions and indicators of effectiveness to be studied; (b) establishing minimal levels of diagnostic accuracy for particular applications and conditions; and (c) assessing the appropriateness (a combination of effectiveness, cost-effectiveness, and acceptability to patients and physicians) of using a technology in specific health care environments (e.g., rural areas) and for specific clinical problems and types of patients (e.g., gynecological examinations).
The proposed taxonomy sorted telemedicine applications according to the level of evidence or consensus about their effectiveness, a key criterion for coverage. Applications or aspects of evaluations can be described as (a) effective; (b) probably effective; (c) not demonstrated as safe and effective; or (d) new and untested.
For purposes of HCFA coverage policy (as governed by statutes and regulations), even the first category—applications that are judged effective—may raise additional questions about implementation and economic impacts that warrant pilot tests designed to guide explicit
coverage decisions and monitoring strategies. Examples of such questions include how to structure supervision, consultations, and payments for nonphysician primary care providers in remote sites. The "probably effective" category of applications generally has not been the subject of full-fledged evaluations that describe basic characteristics of the applicants' implementation and impact. Telepsychiatry falls into this category. The third category (not demonstrated as safe and effective) includes applications for which procedures or standards for safe and effective use have not been established or sufficiently refined to warrant routine use. For example, in radiology, some consider the safety and effectiveness of digital mammography inadequately documented, although the technology is being employed on a limited basis already. The final category of new and untested technologies includes those that are clearly experimental such as remote surgery.
Grigsby more recently suggested that three coverage-relevant categories would be sufficient: (1) effective; (2) probably effective but with unknown effect on the health care system (e.g., increased costs); and (3) not demonstrated as effective (and with serious ramifications if ineffective) (personal communication, March 7, 1996). In addition, in a recent article, Grigsby and colleagues proposed three key questions for evaluation (Grigsby et al., 1995, pp. 126–127): (1) Are specific telemedicine applications medically effective means of delivering health care? (2) What are the costs involved in specific telemedicine applications, and are these applications cost-effective means of providing health care? (3) What processes of telemedicine are associated with optimal health outcomes? The group also proposed two key policy questions: (4) Can appropriate use be defined? and (5) How should payment for telemedicine services be handled?
Telemedicine Research Center
One of the problems in evaluating telemedicine applications is the small number of cases generated by most demonstration or pilot projects (Crump and Pfeil, 1995; Perednia, 1995). The Telemedicine Research Center, an independent, nonprofit organization located in Portland, Oregon, has created a Clinical Telemedicine Cooperative Group (CTCG) to promote the pooling of information from multiple telemedicine evaluations (Perednia, 1996). Taken as a whole, the center's work includes both elements of an evaluation framework
(e.g., generally applicable concepts and protocols) and actual evaluations (e.g., compilation and analysis of data).
Working from a model provided by cooperative oncology research networks and with funding from HCFA, the CTCG involves subscribers (e.g., research projects) who are permitted to use the research tools (e.g., questionnaires) developed by the group in exchange for a small subscription fee and an agreement to contribute their data for aggregation and analysis (Perednia, 1995). The subscribers' research projects should have some common components (e.g., certain questions asked of patients), although they might differ in other respects.
Such efforts need to assess when data (e.g., patient satisfaction, utilization rates) can be meaningfully aggregated for disparate applications or when the data are meaningful only if like uses are pooled. The latter category includes information on accuracy rates for specific diagnoses or outcomes for specific clinical applications. For a multisite cooperative evaluation effort, these different situations affect the choice of questions asked and the way responses are analyzed. Obviously, questions about characteristics of a skin lesion make no sense for telepsychiatry sites. On the other hand, questions about clinician acceptance of a technology might be analyzed in pooled form (i.e., for disparate programs) as well as for distinct applications (e.g., for teledermatology visits only or for just the store-and-forward teledermatology data). Researchers also should be sensitive to limits on pooling data created by other differences in research design and methods. These limits have been discussed in the context of growing use of formal meta-analyses (see, e.g., Eysenck, 1994; Greenland, 1994; Bailar, 1995).
The Telemedicine Research Center also sponsors an on-line computer information service, collaborates with other researchers, and develops tools to support telemedicine evaluations.3 One of these tools is the Evaluation Question Hierarchy and associated software, which are designed to generate specific questions tailored to particular research problems, to streamline the process of questionnaire
The Telemedicine Information Exchange (TIE) can be found on the Internet at http://tie.telemed.org/. It provides links to many other telemedicine information sites sponsored by governmental, university-based, commercial, and other organizations. In early 1996, it had about 300 Internet sites linked to it.
construction so that researchers do not have to start anew on each project, and to encourage efforts to pool data from multiple projects. The software links four kinds of factual questions to several policy questions (e.g., was a program cost-effective?). They attempt to identify.
- what happened in association with the intervention or control situation;
- what the financial impact was;
- what the clinical impact was; and
- how participants reacted.
Examples Of Individual Research Strategies
This chapter began by noting the dearth of rigorous evaluations of clinical telemedicine. Although this study was intended to develop an evaluation framework rather than to draw conclusions about the effectiveness of telemedicine, the committee did search for research overviews and published reports that might provide models or lessons for its work. Most of the reports it reviewed focused on technical quality or feasibility (what was termed above "test of concept" studies) with few addressing effects on health outcomes, process of care, access, or costs.
The literature reviews consulted by the committee include extensive reviews conducted by the Center for Health Policy Research under its contract with the Health Care Financing Administration (Grigsby et al., 1993, 1994a,b,c; Grigsby et al., 1995) to determine whether the literature "supported the use of telemedicine as a safe, medically effective set of procedures" (Grigsby et al., 1995, p. 116). The investigators found relatively few peer-reviewed studies, only a limited amount of work in progress, and a highly varied mix of research approaches and targets with no replications or cross-validating studies (Grigsby et al., 1995, p. 117). Another review of the literature by Sanders and Bashshur yielded similar conclusions: "Much of the appeal [of telemedicine] remains intuitive and based on fragmentary rather than systematic empirical research" (Sanders and Bashshur, 1994, p. 7).
The committee's review of the literature was designed primarily to identify different research strategies and useful research tools. Unfortunately, this review experienced the same difficulties found by
the substantive literature reviews, that is, a modest research base, limited documentation of methods, and research designs changing during implementation. Nonetheless, the committee reviewed a number of planned or completed projects and selected several as illustrative of the more rigorous approaches taken by some investigators. These projects are briefly described below with an emphasis on their purposes, designs, and difficulties.
Studies to Compare Digital versus Conventional Images
Perhaps the largest quantity of systematic comparative telemedicine research has dealt with medical images viewed on electronic workstations compared to conventionally viewed images (e.g., radiology films, glass pathology slides) or direct patient examination, for example, of skin lesions. Researchers in radiology, in particular, have accumulated considerable experience in evaluating the quality of digital imaging and image transmission compared to the "gold standard" of conventional film images (Grigsby, 1995a).
Early studies in the 1970s and 1980s generally found that images produced via teleradiology were not of acceptable quality compared to film images (Gitlin, 1986; Grigsby et al., 1993). More recent research employing improved equipment is producing good results for a variety of uses, and researchers are continuing to explore the strengths and limitations of teleradiology for specific clinical problems, settings, and purposes (see, e.g., Decorato et al., 1995; Mun et al., 1995; Roponen et al., 1995; Wilson and Hodge, 1995).
Work at Johns Hopkins University illustrates the shift in research results. In an initial effort to assess the acceptability of digital images for primary interpretation by emergency department physicians, researchers selected images from their radiology library based on their clinical importance and difficulty and also selected a comparison group of less challenging images (Scott et al., 1995). Based on comparisons involving four different groups of readers (staff radiologists, emergency physicians, radiology residents, and emergency medicine residents) using the relatively low resolution monitors then available, they concluded that the teleradiology images were not satisfactory for primary interpretation. Later work at the same institution using the same general strategy described above but employing more advanced equipment for digitally producing, transmitting, and displaying images (in particular, higher resolution monitors) has
shown agreement for primary diagnosis between film-based interpretations and those done using electronic work stations (Gitlin, 1996).
The contrast in results for the earlier and later studies of teleradiology underscores the difficulties of conducting research—and making technical, clinical, and financial decisions about equipment and software purchases—when technologies are changing and improving rapidly. The committee understands that the major debates about the quality of digital images in radiology now involve mammography, subtle skeletal problems, and some pulmonary applications (Mun et al., 1995; Wilson and Hodge, 1995). According to one expert, "what now needs major assessment is the effect of teleradiology on patient management and outcome," including the timeliness of care and cost-effectiveness (Franken, 1996).
Some work is under way to assess the impact of digital radiology systems on productivity. At the Baltimore Veterans Affairs Medical Center, investigators collected baseline data for three months before implementation of their institution-wide digital imaging system and then collected data again after the system had been in place for a year (Siegel, 1996). The results indicated that although it took a radiologist about 40 percent more time to use a computer work station rather than a conventional viewing system set up by technicians, overall productivity increased by about 25 percent. The investigators attributed this increase to several factors, including better workload sharing, home access to images, fewer interruptions, quicker and more organized access to previous images and reports, and elimination of time spent waiting for film to be developed. The time between taking an image and its interpretation also dropped. Survey information suggested that other clinicians were consulting less with radiologists because they had "bedside access" to images, and this led to steps to encourage such consultations.
Dermatologists and pathologists have also been active in imaging research (see, e.g., Krupinski et al., 1993; Perednia and Brown, 1995; Barnard and Middleton, 1995; Menn and Kvedar, 1995; Seykora, 1995). This work has involved not only the comparison of digital and conventional images but also the comparison of video and still images and the comparison of images with direct physical examination.
Dermatologists use color and texture information in diagnosis, and the requirements for diagnostic quality color images are still
being explored.4 The committee's site visits turned up two image evaluation studies in dermatology. A project at the Oregon Health Sciences University (with assistance from the Telemedicine Research Center and funding from the National Library of Medicine) is intended (a) to verify that electronic images can be used to make accurate diagnoses and (b) to determine the minimum technical specifications required for the capture of diagnostically relevant information (Perednia and Brown, 1995). Clinical photographs with proven diagnoses will serve as the "gold standard" against digital images acquired, transmitted, or stored under varying conditions (e.g., different video formats, different color resolution). In another project, Stanford researchers will compare image quality for three groups of patients: one involving real-time video consults; a second using store-and-forward video technology using technicians to acquire the images; and the third relying on face-to-face examinations (Barnard, 1995).
Evaluations of Automated, Telephone-Based Services
As noted in Chapter 2, nonvideo applications of telemedicine are common in the form of telephone calls initiated by patients or clinicians. A variety of telephone-based computer-assisted services have also been developed over more than two decades (see, e.g., Greenlick et al., 1973; Muller et al., 1977; Alemi and Stephens, in press). These include automated systems that call to remind people of scheduled appointments, programs that provide recorded health information, and others that monitor patient status and record voice answers or touch-tone telephone responses.
Alemi and colleagues at Cleveland State University have reported on a number of attempts to use quasi-experimental research designs to evaluate the effectiveness of these kinds of services. One study
examined a telephone-based health risk assessment program designed to inform students about their risk levels for blood pressure, seat belt use, and other factors. Evaluators first randomly selected and interviewed control subjects prior to the introduction of the program and then randomly selected experimental subjects to participate in the program (Alemi and Higley, 1995). The sequence was designed to avoid contamination of the control subjects because once the experimental program was available it could be shared with those who were not part of the formal test group. Those in the test group who used the program (71%), when interviewed later, reported higher satisfaction with the experimental system than the control group reported for their current sources of health information. The experimental group, however, reported that the risk information was redundant in that they were already aware of their status on most risk factors.
Another study of an automated monitoring system did not employ randomization but rather used a single-group time-series design that provided for weekly computerized telephone interviews over a nearly five-month period and also for mail surveys during the 4th, 10th, 14th, and 18th weeks (Alemi et al., 1994). Response rates for the telephone interviews were higher than for the mailed surveys (1994).
In a third study, investigators randomly assigned pregnant patients with a history of drug use to participate in experimental and control groups (Alemi and Stephens, in press). Both groups received certain services including case management and obstetrical care. The experimental group also received a variety of telephone-based computer-assisted services including information and support services using automated reminder and other calls, conference calls, and voice mail arrangements. A forthcoming issue of the journal Medical Care is being devoted to reports on this third set of studies.
Describing Deployment Telemedicine
As discussed earlier in this chapter, the Department of Defense has been working to develop a coherent evaluation strategy for telemedicine. It has already accumulated considerable practical experience in telemedicine consultations involving a number of its major medical centers. The experience with deployed troops was recently described in an article by Walters (1996). She retrospectively
analyzed all 171 telemedicine consultations received from deployments in Somalia, Macedonia, Croatia, and Haiti between February 1993 and March 1995.
In this study, a third of the records were excluded for lack of key data (e.g., the consultant's report), although this problem diminished over time to the point that all records were complete for Macedonia. Follow-up information on patients was not available, nor were comparisons possible with patients seen at the deployed hospital. In addition, there were no data on patient or provider satisfaction, response time, or costs. The majority of consults were for acute problems that were not emergencies, and the most frequent questions involved recommendations for further treatment. Dermatology was the specialty most often consulted (suggesting that perhaps dermatologists ought to be routinely deployed). Reviews of records by experts suggested that the consultation significantly changed the diagnosis in 30 percent of the cases and significantly changed the treatment in 32 percent of the cases. Change was more likely for seriously ill patients. The expert reviewers concluded that the consult was essential or prevented evacuation in about 10 percent of the cases and was not needed in an equal percentage; it was helpful or significantly helpful in the majority of the other cases. Consults dropped off after deployed physicians familiar with the program were rotated out.
This study highlighted issues identified in other studies. One was the problem of sustaining telemedicine consultations over time because the initial group of trained participants left, because the technology was awkward, or because participants at distant sites learned enough during initial consultations to handle subsequent patients. It also demonstrated the difficulty of conducting prospective studies and, even with retrospective studies, of tracking patient outcomes. The "difference in diagnosis" variable and expert judgments were used because data on differences in health outcomes were unavailable or too costly to collect.
Research Under Way on Teledermatology Services for Rural Areas
As mentioned above, dermatology has proven to be a major generator of telemedicine consultations. One project at the Oregon Health Sciences University (OHSU) was described above. Researchers
there are also testing dermatology consultation services in rural sites over a two-year period. The project will test "whether this technology will improve the process of health care delivery by increasing information flow and reducing isolation; improving the provision of dermatologic care; and increasing the primary care provider's knowledge of dermatology" (Perednia and Brown, 1995, p. 46). The project is also designed to develop an application with the potential to sustain itself once federal funding ceases. It relies on ordinary phone service and off-the-shelf equipment that, although not necessarily shareable for other telemedicine applications, is inexpensive to operate.
The experiences of the project investigators illustrate the difficulties of conducting research in distance medicine. For example, attracting and maintaining multiple remote research sites has been difficult. Involvement may depend on personal links (e.g., between a rural physician and the university from which he or she graduated) that may disappear (or at least be interrupted) with retirements or similar events. Phone companies have not been eager to extend improved telecommunications technologies to rural areas, where even basic phone service is sometimes hard to obtain.5 Competition and other financial pressures are leading health care providers to reduce funding of continuing medical education for staff and withdraw from participation in the telecommunications network that was also to be used for telemedicine.
Three Research Initiatives on Effectiveness and Cost-Effectiveness
The committee discovered several research projects that were intended to apply more rigorous methods to the evaluation of telemedicine and to extend the focus beyond description and feasibility assessments to effectiveness and cost-effectiveness. The three described below illustrate different strategies.
In addition to its other contracts with CHPR, HCFA has also
contracted with the center to evaluate the medical effectiveness and cost-effectiveness of telemedicine for routine consultative services, medical-surgical follow-up, and management of chronic illness. The study will involve all HCFA telemedicine demonstration sites and, as needed, other sites that are able to participate (up to a total of 15 programs).6 Patients who receive telemedicine services will be compared with those receiving conventional consultations in a set of comparable control facilities. The goal is to accumulate a total of 2,400 cases (half telemedicine patients and half a comparison group matched insofar as possible for clinical and demographic characteristics).
In conjunction with the Clinical Telemedicine Cooperative Group (CTCG, above), CHPR will develop computerized data collection instruments focused on episodes of care over a nine-month period. The plan is to collect data on (a) fixed and variable program costs; (b) use of services by participating patients; (c) patient demographic characteristics and clinical history; (d) presenting symptoms and complaints; (e) health status; (f) symptom distress; (g) functional capacity; (h) symptom resolution; and (i) characteristics of the consultation. Information collection will involve abstraction of information from patient records, telephone interviews with patients, Medicare records, and other sources.
In a second, experimental phase of the study, CHPR will randomly assign patients to one of four interventions: telephone consultation only; still images with audio or text; interactive video; and face-to-face consultation. The objective is to compare the effectiveness of the alternatives and to identify the marginal effects and costs of each of the additions of information (e.g., shifting from audio only to audio plus still images).
In another HCFA funded project, researchers at the University of Michigan have been collaborating with researchers at the Medical College of Georgia on a project that is intended to both develop a model research methodology and implement it using sites in Georgia and West Virginia (Sanders and Bashshur, 1995). The components of the model (which is a kind of evaluation framework) include the
research question, the research design, the data collection instruments, and the data analysis plan. The proposal emphasizes the need to consider not just individual applications or technologies but the system of care in which they are embedded. The research hypothesis stated that telemedicine would improve access, enhance the quality of care, and contain costs. To investigate this hypothesis, the project devised a matrix that included both client and provider perspectives on each of these outcome areas.
The design also draws from educational evaluations the concepts of formative and summative evaluations (Bashshur et al., 1975; Bashshur, 1995; see also Weiss, 1972; and Rutman, 1980). Formative evaluations are primarily descriptive, focus on immediate or short-term outcomes, and attempt to identify operational problems, including departures from the program as originally designed. They often emphasize what some call the proof or test of concept (referred to above), that is, the basic operational feasibility of an application with which users have little relevant experience. In contrast, summative evaluations tend to focus on programs or applications that are better established. They attempt to discern longer-term effects (including unanticipated or unwanted effects) and provide an overall assessment of whether the program achieved its objectives.
Although these concepts usually are employed to describe different stages in the evolution of research on a topic, this design incorporates both formative and summative aspects. Thus, one phase of research emphasizes the importance of descriptive information on the program's structure (hardware, software, staffing, support systems), the problems encountered, and efforts to resolve them. The other phase originally provided for a quasi-experimental study of clients and providers in two experimental and two control sites and an additional case-control study of episodes of care with and without telemedicine.
As the project has developed further, the methodology has shifted to reflect practical difficulties in implementing the project and in response to requests from the HCFA, the primary funder. The emphasis in West Virginia is on Medicare and financing issues. In addition to an empirical analysis of cost, quality, and access for Medicare inpatients, the project will use a dynamic simulation model to estimate effects in more detail using existing data, expert clinical opinions, and theoretical assumptions. The primary theoretical component
for the financial analysis draws on "real options" analysis and operations theory. Real options analysis is designed for situations in which the size and timing of future cash flows is highly uncertain (as is often true for telemedicine) and the use of conventional net-present-value analysis is less applicable (Trigeorgis, 1995).
In a third study funded primarily by the Office of Rural Health Policy, the University of Washington (as part of its involvement in the multistate WAMI—Washington, Alaska, Montana, and Idaho—consortium) has developed a demonstration project involving diverse rural sites and a university-based specialty consultation network. It employs a multisite pretest, posttest research design to assess the feasibility, acceptability, and cost-effectiveness of a telemedicine network (WAMI, unpublished project description, 1995).
As this report was being drafted, the project was in the pretest data collection stage in four quite different sites (ranging from a 22-person physician group to a site staffed by two physicians and a physician assistant). Researchers were preparing an inventory of existing telemedicine links and on-site specialty consultations (by out-of-area practitioners). They were also developing comparative cost information for on-site consultations. The included specialties are radiology, cardiology, dermatology, mental health, obstetrics/paranatology, orthopedics, pediatrics, emergency/trauma care, and neurosurgery. Baseline provider and administrator survey data have been collected.
This project illustrates some of the practical tools that may be used to determine whether the project was implemented as planned and to identify problems that arose during implementation. Remote site participants were to keep detailed logs to capture mostly qualitative data about the various steps involved in putting the telemedicine system into place. In addition, encounter forms were to be generated for every telemedicine contact to track information about the patient, provider, clinical problem, process of care, costs (including grant costs, patient or provider expenses, and in-kind contributions) and difficulties experienced with the equipment or other aspects of the consultation. The project researchers would periodically reinventory telemedicine linkages to track changes, survey patient and provider satisfaction, and collect general comments about user experience. Researchers stated that they would try to develop data collection instruments identical or comparable to those of the Clinical
Telemedicine Cooperative Group described earlier (see earlier section on the Telemedicine Research Center).
For the series of early demonstration projects funded in the 1970s, the awkward equipment, feasibility oriented projects, small numbers of patients, and high cost per patient served discouraged a sustained program of systematic development and research in telemedicine and apparently contributed to the disappearance of most of these projects. In the late 1980s and 1990s, as the technological base advanced and became more practical to use and as support for outcomes research and clinical evaluations gathered momentum, demonstration projects blossomed once again. The committee's discussions with those now involved in telemedicine evaluations suggest that they continue to face problems of small numbers of cases. In addition, securing relevant and comparable evaluation sites can be difficult given special data collection requirements, differing organizational and professional priorities, and reimbursement limits.
The committee was encouraged by the increased attention to evaluation by government agencies, health care organizations, and researchers and by efforts to develop creative strategies for overcoming or compensating for difficulties in undertaking sound evaluations. This work provides an important starting point. Much, however, remains to be done to build evaluation into telemedicine programs and to see more well-designed and well-executed studies of specific applications carried to conclusion. The next chapter presents the committee's framework for such studies.