A Framework for Planning and Improving Evaluations of Telemedicine
In some respects, telemedicine is still a frontier. Rigorous evaluative discipline can be difficult to apply amidst the effort and enthusiasm that comes with developing projects, coping with immature technologies, gaining financial or political support, or building new markets. Systematic evaluations require time to plan, fund, and implement, and the evaluation projects inspired by the recent resurgence of interest in telemedicine generally have yet to be completed and reported. As a result, the models and information available to the committee were limited, although the committee learned much from the work that has been done.
Continued improvement in the field will depend on agreement by those interested in telemedicine that it is important to invest in systematic evaluation of telemedicine's effects on the quality, accessibility, cost, and acceptability of health care. The evaluation framework presented in this chapter attempts to relate broadly accepted strategies of health services research and evaluation research in general to some of the challenges and problems in evaluating telemedicine that have been described in preceding chapters.
Starting with the general principles set forth in Chapter 1, the committee devised several principles more specific to the task of developing the evaluation framework for clinical applications of telemedicine. First, evaluation should be viewed as an integral part
of program design, implementation, and redesign. Second, evaluation should be understood as a cumulative and forward-looking process for building useful knowledge and as guidance for program or policy improvement rather than as an isolated exercise in project assessment. Third, the benefits and costs of specific telemedicine applications should be compared with those of current practice or reasonable alternatives to current practice. Careful comparison is the core of evaluation.
Fourth, the potential benefits and costs of telemedicine should be broadly construed to promote the identification and measurement of unexpected and possibly unwanted effects and to encourage an assessment of overall effects on all significant parties. Fifth, in considering evaluation options and strategies, the accent should be on identifying the least costly and most practical ways of achieving desired results rather than investigating the most exciting or advanced telemedicine options. Sixth, by focusing on the clinical, financial, institutional, and social objectives and needs of those who may benefit or suffer from telemedicine, evaluations can avoid excessive preoccupation with the characteristics and demands of individual technologies.
The committee recognizes that actual evaluations face a variety of methodological, financial, political, and organizational constraints. Nonetheless, based on its review of current applications and evaluations, the committee believes that considerable improvement can be achieved in the quality and rigor of telemedicine evaluations and, thereby, in the utility of the information and guidance they provide to decisionmakers.
Planning For Evaluation
Before presenting the evaluation framework, the committee thought it was important to underscore the significance of systematic planning for evaluation. Evaluation is too often an afterthought, considered after the seemingly more important issues of putting a program together are settled. This approach jeopardizes the potential for the evaluation plan, the program plan, and program implementation to operate together to answer questions about the program's benefits and costs. For example, an effort to assess whether a telemedicine application is likely to be sustainable after a demonstration period will be more useful if the conditions for sustained
operation are considered in planning the personnel, procedures, organizational linkages, outcomes and financial data, and other aspects of the test application. Although evaluation strategies must necessarily be tailored to fit the policy or management concerns and the characteristics of different fields (e.g., education, public safety, health care), certain questions, concepts, and steps are common to the planning of successful evaluations. They include
- establishing evaluation objectives;
- setting priorities for the selection of specific applications to be evaluated;
- assessing the probable feasibility of an evaluation, including the availability of adequate funding and the likelihood of adequate cooperation from relevant parties;
- identifying the particular intervention to be evaluated, the alternatives to which it will be compared, the outcomes of interest, and the level and timing of evaluation;
- specifying the expected relationships between interventions and outcomes and the other factors that might affect these relationships; and
- developing an evaluation strategy that includes a credible and feasible research design and analysis plan.
This list reflects several decades' worth of work in many disciplines to create scientifically respectable evaluation strategies that are also useful to decisionmakers and feasible to implement (see, e.g., Suchman, 1967; Weiss, 1972; NAS, 1978; Cook and Campbell, 1979; Sechrest, 1979; OTA, 1980a; Rutman, 1980; Wortman, 1981; Tufte, 1983, 1990; Rossi and Freeman, 1989; Mohr, 1988; Flagle, 1990; Wholey et al., 1994). Although this report was not intended to be a how-to-do-it manual, or to duplicate existing texts, the discussion below briefly discusses the above steps. Readers should, however, consult the references cited above—as well as those cited below and in the preceding chapter—for more detailed guidance.
Establishing Evaluation Objectives
Ideally, evaluation needs will be considered in the early stages of planning for pilot programs. This implies the identification of clear
objectives for the program, the stipulation of results that would indicate whether the program has met its objectives, and the specification of steps to collect relevant data about the program's operations and effects.
Several important questions will ordinarily be considered in establishing the objectives for a particular evaluation. They include: What kinds of decisions may be affected by the results? Who will be the primary users of evaluation results? Who is sponsoring the evaluation and why? Who else has a major stake in the evaluation results?
Determining the objectives—and, thus, the important questions to be answered or concerns to be addressed—for a particular evaluation may not be completely straightforward. In some cases, programs and activities evolve incrementally without much attention to well-argued rationales. Moreover, stated rationales may not always capture program goals, perhaps because the goals have not been carefully thought through or perhaps because underlying motivations are somewhat different from those that are declared. The latter situation may require that study designs be sensitive to political currents. In any case, investigators should seek to determine either what their target program was originally intended to accomplish or what objectives it may serve in the current environment (regardless of the past) or both.
Even if program objectives are relatively clear, other considerations such as evaluation feasibility and anticipated concerns of possible future funders may influence the choice of specific evaluation questions. Government agencies, private foundations, and vendors will usually have interests related to public policies or market strategies that go beyond those of specific demonstration sites. Although project objectives can sometimes be stated in some order of priority, how they will be balanced and what trade-offs will have to be considered may be difficult to specify precisely in advance.
The varying interests of project and evaluation sponsors are reflected in the expectations for the telemedicine projects supported by different federal agencies. For example, the Office of Rural Health Policy focuses on quality, accessibility, and cost of health care in rural areas. Although the Health Care Financing Administration is also interested in quality and access, its sponsored projects are intended primarily to provide information that will help the agency
formulate payment policies for Medicare. These differences in interest notwithstanding, federal agencies have been working together (as described in the preceding chapter) to formulate an umbrella framework for project evaluation that is intended to make it easier to aggregate conclusions from individual evaluations.
As is true for any activity, resources for evaluating telemedicine applications are limited, and funding for an evaluation may compete with funding for the services to be evaluated. Making the case for research to distinguish what works from what does not is easier in theory than in practice, for example, in cases when decisions have to be made between funding patient care at higher levels or funding program evaluation.
Those sponsoring or conducting evaluations generally have to consider priorities for the use of limited resources in making two kinds of decisions: selection of topics and selection of evaluation strategies or methods. Topic selection is often handled quite informally, but a more formal or explicit process of setting priorities may help decisionmakers focus limited resources more rationally. Several core questions are generally relevant to any priority-setting exercise (IOM, 1992b, 1995b). These questions, which are framed below in terms of possible clinical applications of telemedicine, include
- How common is the telemedicine application now? How common is it likely to be?
- How significant is the problem addressed by the application?
- prevalence of the problem
- burden of illness (e.g., mortality, quality of life)
- cost of managing the problem
- variability across regions or population subgroups
- What is the likelihood that evaluation results will affect decisions about adoption of the application, its integration into routine operations, and other missions of the venture?
- Will the study wastefully duplicate or constructively supplement conclusions from other evaluations?
Most of these considerations assume a societal or policy-level perspective. They are most likely to be raised by organizations such
as the Department of Defense, the National Library of Medicine, and the Office of Rural Health Policy that fund a variety of telemedicine projects and have a formal commitment to program evaluation. Nonetheless, health plans, health care delivery organizations, and vendors of communications and information technologies may also consider similar questions in determining where they will direct resources for systematic analysis and evaluation.
The resource issues in selecting a research design revolve around three basic questions. First, what are the costs associated with different research strategies? Second, what are the costs of a strategy relative to its potential to provide answers to the evaluation questions? Third, is the cost of the evaluation strategy reasonable in relation to the potential costs and benefits of the application or program to be evaluated?
In practice, evaluations often follow targets of opportunity. That is, they are designed to take advantage of the programs or capacities of an established institution or the political appeal of certain topics. For example, if a medical center has an energetic and determined specialist willing and able to design an application and secure funds, that person's project may take priority over applications with (theoretically) more organizational relevance. Likewise, if demonstration funds are confined to projects involving rural areas, urban applications with more potential benefit may be neglected.
Determining the Feasibility of Evaluation
In addition to costs, a number of other factors may affect the feasibility of an evaluation. Some factors have behavioral or political aspects. These include whether those responsible for the application or program in question will cooperate and whether the intended beneficiaries of a program will agree to provide the information needed from them. A different but possibly relevant question is whether the intended audience for the evaluation will be receptive to results that may run counter to their preferences or self-interest.
Other considerations are quite practical. Will the needed information be available on a timely basis? If not, what steps would need to be taken to provide it, and how long would it take to implement those steps? Are the time demands for information collection excessive for program staff or beneficiaries?
Another practical issue involves the timing of an evaluation. Because most evaluations look for effects within a relatively short period of a few months or perhaps two to three years, timing can be a problem if the key results emerge over a longer term and if short-term outcomes are not good proxies for these long-term results. Moreover, evaluating a program before start-up problems are resolved may produce misleading results. Evaluating a program too late may also lead to problems if, for example, users are so comfortable with an intervention that they will not agree to be part of a control group not subject to the intervention.
Feasibility assessments are also relevant to decisions about the alternatives to which the telemedicine application will be compared. If the preferred comparison sites will not or cannot participate, the comparison group may have to be the experimental group before the telemedicine application is initiated or after it has been concluded. This kind of before-and-after single group design is a relatively weak evaluation strategy, although measures taken at multiple points before, during, or after the telemedicine test will strengthen the design (see, e.g., Cook and Campbell, 1979).
The choice of appropriate comparisons will depend, in part, on whether the application is in the earlier or later stages of development. For example, when image quality is yet to be established, an evaluation may compare diagnoses based on digital images with diagnoses based on conventional film-based images or direct patient examination. The next stage would extend the evaluative focus to consider other issues of quality, access, cost, patient and clinician acceptance, and feasibility in real practice settings. For example, in a project described in Chapter 5, physicians in one set of rural practices would be able to consult on dermatology problems via telemedicine while physicians in another set of practices would continue their traditional referral patterns. In some cases, the alternative might be doing nothing, but only if that is what would be expected in the absence of a program.
Although general methodological and statistical principles exist to guide a multiplicity of evaluation tasks, no "one size fits all" evaluation plan exists. For example, if an evaluation is an early "test of concept" to determine the basic technical and procedural feasibility of a telemedicine application (e.g., home health monitoring), the research design and measures will likely differ from a later project
intended to help decisionmakers decide whether the application should be adopted as a regular service of a health care organization.
Elements Of An Evaluation
The committee identified several basic elements that should be considered in planning and reporting an evaluation, whether that evaluation is very tightly focused or broader in scope. These elements include
- Project description and research question(s)
- Strategic objectives
- Clinical objectives
- Business plan or project management plan
- Level and perspective of evaluation
- Research design and analysis plan
- characteristics of experimental and comparison groups
- technical, clinical, and administrative processes
- measurable outcomes
- sensitivity analysis
- Documentation of methods and results
Although these elements are necessarily described individually and sequentially below, the development of an evaluation plan involves the continuing interplay and rethinking of elements as their conceptual and practical implications are assessed and reassessed. Moreover, during implementation, evaluators often find they need to revise the evaluation plan. In sum, the process of planning and implementing an evaluation flows logically but not always in a strictly linear fashion.
Project Description and Research Questions
The project description identifies the application that is being evaluated and the alternative(s) to which it is being compared. For example, the application might be described concisely as a dermatology consultation program using a one-way video and two-way audio link between a consulting center and two rural primary care sites. Two other rural sites would maintain their existing consulting practices. A thorough program description would more precisely and
completely identify the characteristics of the telemedicine and comparison services including relevant hardware and software employed, restrictions on the clinical problems or patients to be studied, the length of the project, and the project personnel.
Specifying the basic research question or questions—the hypothesized link between the program intervention and desired outcomes—is a critical evaluation step. It encourages systematic thinking about how program interventions are expected to affect the outcomes of interest; what other factors may influence that link; and which different research designs and measurement strategies best fit the problem.
By identifying the expected intermediate changes that an intervention must set in motion if the desired outcome is to occur, evaluators will be in a better position to give decisionmakers useful information on what contributed to a program's success or failure. For example, research on programs designed to change personal health habits or physician practice patterns have made it clear that not only must a service or decision guide be available, it must also be accepted and adopted (Avorn and Soumerai, 1983; Eisenberg, 1986; Soumerai and Avorn, 1990; Green, 1991; IOM, 1992a; Kaluzny et al., 1995). This research implies that potential clinician users of telemedicine, for instance, must (a) know an option is available; (b) understand the minimum details necessary to use it; (c) accept it, that is, conclude that its potential advantages (e.g., better clinical information or better patient access to care) outweigh its apparent disadvantages (e.g., inconvenient scheduling); and (d) act on the basis of their knowledge and conclusions. If one or more of these intermediate events fail to occur for all or most of the clinicians involved, then an application is likely to fail.
Strategic and Clinical Objectives
The strategic objectives in an evaluation plan state how the telemedicine project is intended to affect the organization's or sponsor's goals and how the evaluation strategy relates to those objectives. These goals might include improving health services in rural areas, keeping deployed soldiers in the field, reducing expenses for government-funded medical care, or strengthening an organization's competitive position. Competitive position is broadly construed to extend beyond the marketplace to encompass the need of public
organizations to demonstrate their value to the policymakers who determine which programs will survive in an era of government retrenchment and health care cost containment. For instance, the early strategic objectives for a telemedicine program at an academic medical center might be to add to the telemedicine knowledge base (and thereby serve the institution's research mission) and to establish or strengthen the center's research reputation in the field (and thereby lay the base for future funding). Depending on the results, later strategic objectives might relate more to the patient care mission or to reinforcing the institution's position in local, regional, and broader health care markets.
The clinical objectives state how the telemedicine project is intended to affect individual or population health by changing the quality, accessibility, or cost of care. For example, a project might be intended to allow more frequent, economical, and convenient monitoring of homebound patients than is provided by existing home and office visit arrangements or it might be designed to improve access to appropriate specialty services for a rural population.
To the extent possible, evaluators should identify in advance what constitutes favorable or unfavorable outcomes in a particular context. For example, does a clinical application of telemedicine need to show performance better than, equivalent to, or almost as good as the alternative(s) to which it is being compared? Depending on the outcome at issue, the goals of the project sponsor, and other factors such as severe cost constraints, the answer may vary. Thus, if an application was expected to (and did) substantially reduce costs and if costs were thought to be the dominant issue for the organization's customers, then an organization might consider a slight decrease in patient satisfaction to be tolerable. Although the judgment of the outcome or the way different outcomes are balanced may vary depending on the perspective, the definition, measurement, or calculation of the outcome should not differ.
Level and Perspective of Evaluation
Once the research questions and objectives have been established, the appropriate level and perspective of an evaluation will usually become apparent. Although they may overlap to some degree, at least three broad levels can be distinguished: clinical, institutional,
and societal. Somewhat different evaluation strategies may be appropriate for various levels of decisionmaking.
At the clinical level, the evaluative focus is on the benefits, risks, and costs of alternative approaches to a health problem. For example, does digital teleradiology provide clinically acceptable images for breast cancer screening? What are the benefits and harms of telepsychiatry compared to the alternatives? Clinical evaluations provide critical guidance for decisions about individual patient care. An institutional decision to adopt a technology will, however, ordinarily require additional evidence of its feasibility and value.
At the institutional level, the focus includes not only the application but also its organizational context including administrative structures and practices, clients or customers, clinical and other personnel, and clinical protocols. An institution-level evaluation might ask the following kinds of questions: Has a teleradiology link between a rural hospital and an urban radiology center affected referrals or revenues for each institution? Does a telemedicine link for troops in remote locations reduce medical evacuations? Are clinicians and patients at each site satisfied with a teledermatology link between a university medical center and a capitated medical group? How do the costs compare to the alternatives (e.g., physically referring patients, adding another dermatologist to the group)? What factors (e.g., equipment location or ease of use) appear to underlie the results (positive or negative)? Positive results at this stage of evaluation may encourage diffusion of a technology on an institution-by-institution basis.
At the system or societal level, the focus expands to incorporate broader health care delivery and financing issues, particularly those involving the allocation of public resources. For example, does telemedicine have a role to play in state policies to support rural medical services? Or, more specifically, how do particular telemedicine applications compare to other policy options, such as area health education centers, direct subsidies for rural hospitals, and educational loan programs linked to practice in underserved areas? If the evaluation results look positive at this level, decisionmakers may support broad adoption and diffusion of the technology.
In developing an evaluative framework and related criteria, this committee has attempted to keep in mind evaluation issues at each of these levels. The distinctions are particularly relevant in the areas of
quality and cost because conclusions about the merits of a particular application of telemedicine may differ depending on whether one considers individual, institutional, or societal interests. Moreover, the committee recognized that, depending on the sponsor and audience, program-level and system-level questions may be both intertwined and overlooked. For example, telemedicine may save patients money by eliminating transportation and accommodations expenses for travel to a distant consultant. Evaluations driven by purchaser (e.g., insurer) or supplier (e.g., hospital) concerns may or may not consider such savings.
Business or Project Management Plan
The committee concluded that a significant weakness of many demonstration projects and their evaluations has been the lack of a business plan that sets forth how the implementation and evaluation of the project are designed to provide information that decisionmakers can use to decide whether the test application is financially sustainable as an ongoing program. It is likely that the demise of many telemedicine programs can be attributed to an incomplete understanding of the business case for establishing and maintaining a telemedicine program and an inadequate appreciation of the costs involved.
In some cases, the business plan may be little more than the project management plan for an early exploration (test of concept) of a telemedicine application. That is, it will outline the project's leadership and management structures, its work plan and schedule, and its budget. In other cases, the business plan will be much more extensive, incorporating a detailed financial analysis and an appraisal of the program's fit with the organization's strategic plan.
On the financial side, a formal business plan typically would include start-up and operating budgets for the project, a break-even analysis, income projections (a profit and loss statement), and cash flow projections. Although details will vary depending on the type of project, its sponsor, its tax status (e.g., not-for-profit), and other factors, a start-up budget should allow for the following expenses prior to the time the project becomes operational: personnel costs prior to opening; consultant fees; travel; equipment and supplies; salaries and wages; insurance; utilities; and any overhead or other charges that may be required by the parent organization. An operating
budget should include money to cover expenses for the first three to six months of operation and would, for most evaluations, include many of the same kinds of expenses (e.g., salaries, supplies) included in the start-up budget.
If it is clear that the project is being evaluated as a possible component of its parent organization's overall business plan, then the project business plan usually would include a multiyear summary of the income statement and cash flow projections, with more detailed monthly projections for the first year and quarterly projections for later years. Each should be backed by documentation of assumptions, for example, about revenue sources.
Research Design and Analysis Plan
The research design describes the strategy and steps for developing valid comparative information, including the sources and techniques for collecting data. It specifies whether the strategy is experimental, quasi-experimental, or nonexperimental and presents the rationale and the limitations of the approach. The analysis plan outlines the methods for analyzing and interpreting the resulting information. Depending on the nature of the information collected and the research design, these methods may range from relatively simple tabular comparisons to sophisticated multivariate regression analyses.
Initiatives to evaluate education, welfare, criminal justice, public health, and other nonclinical programs have generated a large literature on evaluation research designs (see, e.g., Campbell and Stanley, 1963; Suchman, 1967; Weiss, 1972; Cook and Campbell, 1979; Sechrest, 1979; Rossi et al., 1983; Fink, 1993; Wholey et al., 1994). This literature provides systematic assessments of the strengths and limitations of different research designs (see the addendum to this chapter for further discussion). It also describes and encourages creative attempts to minimize or correct some of the limitations of the weaker but more feasible designs.
Much of the program evaluation literature suggests, to paraphrase an old saying, that "it is better to be roughly right than to be precisely ignorant" (Wholey et al., 1994, p. 1). This should not be taken as an excuse for a sloppy evaluation, but the rigor of the research design may reasonably depend on how much experience has accumulated with the intervention or program being evaluated and
the uses that will be made of findings. Overall, the basic challenge in research design is to balance the need for confidence in the findings of research with the demand for relevance, feasibility, and afford-ability. Trust in the findings of research hinges primarily on judgments about internal and external validity (see addendum) and about an evaluator's freedom from serious bias or conflict of interest.
Characteristics of Experimental and Comparison Groups
The research design specifies the experimental group or groups that will be provided telemedicine services and the comparison (or control) group or groups that will be provided alternative services. Except perhaps in the early "test of concept" stage, when the assessment focuses on whether an application can even be implemented, comparison is central to evaluation. Unfortunately, as suggested earlier, telemedicine evaluators may find it very difficult to recruit appropriate comparison groups, especially when there is no organizational or financial incentive for participation.
Typically, evaluators will want to describe carefully a number of characteristics of experimental and control groups that might affect outcomes and complicate conclusions about the effect of the experimental intervention. The starting point for identifying such characteristics is the basic research question for the project, which will suggest a series of additional questions—drawn from past research, judgment, and experience—about other independent factors or variables that may intensify, block, or confound the relationship between the experimental and dependent variables. These factors usually include but are not limited to
- patient characteristics (e.g., age, sex, race, severity of illness);
- provider characteristics and relationships (e.g., nurse practitioners, salaried primary care physicians);
- organizational characteristics and linkages (e.g., independent primary care practice, unit of an integrated health system);
- financial and legal environment (e.g., sources of revenues, regulatory restrictions); and
- geographic setting (e.g., urban or rural).
To identify the effect of the telemedicine application on the dependent variables or outcomes, these other factors should be "controlled"
through the research design or statistical methods. As briefly described in the addendum to this chapter, random assignment of patients to experimental and control groups is a classic method (actually, a variety of methods) to control for differences in patient characteristics. Often, however, researchers must rely on statistical or other techniques for controlling for differences. For example, to control for (rather than to determine) the effect of different provider payment methods, an evaluation might be restricted to either capitated or fee-for-service sites; alternatively, payment method might be used as a control variable in a multivariate statistical analysis.
Technical, Clinical, and Administrative Processes
In defining the application and comparison services to be evaluated and identifying the objectives of the evaluation, many elements of the project's clinical, technical, and administrative processes will become evident. The technical infrastructure includes not only the immediate hardware and software requirements of the application but also the larger information and communications systems available to support them (as described in Chapter 3). For example, if a project links an urban medical center and a rural clinic, what personnel are available to assist each site with technical problems? If the system depends on a satellite link, what scheduling and other restrictions apply? Will information about patients be available from a computer-based patient record or will the information have to be specially entered and collected for the project?
Clinical processes are the way medical services are to be provided as part of the telemedicine project. Often, they are precisely set forth in a clinical protocol that identifies specific activities, their order and timing, responsible personnel, circumstances that trigger different protocols, and appropriate clinical documentation. Like technical processes, these processes are supported by a larger clinical care system that includes, for example, procedures for maintaining medical equipment, distributing medications, scheduling work flow, and monitoring clinical performance.
Administrative processes also include any array of financial, legal, personnel, security, and facilities management. The most immediately relevant of these (e.g., procedures for establishing new staff positions, hiring personnel, purchasing equipment and services, receiving
funds, paying bills, and referring patients) will ordinarily be identified as part of program and evaluation planning.
In addition to describing technical, clinical, and administrative processes as they are expected to operate and establishing steps to implement these processes, evaluators need to track processes as they actually occur to identify shortfalls and unanticipated problems or complications. If, for example, a homebound patient is to demonstrate range of motion in front of a camera, an evaluation should document whether patients follow the instructions well enough for the distant clinician to make an assessment. To cite another case, if military clinicians try to use telemedicine services but find that the clinical protocols are irritating, the equipment does not work, or the consultants are not scheduled appropriately, an evaluation needs to document this and, if possible, suggest how the problem could be resolved. Event or problem logs kept by project personnel may be used to record (for later analysis) departures from planned processes as well as unexpected events and problems.
Without efforts to implement interventions as planned and to monitor the extent to which this happens, evaluators will find it difficult to distinguish between a failure of the telemedicine application and a failure to implement the application as intended. Such distinctions are critically important to those making decisions about whether to adopt, substantially redesign, or discontinue telemedicine programs.
Measurable outcomes identify the variables and the data to be collected to determine whether the project is meeting its clinical and strategic objectives. This committee was asked to focus on issues in evaluating quality, access, and costs for clinical applications of telemedicine. It also concluded that the acceptability of telemedicine to patients and clinicians warranted separate attention, although patient satisfaction frequently figures in assessments of quality of care, access, and cost-effectiveness. Depending on its objectives, an evaluation may consider a range of other outcomes related to an organization's competitive position, its relationships with other institutions, the demand for different kinds of health care personnel, the economic health of a community, or other effects.
In addition to outcomes desired from the project, decisionmakers
will also benefit from evaluations that attempt to identify and measure possible unwanted and unexpected outcomes. A case in point is the "training effect" that appears to operate in some telemedicine programs such that the distant clinicians who participate in telemedicine consultations learn enough about diagnosis and patient management that they no longer need telemedicine consultations when they encounter certain patient problems. The benefit of such clinician education, however, may create a dilemma if demand for telemedicine consultations drops too low to justify continuation of a program. How such results might factor into decisions about the future of an application is not clear, but it would undoubtedly affect the interpretation of utilization statistics.
The specification of outcomes to be measured should describe the time frame for the measurements, for example, rehospitalization within six months of discharge or patient satisfaction with telemedicine at the time of service. One of the most frequent limitations of clinical and program evaluations is their focus on relatively short-term outcomes. This focus is borne of time and budget constraints and data collection difficulties. These difficulties are especially acute for longer-term health and cost outcomes. Depending on the objectives, circumstances, and resources, an evaluation may involve a range of immediate, intermediate, and long-term outcome measures, as discussed further in Chapter 7.
Because the committee believed that the fast pace of change and other uncertainties surrounding telemedicine applications were particular challenges, it highlighted one element of an analysis plan—sensitivity analyses—as a distinct item in the evaluation framework. Sensitivity analyses explore the extent to which conclusions may change if values of key variables or assumptions change. For example, financial projections may show the impact of different assumptions about costs for purchasing and maintaining telecommunications and other equipment. As noted above, a particular problem for telemedicine evaluations is the stability of technology or environment. With data capture, transmission, and display technologies improving in quality and declining in cost, evaluators may need to consider (a) how sensitive their conclusions may be to technological change and (b) how analyses might be constructed to estimate the
impact of certain kinds of changes. For example, an analysis of cost-effectiveness could include a sensitivity analysis that incorporates different assumptions about the timing and cost of key hardware or software upgrades or replacement (Briggs et al., 1994; Hamby, 1995).
Documentation of Methods and Results
In reviewing evaluations of telemedicine applications, the committee was often frustrated by the incomplete or casual documentation of the methods employed and the specific findings. One result was to diminish the utility and credibility of the reports. Efforts to identify weaknesses and improve documentation in research reports have been undertaken by a number of medical and health services research journals, including the Journal of the American Medical Association, Annals of Internal Medicine, Health Services Research, and Medical Care. They have developed guidelines and procedures to improve the clarity and specificity of abstracts, the processes of peer review, and the reporting of methods (including randomization procedures, sample sizes, and statistical power), data analysis and reporting, and sponsorship. (See, for example, DerSimonian et al., 1982; Pocock et al., 1987; Haynes et al., 1990; Altman and Goodman, 1994; Moher et al., 1994; Schulz et al., 1994; Sweitzer and Cullen, 1994; Taddio et al., 1994; Rennie, 1995; and Schulz, 1995.) At least one telemedicine publication, Telemedicine Journal, is attempting to follow this guidance. Although these suggestions have been aimed at journal editors, they have the important additional benefit of reinforcing basic principles of sound research and statistical analysis.
Evaluation And Continuous Improvement
As noted at the beginning of this chapter, one objective of evaluation and applied research generally is to provide decisionmakers with information that will help them redesign and improve programs. This is particularly true for evaluations conducted in the context of a continuous quality improvement process. The tenets of continuous quality improvement, which were derived in considerable measure from industrial applications, are described in detail elsewhere (see, e.g., Deming, 1986; Batalden and Buchanan, 1989;
Berwick, 1989; Berwick et al., 1990; IOM, 1990c, 1992a; Roberts, 1991; Williamson, 1991; Horn and Hopkins, 1994). Consistent with the evaluation framework set forth here are the principles calling for (a) planning, control, assessment, and improvement activities grounded in statistical and scientific precepts and techniques and (b) standardization of processes to reduce the opportunity for error and to link specific care processes to health outcomes.
Another key principle emphasizes close relationships between customers and suppliers, for example, patients and providers or providers and suppliers of equipment or services. The application of this principle to the design and evaluation of telemedicine applications would address one of the human factor problems identified in Chapter 3: inadequate assessment of and attention to user needs.
The very process of implementing a program and its evaluation components may make evaluators aware of program deficiencies or environmental obstacles to program success. For example, potential participants may balk at using equipment that is inconveniently located or difficult to apply. In addition, the evaluation frameworks and plans reviewed by the committee suggested a number of other means for securing information for program improvement. These included logs kept by clinical or technical personnel and individual or group "debriefing" interviews with participants. These strategies may identify poorly designed or located equipment, "user-unfriendly" software, inadequate training of personnel, bureaucratic burdens, or deficient patient record systems.
Unfortunately, depending on the problems identified, the path to program redesign or improvement may or may not lie within the feasible reach of program administrators or sponsors. For example, some equipment deficiencies may be corrected by switching hardware but others may be resolved only if manufacturers are willing or technically able to fix them.
In general, evaluations based on continuous improvement principles will expect that mistakes or poor outcomes are more often the result of system defects (e.g., poor scheduling systems) than of individual deficiencies. In an environment governed by this outlook, program evaluations may provoke less apprehension and win more cooperation from those whose activities are being studied.
Based on its review of current applications and evaluations, the committee concluded that significant improvements are possible in the quality and rigor of telemedicine evaluations. This chapter has emphasized the importance of considering evaluation objectives and strategies during the early stages of program planning. Likewise, it has stressed the value of developing a business plan that explicitly states how the evaluation will provide information to help decisionmakers determine whether a telemedicine application is useful, consistent with their strategic plan, and sustainable beyond the initial evaluation stage.
The fast pace of change and other uncertainties surrounding telemedicine applications argue strongly for sensitivity analyses to explore how conclusions may change if values of key variables or assumptions change. It also argues for thinking broadly about potential benefits and costs, carefully documenting how the technical infrastructure and the clinical processes of care were intended to operate, and tracking what actually does occur. This latter step is crucial if evaluators who find negative results are to determine, for example, whether the hypothesis linking independent and dependent variables is untenable or whether the hypothesis was not actually tested because the application was not implemented as intended. By tracking what actually happened, evaluators also may achieve a fuller understanding of critical success factors or the factors that, if changed, might improve results.
The evaluation framework presented in this chapter is, in the lexicon of information technologies, a basic evaluation platform that incorporates general evaluation principles, principles adapted to the health care field, and elements of strategies proposed by those encouraging and conducting evaluations of clinical telemedicine. The framework is intended to promote improvements in individual evaluations, but the committee also encourages the coordination of evaluation strategies across projects and organizations, when possible.
Addendum: Experimental, Quasi-Experimental, And Nonexperimental Designs
As noted in the text of Chapter 6, a large literature on evaluation research designs exists to guide those planning evaluations of telemedicine
and other activities (see, e.g., Campbell and Stanley, 1963; Suchman, 1967; Weiss, 1972; Cook and Campbell, 1979; Sechrest, 1979; Rossi et al., 1983; Fink, 1993; Wholey et al., 1994). One value of this work is that much of it is not just theoretical but highly practical in its attempts to develop and encourage creative but respectable ways of handling difficult evaluation problems. These efforts revolve around concerns with internal and external validity.
In a 1963 discussion that has become a classic source for evaluation research, Campbell and Stanley set forth an analysis of validity and threats to validity and provided a systematic assessment of the strengths and limitations of various common research designs. Internal validity focuses on the fundamental question: "Did in fact the experimental treatments make a difference in this specific experimental instance?" (Campbell and Stanley, 1963, p. 5). External validity focuses on the extent to which the procedures and results of a particular experiment can be generalized to other populations, settings, and circumstances.
Box 6.1 lists the common threats to internal validity as identified by Campbell and Stanley. It also provides hypothetical illustrations of how they may appear in evaluations of telemedicine applications.
Threats to external validity involve a variety of differences between the groups studied and the groups to which the results might be generalized. For example, generalizing to urban settings from projects in rural areas may be risky. A project that used physicians knowledgeable and enthusiastic about computer-assisted medicine might not produce results applicable to physicians without such knowledge and enthusiasm. A project undertaken in a fee-for-service environment might be less relevant in managed care markets.
In general, research designs can be categorized as experimental, quasi-experimental, or nonexperimental. A true experimental design has two special characteristics. The first is that the design includes at least one group that is subjected to a carefully specified intervention or treatment and another that is subjected to a different intervention. The second characteristic is the random assignment of the subjects (e.g., patients) to the experimental and control groups. Ideally, experimental designs are also "double blinded" in that neither the investigators nor the patients know which group is receiving which treatment.
The most highly structured randomized clinical trials (RCTs) have generally aimed to establish efficacy (effects under tightly controlled
SOURCE: Quoted material excerpted from Campbell and Stanley, 1963, pp. 5-6.
conditions) rather than effectiveness (results under actual conditions of practice). The strength of RCTs is based on the protection of internal validity through the randomization, restrictive patient selection criteria, masking from researchers and patients which patients are receiving which treatments, and strictly controlling the treatment protocols.
A well-designed RCT may still have problems with external validity or generalizability to less controlled practice settings. For example, a recent retrospective analysis of data from two large HMOs on patients who discontinued antihyperlipidemic drugs (drugs to treat high cholesterol) because of adverse effects and therapeutic ineffectiveness suggested that "rates reported in randomized clinical trials may not give an accurate reflection of the tolerability or effectiveness of therapy in the general population" under ordinary conditions (Andrade et al., 1995).
From a practical perspective, traditional, tightly controlled RCTs suffer several handicaps: they tend to be expensive, time-consuming, complex to plan and administer, and ethically or practically unsuitable for some research questions.* Thus, researchers have sought to develop adaptations and alternatives.
One adaptation of the RCT includes "large simple trials" (Zelen, 1993). Large simple trials are simple primarily in that they ask fewer questions than many traditional RCTs. They would still require random assignment but would also rely more on statistical than physical controls of the research setting. Data collection is streamlined. Patients and clinicians anywhere in the United States or elsewhere could participate in a clinical trial if they met defined eligibility criteria and agreed to follow (and document that they followed) specific treatment protocols. Depending on the complexity of the research and treatment protocols, this openness may demand sophisticated and generally expensive programs of training, monitoring, operating assistance, and auditing. In one of its last reports, the Office of Technology Assessment urged those involved with effectiveness research to explore innovative ways to conduct randomized
clinical trials and incorporate them into ordinary practice (OTA, 1994).
Another option, the clinical practice study or effectiveness trial, generally involves a relatively rigorous form of quasi-experimental research (Horn and Hopkins, 1994; McDonald and Overhage, 1994; Stiell et al., 1994). Quasi-experimental designs cover a variety of strategies that may or may not include a control group or random assignment. Although they are weaker on internal validity, a strength of clinical practice studies or effectiveness trials is that they better represent actual conditions of practice and may be somewhat less expensive and time consuming. They do not insist on homogeneous patient populations that exclude those with comorbidities or complications that may confound analysis of the link between the experimental intervention and patient outcomes. Instead, they measure relevant patient characteristics using severity assessment tools and statistically adjust for differences in experimental and comparison groups. Further, they accommodate departures from rigid treatment protocols by carefully monitoring and measuring actual treatments and then incorporating these data in the statistical analysis. Because this approach does not disqualify large numbers of patients, it is easier to generate the numbers of cases needed for comparisons. Using regression or other statistical techniques, researchers test which process steps are associated with desirable quality, access, or cost outcomes for different kinds of patients.
Although clinical practice studies tend to focus on shorter- rather than longer-term outcomes, the outcomes include effects that are noticeable and important to patients rather than only those that are physiologically measurable through laboratory or other tests. Such studies are often designed to be replicated easily so that they can be undertaken at multiple sites. Sophisticated computer-based patient information systems make it more acceptable to rely—as a "second best" strategy and with appropriate caution—on statistical control techniques rather than randomization and physical control of "confounding" variables.
The objective of such alternatives is not to devalue or replace the RCT but to develop additional sources of systematic information on outcomes that will improve on the anecdotal and informal knowledge base that characterizes much of clinical practice (IOM, 1992a; Horn and Hopkins, 1994; OTA, 1994). Some of the telemedicine
research projects discussed in Chapter 5 attempt experimental and quasi-experimental research strategies. Even with less demanding designs, tension will exist between the principles of design and the pressures of real-world evaluation.
Another stream of work on alternatives or supplements to the RCT has emphasized nonexperimental research based on the retrospective analysis of large databases that have often been compiled for other purposes (Roos et al., 1982; Moses, 1990; Hannan et al., 1992; NAHDO, 1993). Until telemedicine applications become much more common and routine and are assigned codes to identify them, large databases are unlikely to be useful sources of data on telemedicine applications.
Nonetheless, those looking ahead to more widespread use of telemedicine should consider how routine collection of data about telemedicine may be useful and what would be required to incorporate such data in large data systems. The appeal of these data sources lies in their relative convenience, large numbers of cases, and ease of statistical analysis. Questions or criticisms related to use of large databases for health services research, performance monitoring, and other purposes involve their completeness, accuracy, relevance, and security from authorized access (IOM, 1994b; Maklan et al., 1994; Kuller, 1995). A variety of initiatives have focused on means to reduce the amount of missing data, validate and improve coding of clinical and other information, add information (e.g., death records), and develop methods to adjust comparisons for differences in severity of patient conditions (IOM, 1994b; Roos et al., 1995). Even with improvements, data collected for one purpose (e.g., claims administration) may remain questionable for other purposes (e.g., outcomes research) if they lack reliable information about patient medical status, processes of care, and other variables. The OTA, for example, warned that "focusing on this research method as a relatively simple, inexpensive first-line tool for answering comparative questions [about the effectiveness of treatment alternatives] is unwarranted" (OTA, 1994, p. 74).