Methodological Issues and Approaches
The scientific literature relevant to the problem of work-related musculoskeletal disorders represents a wide variety of research designs, assessment instruments, and methods of analysis. Therefore, the panel's representation of the science base covers a wide range of theoretical and empirical approaches. For example, there are highly controlled studies of soft tissue responses to specific exposures that are based on work with cadavers, animal models, and human biomechanics. There are also surveys and other observational epidemiologic studies that examine the associations between musculoskeletal disorders and physical work and organizational, social, and individual factors. In addition, there are experimental and quasi-experimental studies of human populations that are designed to examine the effects of various interventions.
Each of these approaches contributes a different perspective to the overall topic of musculoskeletal disorders in the workplace; together they provide a more complete, cross-validated understanding of how different workplace exposures may contribute to the occurrence of musculoskeletal disorders. Each approach has important strengths and limitations when viewed alone. When information from the three approaches is viewed together, as in this report, however, the perspective on musculoskeletal disorders in the workplace is enriched. Illustrations of these approaches are provided below, using the example of the examination of the relationship between repetitive lifting and back disorders. Similar illustrations could be provided for upper extremity disorders.
In other reports, such as the recent National Institute for Occupational Safety and Health report on musculoskeletal disorders in the workplace (Bernard, 1997b), the emphasis has been on the preponderance of evidence within one area of literature (e.g., the report focused on observa-
tional epidemiology). Here, however, we review all three approaches: basic science, observational epidemiology, and intervention studies. Rather than review each approach with the aim of examining a preponderance of evidence, this report considers the pattern of evidence across the different areas of scientific study. The pattern of evidence analysis, described in detail by Cordray (1986) and discussed below, has been used in an earlier National Research Council report (1995); it is particularly useful when considering causal inferences across different fields of study.
From the perspective of basic sciences, studies are designed and performed to isolate discrete events that are carefully engineered to deliver a set of exposures characterized by replicable frequency, dose, and duration. These exposures are applied to isolated anatomical and physiological systems (e.g., muscles, nerves) that are then measured for anatomical damage or adverse biochemical changes. For example, the question of the extent to which repetitive lifting is related to back disorder can be examined with the assistance of an apparatus that applies a repeatable frequency, dose, and duration of a load to a cadaver or relevant animal models; then biological measures, such as tissue biopsy for measurement of biochemical changes consistent with damage, can be obtained. The results from this type of study provide data on basic mechanisms to show, for example, whether repetitive compression similar to that involved during lifting is associated with tissue damage, and the extent to which damage can be identified as following these discrete events. In the laboratory context, the goal is to isolate events of exposure and outcome to the greatest degree possible, by precise and refined measurement and by controlling extraneous environmental conditions (e.g., temperature, humidity).
The results provide confidence in drawing inferences on whether tissue damage follows application of exposure, but these inferences are tempered by several factors. First, isolation of human tissue for study (such as a particular muscle or group of muscles) may demonstrate damage, but it may remain unclear whether the load applied in the experiment is similar to that experienced by humans. There are studies that report precise physiological abnormalities but no correlation with symptoms or function of the person being studied. These fine measurements may be trivial, or they may represent an early disease process that will become manifest only later. Second, the complexity of the human biological system includes compensatory mechanisms that are excluded in studies that focus on the isolation of mechanisms. For example, some combination of muscle
groups and positioning of the whole person could provide a pattern of motion that may partially offset the insulting exposure in the real work environment. Therefore, the degree to which the results from these more focused studies can be generalized to workers in the context of their ongoing tasks must be carefully interpreted. Third, the degree to which an animal model is analogous to the human must be considered. How experiments that involve loading of mouse tails are relevant to human musculoskeletal disorders is a meaningful question that scientists need to clarify for others. While seemingly distant from human experience, this animal model is relevant from the perspective of comparative anatomy and provides a quantifiable aspect to measurement of the impact of physical stresses. Ultimately, the basic science studies contribute important information about the mechanisms by which injury can occur following a prescribed set of exposures, but the application of the findings to the whole human depends on the extent to which the results are congruent with data from human studies.
From the perspective of observational epidemiology, studies of human populations are designed that measure both exposures (e.g., repetitive motion) and outcomes (e.g., back disorders). A key feature of measurement in observational epidemiologic studies is that the results are generated from populations of humans. There are limits to the measurements that persons (as opposed to cells or tissues) will allow or tolerate. Because the comparisons involve groups, there are levels of detail that remain desirable but are not feasible to measure (e.g., the cost may be prohibitive). However, surveys provide information, such as symptoms, that cannot be obtained in the same way from certain types of laboratory study; for example, back pain may be disabling even in the absence of objective diagnostic test findings, and the frequency of this condition is important to ascertain.
The presence of symptoms in the absence of objective findings needs careful attention. Symptoms are often the first presentation of an illness and may represent changes that are due to physical damage (due, in this case, to repetitive lifting) that has not progressed enough to be measured by physiological tests or physical exam. Self-reports of symptoms may also reflect other factors, such as psychological stress. These trade-offs (e.g., in detail versus feasibility of measurement) are inherent in observational epidemiologic studies.
The basic strategy in an epidemiologic study is to identify the extent to which the outcome (e.g., back disorder) occurs more frequently in the exposed group than the unexposed group. The strength of this type of
study over basic science investigations is that the results are generated in a real-world context. The assumption is that the persons being compared are similar in all respects (e.g., age, weight for height) other than the exposure (e.g., repetitive lifting) being considered in the investigation. In fact, observational epidemiologic studies use techniques from biostatistics to evaluate the degree of differences in the extraneous variables between the two groups and to perform statistical adjustment that makes these other factors similar. The goal of these statistical techniques is to control for confounding. It also allows precise estimation through statistical adjustment for a variety of factors, including whether the “appropriate” variables are collected and entered into the calculations and whether the number of people enrolled in the study is sufficient to permit the level of comparisons after accounting for other extraneous variables. No amount of application of statistical procedures can redeem a study if important variables are missing from the data collection or the study size is too small.
Another aspect of observational epidemiologic studies is the variety of designs that can be used to build inferences about causal associations. To illustrate this, we consider one of the most common types of studies, the cross-sectional survey. This survey involves collection of data from a sample of people of whom measures of exposure are made usually by interview. It can involve observation of work tasks and measurement of the environment (e.g., average load weight, frequency, duration of lifting). The outcomes can also be measured by interview (e.g., incidence and duration of back pain), physical examinations, or application of standardized diagnostic tests. Applied to the example given here, a population surveyed to examine the relationship between repetitive lifting and back disorder, the design involves collecting information from the individuals about their current and past lifting experience (exposure) and their current and past episodes of back pain or diagnoses of back disorder (outcome).
This survey design is efficient in that the scientists conducting the study can collect a group of people, ask questions (and check medical records, etc.), and tabulate the results all within a reasonable time frame. However, this survey approach has limitations. Because data on the exposures and outcomes are collected simultaneously, it is possible that persons who had back disorders left job categories involving repetitive lifting. Therefore, in this case, the survey may show, for those who remained to be questioned, that back disorders were not related to repetitive lifting; the result, however, may be affected by the fact that the injured persons were no longer at these work tasks, or even the possibility that lifting is “protective” against back disorders. Because there are a variety of such considerations that complicate interpretation of a survey, a number of
other study designs have been developed that permit the ability to establish a relationship with greater confidence. However, these study designs, described later in the chapter, are more time- and resource-intensive.
From the perspective of intervention studies, investigations of human populations are designed on the basis of data from basic sciences and observational epidemiologic studies, to formally test whether reduction (or enhancement) of an exposure results in a lower incidence of a disorder (and an elevation of a state of well-being). Intervention studies in basic sciences and epidemiology are rooted partly in the fact that the effort to undertake rigorous study is resource intensive, and scientists need a basis of understanding before embarking on a program that could have unintended and untoward consequences. The ideal of the intervention study is to achieve, to the extent possible, the features of laboratory studies that involve control of the ambient environment.
After a discrete intervention is identified (for example, introducing job redesign to address the relationship between repetitive lifting and back disorders), a population of workers who are exposed (e.g., engage in repetitive lifting) is randomly assigned to receive either the intervention (i.e., job redesign) or the usual activity (if considered an ethical alternative). Random assignment of intervention treatments is done in an attempt to equalize the effect of extraneous variables across those who receive the intervention and those who do not. The design is prospective, in that people undergo the intervention or comparison conditions (modified exposure) and are then followed over time to examine the incidence of the outcome (rates of back disorders in this case). The rate of incidence in the outcome in the two groups is compared, and if a difference is observed, it is attributed to the intervention. The key to this inference is the recognition that if the two groups are comparable on all factors measured, except the intervention itself, an assumption can be made that the groups are probably comparable on unmeasured factors. Therefore, the factor that most likely explains the difference in rates of disease or other outcomes is the intervention.
This feature of randomized allocation distinguishes intervention from observational epidemiologic studies. In prospective epidemiologic studies, the variable of the intervention (such as exercise) is measured on those who self-select to perform the intervention (or it is intentionally selected based on some preconceived preferences of those responsible for the selection process) rather than on those who are selected on a random basis (e.g., to undergo exercise activities in the trial). While the techniques for comparison are similar in the two study designs, self-selection in the
epidemiologic study means that there may be other, unmeasured, factors that contribute to the selection of who performs the activity under study. These other factors may be critical for contributing to and explaining the outcome of interest (e.g., back disorder).
While the randomized controlled design is powerful for testing the effectiveness of interventions, it is not appropriate for all situations. For example, if the data from basic and epidemiologic studies and clinical experience suggest that the proposed intervention is highly likely to be effective, then questions can arise about how ethical it would be to withhold an intervention from a group for the sake of a formal comparison. The converse question to consider is whether absence of a study would be ethical. Probably more important for workplace studies is the consideration that there is constant change at work, independent of any planned intervention, that makes the laborious process of planning and implementing a randomized controlled trial challenging, if not impractical.
In these situations, another design, variously termed “historical control study,” “before/after design” study, and “time-series analysis,” has been used. For this design, an intervention is prescribed (or simply happens) in a population that has been followed and for whom the preintervention disease or outcome incidence is already known; after the intervention is introduced to all in the population, incidence of disease or outcome is ascertained over time. The basis of comparison is the incidence of the outcome, since an effective intervention should result in a lower rate of disease or other outcome in the population “after” compared with the “before” interval being studied. While ethically less complicated to institute than randomized trials, this design is limited by the possibility that any number of other unmeasured factors may have occurred during the course of the study, and these other factors may have contributed to the outcomes observed. For example, while instituting job redesign in factories, new medications for low back pain could be introduced. The extent to which this other factor accounts for the results that are attributed to the intervention would need to be considered. However, the before/after study is an efficient design that takes advantage of natural changes that occur in the workplace and are applied uniformly to all people (so selection issues are minimized). Therefore, findings from these studies can be accumulated more quickly to suggest directions for program implementation and, if appropriate and necessary, additional studies using randomized controlled trials.
Randomized trials can be limited in other ways besides costs and ethical concerns about withholding interventions. Concerns include whether the intervention was applied as designed (fidelity to the plan), contamination of effect (whereby persons randomized to the control condition have contact with persons who received the intervention and adopt
some elements of the intervention indirectly), and inadequate randomization procedures. Randomized trials are difficult to conduct in the workplace because work practices change frequently, workers are reassigned frequently, and it is difficult to mask participants in this setting. Scientists involved with randomized controlled trials plan carefully to minimize the concerns whenever possible.
Prevention represents the ultimate goal of pubic health science, the objectives of which are to build on the basic sciences and observational epidemiology and to test practices designed to reduce the incidence of disease and facilitate the well-being of the population. Prevention is best tested through intervention studies. Intervention studies serve not only as formal tests to demonstrate practices that should be implemented in different settings, such as the workplace, but also provide another layer in developing confident inferences on which health factors are related. For example, basic science and epidemiologic studies may provide information about the association of repetitive lifting and back disorder. This information provides the basis for developing interventions, including job redesign, that are aimed at reducing the frequency or some other characteristic of repetitive lifting. The successful intervention studies with job redesign that show a reduction in repetitive job lifting and a resultant reduction in the incidence of back disorders provide evidence for instituting activities in practice and for confirming that the repetitive lifting and back disorder were truly related.
In summary, the three types of scientific inquiry described briefly here—basic science, observational epidemiology, and intervention studies—provide different perspectives, but they contribute to each other in generating support, as well as checks and balances, in building scientific certainty. This chapter reviews some of the methodological approaches used by the various fields that address the questions on musculoskeletal disorders in the workplace. While the methods are in many ways powerful, it is the variety of observations from different perspectives that continues to provide an evolving picture of the causes of and interventions for preventing musculoskeletal disorders in the workplace. To develop prediction, a view is needed on how causal inferences are made.
DETERMINING CAUSALITY WITHIN STUDIES
Making sense of the diverse literature requires approaches to show how causality is established. How is it that a claim can be made that repetitive lifting “causes” back disorder? How is it that repetitive motion contributes to upper extremity disorders? Before extensive resources are committed to modify the workplace or the ways workers approach their jobs, one should examine the certainty of the statement that repetitive
lifting is responsible for back disorder. The basis for understanding, whether it is from a laboratory study, a field study, or an intervention trial, rests on key concepts. The first is that there is an exposure and the second is that there is an outcome of interest. The exposure can be one of any number of events of a biological, physical, chemical, or psychological nature. In the example used in this chapter, the exposure has been repetitive lifting, but one could study trunk bending, trunk twisting, temperature, and so on. In public health literature, the outcome is usually a disease condition for which risk factors are being sought in order to find strategies to prevent the disease. In this chapter, we are using the example of back disorder. In more recent public health literature, there has been a greater emphasis on health and well-being, and certainly this can be operationalized into an outcome for study. Here we use the terms “exposure” and “outcome,” while in other settings, the analogous terms would be “stimulus” and “response” or “cause” and “effect,” respectively.
The third concept is the association between the exposure and the outcome. While the association of the exposure and outcome can be made based on an individual at a single point in time, such an inference will be more speculative than causal, because it is based on limited information (e.g., a nurse reports back pain and her job involves repetitive lifting). Clinicians do this all the time, but the single observation for the clinician is actually cumulative, because the findings from one patient are placed in the context of medical knowledge learned and catalogued to date. The circumstances in science are different; the focus is not on categorizing patients into established categories, but on establishing a novel association. A hallmark for scientific inference is that there are repeated observations. This is in part to ensure replication (and why scientific activity is called “re-search”). Also, given the biological diversity of the human species, the observations of many studies help to ensure that associations between outcomes and exposures are found across different layers of human characteristics, showing that they are not simply a by-product of some other characteristic that is coincidental in the population.
For associations to be causal, they must be based on multiple observations; there are also necessary characteristics of the association between exposure and outcome. Some refer to these characteristics as “criteria”; and others would use the term “conditions” or “conventions” for causation. The five characteristics listed by Campbell and Stanley (1966) and updated by Cook and Campbell (1979) and Cordray (1986) include: (1) temporal ordering, (2) that exposure and outcome vary together (“covary”), (3) the absence of other plausible explanations, (4) temporal contiguity, and (5) congruity between exposure and outcome. These characteristics serve as criteria when reviewing individual studies for their
likelihood of generating causal inferences. Later in this chapter, characteristics that are considered across studies are presented.
Temporal ordering refers to the importance of having the exposure precede the outcome in time; for example, repetitive lifting is more likely to be considered a cause of back disorder if it precedes it in time. In laboratory studies, temporal ordering can be established by the type of control that an investigator has over the timing and delivery of an exposure. In epidemiologic studies, this involves a recording of events in the real world. Prospective studies involve measuring exposures and then following people for the development of an outcome. Temporal ordering may be difficult to establish in cross-sectional surveys because information about exposure and outcome are obtained during the same interview. The survey approach, though operationally efficient, is less powerful for generating causal inferences than a prospective study (e.g., that follows individuals who vary in terms of performing repetitive lifting and who are then followed systematically for development of back disorder).
The second characteristic, exposure and outcome covary, refers to the observation that if the exposure is present, an outcome will occur; a reduction of exposure will also result in a reduction of outcome. For example, when no compressive force is applied to a spinal disk, it becomes thickened, but when loaded, it thins. While this suggests a one-to-one correspondence, it represents more of an ideal in human population studies, in which a number of factors may be contributing and offsetting each other in establishing the outcome. Therefore, the epidemiologic equivalent is that if the exposure is present, it is more likely that an outcome will occur.
The third characteristic is the absence of other plausible explanations. This includes the concept of confounding. Confounding is the circumstance in which the basic association of interest is in fact due (at least in part) to another factor. The definition of a confounding variable is that it is associated with both the exposure and the outcome, and that after accounting for this third variable, the relationship between the exposure and outcome is reduced, sometimes to the point at which an association can be said to be no longer meaningful. Suppose, for example, that studies find a strong association of repetitive lifting and back disorder, and an inference is evolving that the former is causing the latter. Then suppose that another set of studies is performed that includes a measure of recreational activities outside the workplace. From these studies, suppose it is found that these activities are associated with repetitive lifting (presumably because people in these jobs are likely to engage in similar levels of activities outside the workplace compared with other workers) and back disorder. The question can then be raised as to whether repetitive lifting itself is the
culprit for back disorders at work, or whether the back disorders are in fact due to recreational activities that are characteristic of workers who happen to select (or be selected for) jobs involving repetitive lifting. In this case, recreational activity serves as a potential confounder and represents a possible alternative explanation for the association between repetitive lifting and back disorder.
Fortunately, in field studies of workers, there are biostatistical methods to assist with disentangling the effects of putative confounders. The techniques include stratification (i.e., examine whether the association of repetitive lifting and back disorder persists across groups of persons stratified by levels of recreational activity) and adjustment (i.e., statistical procedures to examine and average associations between exposure and outcome in the presence of other variables). The selection of variables for examination as confounders is based on those found to be plausible from the literature as well as those identified empirically in the population being studied.
Exposures that remain associated with the outcome of interest are then termed “risk factors.” Because of the variability in biological diversity, it is infrequent that the association of an outcome is limited to a single risk factor. Instead, there may be a combination of risk factors that are implicated for an outcome of interest; this is referred to as the multifactorial nature of causation. Thus, it is often observed that examination of a third (putatively confounding) variable does not eliminate the fundamental association of the exposure and outcome; rather it reduces the primary association, leaving it intact. In this circumstance, the investigation could then identify a combination of risk factors that contribute to the outcome, each of which is important both alone and additively. Recognizing that the etiologic nature of many diseases, including musculoskeletal disorders, is likely to be multifactorial, scientists search for a “web of causation” (Susser, 1973).
The fourth consideration for establishing causality is temporal contiguity. This addresses the time interval between exposure and outcome. It may be more compelling if the exposure immediately precedes the outcome (as is the case with an acute injury) than if the exposure preceded the outcome in the remote past. The assumption is that the more temporally remote an exposure is, the more likely it is that some other (possibly unmeasured) factor may be the true explanation for the etiology of the outcome. This could mean that the observed association is, in fact, spurious. However, there may also be a “chain of causation” within a multifactorial model, whereby a series of different conditions must be met in order for the outcome to be observed. In this reasoning, there are factors that are proximal to (immediately preceding) the outcome, and factors that are more distal from (further removed in time preceding) the out-
come. In occupational epidemiologic studies, there is frequently a consideration of cumulative proximal and distal exposures, so that the distinction made here is at times less relevant to generating causal inferences than for some other fields. Therefore, temporal contiguity is satisfactory as a condition to isolate only in analyses for the proximal risk factors. Slavish reliance on a requirement for temporal contiguity can be limiting in developing a full understanding of the relationships (such as cumulative exposures) that lead to an outcome of interest.
Congruity of exposure and outcome involves the finding that if the exposure is increased, then the outcome is expressed more frequently. This could be expressed as a dose-response effect, whereby an increase in exposure should lead invariably to an increase in outcome (or an increase in stimulus leads to an increase in response). This assumes a continuous and linear relationship of the stimulus and response. However, there are circumstances in which the relationship may be expressed more as a threshold (no response is observed until the stimulus rises to a minimum level, after which there is an increase until some maximum threshold is achieved and response no longer increases or, with fatigue, actually decreases). When the dose-response relationship is not linear, investigators search for the circumstances that can account for these findings. This deviation from a linear relationship could be due to operational characteristics of a study (e.g., delays in implementation of job redesign). There may also be factors other than the exposure and outcome that modify the basic exposure outcome association; an interaction occurs when the joint effect of two exposures exceeds (or offsets) the independent effect of each variable alone.
Put another way, the scientist looks for interaction. This refers to the relationship of exposure and outcome in the presence of a third variable, whereby the primary association differs significantly across different levels of the third variable. While the association of the exposure and outcome may be impressive when viewed alone or when summarized across the levels of the third variable, closer examination reveals that the primary association is vastly larger at the first than at the second level of the third variable. This definition distinguishes confounding from interaction, in that confounding represents the effect of a third variable that is related to both primary variables (exposure and outcome) that accounts for the primary relationship. By contrast, interaction represents the effect of the third variable in synergizing (or offsetting) the exposure's effect on the outcome. A classic example is the relationship between alcohol consumption and esophageal cancer. While this relationship is strong, it is also true that cigarette smoking is associated with both alcohol use and cancer. Therefore, cigarette smoking could be a confounder in this association. However, when examined more closely, the association of alcohol
and esophageal cancer is many times greater among smokers than non-smokers, showing that there is combination or a joint effect of alcohol and cigarettes that is greater than the significant effects of each factor alone. The potency of identifying interactions is important for targeting public health interventions, and the extent to which these interactions can be deciphered for musculoskeletal disorders in the workplace is discussed in later chapters.
Thus far, the discussion has covered basic characteristics of an association between a putative exposure and outcome in a study in order to consider it a causal relationship. There are other important methodological issues as well. In generating causal inferences, there must be considerable attention to errors in measurement. Random error can occur through imprecise measurement that allows a broader array of responses than would be necessary, whether through questionnaires or with an apparatus that captures information within a range of the true values. An example is the measurement of blood pressure, which depends on the skill and experience of the reader as well as the setting of the sphygmomanometer and other circumstances. If important variables are not measured, this contributes to random error. Random error results in attenuation of an observed association (when one existed in reality) and does not undermine confidence in a study that yields positive findings. In fact, random error can increase confidence in a positive study, since the strength of the effect was sufficient to permit its observation despite the null basis created by random error. Efforts can be made to consider research questions comprehensively and to sharpen measurements for greater precision, but while random error can be reduced, it can never be eliminated. Despite residual random error, studies can still contribute well to assessment of causality.
Another general area of concern is error that is systematic. Systematic error is also called bias, which can relate to the sampling of people for a study, the collection of information that is used to generate associations, and analyses. An example involves case-control studies in which cases are selected from clinical practice and controls are selected from the general population. If cases identified in the hospital do not represent all cases in the community, this could lead to a bias (especially if the cases in the hospital are more severe). The bias can be exaggerated further if the controls are selected not from the community, but from hospital services that systematically exclude musculoskeletal diseases or happen to exclude the possibility of work (e.g., a chronic care psychiatric ward). The result would provide an artificial association of work-related activities and musculoskeletal disorder. Another example of bias involves the type of information collected. For example, in a case control study, musculoskeletal cases, especially those identified in a hospital, might be
questioned more thoroughly about their exposures and ruminate about possible risk factors more than controls (free of disorders) who are questioned in a community setting. Clearly, it is important to design studies in which the same information is obtained in the same way from individuals who are comparable. Rather than review here the extensive array of potential biases that are possible, the reader is referred to a more detailed discussion elsewhere (Sackett, 1979).
CRITERIA FOR CAUSALITY ACROSS STUDIES
Thus far the discussion has considered characteristics of causal inference that can be examined within studies. These serve as a means to sort through individual studies and identify those that can contribute to generating causal inferences. The emphasis in this discussion has not been on study design, but rather generic considerations that could apply to basic science, observational epidemiologic, and intervention studies. There is, however, an approach for considering a body of literature to generate causal inferences across a variety of studies. The approach used in epidemiology has been attributed to Sir A.B. Hill (Lilienfeld and Lilienfeld, 1980). The Bradford Hill criteria for causality address strength, temporality, consistency, and specificity of association; dose-response association; and biological plausibility.
The strength of association refers to the magnitude of the measure of the association; the larger the summary measure, the more confident one can be that the putative association may be causal. The type of measure used includes the relative risk or odds ratio. In our example, the ratio could be higher rates of back disorders in persons engaged in repetitive lifting than in those not engaged in this activity. The larger the ratio of incidence rates (especially across studies), the greater confidence one can have that the observed association is meaningful. There are no hard and fast rules for the minimum size of the association, although Lilienfeld and Lilienfeld (1980) have suggested that associations (e.g., relative risks) greater than 3 were probably less likely to be due to selection bias.
Temporality of association refers to the need, in establishing causality, that the exposure must precede the outcome in time. This point has been addressed earlier.
Consistency of association refers to similarity of findings within subgroups of a study or similarity of findings in other populations studied at different times, even by different study designs (e.g., retrospective versus prospective studies). The greater the degree of consistency across subgroups or across studies, the more confidence the reviewer can have that the association under study is likely to be considered causal. Failure to find consistency of association across studies is not necessarily evidence
for the lack of association; rather, it is possible that factors associated with the outcome of interest could respond differently in the presence of other factors. This circumstance is referred to as “interaction” and should be considered before discarding associations as inconsistent.
Specificity of association refers to the concept that if a factor is associated with one outcome but not others, then a causal inference is more likely to be entertained. Many epidemiologists who note that some exposures can in fact be related to numerous disease outcomes (e.g., tobacco exposure) have concluded that specificity of association, as a criterion for causality, needs to be considered with caution.
The dose-response relationship is a direct association between levels of the exposure and levels of the outcome, for example, when reduction in levels of exposure, through intervention, is associated with reduction in levels of the outcome (e.g., rates of disease within subgroups). In epidemiologic studies in which exposure levels are difficult to measure at the individual level, studies can classify individuals by ambient exposures; in other settings, special population groups characterized as having extremes of the putative exposure (i.e., none, considerable) might be used to assist in contributing to the understanding of the response to the dose.
Biological plausibility refers to the likelihood that an association is compatible with existing knowledge of biological mechanisms. This point is an explicit statement of the concordance of the basic science and the epidemiologic literature. However, failure to have an established mechanism does not necessarily negate the observed association; rather, if other criteria of causality are observed for the association, we might consider the observed association for hypothesis generation. Indeed, there are numerous examples in the history of epidemiology in which associations observed in the field led to important public health practice and policy change, even though the association initially preceded biological theory development, which was established later. Historically, Snow's (1936) work in London on the cholera epidemic and Goldberger and colleagues' (Goldberger, Waring, and Tanner, 1923) work on pellagra are classic examples in which epidemiologists made important observations and interventions based on epidemiologic observations prior to the development of basic science observations or understanding.
These criteria have been used in numerous other reports for the past 40 years and have represented important guidance for drawing etiologic causal inferences in studies of human disease that were characterized by the use of observational epidemiologic studies.
As noted above, the Bradford Hill criteria have been used primarily with observational epidemiologic studies. The principal designs used in such studies include surveillance, the cross-sectional survey, the case control design, the prospective study, the randomized controlled trial, and the community trial. Surveillance involves a systematic collection of data on cases of disease (or exposures of interest, such as the National Occupational Exposure Survey project of the National Institute for Occupational Safety and Health). The data collection is usually passive, as when doctors complete forms to report diseases or conditions to the government or to insurance companies, or it can be active, with trained surveillance technicians conducting systematic surveys in selected settings using established protocols. In either case, the purpose of surveillance is to monitor a population for departures in the typical number of cases observed over time or across jurisdictions. Surveillance data can be analyzed for trends, and it has been used for analyses to generate and to test hypotheses. However, the level of information obtained in surveillance projects is typically limited, as is the sampling scheme, in order to provide a cost-efficient means of monitoring a population to identify periods when more focused studies are warranted.
The cross-sectional study is typically a single survey in which ascertainment of the exposures and the outcome of interest are conducted at the same time. The survey is done with an interest in determining whether the outcome is present at the time of the survey and whether the exposure has been present at some point. Although the cross-sectional study is more efficient than designs described below, its general limitation is that such surveys obtain information on prevalence of conditions (and exposures), so the temporal association between exposure and outcome may be more difficult to document. However, often the temporal association in cross-sectional studies is sufficiently clear that such an association can be inferred (e.g., to address such questions as whether workers with carpal tunnel syndrome have jobs that require forceful and repetitive use of the hand more often than those without carpal tunnel syndrome).
To address the difficulty of establishing temporal ordering, some cross-sectional studies can provide proper temporal information by using a careful history of exposure and onset of disease or historical data. However, depending on the accrual of the sample (and departures of those disabled before the study begins), the association determined from a cross-sectional study might reflect information on those who have survived up to that point. As was noted earlier in this chapter, the effect of selective survival on the observed association between exposure and outcome is an important limitation of cross-sectional studies. Estimates of risk may be
erroneously low as a result of such sample distortion. The cohort study is not prone to this effect. Thus, the cross-sectional survey needs to be reviewed to determine whether the correlates identified represent suggestions for risk factors for the disease outcome, or represent correlates for survival in the population up to the point at which the study is done.
The prospective study is a longitudinal design that starts with measuring exposure and then follows individuals over time to identify incidence of disease (or other outcome). This design can be concurrent (starting with exposure currently and then following individuals over time) or nonconcurrent (have a record of exposures that were recorded in the past and then identify incidence of outcomes subsequent to those measurements). One of the greatest strengths of the prospective study is the ability to have information on temporal association. These studies can be used to examine the incidence of disease given exposure, the spectrum of disease, the incubation period (given a discrete date of exposure and onset of outcome), prognostic indicators for disease given exposure, and survival. These studies are also used for nested case-control studies and evaluation of interventions in practice settings. However, evaluation of interventions is limited by the fact that exposure for individuals is based on some selection process that is usually nonrandom; therefore, differences attributed to intervention could be also due to factors related to selection. As noted earlier, although prospective studies are difficult to carry out, it is important to emphasize their value.
The randomized controlled trial is the design used to formally test an intervention. Essentially, the randomized controlled trial is a prospective study that has the added feature of random allocation of the exposure of interest. The advantages of the randomized controlled study and the prospective study are similar; both are able to demonstrate temporal ordering between exposure and outcome. The trial has the key feature of having random allocation of treatment, whereby the investigator controls assignment of an intervention to a portion of the participants. Selection of participants is not controlled by the investigator, so bias in assignment is less likely. Similarly, because the process of assignment is random, on average the groups assigned to treatment and control conditions are, in theory, similar. This similarity does not always occur in fact, because the random procedure for assignment by chance can produce unequal groups as well. If the groups are unequal, the investigators make statistical adjustments. If the groups are equivalent (based on characteristics the investigators had available to measure), then the investigators can extrapolate that the groups being compared are likely to be comparable on unmeasured factors. This assumption may or may not be true. However, this equivalence of groups is the unique feature of randomized controlled trials that can make them potentially so powerful in generating inferences. As noted
earlier, the strengths of the randomized controlled trial are contrasted by a number of considerations (feasibility, practicality, ethics), so this design involves a number of decisions that frequently limit its use.
PATTERN OF EVIDENCE COMPARISONS
Results across the variety of datasets are also summarized in this report using the “pattern of evidence” approach. This is an alternative approach to analyzing a collection of datasets, contrasted with the traditional approach of “preponderance of evidence,” which by strict definition is reserved for summarizing a larger body of studies with more uniform research designs. Rather than summarizing the results of studies to determine whether the direction and magnitude of association is similar across studies to develop a preponderance of evidence, the pattern of evidence approach looks at the extent to which results from one class of studies help to compensate for the limitations from another class of studies. The goal is to establish a pattern of evidence that can be discerned from multiple data sources that are based on different sampling frames and methods.
This approach considers interrelated conditions, such as intermediate outcomes (e.g., if there is a reduction in back disorders after the implementation of job redesign), and can be used to decide whether there are other pieces of evidence available to rule in its plausibility (Cordray, 1986). For example, such evidence could include: epidemiologic evidence that shows an association of repetitive lifting and back disorder among different occupational groups, independent of organizational, psychological, and recreational factors; biomechanical literature showing precise load location, load moment, spinal load, three-dimensional trunk position, frequency, and kinematics that points to a well-defined pathway for exposure and risk of spine structure loading; and basic biological studies showing that a greater magnitude of spinal loading can explain deterioration of spinal tissue and can cause damage. Thus, the more supportive the pattern of evidence, the more plausible the perceived effect. The strength of this method is that, alone, self-reports of work practices in epidemiologic studies might be important but could be questioned as being subject to socially desirable responding; having data from other sources helps to strengthen inferences that this behavior occurred. Similarly, having a reduction in back disorders after implementing job redesign could be due to any number of factors, but having data to show that there was a decrease in specified biomechanical actions at the level of the individual bolsters confidence that the observed results might be due to the intervention. Of course, there can be inconsistencies, such as no change in the frequency of back disorder in a workplace where a program was implemented. However, further investigation might show that the program, while well in-
tended, had elements that did not target behaviors properly. This pattern of evidence, with the contribution of different studies with different forms of measurement, may suggest that while the program was utilized, the intended effect was not observed. Ideally, these results can be obtained early enough to focus data collection toward the factors that might be related to such a finding.
In other words, if certain conditions are met, then it is possible to probe the plausibility that the intervention was responsible, at least in part, for the observed outcomes. As empirical evidence or assessments are repeated, the plausibility increases. Through multiple assessments, involving a logical network of evidence, it may be possible to derive a portrait of the plausibility. While there is a pool of studies of varying quality, the pattern of evidence approach requires at least some higher-quality studies (i.e., prospectively collected data) within the total pool of studies to be available to assess whether the evidence from the lower-quality studies is meaningful. This approach is particularly appropriate as an inferential strategy in the situation in which the number of any single type of studies (e.g., behavioral surveys, incidence data, surveillance data) is limited, precluding a preponderance of evidence approach. This pattern of evidence approach is not novel; it was described and used in a recent report that investigated the role of sterile syringes and bleach disinfection in HIV prevention (National Research Council, 1995).
This report reviews the literature on musculoskeletal disorders in the workplace, drawing on studies from basic sciences, epidemiology, and intervention research. Each of these types of research is reviewed separately in different chapters. However, there is also cross-referencing between chapters, and an integration chapter, to assemble inferences that are based on patterns of evidence about the associations that lead to the conclusions and recommendations in this report.