Abstract: This chapter addresses the qualitative and quantitative synthesis (meta-analysis) of the body of evidence. The committee recommends four related standards. The systematic review (SR) should use prespecified methods; include a qualitative synthesis based on essential characteristics of study quality (risk of bias, consistency, precision, directness, reporting bias, and for observational studies, dose–response association, plausible confounding that would change an observed effect, and strength of association); and make an explicit judgment of whether a meta-analysis is appropriate. If conducting meta-analyses, expert methodologists should develop, execute, and peer review the meta-analyses. The meta-analyses should address heterogeneity among study effects, accompany all estimates with measures of statistical uncertainty, and assess the sensitivity of conclusions to changes in the protocol, assumptions, and study selection (sensitivity analysis). An SR that uses rigorous and transparent methods will enable patients, clinicians, and other decision makers to discern what is known and not known about an intervention’s effectiveness and how the evidence applies to particular population groups and clinical situations.
More than a century ago, Nobel prize-winning physicist J. W. Strutt Lord Rayleigh observed that “the work which deserves …
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 155
4
Standards for Synthesizing the
Body of Evidence
Abstract: This chapter addresses the qualitative and quantitative
synthesis (meta-analysis) of the body of evidence. The committee
recommends four related standards. The systematic review (SR)
should use prespecified methods; include a qualitative synthesis
based on essential characteristics of study quality (risk of bias,
consistency, precision, directness, reporting bias, and for observa-
tional studies, dose–response association, plausible confounding
that would change an observed effect, and strength of association);
and make an explicit judgment of whether a meta-analysis is
appropriate. If conducting meta-analyses, expert methodologists
should develop, execute, and peer review the meta-analyses. The
meta-analyses should address heterogeneity among study effects,
accompany all estimates with measures of statistical uncertainty,
and assess the sensitivity of conclusions to changes in the protocol,
assumptions, and study selection (sensitivity analysis). An SR
that uses rigorous and transparent methods will enable patients,
clinicians, and other decision makers to discern what is known
and not known about an intervention’s effectiveness and how
the evidence applies to particular population groups and clinical
situations.
More than a century ago, Nobel prize-winning physicist J. W.
Strutt Lord Rayleigh observed that “the work which deserves . . .
155
OCR for page 155
156 FINDING WHAT WORKS IN HEALTH CARE
the most credit is that in which discovery and explanation go hand
in hand, in which not only are new facts presented, but their rela-
tion to old ones is pointed out” (Rayleigh, 1884). In other words, the
contribution of any singular piece of research draws not only from
its own unique discoveries, but also from its relationship to previ-
ous research (Glasziou et al., 2004; Mulrow and Lohr, 2001). Thus,
the synthesis and assessment of a body of evidence is at the heart
of a systematic review (SR) of comparative effectiveness research
(CER).
The previous chapter described the considerable challenges
involved in assembling all the individual studies that comprise cur-
rent knowledge on the effectiveness of a healthcare intervention: the
“body of evidence.” This chapter begins with the assumption that the
body of evidence was identified in an optimal manner and that the
risk of bias in each individual study was assessed appropriately—
both according to the committee’s standards. This chapter addresses
the synthesis and assessment of the collected evidence, focusing on
those aspects that are most salient to setting standards. The science
of SR is rapidly evolving; much has yet to be learned. The purpose
of standards for evidence synthesis and assessment—as in other
SR methods—is to set performance expectations and to promote
accountability for meeting those expectations without stifling inno -
vation in methods. Thus, the emphasis is not on specifying preferred
technical methods, but rather the building blocks that help ensure
objectivity, transparency, and scientific rigor.
As it did elsewhere in this report, the committee developed this
chapter’s standards and elements of performance based on avail-
able evidence and expert guidance from the Agency for Healthcare
Research and Quality (AHRQ) Effective Health Care Program, the
Centre for Reviews and Dissemination (CRD, part of University of
York, UK), and the Cochrane Collaboration (Chou et al., 2010; CRD,
2009; Deeks et al., 2008; Fu et al., 2010; Lefebvre et al., 2008; Owens et
al., 2010). Guidance on assessing quality of evidence from the Grad -
ing of Recommendations Assessment, Development, and Evaluation
(GRADE) Working Group was another key source of information
(Guyatt et al. 2010; Schünemann et al., 2009). See Appendix F for a
detailed summary of AHRQ, CRD, and Cochrane guidance for the
assessment and synthesis of a body of evidence.
The committee had several opportunities for learning the per-
spectives of stakeholders on issues related to this chapter. SR experts
and representatives from medical specialty associations, payers, and
consumer groups provided both written responses to the commit-
tee’s questions and oral testimony in a public workshop (see Appen-
OCR for page 155
157
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
dix C). In addition, staff conducted informal, structured interviews
with other key stakeholders.
The committee recommends four standards for the assessment
and qualitative and quantitative synthesis of an SR’s body of evi-
dence. Each standard consists of two parts: first, a brief statement
describing the related SR step and, second, one or more elements of
performance that are fundamental to carrying out the step. Box 4-1
lists all of the chapter’s recommended standards. This chapter pro -
vides the background and rationale for the recommended standards
and elements of performance, first outlining the key considerations
in assessing a body of evidence, and followed by sections on the fun-
damental components of qualitative and quantitative synthesis. The
order of the chapter’s standards and the presentation of the discus-
sion do not necessarily indicate the sequence in which the various
steps should be conducted. Although an SR synthesis should always
include a qualitative component, the feasibility of a quantitative
synthesis (meta-analysis) depends on the available data. If a meta-
analysis is conducted, its interpretation should be included in the
qualitative synthesis. Moreover, the overall assessment of the body
of evidence cannot be done until the syntheses are complete.
In the context of CER, SRs are produced to help consumers,
clinicians, developers of clinical practice guidelines, purchasers, and
policy makers to make informed healthcare decisions (Federal Coor-
dinating Council for Comparative Effectiveness Research, 2009; IOM,
2009). Thus, the assessment and synthesis of a body of evidence in
the SR should be approached with the decision makers in mind. An
SR using rigorous and transparent methods allows decision makers
to discern what is known and not known about an intervention’s
effectiveness and how the evidence applies to particular population
groups and clinical situations (Helfand, 2005). Making evidence-
based decisions—such as when a guideline developer recommends
what should and should not be done in specific clinical circum-
stances—is a distinct and separate process from the SR and is outside
the scope of this report. It is the focus of a companion IOM study on
developing standards for trustworthy clinical practice guidelines.1
A NOTE ON TERMINOLOGY
The SR field lacks an agreed-on lexicon for some of its most fun-
damental terms and concepts, including what actually constitutes
1 The IOM report, Clinical Practice Guidelines We Can Trust, is available at the Na-
tional Academies Press website: http://www.nap.edu/.
OCR for page 155
158 FINDING WHAT WORKS IN HEALTH CARE
BOX 4-1
Recommended Standards for Synthesizing
the Body of Evidence
Standard 4.1 Use a prespecified method to evaluate the body of
evidence
Required elements:
4.1.1 For each outcome, systematically assess the following char-
acteristics of the body of evidence:
• Risk of bias
• Consistency
• Precision
• Directness
• Reporting bias
4.1.2 For bodies of evidence that include observational research,
also systematically assess the following characteristics for
each outcome:
• Dose–response association
• Plausible confounding that would change the observed
effect
• Strength of association
4.1.3 For each outcome specified in the protocol, use consistent
language to characterize the level of confidence in the esti-
mates of the effect of an intervention
Standard 4.2 Conduct a qualitative synthesis
Required elements:
4.2.1 Describe the clinical and methodological characteristics of
the included studies, including their size, inclusion or exclu-
sion of important subgroups, timeliness, and other relevant
factors
the quality of a body of evidence. This leads to considerable confu-
sion. Because this report focuses on SRs for the purposes of CER and
clinical decision making, the committee uses the term “quality of the
body of evidence” to describe the extent to which one can be con -
fident that the estimate of an intervention’s effectiveness is correct.
This terminology is designed to support clinical decision making
and is similar to that used by GRADE and adopted by the Cochrane
Collaboration and other organizations for the same purpose (Guyatt
et al., 2010; Schünemann et al., 2008, 2009).
Quality encompasses summary assessments of a number of
characteristics of a body of evidence, such as within-study bias
(methodological quality), consistency, precision, directness or appli-
cability of the evidence, and others (Schünemann et al., 2009) . Syn-
OCR for page 155
159
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
4.2.2 Describe the strengths and limitations of individual studies
and patterns across studies
4.2.3 Describe, in plain terms, how flaws in the design or execu-
tion of the study (or groups of studies) could bias the results,
explaining the reasoning behind these judgments
4.2.4 Describe the relationships between the characteristics of the
individual studies and their reported findings and patterns
across studies
4.2.5 Discuss the relevance of individual studies to the popula-
tions, comparisons, cointerventions, settings, and outcomes
or measures of interest
Standard 4.3 Decide if, in addition to a qualitative analysis, the system-
atic review will include a quantitative analysis (meta-analysis)
Required element:
4.3.1 Explain why a pooled estimate might be useful to decision
makers
Standard 4.4 If conducting a meta-analysis, then do the following:
Required elements:
4.2.1 Use expert methodologists to develop, execute, and peer
review the meta-analyses
4.2.2 Address the heterogeneity among study effects
4.2.3 Accompany all estimates with measures of statistical
uncertainty
4.2.4 Assess the sensitivity of conclusions to changes in the pro-
tocol, assumptions, and study selection (sensitivity analysis)
NOTE: The order of the standards does not indicate the sequence in which they are
carried out.
thesis is the collation, combination, and summary of the findings of
a body of evidence (CRD, 2009). In an SR, the synthesis of the body
of evidence should always include a qualitative component and, if
the data permit, a quantitative synthesis (meta-analysis).
The following section presents the background and rationale for
the committee’s recommended standard and performance elements
for prespecifying the assessment methods.
A Need for Clarity and Consistency
Neither empirical evidence nor agreement among experts is
available to support the committee’s endorsement of a specific
approach for assessing and describing the quality of a body of evi-
OCR for page 155
160 FINDING WHAT WORKS IN HEALTH CARE
dence. Medical specialty societies, U.S. and other national govern-
ment agencies, private research groups, and others have created a
multitude of systems for assessing and characterizing the quality
of a body of evidence (AAN, 2004; ACCF/AHA, 2009; ACCP, 2009;
CEBM, 2009; Chalmers et al., 1990; Ebell et al., 2004; Faraday et
al., 2009; Guirguis-Blake et al., 2007; Guyatt et al., 2004; ICSI, 2003;
NCCN, 2008; NZGG, 2007; Owens et al., 2010; Schünemann et al.,
2009; SIGN, 2009; USPSTF, 2008). The various systems share common
features, but employ conflicting evidence hierarchies; emphasize
different factors in assessing the quality of research; and use a con-
fusing array of letters, codes, and symbols to convey investigators’
conclusions about the overall quality of a body of evidence (Atkins
et al., 2004a, 2004b; Schünemann et al., 2003; West et al., 2002). The
reader cannot make sense of the differences (Table 4-1). Through
public testimony and interviews, the committee heard that numer-
ous producers and users of SRs were frustrated by the number,
variation, complexity, and lack of transparency in existing systems.
One comprehensive review documented 40 different systems
for grading the strength of a body of evidence (West et al., 2002).
Another review, conducted several years later, found that more than
50 evidence-grading systems and 230 quality assessment instru-
ments were in use (COMPUS, 2005).
Early systems for evaluating the quality of a body of evidence
used simple hierarchies of study design to judge the internal valid-
ity (risk of bias) of a body of evidence (Guyatt et al., 1995). For
example, a body of evidence that included two or more randomized
controlled trials (RCTs) was assumed to be “high-quality,” “level
1,” or “grade A” evidence whether or not the trials met scientific
standards. Quasi-experimental research, observational studies, case
series, and other qualitative research designs were automatically
considered lower quality evidence. As research documented the
variable quality of trials and widespread reporting bias in the pub-
lication of trial findings, it became clear that such hierarchies are too
simplistic because they do not assess the extent to which the design
and implementation of RCTs (or other study designs) avoid biases
that may reduce confidence in the measures of effectiveness (Atkins
et al., 2004b; Coleman et al., 2009; Harris et al., 2001).
The early hierarchies produced conflicting conclusions about
effectiveness. A study by Ferreira and colleagues analyzed the effect
of applying different “levels of evidence” systems to the conclusions
of six Cochrane SRs of interventions for low back pain (Ferreira
et al., 2002). They found that the conclusions of the reviews were
highly dependent on the system used to evaluate the evidence
OCR for page 155
TABLE 4-1 Examples of Approaches to Assessing the Body of Evidence for Therapeutic Interventions*
System System for Assessing the Body of Evidence
High
Agency for Healthcare High confidence that the evidence reflects the true effect. Further research is very unlikely to
Research and change our confidence of the estimate of effect.
Moderate
Quality Moderate confidence that the evidence reflects the true effect. Further research may change
our confidence in the estimate of effect and may change the estimate.
Low Low confidence that the evidence reflects the true effect. Further research is likely to change
the confidence in the estimate of effect and is likely to change the estimate.
Insufficient Evidence either is unavailable or does not permit a conclusion.
High
American Randomized controlled trials (RCTs) without important limitations or overwhelming evidence
College of from observational studies.
Moderate
Chest Physicians RCTs with important limitations (inconsistent results, methodological flaws, indirect, or
imprecise) or exceptionally strong evidence from observational studies.
Low Observational studies or case series.
A
American Heart Multiple RCTs or meta-analyses.
Association/ B Single RCT, or nonrandomized studies.
American College
C Consensus opinion of experts, case studies, or standard of care.
of Cardiology
Starting points for evaluating quality level:
Grading of
Recommendations • RCTs start high.
Assessment, • bservational studies start low.
O
Factors that may decrease or increase the quality level of a body of evidence:
Development and
D
Evaluation • ecrease: Study limitations, inconsistency of results, indirectness of evidence, imprecision of results, and
(GRADE) high risk of publication bias.
I
• ncrease: Large magnitude of effect, dose–response gradient, all plausible biases would reduce the observed
161
continued
effect.
OCR for page 155
TABLE 4-1 Continued
162
System System for Assessing the Body of Evidence
High Further research is very unlikely to change our confidence in the estimate of effect.
Moderate Further research is likely to have an important impact on our confidence in the estimate of
effect and may change the estimate.
Low Further research is very likely to have an important impact on our confidence in the estimate
of effect and is likely to change the estimate.
Very low Any estimate of effect is very uncertain.
High
National High-powered RCTs or meta-analysis.
Comprehensive
Cancer Network Lower Ranges from Phase II Trials to large cohort studies to case series to individual practitioner
experience.
Oxford Centre for Varies with type of question. Level may be graded down on the basis of study quality, imprecision,
Evidence-Based indirectness, inconsistency between studies, or because the absolute effect size is very small. Level may be
Medicine graded up if there is a large or very large effect size.
Level 1 Systematic review (SR) of randomized trials or n-of-1 trial.
For rare harms: SR of case-control studies, or studies revealing dramatic effects.
Level 2 SR of nested case-control or dramatic effect.
For rare harms: Randomized trial or (exceptionally) observational study with dramatic effect.
Level 3 Nonrandomized controlled cohort/follow-up study.
Level 4 Case-control studies, historically controlled studies.
Level 5 Opinion without explicit critical appraisal, based on limited/undocumented experience, or
based on mechanisms.
OCR for page 155
Scottish Intercollegiate 1++ High-quality meta-analyses, SRs of RCTs, or RCTs with a very low risk of bias.
Guidelines Network 1+ Well-conducted meta-analyses, SRs, or RCTs with a low risk of bias.
1– Meta-analyses, SRs, or RCTs with a high risk of bias.
2++ High-quality SRs of case control or cohort studies. High-quality case control or cohort studies
with a very low risk of confounding or bias and a high probability that the relationship is
causal.
2– Case control or cohort studies with a high risk of confounding or bias and a significant risk
that the relationship is not causal.
3 Nonanalytic studies, e.g., case reports, case series.
4 Expert opinion.
* Some systems use different grading schemes depending on the type of intervention (e.g., preventive service, diagnostic tests, and
therapies). This table includes systems for therapeutic interventions.
SOURCES: ACCF/AHA (2009); ACCP (2009); CEBM (2009); NCCN (2008); Owens et al. (2010); Schünemann et al. (2009); SIGN (2009).
163
OCR for page 155
164 FINDING WHAT WORKS IN HEALTH CARE
primarily because of differences in the number and quality of trials
required for a particular level of evidence. In many cases, the dif-
ferences in the conclusions were so substantial that they could lead
to contradictory clinical advice. For example, for one intervention,
“back school,”2 the conclusions ranged from “strong evidence that
back schools are effective” to “no evidence” on the effectiveness of
back schools.
One reason for these discrepancies was failure to distinguish
between the quality of the evidence and the magnitude of net
benefit. For example, an SR and meta-analysis might highlight a
dramatic effect size regardless of the risk of bias in the body of
evidence. Conversely, use of a rigid hierarchy gave the impression
that any effect based on randomized trial evidence was clinically
important, regardless of the size of the effect. In 2001, the U.S.
Preventive Services Task Force broke new ground when it updated
its review methods, separating its assessment of the quality of
evidence from its assessment of the magnitude of effect (Harris et
al., 2001).
What Are the Characteristics of Quality
for a Body of Evidence?
Experts in SR methodology agree on the conceptual underpin-
nings for the systematic assessment of a body of evidence. The
committee identified eight basic characteristics of quality, described
below, that are integral to assessing and characterizing the quality of
a body of evidence. These characteristics—risk of bias, consistency,
precision, directness, and reporting bias, and for observational stud -
ies, dose–response association, plausible confounding that would
change an observed effect, and strength of association—are used
by GRADE; the Cochrane Collaboration, which has adopted the
GRADE approach; and the AHRQ Effective Health Care Program,
which adopted a modified version of the GRADE approach (Owens
et al., 2010; Balshem et al., 2011; Falck-Ytter et al., 2010; Schünemann
et al., 2008). Although their terminology varies somewhat, Falck-
Ytter and his GRADE colleagues describe any differences between
the GRADE and AHRQ quality characteristics as essentially seman -
tic (Falck-Ytter et al., 2010). Owens and his AHRQ colleagues appear
2 Back schools are educational programs designed to teach patients how to manage
chronic low back pain to prevent future episodes. The curriculums typically include
the natural history, anatomy, and physiology of back pain as well as a home exercise
program (Hsieh et al., 2002).
OCR for page 155
165
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
BOX 4-2
Key Concepts Used in the GRADE Approach to
Assessing the Quality of a Body of Evidence
The Grading of Recommendations Assessment, Development, and
Evaluation (GRADE) Working Group uses a point system to upgrade or
downgrade the ratings for each quality characteristic. A grade of high,
moderate, low, or very low is assigned to the body of evidence for each
outcome. Eight characteristics of the quality of evidence are assessed for
each outcome.
Five characteristics can lower the quality rating for the body of evidence:
• Limitations in study design and conduct
• Inconsistent results across studies
• Indirectness of evidence with respect to the study design, popula-
tions, interventions, comparisons, or outcomes
• Imprecision of the estimates of effect
• Publication bias
Three factors can increase the quality rating for the body of evidence
because they raise confidence in the certainty of estimates (particularly for
observational studies):
• Large magnitude of effect
• Plausible confounding that would reduce the demonstrated effect
• Dose–response gradient
SOURCES: Atkins et al. (2004a); Balshem et al. (2011); Falck-Ytter et al. (2010);
Schünemann et al. (2009).
to agree (Owens et al., 2010). As Boxes 4-2 and 4-3 indicate, the two
approaches are quite similar.3
Risk of Bias
In the context of a body of evidence, risk of bias refers to the
extent to which flaws in the design and execution of a collection of
studies could bias the estimate of effect for each outcome under study.
3 For detailed descriptions of the AHRQ and GRADE methods, see the GRADE
Handbook for Grading Quality of Evidence and Strength of Recommendations (Schünemann
et al., 2009) and “Grading the Strength of a Body of Evidence When Comparing Medi-
cal Interventions—AHRQ and the Effective Health Care Program” (Owens et al., 2010).
OCR for page 155
184 FINDING WHAT WORKS IN HEALTH CARE
Statistical Uncertainty
In meta-analyses, the amount of within- and between-study
variation determines how precisely study and aggregate treatment
effects are estimated. Estimates of effects without accompanying
measures of their uncertainty, such as confidence intervals, can -
not be correctly interpreted. A forest plot can provide a succinct
representation of the size and precision of individual study effects
and aggregated effects. When effects are heterogeneous, more than
one summary effect may be necessary to fully describe the data.
Measures of uncertainty should also be presented for estimates of
heterogeneity and for statistics that quantify relationships between
treatment effects and sources of heterogeneity.
Between-study heterogeneity is common in meta-analysis
because studies differ in their protocols, target populations, settings,
and ages of included subjects. This type of heterogeneity provides
evidence about potential variability in treatment effects. Therefore,
heterogeneity is not a nuisance or an undesirable feature, but rather
an important source of information to be carefully analyzed (Lau et
al., 1998). Instead of eliminating heterogeneity by restricting study
inclusion criteria or scope, which can limit the utility of the review,
heterogeneity of effect sizes can be quantified, and related to aspects
of study populations or design features through statistical techniques
such as meta-regression, which associates the size of treatment effects
with effect modifiers. Meta-regression is most useful in explaining
variation that occurs from sources that have no effect within stud-
ies, but big effects among studies (e.g., use of randomization or dose
employed). Except in rare cases, meta-regression analyses are explor-
atory, motivated by the need to explain heterogeneity, and not by
prespecification in the protocol. Meta-regression is observational in
nature, and if the results of meta-regression are to be considered valid,
they should be clinically plausible and supported by other external
evidence. Because the number of studies in a meta-regression is often
small, the technique has low power. The technique is subject to spu-
rious findings because many potential covariates may be available,
and adjustments to levels of significance may be necessary (Higgins
and Thompson, 2004). Users should also be careful of relationships
driven by anomalies in one or two studies. Such influential data do
not provide solid evidence of strong relationships.
Research Trends in Meta-Analysis
As mentioned previously, a detailed discussion of meta-analysis
methodology is beyond the scope of this report. There are many
OCR for page 155
185
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
unresolved questions regarding meta-analysis methods. Fortunately,
meta-analysis methodological research is vibrant and ongoing.
Box 4-4 describes some of the research trends in meta-analysis and
provides relevant references for the interested reader.
Sensitivity of Conclusions
Meta-analysis entails combining information from different
studies; thus, the data may come from very different study designs.
A small number of studies in conjunction with a variety of study
designs contribute to heterogeneity in results. Consequently, verify -
ing that conclusions are robust to small changes in the data and to
changes in modeling assumptions solidifies the belief that they are
robust to new information that could appear. Without a sensitivity
analysis, the credibility of the meta-analysis is reduced.
Results are considered robust if small changes in the meta-
analytic protocol, in modeling assumptions, and in study selection
do not affect the conclusions. Robust estimates increase confidence
in the SR’s findings. Sensitivity analyses subject conclusions to such
tests by perturbing these characteristics in various ways.
The sensitivity analysis could, for example, assess whether the
results change when the meta-analysis is rerun leaving one study
out at a time. One statistical test for stability is to check that the pre-
dictive distribution of a new study from a meta-analysis with one of
the studies omitted would include the results of the omitted study
(Deeks et al., 2008). Failure to meet this criterion implies that the
result of the omitted study is unexpected given the remaining stud-
ies. Another common criterion is to determine whether the estimated
average treatment effect changes substantially upon omission of one
of the studies. A common definition of substantial involves change
in the determination of statistical significance of the summary effect,
although this definition is problematic because a significance thresh-
old may be crossed with an unimportant change in the magnitude or
precision of the effect (i.e., loss of statistical significance may result
from omission of a large study that reduces the precision, but not
the magnitude, of the effect).
In addition to checking sensitivity to inclusion of single stud-
ies, it is important to evaluate the effect of changes in the protocol
that may alter the composition of the studies in the meta-analysis.
Changes to the inclusion and exclusion criteria—such as the inclu -
sion of non-English literature or the exclusion of studies that enroll
some participants not in the target population or the focus on stud-
ies with low risk of bias—may all modify results sufficiently to ques-
tion robustness of inferences.
OCR for page 155
186 FINDING WHAT WORKS IN HEALTH CARE
BOX 4-4
Research Trends in Meta-Analysis
Meta-analytic research is a dynamic and rapidly changing field. The
following describes key areas of research with recommended citations for
additional reading:
Prospective meta-analysis—In this approach, studies are identi-
fied and evaluated prior to the results of any individual studies being
known. Prospective meta-analysis (PMA) allows selection criteria and
hypotheses to be defined a priori to the trials being concluded. PMA
can implement standardization across studies so that heterogene-
ity is decreased. In addition, small studies that lack statistical power
individually can be conducted if large studies are not feasible. See
for example: Berlin and Ghersi, 2004, 2005; Ghersi et al., 2008; The
Cochrane Collaboration, 2010.
Meta-regression—In this method, potential sources of heterogeneity
are represented as predictors in a regression model, thereby enabling
estimation of their relationship with treatment effects. Such analyses
are exploratory in the majority of cases, motivated by the need to ex-
plain heterogeneity. See for example: Schmid et al., 2004; Smith et al.,
1997; Sterne et al., 2002; Thompson and Higgins, 2002.
Bayesian methods in meta-analysis—In these approaches, as in
Bayesian approaches in other settings, both the data and parameters
in the meta-analytic model are considered random variables. This ap-
proach allows the incorporation of prior information into subsequent
analyses, and may be more flexible in complex situations than stan-
dard methodologies. See for example: Berry et al., 2010; O’Rourke and
Altman, 2005; Schmid, 2001; Smith et al., 1995; Sutton and Abrams,
2001; Warn et al., 2002.
Meta-analysis of multiple treatments—In this setting, direct treat-
ment comparisons are not available, but an indirect comparison
through a common comparator is. Multiple treatment models, also
called mixed comparison models or network meta-analysis, may be
used to more efficiently model treatment comparisons of interest. See
for example: Cooper et al., 2009; Dias et al., 2010; Salanti et al., 2009.
Individual participant data meta-analysis—In some cases, study
data may include outcomes, treatments, and characteristics of indi-
vidual participants. Meta-analysis with such individual participant data
(IPD) offers many advantages over meta-analysis of aggregate study-
level data. See for example: Berlin et al., 2002; Simmonds et al., 2005;
Smith et al., 1997; Sterne et al., 2002; Stewart, 1995; Thompson and
Higgins, 2002; Tierney et al., 2000.
OCR for page 155
187
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
Another good practice is to evaluate sensitivity to choices about
outcome metrics and statistical models. While one metric and one
model may in the end be chosen as best for scientific reasons, results
that are highly model dependent require more trust in the modeler and
may be more prone to being overturned with new data. In any case,
support for the metrics and models chosen should be provided.
Meta-analyses are also frequently sensitive to assumptions
about missing data. In meta-analysis, missing data include not only
missing outcomes or predictors, but also missing variances and cor-
relations needed when constructing weights based on study preci-
sion. As with any statistical analysis, missing data pose two threats:
reduced power and bias. Because the number of studies is often
small, loss of even a single study’s data can seriously affect the abil-
ity to draw conclusive inferences from a meta-analysis. Bias poses
an even more dangerous problem. Seemingly conclusive analyses
may give the wrong answer if studies that were excluded—because
of missing data—differ from the studies that supplied the data.
The conclusion that the treatment improved one outcome, but not
another, may result solely from the different studies used. Interpret-
ing such results requires care and caution.
RECOMMENDED STANDARDS FOR META-ANALYSIS
The committee recommends the following standards and ele-
ments of performance for conducting the quantitative synthesis.
Standard 4.3—Decide if, in addition to a qualitative analy-
sis, the systematic review will include a quantitative analysis
(meta-analysis)
Required element:
4.3.1 Explain why a pooled estimate might be useful to
decision makers
Standard 4.4—If conducting a meta-analysis, then do the
following:
Required elements:
4.4.1 Use expert methodologists to develop, execute,
and peer review the meta-analyses
4.4.2 Address heterogeneity among study effects
4.4.3 Accompany all estimates with measures of statisti-
cal uncertainty
4.4.4 Assess the sensitivity of conclusions to changes
in the protocol, assumptions, and study selection
(sensitivity analysis)
OCR for page 155
188 FINDING WHAT WORKS IN HEALTH CARE
Rationale
A meta-analysis is usually desirable in an SR because it pro-
vides reproducible summaries of the individual study results and
has potential to offer valuable insights into the patterns of results
across studies. However, many published analyses have important
methodological shortcomings and lack scientific rigor (Bailar, 1997;
Gerber et al., 2007; Mullen and Ramirez, 2006). One must always
look beyond the simple fact that an SR contains a meta-analysis to
examine the details of how it was planned and conducted. A strong
meta-analysis emanates from a well-conducted SR and features and
clearly describes its subjective components, scrutinizes the indi-
vidual studies for sources of heterogeneity, and tests the sensitivity
of the findings to changes in the assumptions and set of studies
(Greenland, 1994; Walker et al., 2008).
REFERENCES
AAN (American Academy of Neurology). 2004. Clinical practice guidelines process
manual. http://www.aan.com/globals/axon/assets/3749.pdf (accessed Febru-
ary 1, 2011).
ACCF/AHA. 2009. Methodology manual for ACCF/AHA guideline writing committees.
http://www.americanheart.org/downloadable/heart/12378388766452009Met
hodologyManualACCF_AHAGuidelineWritingCommittees.pdf (accessed July
29, 2009).
ACCP (American College of Chest Physicians). 2009. The ACCP grading system for
guideline recommendations. http://www.chestnet.org/education/hsp/grading
System.php (accessed February 1, 2011).
Ammerman, A., M. Pignone, L. Fernandez, K. Lohr, A. D. Jacobs, C. Nester, T. Orleans,
N. Pender, S. Woolf, S. F. Sutton, L. J. Lux, and L. Whitener. 2002. Counseling to
promote a healthy diet. http://www.ahrq.gov/downloads/pub/prevent/pdfser/
dietser.pdf (accessed September 26, 2010).
Anello, C., and J. L. Fleiss. 1995. Exploratory or analytic meta-analysis: Should we
distinguish between them? Journal of Clinical Epidemiology 48(1):109–116.
Anzures-Cabrera, J., and J. P. T. Higgins. 2010. Graphical displays for meta-analysis:
An overview with suggestions for practice. Research Synthesis Methods 1(1):66–
89.
Atkins, D. 2007. Creating and synthesizing evidence with decision makers in mind:
Integrating evidence from clinical trials and other study designs. Medical Care
45(10 Suppl 2):S16–S22.
Atkins, D., D. Best, P. A. Briss, M. Eccles, Y. Falck-Ytter, S. Flottorp, and GRADE
Working Group. 2004a. Grading quality of evidence and strength of recommen -
dations. BMJ 328(7454):1490–1497.
Atkins, D., M. Eccles, S. Flottorp, G. Guyatt, D. Henry, S. Hill, A. Liberati, D. O’Connell,
A. D. Oxman, B. Phillips, H. Schünemann, T. T. Edejer, G. Vist, J. Williams, and
the GRADE Working Group. 2004b. Systems for grading the quality of evidence
and the strength of recommendations I: Critical appraisal of existing approaches.
BMC Health Services Research 4(1):38.
OCR for page 155
189
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
Bailar, J. C., III. 1997. The promise and problems of meta-analysis. New England Journal
of Medicine 337(8):559–561.
Balshem, H., M. Helfand, H. J. Schünemann, A. D. Oxman, R. Kunz, J. Brozek, G. E.
Vist, Y. Falck-Ytter, J. Meerpohl, S. Norris, and G. H. Guyatt. 2011. GRADE
guidelines: 3. Rating the quality of evidence. Journal of Clinical Epidemiology (In
press).
Berlin, J. A., J. Santanna, C. H. Schmid, L. A. Szczech, H. I. Feldman, and the Anti-
lymphocyte Antibody Induction Therapy Study Group. 2002. Individual pa -
tient versus group-level data meta-regressions for the investigation of treat -
ment effect modifiers: Ecological bias rears its ugly head. Statistics in Medicine
21(3):371–387.
Berlin, J., and D. Ghersi. 2004. Prospective meta-analysis in dentistry. The Journal of
Evidence-Based Dental Practice 4(1):59–64.
———. 2005. Preventing publication bias: Registries and prospective meta-analysis.
Publication bias in meta-analysis: Prevention, assessment and adjustments , edited by
H. R. Rothstein, A. J. Sutton, and M. Borenstein, pp. 35–48.
Berry, S., K. Ishak, B. Luce, and D. Berry. 2010. Bayesian meta-analyses for compara -
tive effectiveness and informing coverage decisions. Medical Care 48(6):S137.
Borenstein, M. 2009. Introduction to meta-analysis. West Sussex, U.K.: John Wiley &
Sons.
Brozek, J. L., E. A. Aki, P. Alonso-Coelle, D. Lang, R. Jaeschke, J. W. Williams, B.
Phillips, M. Lelgemann, A. Lethaby, J. Bousquet, G. Guyatt, H. J. Schünemann,
and the GRADE Working Group. 2009. Grading quality of evidence and strength
of recommendations in clinical practice guidelines: Part 1 of 3. An overview of
the GRADE approach and grading quality of evidence about interventions. Al-
lergy 64(5):669–677.
CEBM (Centre for Evidence-based Medicine). 2009. Oxford Centre for Evidence-
based Medicine—Levels of evidence (March 2009). http://www.cebm.net/index.
aspx?o=1025 (accessed February 1, 2011).
Chalmers, I., M. Adams, K. Dickersin, J. Hetherington, W. Tarnow-Mordi, C. Meinert,
S. Tonascia, and T. C. Chalmers. 1990. A cohort study of summary reports of
controlled trials. JAMA 263(10):1401–1405.
Chou, R., N. Aronson, D. Atkins, A. S. Ismaila, P. Santaguida, D. H. Smith, E. Whitlock,
T. J. Wilt, and D. Moher. 2010. AHRQ series paper 4: Assessing harms when com-
paring medical interventions: AHRQ and the Effective Health Care Program.
Journal of Clinical Epidemiology 63(5):502–512.
Cochrane Collaboration. 2010. Cochrane prospective meta-analysis methods group. http://
pma.cochrane.org/ (accessed January 27, 2011).
Coleman, C. I., R. Talati, and C. M. White. 2009. A clinician’s perspective on rating the
strength of evidence in a systematic review. Pharmacotherapy 29(9):1017–1029.
COMPUS (Canadian Optimal Medication Prescribing and Utilization Service). 2005.
Evaluation tools for Canadian Optimal Medication Prescribing and Utilization Service.
http://www.cadth.ca/media/compus/pdf/COMPUS_Evaluation_Methodol-
ogy_final_e.pdf (accessed September 6, 2010).
Cooper, H. M., L. V. Hedges, and J. C. Valentine. 2009. The handbook of research synthesis
and meta-analysis, 2nd ed. New York: Russell Sage Foundation.
Cooper, N., A. Sutton, D. Morris, A. Ades, and N. Welton. 2009. Addressing between-
study heterogeneity and inconsistency in mixed treatment comparisons: Appli -
cation to stroke prevention treatments in individuals with non-rheumatic atrial
fibrillation. Statistics in Medicine 28(14):1861–1881.
CRD (Centre for Reviews and Dissemination). 2009. Systematic reviews: CRD’s guidance
for undertaking reviews in health care. York, U.K.: York Publishing Services, Ltd.
OCR for page 155
190 FINDING WHAT WORKS IN HEALTH CARE
Cummings, P. 2004. Meta-analysis based on standardized effects is unreliable. Ar-
chives of Pediatrics & Adolescent Medicine 158(6):595–597.
Deeks, J., J. Higgins, and D. Altman, eds. 2008. Chapter 9: Analysing data and under-
taking meta-anayses. In Cochrane handbook for systematic reviews of interventions,
edited by J. P. T. Higgins and S. Green. Chichester, UK: John Wiley & Sons.
Devereaux, P. J., D. Heels-Ansdell, C. Lacchetti, T. Haines, K. E. Burns, D. J. Cook,
N. Ravindran, S. D. Walter, H. McDonald, S. B. Stone, R. Patel, M. Bhandari,
H. J. Schünemann, P. T. Choi, A. M. Bayoumi, J. N. Lavis, T. Sullivan, G. Stod-
dart, and G. H. Guyatt. 2004. Payments for care at private for-profit and private
not-for-profit hospitals: A systematic review and meta-analysis. Canadian Medical
Association Journal 170(12):1817–1824.
Dias, S., N. Welton, D. Caldwell, and A. Ades. 2010. Checking consistency in mixed
treatment comparison meta analysis. Statistics in Medicine 29(7 8):932–944.
Dickersin, K. 1990. The existence of publication bias and risk factors for its occurrence.
JAMA 263(10):1385–1389.
Dwan, K., D. G. Altman, J. A. Arnaiz, J. Bloom, A.-W. Chan, E. Cronin, E. Decullier,
P. J. Easterbrook, E. Von Elm, C. Gamble, D. Ghersi, J. P. A. Ioannidis, J. Simes,
and P. R. Williamson. 2008. Systematic review of the empirical evidence of study
publication bias and outcome reporting bias. PLoS ONE 3(8):e3081.
Ebell, M. H., J. Siwek, B. D. Weiss, S. H. Woolf, J. Susman, B. Ewigman, and M. Bow -
man. 2004. Strength of recommendation taxonomy (SORT): A patient-centered
approach to grading evidence in medical literature. American Family Physician
69(3):548–556.
Editors. 2005. Reviews: Making sense of an often tangled skein of evidence. Annals of
Internal Medicine 142(12 Pt 1):1019–1020.
Egger, M., G. D. Smith, and D. G. Altman. 2001. Systematic reviews in health care: Meta-
analysis in context. London, U.K.: BMJ Publishing Group.
Falck-Ytter, Y., H. Schünemann, and G. Guyatt. 2010. AHRQ series commentary 1:
Rating the evidence in comparative effectiveness reviews. Journal of Clinical
Epidemiology 63(5):474–475.
Faraday, M., H. Hubbard, B. Kosiak, and R. Dmochowski. 2009. Staying at the cutting
edge: A review and analysis of evidence reporting and grading; The recom-
mendations of the American Urological Association. BJU International 104(3):
294–297.
Federal Coordinating Council for Comparative Effectiveness Research. 2009. Report
to the President and the Congress. Available from http://www.hhs.gov/recovery/
programs/cer/cerannualrpt.pdf.
Ferreira, P. H., M. L. Ferreira, C. G. Maher, K. Refshauge, R. D. Herbert, and J. Latimer.
2002. Effect of applying different “levels of evidence” criteria on conclusions of
Cochrane reviews of interventions for low back pain. Journal of Clinical Epidemiol-
ogy 55(11):1126–1129.
Fu, R., G. Gartlehner, M. Grant, T. Shamliyan, A. Sedrakyan, T. J. Wilt, L. Griffith,
M. Oremus, P. Raina, A. Ismaila, P. Santaguida, J. Lau, and T. A. Trikalinos.
2010. Conducting quantitative synthesis when comparing medical interventions:
AHRQ and the Effective Health Care Program. In Methods guide for compara-
tive effectiveness reviews, edited by Agency for Healthcare Research and Qual-
ity. http://www.effectivehealthcare.ahrq.gov/index.cfm/search-for-guides-
reviews-and-reports/?pageaction=displayProduct&productID=554 (accessed
January 19, 2011).
Gerber, S., D. Tallon, S. Trelle, M. Schneider, P. Jüni, and M. Egger. 2007. Bibliographic
study showed improving methodology of meta-analyses published in leading
journals: 1993–2002. Journal of Clinical Epidemiology 60(8):773–780.
OCR for page 155
191
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
Ghersi, D., J. Berlin, and L. Askie, eds. 2008. Chapter 19: Prospective meta-analysis.
Edited by J. Higgins and S. Green, Cochrane handbook for systematic reviews of
interventions. Chichester, UK: John Wiley & Sons.
Glasziou, P., J. Vandenbroucke, and I. Chalmers. 2004. Assessing the quality of re -
search. BMJ 328(7430):39–41.
Gluud, L. L. 2006. Bias in clinical intervention research. American Journal of Epidemiol-
ogy 163(6):493–501.
GRADE Working Group. 2010. Organizations that have endorsed or that are using
GRADE. http://www.gradeworkinggroup.org/society/index.htm (accessed
September 20, 2010).
Greenland, S. 1994. Invited commentary: A critical look at some popular meta-
analytic methods. American Journal of Epidemiology 140(3):290–296.
Guirguis-Blake, J., N. Calonge, T. Miller, A. Siu, S. Teutsch, E. Whitlock, and for the
U.S. Preventive Services Task Force. 2007. Current processes of the U.S. Preven-
tive Services Task Force: Refining evidence-based recommendation develop -
ment. Annals of Internal Medicine 147:117–122.
Guyatt, G. H., D. L. Sackett, J. C. Sinclair, R. Hayward, D. J. Cook, and R. J. Cook.
1995. Users’ guides to the medical literature: A method for grading health care
recommendations. JAMA 274(22):1800–1804.
Guyatt, G., H. J. Schünemann, D. Cook, R. Jaeschke, and S. Pauker. 2004. Applying
the grades of recommendation for antithrombotic and thrombolytic therapy: The
seventh ACCP conference on antithrombotic and thrombolytic therapy. Chest
126(3 Suppl):179S–187S.
Guyatt, G., A. D. Oxman, E.A. Akl, R. Kunz, G. Vist, J. Brozek, S. Norris, Y. Falck-Ytter,
P. Glasziou, H. deBeer, R. Jaeschke, D. Rind, J. Meerpohl, P. Dahm, and H. J.
Schünemann. 2010. GRADE guidelines 1. Introduction—GRADE evidence pro-
files and summary of findings tables. Journal of Clinical Epidemiology (In press).
Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, D.
Atkins, and the Methods Work Group Third U. S. Preventive Services Task Force.
2001. Current methods of the U.S. Preventive Services Task Force: A review of
the process. American Journal of Preventive Medicine 20(3 Suppl):21–35.
Helfand, M. 2005. Using evidence reports: Progress and challenges in evidence-based
decision making. Health Affairs 24(1):123–127.
HHS (U.S. Department of Health and Human Services). 2010. The Sentinel Initiative: A
national strategy for monitoring medical product safety. Available from http://www.
fda.gov/Safety/FDAsSentinelInitiative/ucm089474.htm.
Higgins, J. P. T., and S. G. Thompson. 2004. Controlling the risk of spurious findings
from meta-regression. Statistics in Medicine 23(11):1663–1682.
Higgins, J. P. T., S. G. Thompson, J. J. Deeks, and D. G. Altman. 2003. Measuring
inconsistency in meta-analyses. BMJ 327(7414):557–560.
Hopewell, S., K. Loudon, M. J. Clarke, A. D. Oxman, and K. Dickersin. 2009. Publica -
tion bias in clinical trials due to statistical significance or direction of trial results
(Review). Cochrane Database of Systematic Reviews 1:MR000006.
Hopewell, S., J. Clarke Mike, L. Stewart, and J. Tierney. 2008. Time to publication
for results of clinical trials (Review). Cochrane Database of Systematic Reviews (2).
Hsieh, C., A. H. Adams, J. Tobis, C. Hong, C. Danielson, K. Platt, F. Hoehler, S.
Reinsch, and A. Rubel. 2002. Effectiveness of four conservative treatments for
subacute low back pain: A randomized clinical trial. Spine 27(11):1142–1148.
ICSI (Institute for Clinical Systems Improvement). 2003. Evidence grading system.
http://www.icsi.org/evidence_grading_system_6/evidence_grading_system_
pdf_.html (accessed September 8, 2009).
OCR for page 155
192 FINDING WHAT WORKS IN HEALTH CARE
IOM (Institute of Medicine). 2009. Initial national priorities for comparative effectiveness
research. Washington, DC: The National Academies Press.
Kirkham, J. J., K. M. Dwan, D. G. Altman, C. Gamble, S. Dodd, R. Smyth, and P. R.
Williamson. 2010. The impact of outcome reporting bias in randomised con -
trolled trials on a cohort of systematic reviews. BMJ 340:c365.
Lau, J., J. P. A. Ioannidis, and C. H. Schmid. 1998. Summing up evidence: One answer
is not always enough. Lancet 351(9096):123–127.
Lefebvre, C., E. Manheimer, and J. Glanville. 2008. Chapter 6: Searching for studies. In
Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins
and S. Green. Chichester, UK: John Wiley & Sons.
Lu, G., and A. E. Ades. 2004. Combination of direct and indirect evidence in mixed
treatment comparisons. Statistics in Medicine 23(20):3105–3124.
Mullen, P. D., and G. Ramirez. 2006. The promise and pitfalls of systematic reviews.
Annual Review of Public Health 27:81–102.
Mulrow, C. D., and K. N. Lohr. 2001. Proof and policy from medical research evi -
dence. Journal of Health Politics Policy and Law 26(2):249–266.
Mulrow, C., P. Langhorne, and J. Grimshaw. 1997. Integrating heterogeneous pieces of
evidence in systematic reviews. Annals of Internal Medicine 127(11):989–995.
NCCN (National Comprehensive Cancer Network). 2008. About the NCCN clinical
practice guidelines in oncology. http://www.nccn.org/professionals/physician_
gls/about.asp (accessed September 8, 2009).
Norris, S., D. Atkins, W. Bruening, S. Fox, E. Johnson, R. Kane, S. C. Morton, M. Ore -
mus, M. Ospina, G. Randhawa, K. Schoelles, P. Shekelle, and M. Viswanathan.
2010. Selecting observational studies for comparing medical interventions. In
Methods guide for comparative effectiveness reviews, edited by Agency for Health-
care Research and Quality. http://www.effectivehealthcare.ahrq.gov/index.
cfm/search-for-guides-reviews-and-reports/?pageaction=displayProduct&pro
ductID=454 (accessed January 19, 2011).
NZGG (New Zealand Guidelines Group). 2007. Handbook for the preparation of explicit
evidence-based clinical practice guidelines. http://www.nzgg.org.nz/download/
files/nzgg_guideline_handbook.pdf (accessed February 1, 2011).
O’Rourke, K., and D. Altman. 2005. Bayesian random effects meta-analysis of trials
with binary outcomes: Methods for the absolute risk difference and relative risk
scales Statistics in Medicine 24(17):2733–2742.
Owens, D. K., K. N. Lohr, D. Atkins, J. R. Treadwell, J. T. Reston, E. B. Bass, S. Chang,
and M. Helfand. 2010. Grading the strength of a body of evidence when compar-
ing medical interventions: AHRQ and the Effective Health Care Program. Journal
of Clinical Epidemiology 63(5):513–523.
Pham, H. H., D. Schrag, A. S. O’Malley, B. Wu, and P. B. Bach. 2007. Care patterns in
Medicare and their implications for pay for performance. New England Journal
of Medicine 356(11):1130–1139.
Platt, R. 2010. FDA’s Mini-Sentinel program. http://www.brookings.edu/~/media/
Files/events/2010/0111_sentinel_workshop/06%20Sentinel%20Initiative%20
Platt%20Brookings%2020100111%20v05%20distribution.pdf (accessed October
25, 2010).
Platt, R., M. Wilson, K. A. Chan, J. S. Benner, J. Marchibroda, and M. McClellan. 2009.
The new Sentinel Network: Improving the evidence of medical-product safety.
New England Journal of Medicine 361(7):645–647.
Rayleigh, J. W. 1884. Address by the Rt. Hon. Lord Rayleigh. In Report of the fifty-fourth
meeting of the British Association for the Advancement of Science, edited by Murray
J. Montreal.
OCR for page 155
193
STANDARDS FOR SYNTHESIZING THE BODY OF EVIDENCE
Riley, R. D., and E. W. Steyerberg. 2010. Meta-analysis of a binary outcome using indi-
vidual participant data and aggregate data. Research Synthesis Methods 1(1):2–19.
Rothstein, H. R., A. J. Sutton, and M. Borenstein, editors. 2005. Publication bias in meta-
analysis: Prevention, assessment and adjustments. Chichester, U.K.: Wiley.
Salanti, G., V. Marinho, and J. Higgins. 2009. A case study of multiple-treatments
meta-analysis demonstrates that covariates should be considered. Journal of
Clinical Epidemiology 62(8):857–864.
Salanti, G., S. Dias, N. J. Welton, A. Ades, V. Golfinopoulos, M. Kyrgiou, D. Mauri,
and J. P. A. Ioannidis. 2010. Evaluating novel agent effects in multiple-treatments
meta-regression. Statistics in Medicine 29(23):2369–2383.
Salpeter, S., E. Greyber, G. Pasternak, and E. Salpeter. 2004. Risk of fatal and nonfatal
lactic association with metformin use in type 2 diabetes mellitus. Cochrane Data-
base of Systematic Reviews 4:CD002967.
Schmid, C. 2001. Using bayesian inference to perform meta-analysis. Evaluation & the
Health Professions 24(2):165.
Schmid, C. H., P. C. Stark, J. A. Berlin, P. Landais, and J. Lau. 2004. Meta-regression
detected associations between heterogeneous treatment effects and study-level,
but not patient-level, factors. Journal of Clinical Epidemiology 57(7):683–697.
Schriger, D. L., D. G. Altman, J. A. Vetter, T. Heafner, and D. Moher. 2010. Forest plots
in reports of systematic reviews: A cross-sectional study reviewing current prac-
tice. International Journal of Epidemiology 39(2):421–429.
Schünemann, H., D. Best, G. Vist, and A. D. Oxman. 2003. Letters, numbers, symbols
and words: How to communicate grades of evidence and recommendations.
Canadian Medical Association Journal 169(7):677–680.
Schünemann, H., A. D. Oxman, G. Vist, J. Higgins, J. Deeks, P. Glasziou, and G.
Guyatt. 2008. Chapter 12: Interpreting results and drawing conclusions. In Co-
chrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins
and S. Green. Chichester, UK: John Wiley & Sons.
Schünemann, H. J., J. Brożek, and A. D. Oxman. 2009. GRADE handbook for grading
quality of evidence and strength of recommendations. Version 3.2 [updated March
2009]. http://www.cc-ims.net/gradepro (accessed November 10, 2010).
SIGN (Scottish Intercollegiate Guidelines Network). 2009. SIGN 50: A guideline de-
veloper’s handbook. http://www.sign.ac.uk/guidelines/fulltext/50/index.html
(accessed Februray 1, 2011).
Silagy, C.A., P. Middelton, and S. Hopewell. 2002. Publishing protocols of systematic
reviews: Comparing what was done to what was planned. JAMA 287:2831–
2834.
Simmonds, M., J. Higginsa, L. Stewartb, J. Tierneyb, M. Clarke, and S. Thompson.
2005. Meta-analysis of individual patient data from randomized trials: A review
of methods used in practice. Clinical Trials 2(3):209.
Slone Survey. 2006. Patterns of medication use in the United States, 2006: A report
from the Slone Survey. http://www.bu.edu/slone/SloneSurvey/AnnualRpt/
SloneSurveyWebReport2006.pdf (accessed February 1, 2011).
Smith, G., M. Egger, and A. Phillips. 1997. Meta-analysis: Beyond the grand mean?
BMJ 315(7122):1610.
Smith, T., D. Spiegelhalter, and A. Thomas. 1995. Bayesian approaches to random-
effects meta-analysis: A comparative study. Statistics in Medicine 14(24):2685–
2699.
Song, F., S. Parekh-Bhurke, L. Hooper, Y. Loke, J. Ryder, A.J. Sutton, C.B. Hing, and
I. Harvey. 2009. Extent of publication bias in different categories of research
cohorts: a meta-analysis of empirical studies. BMC Medical Research Methodol-
ogy 9:79.
OCR for page 155
194 FINDING WHAT WORKS IN HEALTH CARE
Song, F., S. Parekh, L. Hooper, Y. K. Loke, J. Ryder, A. J. Sutton, C. Hing, C. S. Kwok,
C. Pang, and I. Harvey. 2010. Dissemination and publication of research findings:
an updated review of related biases. Health Technology Assessment 14(8).
Sterne, J., P. Jüni, K. Schulz, D. Altman, C. Bartlett, and M. Egger. 2002. Statistical
methods for assessing the influence of study characteristics on treatment effects
in ‘meta epidemiological’ research. Statistics in Medicine 21(11):1513–1524.
Stewart, L. 1995. Practical methodology of meta-analyses (overviews) using updated
individual patient data. Statistics in Medicine 14(19):2057–2079.
Sutton, A., and K. Abrams. 2001. Bayesian methods in meta-analysis and evidence
synthesis. Statistical Methods in Medical Research 10(4):277.
Sutton, A. J., and J. P. Higgins. 2008. Recent developments in meta-analysis. Statistics
in Medicine 27(5):625–650.
Sutton, A. J., K. R. Abams (Q: Abrams?), D. R. Jones, T. A. Sheldon, and F. Song. 2000.
Methods for meta-analysis in medical research, Wiley series in probability and statistics.
Chichester, U.K.: John Wiley & Sons.
Thompson, S., and J. Higgins. 2002. How should meta-regression analyses be under-
taken and interpreted? Statistics in Medicine 21(11):1559–1573.
Tierney, J., M. Clarke, and L. Stewart. 2000. Is there bias in the publication of indi -
vidual patient data meta-analyses? International Journal of Technology Assessment
in Health Care 16(02):657–667.
Turner, E. H., A. M. Matthews, E. Linardatos, R. A. Tell, and R. Rosenthal. 2008. Selec -
tive publication of antidepressant trials and its influence on apparent efficacy.
New England Journal of Medicine 358(3):252–260
USPSTF (U.S. Preventive Services Task Force). 2008. Grade definitions. http://www.
ahrq.gov/clinic/uspstf/grades.htm (accessed January 6, 2010).
Vogeli, C., A. Shields, T. Lee, T. Gibson, W. Marder, K. Weiss, and D. Blumenthal. 2007.
Multiple chronic conditions: Prevalence, health consequences, and implications
for quality, care management, and costs. Journal of General Internal Medicine
22(Suppl. 3):391–395.
Walker, E., A. V. Hernandez, and M. W. Kattan. 2008. Meta-analysis: Its strengths and
limitations. Cleveland Clinic Journal of Medicine 75(6):431–439.
Warn, D., S. Thompson, and D. Spiegelhalter. 2002. Bayesian random effects meta-
analysis of trials with binary outcomes: Methods for the absolute risk difference
and relative risk scales. Statistics in Medicine 21(11):1601–1623.
West, S., V. King, T. S. Carey, K. N. Lohr, N. McKoy, S. F. Sutton, and L. Lux. 2002.
Systems to rate the strength of scientific evidence. Evidence Report/Technology
Assessment No. 47 (prepared by the Research Triangle Institute–University of
North Carolina Evidence-based Practice Center under Contract No. 290-97-
0011). AHRQ Publication No. 02-E016:64–88.
West, S. L., G. Gartlehner, A. J. Mansfield, C. Poole, E. Tant, N. Lenfestey, L. J. Lux,
J. Amoozegar, S. C. Morton, T. C. Carey, M. Viswanathan, and K. N. Lohr.
2010. Comparative effectiveness review methods: Clinical heterogeneity. http://www.
effectivehealthcare.ahrq.gov/ehc/products/93/533/Clinical_Heteogeneity_
Revised_Report_FINAL%209-24-10.pdf (accessed September 28, 2010).