Inherent in the charge to the committee (refer to Box 1-1 in Chapter 1) was the question of the state of the knowledge base for systems of care for childbirth and how that knowledge base can be strengthened in ways that optimally bridge knowledge and decisions about policies and practices. This chapter addresses that question—an important one because such decisions will be most sound when they draw on a robust and comprehensive body of relevant information.
To that end, the first section of this chapter provides an overview of the strengths and limitations of data used to study clinical outcomes with respect to birth settings, including vital statistics and birth registry data. Next, the chapter examines methodological issues relevant for birth settings research; the types of questions that are salient in making decisions about policy and practice; and how, ideally, the different types and sources of high-quality evidence could best be used in concert to answer those questions. These questions require methods that make it possible to measure outcomes that move beyond traditional clinical variables and include perceptions of racism, disrespect and unequal treatment, women’s experiences of care, human-centered design, and patient-reported outcomes. Finally, we conclude with a discussion of two tools for grading the quality of evidence typical in this field.
In this section, we discuss the strengths and limitations of the data sources commonly used when conducting research on birth settings. These
data come primarily from the National Vital Statistics report, managed by the Centers for Disease Control and Prevention (CDC), and birth registries, which use an intention-to-treat model (i.e., models indicating intended birth setting) to track outcomes for women and infants. We discuss each of these data sources in turn below.
Of course, additional data sources could be used for research in this area. Such sources include linked birth certificate and hospital data, which add more valid information on many morbidities, as well as insurer data. The committee does not discuss these data sources in detail, however, because of the need to focus on those data sources that are most valuable for comparing outcomes across birth settings. Hospital discharge data, by definition, include only information on hospital births, and do not include information on birth center or home births. Thus, linked birth certificate and hospital discharge data are not a good data source for understanding variations in birth characteristics or outcomes by birth setting.
Vital Statistics Data
Vital statistics data from birth certificates provide information on each of the approximately 3.9 million births occurring in the United States each year, by place of birth (birth setting) and sociodemographic and limited medical characteristics (Centers for Disease Control and Prevention, 2017b; Martin et al., 2018a). These data have the benefit of being population based, reducing selection bias. Two types of data are potentially available from vital statistics: (1) information on the number and characteristics of birth, by birth setting (hospital, home, birth center); and (2) information on birth outcomes, including infant mortality, preterm births, and low birthweight, by birthplace. The strength of these data lies in providing a complete count of births by important characteristics, including birth setting. However, these data also have significant limitations regarding their use for birth settings research (Centers for Disease Control and Prevention, 2017b).
The question on the birth certificate on place of birth reports exclusively on the birth setting at the time of birth (Centers for Disease Control and Prevention, 2017b), and not on the intended setting at the onset of labor, potentially leading to misclassification bias. As a result, women who planned a home or birth center birth but were transferred to a hospital during labor and delivery are reported as hospital births in vital statistics data (MacDorman and Declercq, 2019), and identification of individuals who intended a home or birth center birth has been challenging. Instead, most researchers agree that analysis indicating intended birth setting (based on birth registry data, discussed below) is the methodologically most robust method for analyzing data by birth setting (Scarf et al., 2018). That is, as
elaborated below, it is essential that studies evaluate outcomes based on intended place of delivery and not actual place of birth.
Studies show that in the United States, approximately 11 percent of planned home births result in intrapartum transfers (Cheyney et al., 2014a) and that 16 percent of birth center births require movement to a higher level of care (Stapleton et al., 2013). It is important to note that complications in planned home or birth center births that occur prior to or during transfer to a hospital can lead to bias in estimates of outcomes. The result is that the number of planned home and birth center births reported in vital statistics data is an underestimate by the number of intrapartum transfers. The example of cesarean section makes this clear: when studies examine outcomes by actual rather than intended place of birth, the cesarean rates for planned home and birth center births appear to be zero, as cesareans occur only after transfer to a hospital. As a result, all cesareans are misattributed to a hospital sample when in reality they may have started as planned home or birth center births (MacDorman and Declercq, 2019). In addition, the outcomes of patient transfer may differ from those of births that occurred in their planned setting. Although most transfers are nonemergent and occur for such reasons as failure to progress and need for anesthesia (Vedam et al., 2014a), some are emergencies, and not being able to attribute these outcomes back to the intended place of birth could significantly bias estimates of outcomes by birth setting against hospitals.
For births that occur at home, moreover, information is available on the planning status of the birth (i.e., planned or unplanned home birth, or unknown whether planned) for all states except California, which accounts for one out of every eight U.S. births (Centers for Disease Control and Prevention, 2017b; MacDorman and Declercq, 2019). This means that planned and unplanned home births in California are combined in the overall home birth category. As a result, significant bias can occur when analyzing birth outcomes, since outcomes for unplanned home births are often less favorable given that the setting is unprepared for the birth (MacDorman and Declercq, 2019). Even in Oregon, the only state where planning status and intended location are both recorded on the birth certificate, it is impossible to distinguish between birth center or planned home births and planned unassisted births (or births without a trained attendant) (Snowden et al., 2015), which are known to have significantly higher rates of mortality than midwife-attended home births (Vogel, 2011).
On the other hand, there are also several biases in these studies favoring home births. First, some of the “unintended” home births may in fact be misclassified as noncertified midwife births in states where their practice is unregulated. In addition, home to hospital or birth center to hospital transfers and their outcomes accrue to the hospital. Moreover, as hospitals take care of all risk categories of women, the expected outcomes for hospital
birth should be worse than the outcomes for low-risk women in home and birth center settings. Some studies mitigate against this bias by choosing low-risk populations giving birth in hospitals and hospital births attended by midwives as the reference group.
For some biases, the direction is unclear. Studies on the effects of different types of midwives on outcomes, for example, are hampered by lack of data regarding who assumes care of the mother–infant dyad after transfer to the hospital. Data on death with congenital anomalies are difficult to interpret given that often studies lack granularity on the type of anomaly, and its lethality. Furthermore, fetal death, if divergent across groups, might contribute to selection bias of a healthier birth cohort for home births. These potential misclassification biases were discussed extensively by the committee; however, the committee was unable to reach consensus on the direction of bias given limitations in the availability of variables that would allow quantification of the number of planned unassisted births. Additional research and data are needed to better understand this question.
Additional limitations of birth certificate data for epidemiologic analysis have been widely discussed in the literature, and include the inability of birth certificates to provide information on clinical intentions, as well as concerns about the completeness, validity, and reliability of the reporting of specific data items (DiGuiseppe et al., 2002; Ananth, 2005; Cahill and Macones, 2006; Schoendorf and Branum, 2006; Cheyney et al., 2014a).
Data quality measures are generally good for most sociodemographic items reported on the birth certificate, as well as for place of birth, source of payment for the delivery, and basic medical variables such as birthweight, period of gestation, and mortality (DiGuiseppe et al., 2002; Zollinger et al., 2006; Vinikoor et al., 2010; Martin et al., 2013; Deitz et al., 2015). Although more recent studies are needed, studies based on older data suggest that the 5-minute Apgar score is reasonably well reported in vital statistics data (DiGuiseppe et al., 2002; Zollinger et al., 2006). In contrast, other studies have found that some items on birth certificates (such as attempted labor induction) are undercounted (Deitz et al., 2015; DiGuiseppe et al., 2002; Li et al., 2017; Martin et al., 2013; Vinikoor et al., 2010; Zollinger et al., 2006).
Other, more detailed medical variables, particularly those based on a checkbox on the birth certificate, may be less well reported. Data on hypertension and diabetes are of moderate to fair quality, with a tendency toward being underreported, although the quality of reporting of these data varies significantly by state (Ananth, 2005; Martin et al., 2013; Dietz et al., 2015). Regarding neonatal seizures and serious neurologic dysfunction, a recent study by Li and colleagues (2017) compares South Carolina birth certificate data from the 2003 birth certificate revision with hospital discharge and Medicaid data (Li et al., 2017). The authors found sensitivity
rates for birth certificate reporting of neonatal seizures or serious neurologic dysfunction of 7 percent and 0 percent for hospital and planned home births, respectively. Thus, despite improvements across revisions, U.S. birth certificates underreport or falsely report seizures, especially among home births.1 The authors conclude that “birth certificates alone should not be used to measure neonatal seizures or serious neurologic dysfunction.” Multiple sources of data, such as discharge summaries and Medicaid claims data, are needed to supplement birth certificate data to obtain an accurate understanding of seizure prevalence in all three U.S. birth settings. Despite these concerns, this variable has been used in studies on birth setting, given the concern over hypoxic ischemic encephalopathy in the out-of-hospital setting, where immediate access to surgical delivery may not be available (see, e.g., Cheng et al., 2013; Tilden et al., 2017).
The reliability of Apgar score = 0 for birth settings research has also been widely questioned. For example, Watterberg (2013) found large differences in reporting of Apgar score = 0 between physicians and home birth midwives, and they suggest that Apgar score <4 is a more robust measure for birth settings research.
In addition, some data items may be reported differently depending on the birth setting. For example, midwives in home and birth center settings may file birth certificates 10 days or more postpartum, while birth certificates for hospital births are typically filed 1–3 days following birth, depending on the mode of delivery (Zollinger et al., 2006; Li et al., 2017). This means that out-of-hospital midwives reporting on complications in the early postpartum period may report conditions over a longer period of time relative to hospital clerks. In addition, an analysis of vital statistics data conducted by Grünebaum and colleagues (2015a) found that midwives attending planned births in out-of-hospital settings assigned a significantly higher proportion of Apgar scores of 10 compared with midwives or physicians in the hospital setting, suggesting a bias toward higher Apgar scores outside of hospitals.
Taken together, these limitations mean that an analysis of birth outcomes by birth setting based on U.S. vital statistics data alone cannot be recommended. Yet these types of analyses are common in the literature (see Chapter 5).
1 Sensitivity, positive predictive values, false positive rates, and kappa values for neonatal seizure recording were, respectively, 7 percent, 66 percent, 34 percent, and 0.12 for the 2003 revision of birth certificates (547,177 hospital births from 2004 to 2013), and 5 percent, 33 percent, 67 percent, and 0.09 for the 1998 revision (396,776 hospital births from 1996 to 2003). Among 660 planned home births between 2004 and 2013 and 920 home births between 1996 and 2003, values were 0, 0, 100 percent, and −0.002, respectively (Li et al., 2017, p. 1047).
Birth Registry Data
Another data source for studies on outcomes by birth setting is birth registries that, by design, collect data indicating intended birth setting. These data also can be used to attribute outcome to provider type and level of care over the course of pregnancy, birth, and ideally early childhood (Stapleton, 2011; Cheyney et al., 2014b; Caughey and Cheyney, 2019).
In the United States, there are two birth registries: one curated by the Midwives Alliance of North America’s (MANA’s) Division of Research, called the MANA Statistics Project (MANA Stats) (Cheyney et al., 2014b); and another curated by the American Association of Birth Centers (AABC), called the Perinatal Data Registry (PDR; formerly called the Uniform Data Set, or UDS) (Stapleton, 2011). These datasets were validated by sampling a percentage of courses of care and comparing midwives’ entries into the registry with the medical record. The MANA Stats validation study found that variables were accurately entered by participants, as evidenced by the perfect or near perfect agreement among pre- and postreview variables (kappas ranging from 0.98 to 1.00; see Cheyney et al. 2014b), suggesting that any errors in this dataset are primarily random and not systematic for the outcomes assessed. Similarly, the validation study for the PDR (formerly UDS) found 97.1 percent agreement between the medical record and data entered into the online system (see Stapleton et al., 2013).
The MANA Stats and PDR datasets are the largest databases on midwife-led births occurring primarily in home or birth center settings in the United States. Both datasets are open to all practitioners attending births in all settings and include records from certified nurse midwives (CNMs), certified midwives (CMs), certified professional midwives (CPMs), and licensed midwives (LMs), as well as doctors of osteopathy (DOs), naturopathic doctors, and doctors of medicine (MDs). Records are submitted online voluntarily and capture perinatal and birth data, with nearly 200 variables collected across the prenatal and postnatal care periods (Cheyney et al., 2014b). Similarly, the PDR registry is a voluntary, online, comprehensive registry that contains perinatal data for use in AABC member centers (Stapleton, 2011).
In addition to MANA Stats and the PDR, some state or multistate networks, such as Perinatal Quality Collaboratives (PQCs), collect data on women and infants in order to improve perinatal care. PQCs are currently active in 32 states, 13 of which are actively working with the CDC (Henderson et al., 2018; see also Chapter 7). In addition, the American College of Obstetricians and Gynecologists (ACOG) is currently piloting a birth registry for maternal care that would allow providers and institutions to measure outcomes for and the quality of care they provide to women and infants (American College of Obstetricians and Gynecologists, n.d.).
A limitation of registry-based studies is that, unlike vital statistics, they are not population based, limiting their generalizability. In addition, participation in these registries is voluntary. The MANA Stats registry includes records from an estimated 20–30 percent of actively practicing CPMs (who attend most of the home births in the United States), with a substantially lower proportion of CNMs (who attend more hospital births) contributing. It captures about 20 percent of home births that occur in the United States annually (Cheyney et al., 2014b). As for birth centers, only 41 percent belong to the AABC, and less than 80 percent of members participate in the PDR online registry (Stapleton et al., 2013). These registries capture data primarily on home and birth center births, and thus, lack the ability to capture outcomes across concurrently collected hospital data. Finally, while many states have a PQC available, those states that are not part of the National Network of Perinatal Quality Collaboratives (NNPQC) created by the CDC are not obligated to share their information with other states, and participation of institutions within the state PQC is often voluntary (Henderson et al., 2018). Taken together, these limitations mean that an analysis of birth outcomes by birth setting based on U.S. registry data alone is not recommended, though these studies may be useful for describing within-group variation. For example, registry data may be useful in studying within-group maternal risk factors and birth outcomes (e.g., see Bovbjerg et al., 2017) or for the analysis of physiologic processes (such as length of third stage) that are less sensitive to selection bias. These limitations are similar to those inherent in vital statistics data. In summary, multiple data sources may have complimentary value in understanding the quality of care and outcomes associated with various birth settings.
In addition to the sources of bias associated with vital statistics and birth registry data, studies on birth settings are subject to potential confounding. As discussed in Chapters 3 and 4, differences exist in the demographic, cultural, social, and clinical characteristics among women who choose home or freestanding birth center births versus hospital births, and these differences have not been systematically measured (Caughey and Cheyney, 2019). Women choosing home or birth center birth may also differ in their desire to have an unmedicated vaginal birth or may be healthier overall than hospital patients, such that their need for medical interventions is reduced. Furthermore, many studies lack the statistical power to reliably evaluate rare outcomes (such as perinatal or maternal mortality), although most studies are adequately powered for common outcomes such as cesarean birth and neonatal intensive care unit (NICU) admissions (Caughey and Cheyney, 2019).
As a result, adequate data are not currently available with which to make comparisons across birth settings for the rarest of events, such as maternal mortality, where it would be necessary to control for maternal risk factors, provider type, and planned place of birth—impossible given the very small number of cases reported in the literature (e.g., the literature has reported just one maternal death following a home birth). In addition, the overall low number of home and birth center births in the United States (less than 2% of all births), the fragmentation of datasets (the PDR and MANA Stats, e.g., each separately collect data on a percentage of home and birth center births, and these datasets have yet to be merged), and the infrequency of mortality mean that samples are commonly combined—intrapartum and neonatal mortality or home and birth center, for instance. This improves power, but reduces the ability to attribute specific outcomes to specific birth settings and distinct provider types while controlling for confounders. Even with the collapsing of categories, confidence intervals are still often quite wide and estimates unstable. These are significant considerations that limited the committee’s ability to nuance comparisons of outcomes by birth setting in many instances. These challenges are discussed in more detail in the following chapter.
Vital statistics data are informative, but the inability to track intended place of birth or the planning status of home births in California’s records limits their utility in birth settings research. And while birth registry data allow indication of intended birth setting, the reporting of these data is generally voluntary, and the data are not collected nationally or across all birth settings.
Finding 5-1: Vital statistics and birth registry data each have limitations for evaluating birth outcomes by setting, provider types, and intentionality.
The utility of evidence is maximized when it is appropriately matched to the needs of those who will be using it, which can vary depending on the context and the primary purpose of the user. High-quality information is needed by communities, community leaders, maternity care providers, funders, and policy makers at the local, state, and federal levels. Decision makers and researchers can be called upon to generate and use evidence to inform decisions that apply at different levels of a system and even across different systems. In maternity care, the system at issue may be the health
care or public health or social service system, and the level may be national, regional, or local. Decision makers typically have a range of questions when attempting to address an important issue. In addition to information that describes the scope of the issue and research or evaluation evidence about the effectiveness of actions they might consider to address it, they frequently seek information with direct relevance to their local context. Such information may be related to implementation; costs; sustainability; specific characteristics of the community; and other factors that could impede or facilitate the success of an intervention, program, or policy (Institute of Medicine, 2010). Decision makers also need to contend with the interrelated nature of factors that affect the issue they hope to address. Evidence needs may vary accordingly, and an issue is typically best addressed when examined from multiple perspectives and with multiple forms of evidence.
Evidence that is useful to inform decisions about policies and practices related to maternity care may come from a variety of disciplines and sources of evidence using a diversity of methods. The following is one way of categorizing study designs and sources of evidence that can be useful for addressing questions relevant to maternity care research and decision making (Institute of Medicine, 2010):
- observational studies,
- qualitative research and analysis,
- mixed-methods studies,
- evidence synthesis methods,
- experimental studies, and
- quasi-experimental studies.
Across these categories, it is important to emphasize that each of these types of evidence has its own inherent quality standards, as detailed below. The value of any source as the “best” evidence is relative, depending on the decision-making context and the type of issue being addressed. It is counterproductive to expect any one type of evidence to be the best fit for all uses (Flay et al., 2005). To understand how an intervention works, for example, qualitative designs can be the most valuable and appropriate (MacKinnon, 2008). And to assess practitioner implementation or organizational adoption of a new practice, it can be useful to carry out longitudinal studies of quality improvement and of how organizational policies are implemented and enforced. The quality of any type of evidence synthesis hinges on consistency in identifying and appraising the quality of the research evidence included and in the care taken in secondary interpretation of findings across diverse studies.
Trade-offs are involved in considering the utility of any single type of evidence available to answer questions about such complex, multilevel
public health issues as maternal and infant health. Typically, no one study or type of evidence is sufficient to support decisions, and therefore the use of multiple types of evidence is often the best approach (Mercer et al., 2007). For situations in which the evidence is inadequate, incomplete, and/or inconsistent, the best available evidence can be interpreted through processes that bring tacit knowledge and the experience of professionals and other stakeholders to bear (Institute of Medicine, 2010). Table 5-1 provides a summary of research designs, types, and strengths and weaknesses of five common study designs.
TABLE 5-1 Typology of Research Designs
|Syntheses encompass meta-analyses that use statistical methods to pool results from a sample of existing experimental and/or quasi-experimental studies, and systematic reviews that provide organized summaries of a body of research addressing a focused question, whether using the same or multiple methodologies.|
|Types||Randomized controlled trials (RCTs)|
|Strengths and Weaknesses||RCT studies have the strengths of being able to measure the effectiveness of interventions and minimize bias. However, feasibility issues, ethical constraints, and variables that cannot be controlled make this design inapplicable for many studies (Institute of Medicine, 2008). RCTs are less well suited to studying effects of large-scale social programs and policies (West et al., 2008).|
|A design similar to experimental except the researcher cannot draw a causal conclusion because there is less than complete control over the variables in the study.|
|Types||Nonrandomized experimental studies (uncontrolled before-and-after studies, time-series designs, controlled before-and-after studies) (Grimshaw et al., 2000)|
|Strengths and Weaknesses||Quasi-experimental designs can be used when randomization is not feasible or ethical or when a sample size is too small for randomization. They have weaknesses similar to those of experimental designs, including regression effects (when the results overestimate or underestimate the effect of a treatment) and confoundinga (Goodwin, 2005; Harris et al., 2006).|
|The researcher assesses variables and relationships among them, but does not manipulate those variables or introduce any intervention. A large proportion of studies on birth settings have used study designs that fall into this category.|
|Types||Cross-sectional or longitudinal survey research, ethnographic studies, secondary analysis of existing databases, trend analysis, cohort and case-control studies, predictive studies, archival studies, census studies, monitoring and surveillance studies, ecological studies, implementation tracking, policy analysis|
|Strengths and Weaknesses||Observational studies have the benefit of allowing researchers to examine multiple outcomes. Weaknesses of observational studies include missing data, selection bias, confounding bias, and information bias (Boyko, 2013).|
|Qualitative researchers collect detailed information from individuals or groups and have the ability to use these responses to formulate grounded theory about a topic (Johnson and Onwuegbuzie, 2004).|
|Types||Ethnographic studies, focus group or key informant interviews, direct observation, content or documentary analysis, case studies, logic modeling or program theory analysis, process and implementation monitoring|
|Strengths and Weaknesses||Qualitative research designs allow researchers to assess the experiences and perceptions of respondents at a much richer level than is possible with quantitative methods. Questions and data-gathering techniques can be appropriately expanded or modified in response to what is learned as data are collected. A weakness of qualitative methods is that subjectivity can occur not only in the design and in the interpretation of the findings, but also in the role of the researcher as an instrument of the research process (Institute of Medicine, 2010).|
|Mixed-methods approaches are suitable for interpreting information across the boundaries of quantitative assessments, such as the prevalence of a condition, the statistical significance of an effect, or cost considerations, and assessments of the nature, process, or meaning of a condition, intervention, or outcome (O’Cathian, 2009; Wisdom et al., 2012).|
|Types||Complementary methodologies, typically including both qualitative and quantitative approaches to data gathering and analysis|
|Strengths and Weaknesses||Mixed-methods approaches allow researchers to conduct more in-depth analyses because some of the weaknesses present in both qualitative and quantitative research can be compensated. Weaknesses of mixed-methods approaches include added workload and the fact that using both methods will not completely erase the already known weaknesses found in qualitative and quantitative designs independently (Johnson and Onweugbuzie, 2004).|
aConfounding is defined as “distortion of the association between an exposure and an outcome due to the influence of another variable that is also associated with both” (Snowden et al., 2018, p. 724).
The most commonly used study design for birth settings is observational studies, because they have the benefit of allowing researchers to examine multiple outcomes (Boyko, 2013). Observational studies are case-control studies, cohort studies, or cross-sectional studies. Each of these study designs comes with strengths and weaknesses. Case-control and retrospective cohort studies allow researchers to study events and characteristics that were present in the past, while cross-sectional and prospective cohort studies allow researchers to study an event occurring in the future. Cohort studies have the advantage of examining multiple outcomes and can be used to calculate rates of exposure, but they require a large sample size, which can be difficult when studying rare outcomes, and they are susceptible to selection bias. Existing records can be used to measure multiple outcomes for case-control studies, which makes them easier to conduct. Unfortunately, they may contain recall bias and are difficult to validate while also not allowing control of extraneous variables (Song and Chung, 2010). Finally, cross-sectional study designs can be used to measure outcomes between groups, but are unable to measure the cause and effect of outcomes and do not require groups to be equivalent (Goodwin, 2005; Song and Chung, 2010). For more in-depth information on birthing experiences, researchers utilize mixed-methods designs or qualitative research. Mixed methods allow researchers to obtain evidence after the implementation of policies or programs to measure the intended or unintended effects.
Randomization and experimental controls are rarely used when studying birth settings and maternal and neonatal outcomes because it would be difficult to find women who would agree to be randomized to one or another birth setting (Hendrix et al., 2009). In addition, many randomized controlled trials (RCTs) used to study effects of intrapartum practices are limited to assessing outcomes that can be measured during the intrapartum phase of care and provide little meaningful information about additional outcomes of interest after this period (Institute of Medicine, 2008). This point is made clear by outcomes measured in the larger, more impactful RCTs included in a Cochrane systematic review of studies of the effects of intrapartum care. Just 16 percent of those RCTs made any measure of the infant after hospital discharge (Teune et al., 2013). Randomization has the potential to reduce selection bias, but this bias cannot be eliminated completely (Jadad and Enkin, 2008). When a study is designed, it is important to look at the characteristics of the participants and also the characteristics of those who do not wish to participate or are not eligible. Those who choose to participate may not be representative of the target population and thus may have an impact on the validity of the findings (Carmichael and Snowden, 2019).
Moreover, recent discourse on RCTs (and most prospective research), particularly in the health field, have noted the historic underrepresentation of people of color and people of low socioeconomic status in health
care trials. Although there is an imperative to increase enrollment of those women, it is also important to acknowledge that medical mistrust, mistrust of research, and access to engagement in care remain barriers that disproportionally impede participation of low-income people and people of color in prospective research (Geller et al., 2011).
Decision makers are often faced with an abundance of data and information that they must sift through to obtain the knowledge they seek. This process requires consistency in how the quality of available evidence is assessed. Although it can appear simplest to use a hierarchy among the types of evidence, in reality no single gold standard of evidence can be used to answer all types of questions; rather, what constitutes the “best” evidence depends on the question being asked, and how the evidence aligns with the user’s needs and interests and how relevant it is to the question at hand and its context. The quality of the evidence also needs to be judged according to established criteria and standards that are appropriate to the type of evidence. Each of the different sources of evidence described above can be linked to appropriate criteria for judging the quality of evidence, which can be found in the literature, although in all cases, high-quality evidence avoids bias, confounding, measurement error, and other threats to validity whenever possible (Institute of Medicine, 2010; Mercer et al., 2007). This section describes two common tools used for grading the quality of evidence in the birth settings literature: the Birth Place Research Quality (ResQu) Index and the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) approach. To facilitate the discussion of health outcomes by birth setting that appears in Chapter 5, the committee identified relevant literature and, using the ResQu Index and the GRADE approach, assessed the quality rating of each identified article. Chapter 6 summarizes the most recent literature available on outcomes by birth settings.
The committee chose to use both GRADE and the ResQu index because while GRADE is ubiquitous, and thus a highly accessible way of evaluating study quality, we also felt it was advisable to apply a scoring system that had been specifically designed for birth settings research, given the nature of our statement of task. Interestingly, this approach enabled us to see the impact on quality ratings when models indicating intended birth setting are weighted (ResQu Index) over RCT design (GRADE). Taken together, these tools enabled us to see and nuance the relative strengths and weaknesses of each study.
Recognizing varying standards for assessing the quality of study designs used to investigate outcomes by birth setting, the limitations of existing datasets for examining these outcomes, and the challenges of comparing outcomes across states or regions, an international panel of experts developed and validated the ResQu Index (Vedam et al., 2017). The ResQu Index allows researchers to evaluate the strength of studies related to birth settings and accounts for items that are critical to birth settings research but less relevant to other epidemiological studies, such as use of models indicating intended birth setting. This tool is used to assess study rigor based on 27 criteria specific to evaluating the effects of birth setting on maternal, fetal, and neonatal outcomes. These criteria are grouped into five categories: (1) quality of design, (2) definition of sample, (3) measurement of outcomes, (4) comparability of cohorts, and (5) accuracy of interpretation and reporting.2 Each criterion is scored, with a higher value indicating higher quality, and a summary score is then calculated to rate the quality of a study as high (scores of 75% and above), moderate (65–74%), or weak/low (less than 65%). Strengths of the ResQu Index include permitting re-
2 Each category contains subcategories that are scored with a rubric based on the information provided within the article. The criteria for quality of design include the following: provides a clear statement of the research question/objective; defines and describes each birth setting clearly (e.g., provider, facilities, location); indicates type of study design; defines key terms (e.g., low risk, outcome, mortality, morbidity, postpartum hemorrhage [PPH]) consistently, transparently, and appropriately (e.g., National Institute for Health and Care Excellence [NICE], American College of Obstetricians and Gynecologists [ACOG], or country-specific guidelines); and indicates ethics approval. The criteria for definition of sample (if relevant) include the following: distinguishes between planned home births with skilled attendants and free or unplanned births, includes sample size calculation, and uses reliable method of sampling and recruitment for each cohort. The criteria for measurement of outcomes include the following: gathers outcome data from reliable sources (e.g., medical records, registration data); identifies planned birth setting at time in pregnancy that is appropriate to selected outcome measures; provider type (for birth) is indicated, measured, and adjusted for in analysis; uses cohort size with appropriate power for selected outcomes being measured; uses reliable method to indicate changes of birth setting; indicates timing of transfer between birth settings in labor or postpartum period; applies reliable and appropriate statistical methods to compare outcomes between cohorts; and reports and minimizes missing data. The criteria for comparability of cohorts include the following: uses cohorts with comparable obstetric and sociodemographic characteristics; retains women in original birth setting cohort for analysis (intention to treat); provides consistent inclusion criteria; and controls for confounders, including sociodemographic and health profile. The criteria for accuracy of interpretation and reporting include the following: presents results of statistical comparisons clearly and effectively; bases discussion and conclusions on reported data; addresses impact of size of cohorts for each outcome measured; addresses impact of incomplete data; addresses impact of retrospective data; addresses effect of level of service integration between home, birth center, and hospital; and addresses impact of local/regional standards, policies, and protocols (Vedam et al., 2017).
searchers to exclude criteria rarely applicable to birth settings research, such as randomization and blinding, and facilitating collaboration among evaluators. In addition, comparisons of the interrater consistency and consensus of scores from different scorers showed considerable similarity in rating.
Development of quality assessment instruments inevitably involves subjective assessment particularly related to the selection and weighting of features. While careful attention was paid to multidisciplinary input and attempts were made to maximize content validity and consistency for this index, the relative emphasis on various study qualities is debatable; differential weighting would produce different scores. Application of the ResQu Index is, as with other all scales, subject to some individual interpretation. In addition, the ResQu Index was not designed to evaluate qualitative studies. The nature of its design in fact means that no qualitative study could be rated highly even though the developers themselves acknowledge that questions about safety and maternal experience by birth setting cannot and should not be answered by quantitative or statistical approaches alone (Vedam et al., 2017).
GRADE provides guidelines for grading the quality of evidence in health care literature. The grading process entails assessing the design, limitations, inconsistency, indirectness, imprecision, and publication bias of a study. The reviewer then summarizes the quality rating for each outcome and the estimate of the effect (e.g., risk ratio, odds ratio, or hazard ratio). The confidence in an estimate of the effect is used when the reviewer decides if the recommendations from the study can be supported based on the provided evidence (Guyatt et al., 2008; Meader et al., 2014). An important limitation to using the GRADE approach when reviewing birth settings research is that observational studies start as low-quality evidence because they are deemed to fail to develop and apply appropriate eligibility criteria, to have flawed measurement of both exposure and outcome, to fail to adequately control confounding, and to have incomplete follow-up (Schünemann, 2013).
Data and methodological limitations make the study of birth settings challenging. Decision makers need research and evidence that appropriately match their needs, but these needs vary greatly by context, and not all evidence is fit-for-purpose. Vital statistics and birth registry data each have limitations for evaluating birth outcomes by setting, provider types, and intentionality (Finding 5-1).
Modifications to birth certificate records, such as those adopted by Oregon, to include intended birth setting are important for improving
the usefulness of these data for birth settings research. The quality of birth settings research could be greatly improved if all states were to fully adopt this revised and updated version without modification. Additional improvements would include designation of planned attended and planned unassisted home births, and identification of the type of physician or midwife attending the birth.
CONCLUSION 5-1: Modifications to the birth certificate that allow inquiry into birth settings based on models indicating intended birth setting, including planned attended and planned unassisted home births in the United States and intended birth attendants, and development of best practices for use of these expanded data in birth settings research are needed to better understand and assess outcomes by birth settings.
Such changes would allow states to track the rates and associated outcomes of unassisted home births, which have been estimated to be on the rise, particularly in areas where vaginal birth after cesarean, vaginal breech, and vaginal twin births are not available in hospital settings (Holten and de Miranda, 2016). While the committee acknowledges that implementation of changes to birth certificate records is often slow, as evidenced by the prolonged time period required to see general use of the 2003 modification of the birth certificate, these changes are imperative to allow the conduct of accurate analyses by birthplace. Adoption of these changes at the state level may speed implementation; however, a piecemeal state-by-state approach is unlikely to yield needed, nationally comparable data.
Recognizing the strengths and limitations of the data and methodologies for birth settings research, we turn in the next chapter to examining what the current evidence base can reveal about health outcomes for women and infants by birth setting.