This chapter describes and critiques the proposed study design for the National Children’s Study (NCS), including the conceptual framework and its evolution from a hypothesis-driven and disease-specific approach to a data collection platform with a focus on health and development. It also describes overarching issues that drive the sample design, including two of the key topics the panel was asked to consider: the national probability sample’s overall sample size and design and the relative size of the prenatal and birth strata in the probability sample.
The chapter also provides the panel’s analysis of a third key topic the panel was asked to consider: the proposed uses of supplemental convenience samples to enroll nulliparous women for preconception data collection and to enroll additional populations to address targeted research questions. In this case, we considered the potential value of studying these populations against the overall cost, size, and scope of the probability sample.
Finally, this and subsequent chapters discuss a fourth key topic: strategies for the NCS to address health disparities in children effectively, a charge in the Children’s Health Act of 2000.
Although nationally representative birth cohort studies have a more than 50-year history, any study like the NCS that aspires to gather exposure and health data over a 21-year period faces difficult design decisions. Many of those decisions have to account for the rapidly changing nature of the relevant
sciences. New environmental dangers are discovered, as are new methods for assaying biological samples and characterizing childhood health conditions. Some of the hypotheses thought to be of greatest interest at the beginning of a study may bear little resemblance to hypotheses that emerge over the course of the study’s span of more than two decades. Furthermore, given an ambitious sample size and the likely billions of dollars of total cost, possible budget constraints are also an important concern for the NCS.
Architects of the current study plan for the NCS (Guttmacher et al., 2013) envision it as an ongoing data platform that will support a broad range of scientific discoveries related to the determinants of child health, growth, and development. Other scientific endeavors have provided such data platforms, with recent examples in genetics (the Human Genome Project), astronomy (the Hubble telescope), and particle physics (the Hadron collider). In the social and behavioral sciences, the content and open data policies of long-running national health studies, such as the National Longitudinal Study of Adolescent Health1 and the Health and Retirement Study,2 have facilitated discoveries by many social and behavioral scientists and clinicians.
A series of publicly available national-level birth cohort studies, beginning with the 1958 National Child Development Study in Britain, have also supported hundreds of research studies (e.g., Lawlor et al., 2009; Vrjheid et al., 2012). Design lessons for the NCS are also provided by birth cohort studies in the United States, such as the Fragile Families and Child Wellbeing Study3 and the Department of Education’s Early Childhood Longitudinal Study birth cohort sample,4 as well as studies based on “convenience” rather than representative samples of birth cohorts and longitudinal studies in other countries (see Golding, 2008).
In considering its assigned topics regarding the NCS study design, the panel found it useful to draw from many of these ongoing studies to delineate design principles and guidelines that would optimize the scientific value of a longitudinal birth cohort study of child health and development such as the NCS (e.g., Golding, 2008; Olsen, 2012). Many of these principles mirror key elements of the current NCS study design (detailed in Guttmacher et al., 2013). This chapter lists these design principles and discusses their implications for the NCS study design. It then discusses design features, supplemental samples, and health disparities.
3For a description, see http://www.fragilefamilies.princeton.edu/about.asp [March 2014].
We begin our discussion with the principles that bear directly on key elements of the design, sample, and content of the NCS:
- A scientific framework that encompasses current and anticipates future domains of high-priority scientific inquiry is needed to guide key study design elements, such as the target population, the sampling strategy, and the schedule and content of data collection.
- Scientifically robust exemplar hypotheses are needed to guide sample design and early-wave data collection, while decisions about data to be collected in later waves should leave room to take into account new hypotheses that emerge over the course of the study.
- A probability sample ensures that results generalize to the population from which the sample is drawn.
- A large stratified national sample in which all children have an approximately equal chance of selection supports multiple goals. For the NCS, these include estimating relationships between exposures and health outcomes, analyzing health disparities, and attaining representation of children in key demographic and geographic subgroups roughly in proportion to their representation in the population.
- As large a sample size as possible within budget constraints is needed to provide statistical power for current and future scientific discoveries.
- Scientific quality is enhanced by using the most valid and standardized data collection methods that are feasible, while maintaining sufficient flexibility to assess emerging domains of scientific inquiry.
- The study design needs to be as cost effective and as efficient for its key purposes as possible.
- Scientific discovery is enhanced when the potential for future innovations in measurement is incorporated into the study. In the case of the NCS, this argues for collecting and storing biological and environmental samples in ways that make them available for future investigations.
- Discoveries related to health conditions are facilitated by a dynamic conception of health and disease, which calls for measuring health status, disease conditions, symptoms, and behaviors rather than just existing disease categories.
- Discovery is facilitated if data are released as early and as completely as possible, with due regard for the protection of confidentiality.
- Transdisciplinary discovery and statistical sophistication are enhanced when all relevant scientific expertise is integrated into the project management structure.
Key design features can be derived from the principles above. This section covers the first nine above; the last two, on data release and project management, are covered in Chapters 4 and 6, respectively.
Current understanding of the determinants of children’s health and development (e.g., National Research Council and Institute of Medicine, 2000, 2004) and an informed consideration of the likely future trajectory of scientific discovery need to underlie the study. A relatively strong consensus has developed during the past decade about key scientific issues in the field (e.g., Cohen Hubal et al., 2013; Landrigan and Miodovnik, 2011). First, child health, growth, and development is a product of biological factors and a diverse set of environmental influences, including intrauterine and social influences, which implies that high-quality measures of multiple dimensions of both sets of influences need to be taken during appropriate developmental periods.
Second, emerging research on the early origins of future health points to the importance of early postnatal, prenatal, and even preconception conditions, which implies concentrating data collection in the early years relative to the later years of the study. These paradigms of developmental biology and life-course epidemiology, coupled with insights from a number of social and behavioral sciences, should guide development of the NCS study design.
Public health goals are an essential first step in framing hypotheses for a major study, and while public health goals are not explicitly stated by the NCS, according to documentation received by the panel, the NCS is informed by a public health perspective. For example, the approach to identifying the specific domains of health and development (NICHD, 2013b, pp. 29-31, 2013d, pp. 26-29) identify a large number of highly relevant developmental stages, symptoms, and conditions that would include most problems of public health significance. Second, there is a long list of environmental exposures, chemical and socioeconomic, that, if associated with child health and developmental problems, would clearly signal potential for prevention. In addition, public health significance is the first criterion by which data collection will be prioritized (NICHD, 2013d, p. 33, 46-47). Finally, as NCS notes, if realized, the phenotype/health profile approach would facilitate examining conditions over time when coding and diagnostic practices may change (e.g., asking whether a child has autism next year would give a different prevalence than 3 years ago with the shift from the DSM-IV to DSM-V), but using a set of symptoms and treatments, one could define the condition according to whatever diagnostic rubric was being used.
Another important context for the scientific framework is recognition of
the stark nature of persistent differences in health between disadvantaged and more advantaged groups. This issue was acknowledged by Congress, which mandated in the 2000 Children’s Health Act that the NCS be designed to “consider health disparities among children.” This issue is addressed in more detail below and in Chapter 3.
RECOMMENDATION 2-1: The scientific framework for the National Children’s Study should be based on current understanding of the determinants of children’s health and development and an informed consideration of the likely future trajectory of scientific discovery. The paradigms of developmental biology and life-course epidemiology, coupled with findings from other social and behavioral sciences research on the prenatal and early life periods, should guide development of the design for the Main Study.
The process of using a scientific framework and context to guide decisions regarding the NCS study design, sampling frame, and data collection protocols can be facilitated by identification of scientifically robust exemplar hypotheses. In the case of the NCS, this means hypotheses that encompass current and anticipate future scientific inquiry concerning high-priority environmental factors and child health and development outcomes, while accounting for potential confounding and effect modification. It also means a need to assess nonpersistent environmental exposures that could occur during periods of developmental plasticity and vulnerability with effects manifested at later stages of development. The NCS Program Office provided exemplar hypotheses in some documents (e.g., NICHD, 2013d, pp. 45-46).
Although some in the scientific community have argued that hypotheses must be specified in advance of undertaking a research project to ensure scientific integrity (e.g., Paneth, 2013), the panel recognizes that it is not possible to anticipate all possible scientific hypotheses that could be addressed over the time span of a long-term study. Yet exemplar hypotheses are critical for guiding sample design and early waves of data collection, while later waves of data collection can also be guided by hypotheses that emerge over the course of the study. Using exemplar hypotheses to guide study design development rather than attempting to develop an exhaustive list of detailed hypotheses is also consistent with the concept of the NCS as a study platform insofar as the study should not be designed only to address specific hypotheses of current interest. Although the number of exemplar hypotheses does not have to be extensive, each must be scientifically robust in order to guide development of the study design. (These issues are addressed in more detail in Chapter 4.)
National Equal Probability of Selection Design and a Stratified Sample
Optimal sample designs are a product of a host of potentially conflicting priorities (Michael and O’Muircheartaigh, 2008). The objectives to study many universal in-the-body biological processes may place minimal constraints on sample selection because the mechanisms may be essentially common to all humans. In contrast, estimating associations between children’s health and their cumulative exposures to various physical or social conditions requires samples that provide considerable variation in and covariation among the outcomes and exposures of interest. Furthermore, evaluating health disparities across various population subgroups and describing the prevalence of exposures and health conditions in the population argue for a probability sample that can be statistically weighted to represent the population from which it is drawn. This goal led the NCS Program Office to opt for a national probability sample in the current design, an element that received strong endorsement from the previous study review (National Research Council and Institute of Medicine, 2008) and which we also endorse.
A key design dimension of probability samples is whether all individuals in the population are sampled at similar rates, which is also part of the current NCS design. A large stratified equal probability sample can help ensure that important population subgroups will be represented in the sample in roughly the same proportion as they are in the general population. This dimension is applicable for any major subgroup considered as a stratum in the sampling design. An advantage of this design is that it produces nearly optimal precision for estimating population means or proportions for the population as a whole, and is adequate for important population subgroups, at least at the start of data collection. In addition, because the premise of the currently proposed design is that the population subgroups of interest for the NCS are unknown and cannot be predicted, the fall-back position is to avoid oversampling any one subgroup to minimize the potential harm by necessarily reducing the sample size for some other subgroup. The analysis of new exposures and subpopulations of interest that arise over the course of the study requires a design that provides variation in both known and not-yet-discovered exposures, which is most likely with the current design’s proposed equal probability of selection national probability sample.
A possible rationale for unequal selection probabilities arises from the NCS’s charge to address health disparities, which by definition involve disadvantaged population subgroups. But, as discussed below and in Chapter 3, we believe that the large size of the main NCS sample, along with careful stratification, will provide sufficient statistical power to investigate health disparities
across the major domains and categories of interest (such as race and ethnicity and socioeconomic status).5
Size of the Sample
Determining the optimal sample size is one of the most critical decisions for any large scale epidemiological or population study. As noted above, this decision should be made in the context of the scientific framework and facilitated by the use of exemplar hypotheses that can be used to develop estimates of minimum detectable effect sizes and statistical power. The current and anticipated future scientific issues related to children’s health and development to be addressed by the NCS involve complex etiologies with covariation and interaction between multiple factors that can vary over time. Some exposures and health outcomes of great importance may be relatively rare, while others may be common but complex. In order to address the range and complexity of issues that potentially should be addressed by the NCS, the study sample size should be as large as possible to provide statistical power for future scientific discoveries.
Relative Value of Preconception, Prenatal, and Postnatal Data
A substantial literature, documented in the justification submitted by the NCS Program Office to the earlier review (National Research Council and Institute of Medicine, 2008), identifies prenatal infection, psychosocial factors, and environmental exposures as major contributors to child health that are in need of further study. The state of the art in longitudinal birth cohort studies is to begin data collection in early pregnancy: starting later risks bias from retrospective recall and an inability to measure transient, nonpersistent prenatal exposures from environmental or biological samples collected after birth. It also limits the ability to evaluate the role of the intrauterine environment as determined by various obstetrical conditions, such as hypertension, diabetes, or fetal growth restriction. The current NCS Program Office design is to enroll roughly half of its sample prenatally and half at birth.
Several major national birth cohort studies have enrolled women during the prenatal period and collected biological specimens at multiple points during pregnancy, including Generation R, a large population-based Dutch cohort (Jaddoe et al., 2012); the Norwegian Mother and Child Cohort Study, which
5Adjustment of sampling weights to reflect nonresponse and a host of more technical sampling issues will mean that after 21 years, sample members will not have equal weights. (See Chapter 3 for a more detailed discussion.) The most noteworthy example comes from the NCS’s plan to include siblings in the probability sample, which roughly doubles the selection probabilities of second and later children born to the same woman during the birth window.
enrolled 108,500 children (Magnus et al., 2006); the Danish National Birth Cohort, which enrolled 100,000 families (Olsen et al., 2001); and the Japan Environment and Children’s Study.6 Multiple reviews of these studies have emphasized the value of prenatal specimen and data collection (e.g., Landrigan et al., 2006). The original NCS strategy of attempting to enroll families during the prenatal and possibly preconception periods was considered to be a strength by the earlier panel (National Research Council and Institute of Medicine, 2008).
Despite scientific consensus on the importance of beginning data collection during the prenatal period, the NCS Program Office did not provide a scientific rationale or a convincing financial argument to support the change in design to enroll fully one-half of the probability sample at birth. Furthermore, the panel found problematic the proposed approach of collecting environmental information through retrospective recall in interviews and assuming that samples collected during the immediate postnatal period can be used to characterize the pregnancy environment. (These issues are discussed in Chapter 4.)
The NCS Program Office justified its decision to split the cohort enrollment into prenatal and birth strata on the experiences of the Vanguard sites and the large incremental costs associated with the prenatal recruitment, enrollment, and data collection. Documents provided by the NCS Program Office stated that each prenatal enrollment and associated data collection would cost an additional $10,000 per recruited woman relative to a birth enrollment (NICHD, 2013b, p. 25). When asked to provide more information on this estimate, however, the NCS Program Office responded that the estimate was incorrect, but it did not provide a correction. Nor did the NCS Program Office provide a sufficient justification to the panel’s request as to why this particular cost-saving strategy (splitting into prenatal and birth cohorts) was selected over other possible strategies. (Chapter 5 discusses the fielding cost implications of several alternative designs that would increase the size of the prenatal sample without significantly affecting the total budget.)
Considering the scientific framework and goal of treating the study as a platform for future research, the NCS should attempt to enroll as many participants as feasible during the prenatal period and to collect prenatal data as well as biological and environmental specimens from them. Enrollment at the time of birth should be limited to women who do not receive prenatal care or otherwise do not have a chance of selection through prenatal providers.
RECOMMENDATION 2-2: In order to facilitate scientific discovery during and after National Children’s Study data are gathered, the Main Study should use a national probability sample with the largest feasible sample
size and an approximately equal probability of selection design, and it should recruit nearly all of the cohort as early in pregnancy as possible.
Quality, Standardization, and Flexibility in Data Collection Methods
A very large birth cohort study such as the NCS provides a unique opportunity to validate or confirm findings reported from smaller and more focused epidemiological studies, as well as to pool data with other large epidemiological studies to identify determinants of very rare conditions (such as specific childhood cancers). The NCS Program Office has engaged with the World Health Organization and investigators leading other national birth cohort studies in an effort to harmonize or coordinate the use of data collection methods to facilitate future pooling of data. To realize the potential of these opportunities, the NCS must use high-quality, well-validated, and standardized study methods and instruments. At the same time, the study will need to incorporate strategies to develop and validate new data collection methods to be able to address future domains of scientific inquiry. It may also be necessary to revise and shorten standardized instruments to reduce overall respondent burden while measuring many domains. Finally, the NCS needs sophisticated and well-developed computerized systems for data collection and management.
Quality control is key to the success of the NCS and adherence to protocols is critical. In the early Vanguard Study, NCS used a contractor to assure conformity in training of interviewers, instruments, and other aspects of the collection. NCS stated that there is also a plan to engage an independent quality control contractor as part of the Main Study (NICHD, 2013d, p. 55), and expanded briefly on that in NICHD (2013f, p. 5), saying that its experience is that “an independent assessment can improve quality, even with extensive quality control built into the process.” In NICHD (2013d, p. 40), NCS also acknowledges the importance of high quality training for field staff. Cost effectiveness and efficiency are also key metrics of a high quality study. Little or no cost information relevant to the Vanguard Study was provided to the panel. However, the cost information for the multiple components of the Vanguard Study may not be directly relevant to evaluating future costs because the Vanguard Study spent significant time and resources to conduct large scale pilot testing of multiple recruitment strategies and data collection protocols. The Vanguard Study has or should yield relevant information for designing the Main Study, although some additional pilot testing may be needed to address gaps in information. The panel concluded generally that the NCS should not have to undertake complete full scale pilot testing of the recruitment strategies and data collection protocol on the scale of the prior recruitment pilots. Additional pilot testing that may be needed to address gaps should be designed to focus on obtaining the specific needed information using the most cost effective approaches.
In its commendable plans to coordinate data collected in the NCS and
other birth cohort studies, the NCS Program Office will benefit from guidance from its advisory committees, other governmental groups, and the scientific community on reaching the optimal balance between widely used standardized instruments and new or revised instruments. The NCS will also use the Vanguard Study to develop and validate data collection methods and instruments prior to their use in the Main Study.
An important strategy to maintain flexibility to address future scientific issues and to anticipate future innovations in measurement is for the NCS to collect and archive biological and environmental samples in ways that make them available for future investigations. The methods used to collect, process, and store samples should maximize the potential future use of analytical approaches (e.g., proteomics, metabolomics, genomics, transcriptomics), particularly when considering that some of these samples may be stored for decades. Archiving samples is also essential to reduce overall study costs, since future analyses can use nested case-control or case-cohort designs to limit the total number of samples that would have to be analyzed. Current NCS plans are compatible with this important design feature.
Essential in any data collection strategy is a robust system for tracking the information gathered, rapid coding of assessments to facilitate dissemination, and appropriate mechanisms for preserving and archiving biological and environmental samples for future use. To the extent feasible, the NCS should build on robust data management systems that have been developed for other large data collection efforts.
RECOMMENDATION 2-3: In order to facilitate scientific discovery during and after National Children’s Study (NCS) data are gathered, the Main Study should use valid and standardized data collection measures and methods, while maintaining flexibility to revise or develop new instruments. The NCS should also use state-of-the-art procedures to collect, archive, and provide access to biological and environmental specimens for future analyses.
Dynamic Conceptual Framework
The earlier report (National Research Council and Institute of Medicine, 2008) expressed concern that there was no apparent overarching conceptual framework for health and development to tie the study together. In response, the current plan describes a detailed conceptualization of health and development. The breadth of the conceptualization would encompass most of the issues affecting child health and development and provide many dimensions that could be linked to environmental exposures, which should facilitate scientific discovery. This issue is discussed in more detail in Chapter 4.
A key component of the new conceptualization is that rather than measur-
ing specific diagnoses or syndromes, the Program Office plans to collect more detailed data on health status to allow future researchers more flexibility in defining health and disease phenotypes. The panel agrees that the flexibility to use data to generate a variety of phenotypes, rather than focusing on specific diagnoses, seems promising. However, as described in Chapter 4, the panel was not able to judge the overall merits of these new approaches because important details on the operationalization and effectiveness of these new approaches were not provided by the NCS Program Office.
RECOMMENDATION 2-4: The proposed strategy for the National Children’s Study Main Study to collect detailed data on children’s health status, conditions, symptoms, and behaviors should be followed to the extent possible, taking into account constraints of costs, operational feasibility, and the need to not overburden respondents.
Given the scientific value of the largest possible national probability sample, the panel carefully considered NCS’s current plan to allocate a portion of its total sample to supplemental or “convenience” samples.7 Specifically, the NCS Program Office proposes a probability sample of 90,000 births and supplemental samples of 10,000 births and seeks advice about the optimal composition of the 10,000. The plans as of October 25, 2013, were as follows (NICHD, 2003d, pp. 64-65):
The only certain use of the [supplemental/convenience] sample is to enroll a cohort of preconception women enriched for those who are nulliparous to perform a preconception data collection visit and with the intent of scheduling a data collection visit as early in pregnancy as feasible during the first trimester. In the current proposal about 5,000 of a projected 10,000 births would be reserved for the preconception cohort. The use of the remaining 5,000 would not be defined until the Primary Sampling Units in the national probability sample are identified and characterized. Some part of the sample could be used for specific exposures that are of high scientific interest and public health value that were not included in the national probability sample. For example if none of the locations were located in an area that had fracking and there was sufficient interest and a scientific need based on a survey of other research efforts to collect data on possible exposures that occur near fracking sites, a location near a hospital with birthing services could be identified as a supplemental recruitment center.
7We prefer the term “supplemental” to “convenience” since some of the proposed samples could be drawn using probability sampling methods.
Other uses of the remaining 5,000 mentioned in the NCS briefing documents include subpopulations likely to experience disparities in health outcomes and not adequately represented in the 90,000-birth main sample; populations exposed to natural disasters, such as hurricanes or industrial accidents; and siblings born to mothers of enrolled children whose birth date occurs after the study’s 4-year recruitment period. We consider each of these possible uses of the supplemental sample in turn.
Preconception Sample for First Births
Given the emerging scientific importance of prenatal and even preconception conditions for later health and development, the potential value of NCS information on preconception exposures could be quite high. The NCS Program Office proposes that the main probability sample include both its targeted births plus roughly 8,000 siblings of the targeted children born later during the 4-year birth window (see details in Chapter 3). Exposure information gathered before and after the birth of the target child provides preconception exposure data on the subsequent sibling birth. The panel strongly endorses this proposed sibling component of the main sample, in part because of the value of the preconception exposure data it will provide.
A preconception sample for first births is potentially valuable since preconception exposure information on first births cannot be gathered in the subsequent sibling portion of the main sample. According to the NCS Program Office, women at risk of becoming pregnant for the first time would likely be enrolled in the NCS supplemental sample through the recruitment of health care providers that offer health care services to nulliparous women. Working through these providers, the NCS would draw a convenience sample of the women most likely to become pregnant. These women would have an initial screening and home visit, during which environmental samples would be taken. Women would be followed at 3-month intervals by telephone:8 if a woman becomes pregnant, she would be followed, using the prenatal and postnatal protocols in the Main Study. The Program Office believes that about 20,000 women would need to be recruited in this way to generate 5,000 first births.
In addition to the fact that preconception exposure information will be gathered for an estimated 8,000 siblings in the main sample who are born after initially enrolled subjects, the potential scientific value of the current plan for an additional preconception sample of first births would be not be high for two reasons: the proposed sample would likely not be representative of all first births, and it would incur high costs, including the costs of in-home interviews with four times as many women as are expected to eventually become
8A woman would be followed for as long as a pregnancy might lead to a birth during the 4-year birth window associated with the probability sample.
pregnant.9 Given this mixture of benefits and drawbacks, the panel believes it is important to begin its analysis with an evaluation of the scientific case for including the first-birth preconception sample as part of the NCS sample.
Although fetal environments of first versus subsequent births may differ, the effects of preconception exposures to persistent environmental agents (i.e., exposures that persist between the preconception and prenatal periods) can be analyzed using the prenatal environmental information gathered for first births that occur in the main sample. The effects of important preconception exposures, whether persistent or not, that have similar effects on first and subsequent births can be analyzed using preconception data gathered from the sibling sample.
Consequently, a first-birth preconception sample provides uniquely valuable data only in the case of transitory preconception exposures that affect first births differently than subsequent births. None of the materials the Program Office provided to the panel referred to research showing such possible interactions. Thus, while preconception exposure information on first births may have potential to add scientific value to the NCS study, prior research provides no examples of such a value, and many of the possible links between preconception exposures and child outcomes can be investigated with the data to be gathered in the probability sample.
The panel also has a number of concerns about the design of the preconception sample and data collection. We note first that no details about provider and participant selection were provided to the panel, rendering a careful analysis of NCS plans impossible. For example, there was no mention of using probability sampling methods to select the nulliparous women. Even if the health care providers cannot be selected at random from the set of all providers, it would still be important to use probability sampling to select women receiving care from these providers. Given the need to develop and possibly pilot test10 entirely separate recruitment and data collection protocols for the proposed preconception cohort and the hoped-for mid-2015 starting date for the study, the panel does not believe it is feasible to prepare the preconception sample for inclusion in the Main Study.
Second, the panel worries that insufficient steps would be taken to recruit nulliparous women who do not seek routine health care. Such women are important to include because they are most likely to be members of disadvantaged groups. Moreover, these women are most likely to be exposed to unhealthy environmental conditions of greatest concern for the NCS.
9See Appendix B for details on the likely field costs associated with a preconception sample.
10Pilot testing of recruitment of nonpregnant women through provider offices was done in the provider-based recruitment component of the Vanguard Study alternative recruitment pilot (see Chapter 1), but it was mostly limited to prenatal care provider offices and targeted nonpregnant women at high likelihood of becoming pregnant.
Third, since little is known about the types of factors or time periods prior to pregnancy that might be the most important, a study might need repeated and fairly extensive data collections to address this issue. Indeed, the original NCS strategy was to enroll nonpregnant women at high or moderate “risk of pregnancy” (National Research Council and Institute of Medicine, 2008) and collect preconception data multiple times for the women identified as high risk. The merits of the current NCS plan (one in-person data collection visit per woman, see NICHD, 2013d, p. 65) are not clear to the panel, and the NCS Program Office did not provide a scientific justification for this design decision.
The preconception data could be biased if, for example, environmental factors that potentially affect child health outcomes could also affect fecundity or time to pregnancy. Alternatively, fecundity could be an intermediate factor between an environmental factor and child health outcomes. There could be multiple complex causal pathways between the preconception environment and child health outcomes. It is doubtful that one preconception data collection of 5,000 women at varying times prior to pregnancy would be adequate to analyze these complex pathways.
Finally, the cost of the preconception sample of births is much higher than the cost of births in the probability sample. As detailed in Appendix B, the main reason for higher costs is that about 20,000 nulliparous women must be recruited and interviewed in their homes to yield 5,000 first births. Our cost analysis illustrates the opportunity costs of the preconception sample by showing that eliminating the preconception first birth sample would enable almost complete prenatal, rather than the currently planned half prenatal and half postnatal, recruitment of women and children in the main sample.
RECOMMENDATION 2-5: While the panel appreciates the potential scientific value of gathering preconception exposure information on 5,000 first-birth children as part of the National Children’s Study Main Study, this supplemental sample should be dropped because of high costs, the lack of any evidence of the value of such a sample, the lack of detailed plans for both selection and analysis, and potential limitations in the proposed data collection schedule.
Supplemental Samples to Address Targeted Research Questions
The panel did not find any value of using the supplemental samples for the NCS’s other stated purposes—namely, for populations living in geographic areas with possible exposures from conditions such as fracking, populations exposed to natural disasters such as hurricanes, younger siblings of enrolled children born outside the birth window, and augmented numbers of minority groups of interest for health disparities research. In part, this assessment reflects
the lack of detail from the NCS Program Office about the rationale for such supplemental groups and operational details.
In the case of the geographic exposure sample, the Program Office provided the panel with only the most general description of its plans and none of the details needed to evaluate them. This is problematic for many reasons. For example, the panel sees no reason why the locations for the geographic exposure sample cannot be identified in advance and included as strata in the Main Study probability sample. In addition, the needed coordination of sample design and study staffing between the probability sample and geographic exposure samples dictates that sample selection and recruitment for the geographic sample should begin at roughly the same time as sample selection and recruitment for the Main Study. Given the study’s expected mid-2015 start date and that the Program Office has not yet identified the specific geographic exposures of interest that it would target, the panel fails to see how data collection for the geographic exposure sample could coincide with the data collection for the probability sample.
Moreover, the panel did not find any justification for devoting a portion of the NCS’s sample to enrolling women in areas affected by meteorological, industrial, or other events of interest. In this case, the NCS would not be able to enroll women with births in the same geographic areas prior to the event, precluding scientifically strong pre- to post-event comparisons. Second, given the time it takes to set up sampling and interviewing mechanisms, the time between the event and actual data collection may be long. Third, for many events, a substantial proportion of affected women may have moved away from the affected area. And fourth, if post-event-only studies are to be conducted, the public availability of NCS instrumentation and other study protocols will make it possible for special studies to be mounted that focus more specifically on gauging the likely aftermath of the event on children.
A third possible use for convenience samples is to enroll younger siblings born outside of the 4-year birth window. We estimate that an expansion of the birth window from 4 to 7 years for these younger siblings only in each primary sampling unit would roughly double the number of siblings enrolled in the study (after accounting for an expected 20 percent attrition) to 18,000. However, given the plans to recruit about 8,000 siblings as part of the Main Study, the panel judges that the likely advantages of additional siblings do not outweigh the opportunity costs that expansion would entail, since resources needed to capture the additional 10,000 births could instead be used to accomplish other study goals, such as expanding the prenatal sample.
A final proposed use of the supplemental samples is to facilitate the investigation of subpopulations of interest for research on health disparities. The NCS’s large probability sample should be sufficient to generate substantial numbers of children in the largest demographic groups commonly used in such research, as well as for subgroups (such as socioeconomic categories within
race and ethnic groups) necessary to properly investigate the extent of health disparities (see detailed discussion in Chapter 3).
RECOMMENDATION 2-6: The supplemental convenience samples proposed for the National Children’s Study Main Study should be dropped from the design, including samples of children exposed to natural disasters or geographically defined environmental exposures, samples of additional members of disadvantaged groups, and samples of siblings born outside the 4-year birth window. The potential added value of the supplemental sample cases is less than the value of the additional cases in the probability sample they would replace, specifically, the value of the additional prenatal cases in the probability sample.
Health disparities, defined as “systematic, plausibly avoidable health differences adversely affecting socially disadvantaged groups” (Braveman et al., 2011), exist for numerous health conditions and across people’s lifespans. There are numerous examples of stark disparities for children, including: African American infant mortality rates are 2.5 times higher than white infant mortality rates (Hamilton et al., 2013); asthma prevalence for children is 2.4 times higher for Puerto Ricans, 1.6 times higher for African Americans, and 1.3 times higher for American Indian and Alaskan natives than for whites (Akinbami et al., 2009); and poor children are almost twice as likely as nonpoor children to have a serious health limitation (Seith and Isakson, 2011). The reduction and elimination of health disparities has been identified as an important goal by government agencies (e.g., in both Healthy People 2010 and Healthy People 2020, produced by the Centers for Disease Control and Prevention), nonprofit groups, and community representatives.
Despite repeated documentation of child health disparities for many conditions, important questions remain about their fundamental causes. In response, the 2000 Children’s Health Act directed that the NCS be designed to consider health disparities. The earlier report (National Research Council and Institute of Medicine, 2008, pp. 37-39) identified a number of deficiencies in the NCS’s original approach to health disparities, including; (1) the decision to use equal probability sampling, which may lead to insufficient sample sizes for some racial, ethnic, and language minorities for some analyses; (2) low response rates in areas, such as inner cities, that are traditionally hard to survey and that will reduce effective sample sizes for disadvantaged groups relative to other groups; (3) lack of attention to generating data on how individuals from different groups may interact with health systems, a factor whose importance has been suggested in many previous studies; and (4) the absence of virtually any hypotheses about racial and ethnic disparities. The earlier study summarized
its concern as follows: “[w]hile the study will gather a great deal of information that is relevant to understanding such disparities, the research design was not informed by a concern with understanding their basis” (p. 5).
The NCS Program Office outlined several responses to these critiques (see NICHD, 2013b, 2013d). First, with regard to equal probability sampling, the Program Office noted that 90,000 of the 100,000 Main Study’s cohort children would be drawn from a representative sample of hospitals and birthing centers, which collectively cover about 99 percent of U.S. births. By implication, subgroups of interest should be enrolled in the probability sample. Even in the absence of oversampling, the main NCS sample will contain thousands of children who belong to subgroups that constitute only a few percent of the overall U.S. population of births (see details in Chapter 3). Second, with regard to attrition, the Program Office noted (NICHD, 2013b, p. 10) that “The early Vanguard Study data indicate that a provider-based recruitment model demonstrates better response rates and retention rates than alternate models. In addition, NCS continues to invest resources in a comprehensive retention plan as called for in the 2008 IOM report.” Third and fourth, in keeping with the NCS’s belief that its scope “should be limited only by scientific creativity and not by current consensus priorities” (quoted in Guttmacher et al., 2013, p. 1873), the current study plan does not include any specific health disparity questions or hypotheses and does not address the concern about how different groups interact with health systems.
In summary, the NCS’s approach to health disparities consists of four prongs: (1) ensure that populations of interest for health disparities research are adequately represented in the sample by including them in the probability sample and possibly using a portion of the planned 10,000 special sample for supplemental coverage of those populations; (2) ensure that information about the demographic and other characteristics that define these populations is gathered in the core NCS questionnaire; (3) ensure that exposures important for understanding health disparities are measured; and (4) devote resources to retain as many participants as possible in the Main Study.
Although the panel agrees that the large sample size and the comprehensive assessment of health determinants and health outcomes that is planned in the NCS will allow researchers to investigate many important health disparity questions, the relevance of health disparities to children and society, as well as the high importance of this topic to the NCS, requires that the NCS take special steps to ensure that the sample is adequate for addressing these questions. The panel’s detailed analysis of these issues is provided in Chapter 3.