5
The Value of Sibling and Other Relational” Data for Biodemography and Genetic Epidemiology

George P.Vogler

A fundamental characteristic of many traits in human populations is that variability occurs. This is true for qualitative traits such as presence or absence of disease as well as for quantitative traits that are characterized by a continuous range of variability in the population. From the perspective of evolutionary theory, it is desirable to maintain the broadest range of opportunities to respond to varying environmental circumstances in order to optimize the likelihood of survival in the face of potentially rapid fluctuations in environmental circumstances. From the perspective of public health policy, individual differences resulting from genetically based sources of variability make it desirable to move towards a more individual-based perspective on recommendations or intervention strategies, in contrast to a focus on the potential effect on the population mean. While an intervention may impact the population at large in a desirable direction, its impact on individuals might consist of nonrandom variability in response, with some individuals responding in a positive direction, other individuals not being influenced by the intervention, and other individuals actually being harmed. Individual differences in response are expected to be generally observed in a variety of contexts of environmental agents, including social or pharmacological interventions, nutritional factors, or exposure to toxicological substances. Research that advances this perspective can best be approached using study designs that are a hybrid between population-based survey methods and those that are informative from a genetic perspective.

Individual variability results, at a minimum, from environmental fac-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? 5 The Value of Sibling and Other Relational” Data for Biodemography and Genetic Epidemiology George P.Vogler A fundamental characteristic of many traits in human populations is that variability occurs. This is true for qualitative traits such as presence or absence of disease as well as for quantitative traits that are characterized by a continuous range of variability in the population. From the perspective of evolutionary theory, it is desirable to maintain the broadest range of opportunities to respond to varying environmental circumstances in order to optimize the likelihood of survival in the face of potentially rapid fluctuations in environmental circumstances. From the perspective of public health policy, individual differences resulting from genetically based sources of variability make it desirable to move towards a more individual-based perspective on recommendations or intervention strategies, in contrast to a focus on the potential effect on the population mean. While an intervention may impact the population at large in a desirable direction, its impact on individuals might consist of nonrandom variability in response, with some individuals responding in a positive direction, other individuals not being influenced by the intervention, and other individuals actually being harmed. Individual differences in response are expected to be generally observed in a variety of contexts of environmental agents, including social or pharmacological interventions, nutritional factors, or exposure to toxicological substances. Research that advances this perspective can best be approached using study designs that are a hybrid between population-based survey methods and those that are informative from a genetic perspective. Individual variability results, at a minimum, from environmental fac-

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? tors (the focus of epidemiologists) and genetic factors (the focus of geneticists). An important step in the process of developing an understanding of the impact of risk factors on traits of health relevance is to identify the individual environmental and genetic risk factors. The approaches taken by epidemiologists to detect effects of exposure to environmental risk factors and by geneticists to detect the effects of genetic loci are very different. Yet an integrated approach is required to be able to examine the impact of multiple effects, genetic and environmental, in a comprehensive manner. Genetic epidemiology incorporates the perspective of several different disciplines. However, much research that is considered to be genetic epidemiology is geared towards characterizing genetic effects in the context of a complex system rather than fully integrating information about the effects of both genetic and environmental influences on health-related outcomes. While a number of definitions of the field of genetic epidemiology have been advanced (summarized by Khoury et al., 1993), a sense of this emphasis on genetics is evident in the statement on the overleaf of each issue of the journal Genetic Epidemiology, the official publication of the International Genetic Epidemiology Society: A peer-reviewed record and forum for discussion of research on the distribution and determinants of human disease, with emphasis on possible familial and hereditary factors as revealed by genetic, molecular, and epidemiological investigations. More complete integration of genetically informative study designs with designs that address issues of interest in epidemiology and demography can be useful from several perspectives. To address questions about specific risk factors, an integrated design can permit some degree of control over other sources of variability that exist in a population-based sample. Certain issues of bias that arise because of sampling can be addressed using within-family controls. Large-scale survey data can be used to select an optimally powerful subsample for undertaking detailed molecular studies that focus on highly specific research issues (such as a particular disease or trait). And perhaps most significantly, a truly integrated approach would provide the opportunity to model genetic and environmental influences as they co-act and interact to produce the trait as it exists as a complete phenotype rather than focusing on isolated sub-systems. Because the traits of interest in aging research are observed in late life, options for incorporating genetically informative extensions to survey research are limited. Family study designs that use parent-offspring information are impractical because parents of index cases are likely to be deceased and children of index cases may not yet be old enough to express

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? age-related traits of interest. Extended pedigrees and adoption studies that rely on the contrast between adoptees and biological and/or adoptive parents have a similar limitation. Twin studies are useful when a study is designed at the outset as a twin study, but are less useful when the base sample is not specifically composed of twins. Study designs that incorporate measures on siblings have emerged as a very useful tool, particularly in genetic designs for detection of the influence of specific genes. This is encouraging for considering hybrid designs for incorporating genetic information into survey research designs in aging since siblings are the most practical group to consider for recruitment into extensions of population-based surveys. In the following sections, sources of variability are considered in the context of genetic and environmental influences. The informativeness of sibling data in addressing issues of variability in this context is explored. SOURCES OF VARIABILITY FOR QUANTITATIVE TRAITS The theoretical basis of genetic and environmental influences on a trait was developed from a number of perspectives, with Fisher (1918) being arguably among the most influential. The basic model is that the trait, or phenotype (P), is a function of genetic (G) and environmental (E) influences. These may consist of major factors—major genes that follow a Mendelian pattern of inheritance in the case of genetic factors or major qualitative exposure to risk factors in the case of environment. Alternatively, a quantitative model can be considered in which there are multiple genetic and environmental influences, each of which has a vanishingly small impact on the phenotype. These result in a quantitative distribution of a trait. Models of major factors are generally most applicable in the case of diseases that occur at a relatively low frequency in the population. The quantitative model is generally more applicable to complex traits such as relatively common chronic disease or variability that falls within the normal range for a quantitative trait. The theory underlying the quantitative model is based on the assumption that there are an infinitely large number of individual factors affecting the trait. In reality, a quantitative distribution is approached relatively quickly with a finite number of factors (even as few as seven, for example) of modest impact. Consequently, much work in study design is now focused on the identification of specific factors, such as individual genetic loci, that impact a trait but only account for a relatively small proportion of the total variance. Identification of specific measurable genetic and environmental factors opens up new opportunities to develop models of interacting influences among the multiple influences.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? Genetic Variance Major Effects Mendelian inheritance of major gene effects has played a significant role in medical genetics. As of January 24, 2000, there were 11,123 entries catalogued in Online Mendelian Inheritance in Man (OMIM, 2000). In general, study designs that are appropriate for survey research are not optimal for investigating major gene effects because such traits tend to occur at a very rare frequency in the population. As a result, major gene effects can be more efficiently investigated using sampling strategies that sample through ascertainment of probands and their relatives in studies that are designed for the purpose of exploring specific traits. Some sampling strategies that are designed to optimize the ability to identify major genes are not designed to address population parameters of interest to demographers and epidemiologists. In general, traits that are relevant for aging-related research are likely to exhibit genetic influences that are more complex than single major genes or are likely to occur at too rare a frequency to be a major focus of population-based survey research. Polygenic Variance Extensive theoretical development has occurred in quantitative genetics over nearly a century. Summaries of this work are very accessible (e.g., Falconer and Mackay, 1996 and earlier editions; Lynch and Walsh, 1998). The basis of the quantitative genetic model is that variability in traits arises from the cumulative action of an infinitely large number of influences of vanishingly small effect. A wide assortment of sophisticated models has been developed to decompose a population’s phenotypic variance into its components. These models permit the distinction between genetic components and environmental components. Classical approaches in animal studies use patterns of means and variances in inbred strains and derived generations primarily to investigate polygenic influences (additive, dominant, and epistatic effects). Classical approaches in human studies use models of covariance among relatives (twins, families, extended pedigrees, and adoptions) to investigate both genetic effects and, in some designs, cultural transmission effects (Cloninger et al., 1979a, 1979b; Rice et al., 1978). It is important to note that these models consider latent, unobserved components of variance that are not directly measurable, although a recent trend is to merge these approaches with gene mapping studies to incorporate the effect of individual quantitative trait loci (QTLs). There is no theoretical reason why such approaches cannot also be extended to incorporate the effect of individual measured environmental factors.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? A Compromise Approach For any given trait, the true genetic architecture may consist of a finite number of genes, each of which accounts for a modest percent of the variation in the trait (modest might be as low as a few percent or as great as perhaps twenty percent). Recognition that this is likely to be the true state of affairs for many complex phenotypes drives one current thrust in human genetics to emphasize the development of new sampling procedures and statistical techniques and the initiation of large-scale data collection. These efforts are designed to permit detection of loci of relatively small individual impact on complex traits, especially chronic disease traits. Latent Effects Variance decomposition models conceptualize genetic effects as latent, unobservable factors that are assessed as components of variation within a population. This approach is useful for addressing the initial question of whether there are indeed genetic influences worth pursuing at a more molecular level for a phenotype. Even in many of the more refined models that are designed to detect the influence of QTLs, the gene itself is not necessarily directly measured but rather is detected as a factor linked to a DNA marker that is not likely to be part of a functional gene. Variance decomposition models still have an important role in genetic investigations. However, the power of such approaches is limited in the ability to adequately include effects such as epistatic interactions, gene-environment correlation or interaction, loci of small effect, developmental effects, and numerous other effects that might be operating in the context of genetic influences in a complex system. Measurable Effects The idea of incorporating measured genotypic effects into models of complex traits is not new (Boerwinkle et al., 1986; 1987). However, the explosion of information that is becoming available regarding molecular markers and specific genes, along with the acceleration of available information on specific loci from the Human Genome Project, has intensified interest in this area. There has been a rapid increase in interest in general models of complex traits that explicitly incorporate information on individual loci (Almasy and Blangero, 1998; Amos, 1994; Fulker and Cherny, 1996; Vogler et al., 1997). We are beginning to see application of such an approach to diverse phenotypes such as Alzheimer’s disease (reviewed by Lovestone, 1999), gene-environment interactions in asthma (Cookson,

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? 1999), and risk for colorectal cancer (Le Marchand, 1999). Further research along these lines will result in an enhanced ability to incorporate measurable influences into a complex-system model that includes more detailed models of specific genetic effects interacting with other genetic and environmental effects. Shared Genetic Effects One feature of quantitative genetic theory is the provision of detailed expectations regarding the resemblance of genetic factors among various pairs of relatives. This underlying theory provides the basis for using patterns of covariation among relatives to infer the existence of genetic influences. Features of common study designs that exploit this approach are outlined in a later section. Environmental Variance It is beyond the scope of this chapter to provide a detailed summary of environmental sources of variance. This section provides a synopsis of environmental effects as they are conceptualized and incorporated into quantitative genetic models. Major Effects Major environmental influences can be characterized as qualitative variables such as exposure or nonexposure to an environmental risk factor. This kind of major environmental factor can be readily incorporated into quantitative genetic models either as a covariate or as a variable that distinguishes among multiple subgroups that can be compared for differences in covariance structure (e.g., Sörbom, 1974). It is important to ensure that stratification by a major environmental factor is not confounded with stratification on genetic factors for this approach to work. It can be used to address questions of genotype-environment interaction (see Kang and Gauch, 1995, for an overview). Polyenvironmental Variance The term polyenvironmental influence is not common but is used here to draw attention to the analogy between polygenic variance and environmental variance that is due to the impact of numerous small environmental effects. In this model, environmental influences can result in continuous quantitative variability in phenotypic expression. In quantitative genetic models, the “environment” includes systematic environmental

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? effects that are shared by family members to varying degrees; true environmental influences that are unique to the individual; and random effects including true random variability, measurement error, and other sources of error (Falconer and Mackay, 1996; Fisher, 1918). A Compromise Approach It is difficult to propose a general model of environmental influences. Unlike with genetic influences, there is not a generally accepted comprehensive theoretical model. Any pattern of environmental effects is likely to be highly phenotype-specific. While any general model needs to include random variability in a measurement model, true environmental effects could consist of any combination of random effects, nonrandom quantitative effects, and/or major effects. Latent Effects As with genetic effects, the quantitative genetic model incorporates environmental influences as latent effects that can be either unique to the individual or shared by family members. As with latent genetic effects, there is limited power to consider models of complex phenomena beyond a simple additive model. Measurable Effects The incorporation of measurable environmental effects into quantitative genetic models is not particularly difficult. Collaboration between quantitative geneticists and environmentalists is key to developing such integrated models (Bronfenbrenner and Ceci, 1993; Horowitz, 1993; Wachs, 1993). Models that incorporate an index of multiple environmental influences were developed by Rao et al. (1982) to describe variability of risk factors for cardiovascular disease. When incorporating measures of the environment, it is important to select measures that are truly environmental and free of genetic influence and at the same time are of relevance to the phenotype of interest. In practice, identification of such environmental measures presents a challenge. Even factors that at first glance seem to be obvious environmental factors often, upon further examination, have complex relationships with genetically influenced factors. For example, smoking might be considered to be an obvious environmental influence on health, yet there is extensive evidence of a significant heritable component to smoking behavior (e.g., Heath et al., 1995; Pomerleau, 1995). Indicators of socioeconomic status are intertwined in a complex manner with genetic influences (Behrman et al., 1980; Vogler and Fulker,

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? 1983). Even accidents may be related to genetically influenced personality attributes that contribute to variability in risk-taking behavior. Collaboration among geneticists, epidemiologists, demographers, and sociologists is essential to overcome this challenge. Shared Environmental Effects Study designs that incorporate groups of relatives will include environmental influences that are shared by relatives. Some study designs are more effective in distinguishing shared genetic effects from shared environmental effects. All study designs rely on a fairly strong set of assumptions regarding environmental influences. In contrast to quantitative genetics, there is no comprehensive theoretical model of environmental sources of familial resemblance that is generally applicable to any phenotype. Familial environmental effects can arise through a variety of mechanisms, including common within-family cultural effects, sibling shared environment, twin shared environment, parent-child cultural transmission, interaction effects among family members, environmental mechanisms of assortative mating, prenatal effects, etc. Some work has been done on developing models of cultural transmission (Karlin, 1979a-d; Cloninger et al., 1979a, 1979b; Rice et al., 1978; Vogler and Fulker, 1983), but the lack of a unified comprehensive model of environmental influence remains an issue. SOURCES OF RESEMBLANCE AMONG FAMILY MEMBERS The most practical extension to population-based survey methods in aging research is likely to involve the incorporation of siblings. However, it is important to briefly summarize alternative study designs to put sibling studies into the appropriate context. Twin Studies Classical twin studies remain a very useful design for decomposition of phenotypic variance into components (see Christensen in this volume). Twins provide the opportunity to differentiate between additive genetic influences, either shared environmental influences or genetic dominance, and unshared environmental influences. Twin studies are moderately easy to conduct, provide a regular data structure that has analytic advantages, and represent a relatively powerful design in terms of information content for statistical inference. A critical assumption that is difficult to test is that monozygotic and dizygotic twins share environmental influences to an equal extent. If this assumption is violated, then estimates of

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? genetic influences will be biased. A second key assumption is that assortative mating is absent. The only option in traditional twin studies is to ignore assortative mating since there is no information in the data regarding this phenomenon. In the past, zygosity misclassification has been a source of bias in twin studies, but the opportunity to determine zygosity basically without error using molecular genetics has eliminated this as a problem. However, questions regarding the generalizability or representativeness of twin studies can be raised. Nuclear Family Studies Nuclear family studies consist of the traditional constellation of parents and their biological offspring. This is an effective design to demonstrate familial resemblance, but it is not possible to separate additive genetic effects from shared environmental effects based solely on phenotypic resemblance among family members (Rice et al., 1978). It assumes that the phenotypic measures in the parental generation are the same as the phenotypic measures in the offspring—a questionable assumption for any trait that shows either substantial developmental change or cohort effects. Data are relatively easy to obtain and there are few questions regarding generalizability or representativeness of samples of nuclear families, although the traditional nuclear family structure has become less representative of population family structure. The spouse correlation can be assessed and explicitly included in a model of familial resemblance, although it may be difficult or impossible to distinguish among competing models of assortative mating without further specialized data that specifically focus on spouse resemblance (e.g., Cloninger, 1980; Heath and Eaves, 1985; Carey, 1986). A model of cultural transmission may be incorporated, but it is a challenge to distinguish among competing models of cultural transmission in the absence of a well-developed theoretical expectation (Cloninger et al., 1979a). It is not possible to distinguish between shared genetic influences and shared family environmental influences in nuclear families without measuring at least one of these influences. The irregular data structure presents surmountable challenges for data analysis (Lange et al., 1976). Nonpaternity is a potential source of error. Adoption Studies Adoption studies feature a genetic relationship between biological parents and the adoptee and an environmental relationship between adoptive parents and the adoptee. The partial adoption design includes the adoptee and adoptive parents; the full adoption design includes the

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? adoptee, adoptive parents, and biological parents. The assumption of no selective placement is critical and a potential problem (Hardy-Brown et al., 1980; Ho et al., 1979). The representativeness of adoptive relationships is also subject to question. The key advantage of adoption studies is that they provide a direct and powerful test of the distinction between genetic influences and shared environmental influences. Potentially powerful information about prenatal effects can also be obtained from the full adoption design. The key disadvantage of adoption designs is the increasing difficulty of obtaining a sample. Serious issues of sample representativeness are also a problem, since adoptive parents tend to be older and of higher socioeconomic status whereas biological parents tend to be younger and of lower socioeconomic status. Other Designs Twins reared apart combine features of twin studies and adoption studies. Twins and their parents combine features of nuclear family and twin studies (Jencks et al., 1972; Eaves et al., 1978; Fulker, 1982). Twins and their offspring combine features of nuclear families, twin studies, and extended pedigrees (Nance and Corey, 1976). Adoption studies that include biological offspring of the adoptive parents provide greater resolution of the information contained in both adoption and nuclear family studies (Plomin and DeFries, 1983). Siblings For quantitative genetic models of variance decomposition, sibling studies represent one of the weakest designs available. Sibs share both genetic and environmental influences that are completely confounded as latent variables. While there is little convincing evidence of strong sibling-shared environmental effects (Turkheimer and Waldron, 2000), particularly in the context of aging research, it remains unacceptable simply to assume that all sibling resemblance results from genetic effects. Why, then, have sibling samples become one of the predominant designs in human genetics? The key is that there has been a fundamental shift in the question that is being addressed in human genetic studies. Convincing demonstration of genetically based variability is no longer the central issue. Instead, a primary issue is now the identification of specific genes. In this context, sibling covariation that is nongenetic in origin merely becomes irrelevant to the primary purpose of gene identification. Thus, a major role for sibling studies is in gene mapping investigations.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? GENE MAPPING The opportunity to merge advances in molecular genetic technology with advances in statistical techniques expanded in earnest with the development of DNA markers such as restriction fragment length polymorphisms (Lander and Botstein, 1989). Research exploded in the past decade with the continued refinement of molecular technology yielding a variety of DNA markers—e.g., short tandem repeats (STRs) or microsatellites; variable number of tandem repeats (VNTRs); single nucleotide polymorpohisms (SNPs), and gene expression microarrays or gene chips. A genetic marker is a measurable polymorphic sequence of DNA whose chromosomal location is known. Markers often have no known functional significance but are used as pointers to a particular chromosomal location. The logic of gene mapping technology is simple: Determine if there is a relationship between variability in a phenotype and variability in an anonymous DNA marker of known chromosomal location. If there is a relationship, it is taken as evidence that there is a gene that influences the trait at or near the marker. Simply to look for an association between a marker and a phenotype in a sample drawn from the population requires that the marker and trait gene be in linkage disequilibrium, which is the preferential occurrence of a trait allele in association with a specific marker allele at the population level. Furthermore, association should not be the result of other phenomena, such as population stratification (discussed in detail under “Association Studies”), that are unrelated to genetic causality. A more general case is that a marker and trait locus are close together on a chromosome but are not necessarily in linkage disequilibrium at the population level. To detect linkage without assuming population linkage disequilibrium between marker and trait loci, family data are used. Since there is only a single meiosis between one generation and the next, linkage disequilibrium is expected within families for a linked marker and trait locus even if the population is in equilibrium. Linkage Analysis of Pedigree Data Classical LOD score linkage analysis of pedigree data (Ott, 1991) has been the method of choice for gene mapping of Mendelian major factors for disease loci. Linkage analysis is a parametric approach that generally requires specification of mode of inheritance for the major gene. Since a major gene model is not appropriate for many of the complex traits of relevance for aging, linkage analysis of pedigree data is not an appropriate strategy, even though this approach is highly sophisticated and has been extensively refined over the years. Furthermore, it is difficult to

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? is that very large samples are required to localize genes of small effect in a population. Page et al. (1998) estimate that 2,000–20,000 sib pairs are required to map loci that explain 10 percent of the total phenotypic variance. One method that has been advanced as an efficient strategy to boost the power to detect loci of small effect is selective sampling. Selection of sib pairs who are discordant from the extremes of the population phenotypic distribution is a highly effective way to boost power from the perspective of the number of subjects who need to be genotyped (Eaves and Meyer, 1994; Gu et al., 1996; Risch and Zhang, 1996). Other approaches that can be employed to boost power are the use of multivariate phenotypes (repeated measures or correlated phenotypes) and the use of sibships of greater than two (Schork, 1993). Each of these approaches can be readily implemented into a general model for the analysis of multiple phenotypes in an arbitrary sibship structure (Vogler et al., 1997). Sib extensions of large-scale survey research projects can be highly effective for gene mapping studies of complex traits. Large samples of individuals with phenotypic information provide an ideal sample from which selected sib pairs can be drawn from the extremes of the distribution for further genetic investigations. This strategy works only for highly focused genetic studies. The ability to genotype selected subsamples will be lost for exploratory studies with multiple independent phenotypes under investigation, since a greater and greater portion of the population will be sampled as the number of phenotypes increases. Rich survey data sets can provide superb opportunities for analysis of multivariate phenotypes. Finally, if a subject is willing to provide contact information on siblings, he or she is likely to provide information on multiple sibs if they are available for study. ASSOCIATION STUDIES In theory, complete knowledge of the entire human genome generated by the Human Genome Project should make it possible simply to measure those genes that are predicted to affect a phenotype in a sample of unrelated individuals, making relative-pair-based methods of analysis obsolete. It is potentially possible simply to include all of the relevant genotypes (and environmental influences) in a large model, such as a regression analysis or a neural network model, using individuals from the population. This approach of demonstrating association between a particular genotype and a trait is likely to predominate in the near future. However, the most effective data structure will not be merely a sample of unrelated individuals.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? Generic Association Using Candidate Genes or Tightly Linked Markers One strategy for association studies is to look for an association between a candidate gene, selected on the basis of an understanding of the mechanism of action of the candidate gene, and a reasonable hypothesis that a particular gene of known function is likely to affect the target phenotype. An association can occur if the gene is in fact directly affecting the phenotype or if the gene does not have a direct effect on the trait but is in linkage disequilibrium with another nearby gene. Linkage disequilibrium can arise from recent mutations, migrations, population bottlenecks, or other sources (Lander and Schork, 1994). Because recombination will break the linkage between a marker and trait, only those loci that are very close with limited opportunity for recombination will be cotransmitted from generation to generation over a period of time, retaining linkage at the level of the population. Consequently, in addition to functional candidate genes, it is possible to use anonymous DNA markers that are tightly linked to functional genes in the context of association studies. An association can be detected in a population sample using a simple case-control design for qualitative traits or a test of mean differences as a function of genotype for a quantitative trait. While the association approach may be most appropriate for complex traits (Risch and Merikangas, 1996), it should be noted that statistical association can arise for reasons unrelated to physical linkage between a marker locus and a disease locus (Ewens and Spielman, 1995; Lander and Schork, 1994). Founder effects (the high frequency of an allele in a population founded by a small ancestral group in which one or more founders carried the specific allele by chance) or genetic drift (random fluctuations in allele frequencies usually due to finite sample size) can cause variation in allele frequencies at marker loci among subdivisions of a population (Slatkin, 1991). If a trait locus is present at a higher frequency in a subgroup of the population that has a higher frequency of a particular allele at a marker locus, then a meaningless association will be present even if the marker and trait locus are completely unlinked. The potential for spurious associations to exist in population-based surveys with large samples is particularly acute. This is because different population subgroups (such as ethnic groups) that differ with regard to both disease frequency and marker allele frequency will be represented in a population-derived sample of individuals. Recent admixture of two populations that differ in marker allele frequencies and trait locus allele frequencies will also exhibit spurious associations (Ewens and Spielman, 1995) until a sufficient number of generations of random mating and recombination move the population to a new equilibrium. Population-based survey research that involves the investi-

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? gation of a population in which substantial migration has resulted in admixture within the past few generations is likely to have spurious associations arising from this phenomenon. If a case-control study is conducted on a structured population in which stratification occurs and the focus is on selected candidate loci, it is possible to test for stratification by typing a panel of additional markers that are unlinked to the candidate loci (Pritchard and Rosenberg, 1999). However, this approach detects but does not control for stratification. It is applicable to the case where a small number of candidate loci are investigated but is difficult to apply for more exploratory approaches. Transmission/Disequilibrium Tests An approach that has received considerable attention and undergone considerable recent methodological development uses within-family tests of association in order to control for population stratification (Falk and Rubenstein, 1987; Knapp et al., 1993; Ott, 1989; Schaid, 1996; Spielman et al., 1993; Spielman and Ewens, 1996; Thomson, 1995). This transmission/ disequilibrium test (TDT), initially developed for application to traits such as insulin-dependent diabetes mellitus (Spielman et al., 1993), uses parents who provide genotypic data and who are heterozygous for marker alleles, and at least one affected child who provides genotypic and phenotypic data. The logic of the TDT test is that spurious associations due to population structure will not be present in transmission of marker alleles from one generation to the next within individual families. Genotypic data on marker alleles are obtained for parents and affected offspring, and a test is conducted to determine if the frequency of transmission of a marker allele to an affected child deviates significantly from the expected frequency of transmission in the absence of linkage. While the TDT tests were originally developed and extended for application to qualitative disease traits, recent extensions have been made to accommodate continuously distributed quantitative traits (Allison, 1997; Rabinowitz, 1997). George et al. (1999) propose a regression-based test for transmission/disequilibrium that permits a more general arbitrary pedigree structure and nonindependence of observations by allowing for a residual correlational structure among the observations. This is an important step in the development of techniques that can be applied to complex traits. Sibling Transmission/Disequilibrium Tests Family-based control techniques that require data on parents and offspring are not well suited to investigate issues in aging since data on

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? parents cannot generally be obtained. A number of procedures have been developed that employ the logic of the TDT for within-family control but that use sibling data and do not require genotypic information on parents. Curtis (1997) and Boehnke and Langefeld (1998) describe tests of association based on a discordant-sib-pair design in which marker alleles are counted in affected and unaffected members of discordant sib pairs and contrasted with the expectation, under no marker-disease association, of equal allele frequencies in affected and unaffected sibs. Spielman and Ewens (1998) describe a similar approach and present a method for combining data that can be analyzed only by the sib TDT test with data that have parental information that can be analyzed using the original TDT test. As with the parent-offspring transmission-based TDT tests, the sibling-based tests have been extended to the analysis of quantitative traits. Allison et al. (1999) propose a method for the analysis of joint tests of linkage and association with quantitative traits using sibling data. Fulker et al. (1999) incorporate a sib-pair-based test of association into a variance-components procedure for mapping QTLs in sib pairs. This approach is generalized further by Abecasis et al. (2000) to the analysis of quantitative traits in nuclear families of arbitrary size that can optionally incorporate information on parental genotypes. To summarize, methodological developments in the use of siblings to map the effects of specific loci using linkage-based or association-based models represent the cutting edge of quantitative analytical tools for the analysis of complex traits. These methods are extremely relevant for the analysis of complex traits in aging research. One of the great potential uses of these models is in incorporating environmental assessment into the models in a manner similar to how genetic marker information has been incorporated into the genetic aspects of the models. ENVIRONMENT “MAPPING” The analogy between mapping specific genetic effects and specific environmental effects is not perfect; we do not have an environmental analogy of the genetic map with markers pointing to positions of tightly linked factors for environmental influences, but we can specify candidate environmental agents analogous to specifying candidate genes. It might be possible to stretch the analogy of linked loci to shared family environmental influences that could be analyzed using some environmental analogy of linkage analysis, but its applicability would be limited to a very small number of special situations of marginal interest. However, certain features of genetic models of QTL influences could be developed more extensively for the environmental side of an integrated model of genetic and environmental effects.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? The issue of spurious association due to population stratification or admixture is just as relevant for measured environmental influences as for genetic influences (e.g., Lilienfeld and Stolley, 1994). Stratified sampling (e.g., Thompson, 1992) is commonly applied in epidemiology to control for such confounding. Differences in irrelevant environmental risk factors among population strata that differ in frequency (or mean value) of a phenotype will result in spurious association. Within-family control can be effective in correcting for effects of stratification on environmental effects similar to how within-family control is used to correct for effects of stratification on genetic effects. The use of within-family data such as siblings basically redefines the subgroup unit to be the family. Consequently, it results in adjustment for between-family variability on any factor, observed or unobserved. In studies of aging, sib-based designs can control for factors that may have occurred decades earlier. For example, families may have differed in exposure to epidemics, exposure to environmental toxins, access to health care, childhood nutrition, and countless other factors early in life whose influence in late life might otherwise be uncontrolled for in samples of individuals. If some of these factors are measurable and show within-family variability as well as between-family variability, they can be analyzed for within-family effects that are independent of effects of stratification. Clearly the use of sibling data is far more developed in the context of genetic influences than in the context of environmental influences. A systematic effort to enhance the environmental side using large-sample population-based sibling data would be valuable for enhancing the ability to understand the total constellation of multiple genetic and environmental influences on complex traits. OPPORTUNITIES FOR INTEGRATED MODELS Taking advantage of advances in molecular genetics, powerful methods have been developed recently to detect individual loci that contribute a relatively modest amount to the total phenotypic variance based on sibling TDT approaches. For example, Abecasis et al. (2000) present extensive power simulation results. A locus that accounts for 10 percent of the phenotypic variance can be detected with 80 percent power at the genomescreen alpha level of 5×10–8 using about 1,000 individuals in sib-pair configurations or 800 individuals in subtrios. A locus that accounts for 5 percent of the phenotypic variance requires about 2,000 individuals in sib-pairs or 1,600 individuals in trios. The prospect of having measurable genetic influences for complex multifactorial traits is particularly exciting because it provides new opportunities to explore influences at a vastly more complex level than afforded by the overly simple models that limit

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? latent variable approaches. These new models are much more likely to be able to describe the true mechanism of action of multiple influences on a complex system than has previously been possible. Equally exciting is the prospect of detecting and incorporating measurable effects of specific environmental influences into a comprehensive model. Geneticists, by and large, have focused on the exciting advances in genetics in recent years for gene mapping and identification. However, it is not reasonable to ignore nongenetic effects on complex traits. Genetic influences are likely to be complex and dynamic, with genes turning on and off in response to conditions of the system that are influenced by other genes and by the environment. As a result, true understanding of the influences that lead to a phenotype is likely to occur only in the context of a comprehensive model of interacting genetic and environmental influences. Gene expression technology is likely to provide further dramatic opportunities for model refinement by providing a means to quantify effects such as tissue specificity, developmental effects, genetic response to environmental agents, etc. The success of this integrated approach will lie in effective cross-communication between researchers who are expert in exploiting genetic information and those who are expert in environmental assessment. Clearly it will not be adequate to simply take a moderate sample, measure everything, and conduct a huge multiple-regression analysis. Collaborative, large-scale data collection efforts will be essential to provide adequate power to detect small effects and especially to characterize interactions among small effects. Certain steps towards comprehensive integrated models can be taken rather rapidly with extensions of available technology. Some issues, however, will require innovative ideas from a multidisciplinary perspective. REFERENCES Abecasis, G.R., L.R.Cardon, and W.O.C.Cookson 2000 A general test of association for quantitative traits in nuclear families. American Journal of Human Genetics 66:279–292. Allison, D.B. 1997 Transmission-disequilibrium tests for quantitative traits. American Journal of Human Genetics 60:676–690. Allison, D.B., M.Heo, N.Kaplan, and E.R.Martin 1999 Sibling-based tests of linkage and association for quantitative traits. American Journal of Human Genetics 64:1754–1764. Almasy, L., and J.Blangero 1998 Multipoint quantitative-trait linkage analysis in general pedigrees. American Journal of Human Genetics 62:1198–1211. Amos, C.I. 1994 Robust variance-components approach for assessing genetic linkage in pedigrees. American Journal of Human Genetics 54:535–543.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? Behrman, J.R., Z.Hrubec, P.Taubman, and T.J.Wales 1980 Socioeconomic Success: A Study of the Effects of Genetic Endowments, Family Environment, and Schooling. Amsterdam: North-Holland. Boehnke, M, and C.D.Langefeld 1998 Genetic association mapping based on discordant sib pairs: The discordant-alleles test. American Journal of Human Genetics 62:950–961. Boerwinkle, E., R.Chakraborty, and C.F.Sing 1986 The use of measured genotypic information in the analysis of quantitative phenotypes in man. Annals of Human Genetics 50:181–194. Boerwinkle, E., S.Viscikis, D.Welsh, J.Steinmetz, S.M.Hamash, and C.F.Sing 1987 The use of measured genotypic information in the analysis of quantitative phenotypes in man. II. The role of the apolipoprotein E polymorphisms in determining levels, variability, and covariability of cholesterol, betalipoprotein, and triglycerides in a sample of unrelated individuals. American Journal of Medical Genetics 27:567–582. Bronfenbrenner, U., and S.J.Ceci 1993 Heredity, environment, and the question “How?”: A first approximation. Pp. 313–324 in Nature, Nurture and Psychology, R.Plomin and G.E.McClearn, eds. Washington, DC: American Psychological Association. Carey, G. 1986 A general multivariate approach to linear modeling in human genetics. American Journal of Human Genetics 39:775–786. Cloninger, C.R. 1980 Interpretation of intrinsic and extrinsic structural relations by path analysis: Theory and applications to assortative mating. Genetical Research 36:133–145. Cloninger, C.R., J.Rice, and T.Reich 1979a Multifactorial inheritance with cultural transmission and assortative mating. II. A general model of combined polygenic and cultural inheritance. American Journal of Human Genetics 31:176–198. 1979b Multifactorial inheritance with cultural transmission and assortative mating. III. Family structure and the analysis of separation experiments. American Journal of Human Genetics 31:366–388. Cookson, W. 1999 The alliance of genes and environment in asthma and allergy. Nature 402(6760 Suppl):B5–11. Curtis, D. 1997 Use of siblings as controls in case-control association studies. Annals of Human Genetics 61:319–333. Eaves, L.J., K.A.Last, P.A.Young, and N.G.Martin 1978 Model-fitting approaches to the analysis of human behavior. Heredity 41:249–320. Eaves, L.J., and J.Meyer 1994 Locating human quantitative trait loci: Guidelines for the selection of sibling pairs for genotyping. Behavior Genetics 24:443–455. Ewens, W.J., and R.S.Spielman 1995 The transmission/disequilibrium test: History, subdivision, and admixture. American Journal of Human Genetics 57:455–464. Falconer, D.C., and T.F.C.Mackay 1996 Introduction to Quantitative Genetics, 4th Edition. Harlow, UK: Longman. Falk, C.T., and P.Rubenstein 1987 Haplotype relative risks: An easy reliable way to construct a proper control sample for risk calculations. Annals of Human Genetics 51:227–233.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? Fisher, R.A. 1918 The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh 52:399–433. Fishman, P.M., B.Suarez, S.E.Hodge, and T.Reich 1978 A robust method for the detection of linkage in familial diseases. American Journal of Human Genetics 30:308–321. Fulker, D.W. 1982 Extensions of the classical twin method. Pp. 395–406 in Human Genetics, Part A: The Unfolding Genome, B.Bonne-Tamir, ed. New York: Alan R.Liss. Fulker, D.W., and S.S.Cherny 1996 An improved multipoint sib-pair analysis of quantitative traits. Behavior Genetics 26:527–532. Fulker, D.W., S.S.Cherny, P.C.Sham, and J.K.Hewitt 1999 Combined linkage and association sib-pair analysis for quantitative traits. American Journal of Human Genetics 64:259–267. George, V., H.K.Tiwari, X.Zhu, and R.C.Elston 1999 A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. American Journal of Human Genetics 65:236–245. Gu, C, A.Todorov, and D.C.Rao 1996 Combining extremely concordant sibpairs with extremely discordant sibpairs provides a cost effective way to linkage analysis of quantitative trait loci. Genetic Epidemiology 13:513–533. Hardy-Brown, K., R.Plomin, J.Greenhalgh, and K.Jax 1980 Selective placement of adopted children: Prevalence and effects. Journal of Child Psychology and Psychiatry 21:143–152. Haseman, J.K. and R.C.Elston 1972 The investigation of linkage between a quantitative trait and a marker locus. Behavior Genetics 2:3–19. Heath, A.C. and L.J.Eaves 1985 Resolving the effects of phenotypic and social background on mate selection. Behavior Genetics 15:15–30. Heath, A.C., P.A.F.Madden, W.S.Slutske, and N.G.Martin 1995 Personality and the inheritance of smoking behavior: A genetic perspective. Behavior Genetics 25:103–117. Ho, H., R.Plomin, and J.C.DeFries 1979 Selective placement in adoption. Social Biology 26:1–6. Horowitz, F.D. 1993 The need for a comprehensive new environmentalism. Pp. 341–353 in Nature, Nurture and Psychology, R.Plomin and G.E.McClearn, eds. Washington, DC: American Psychological Association. Jencks, C., M.Smith, H.Ackland, M.J.Bane, D.Cohen, H.Gintis, B.Heyns, and S.Michelson 1972 Inequality: A Reassessment of the Effect of Family and Schooling in America. New York: Basic Books. Kang, M.S. and H.G.Gauch, Jr., eds. 1995 Genotype-by-Environment Interaction. Boca Raton, FL: CRC Press. Karlin, S. 1979a Models of multifactorial inheritance: I, Multivariate formulations and basic convergence results. Theoretical and Population Biology 15:308–355. 1979b Models of multifactorial inheritance: II, The covariance structure for a scalar phenotype under selective assortative mating and sex-dependent symmetric parental-transmission. Theoretical and Population Biology 15:356–393.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? 1979c Models of multifactorial inheritance: III, Calculation of covariance of relatives under selective assortative mating. Theoretical and Population Biology 15:394–423. 1979d Models of multifactorial inheritance: IV, Asymmetric transmission for a scalar phenotype. Theoretical and Population Biology 15:424–438. Khoury, M.J., T.H.Beaty, and B.H.Cohen 1993 Fundamentals of Genetic Epidemiology. New York: Oxford University Press. Knapp, M., S.A.Seuchter, and M.P.Bauer 1993 The haplotype-relative-risk (HRR) method for analysis of association in nuclear families. American Journal of Human Genetics 52:1085–1093. Lander, E.S., and D.Botstein 1989 Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199. Lander, E.S., and N.J.Schork 1994 Genetic dissection of complex traits. Science 265:2037–2048. Lange, K., J.Westlake, and M.A.Spence 1976 Extensions to pedigree analysis. III. Variance components by the scoring method. Annals of Human Genetics 4:171–189. Le Marchand, L. 1999 Combined influence of genetic and dietary factors on colorectal cancer incidence in Japanese Americans. Monographs: Journal of the National Cancer Institute 26:101– 105. Lilienfeld, D.E., and P.D.Stolley 1994 Foundations of Epidemiology (third edition). New York: Oxford University Press. Lovestone, S. 1999 Early diagnosis and the clinical genetics of Alzheimer’s disease. Journal of Neurology 246:69–72. Lynch, M., and B.Walsh 1998 Genetics and Analysis of Quantitative Traits. Sunderland, MA: Sinauer. Nance, W.E., and L.A.Corey 1976 Genetic models for the analysis of data from the families of identical twins. Genetics 83:811–826. Online Mendelian Inheritance in Man, OMIM (TM) 2000 McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD). World Wide Web URL: http://www.ncbi.nlm.nih.gov/omim/ Ott, J. 1989 Statistical properties of the haplotype relative risk. Genetic Epidemiology 6:127– 130. 1991 Analysis of Human Genetic Linkage, Revised Edition. Baltimore: Johns Hopkins University Press. Page, G.P., C.I.Amos, and E.Boerwinkle 1998 The quantitative LOD score: Test statistic and sample size for exclusion and linkage of quantitative traits in human sibships. American Journal of Human Genetics 62:962–968. Plomin, R. and J.C.DeFries 1983 The Colorado Adoption Project. Child Development 54:276–289. Pomerleau, O.F. 1995 Individual differences in sensitivity to nicotine: Implications for genetic research on nicotine dependence. Behavior Genetics 25:161–177.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? Pritchard, J.K., and N.A.Rosenberg 1999 Use of unlinked genetic markers to detect population stratification in association studies. American Journal of Human Genetics 65:220–228. Rabinowitz, D. 1997 A transmission disequilibrium test for quantitative trait loci. Human Heredity 47:342–350. Rao, D.C., P.M.Laskarzewski, J.A.Morrison, P.Khoury, K.Kelly, R.Wette, J.Rusell, and C.J.Glueck 1982 The Cincinnati Lipid Research Clinic Family Study: Cultural and biological determinants of lipids and lipoprotein concentrations. American Journal of Human Genetics 34:888–903. Rice, J., C.R.Cloninger, and T.Reich 1978 Multifactorial inheritance with cultural transmission and assortative mating. I. Description and basic properties of the unitary models. American Journal of Human Genetics 30:618–643. Risch, N.J., and K.Merikangas 1996 The future of genetic studies of complex human diseases. Science 273:1516–1517. Risch, N.J., and H.Zhang 1996 Mapping quantitative trait loci with extreme discordant sib pairs: Sampling considerations. American Journal of Human Genetics 58:836–843. Schaid, D.J. 1996 General score tests for associations of genetic markers with disease using cases and their parents. Genetic Epidemiology 13:423–449. Schork, N.J. 1993 Extended multipoint identity-by-descent analysis of human quantitative traits: Efficiency, power, and modeling considerations. American Journal of Human Genetics 53:1306–1319. Slatkin, M. 1991 Inbreeding coefficients and coalescence times. Genetical Research 58:167–175. Sörbom, D. 1974 A general method for studying differences in factor means and factor structures between groups. British Journal of Mathematical and Statistical Psychology 27:229– 239. Spielman, R.S., and W.J.Ewens 1996 The TDT and other family-based tests for linkage disequilibrium and association. American Journal of Human Genetics 59:983–989. 1998 A sibship test for linkage in the presence of association: The Sib Transmission/ Disequilibrium Test. American Journal of Human Genetics 62:450–458. Spielman, R.S., R.E.McGinnis, and W.J.Ewens 1993 Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). American Journal of Human Genetics 52:506–516. Thompson, S.K. 1992 Sampling. New York: John Wiley and Sons. Thomson, G. 1995 Mapping disease genes: Family-based association studies. American Journal of Human Genetics 57:487–498. Turkheimer, E. and M.Waldron 2000 Nonshared environment: A theoretical, methodological, and quantitative review. Psychological Bulletin 126:78–108.

OCR for page 110
Cells and Surveys: Should Biological Measures be Included in Social Science Research? Vogler, G.P., and D.W.Fulker 1983 Familial resemblance for educational attainment. Behavior Genetics 13:341–354. Vogler, G.P., W.Tang, T.L.Nelson, S.M.Hofer, J.D.Grant, L.M.Tarantino, and J.R. Fernandez 1997 A multivariate model for the analysis of sibship covariance using marker information and multiple quantitative traits. Genetic Epidemiology 14:921–926. Wachs, T.D. 1993 The nature-nurture gap: What we have here is a failure to collaborate. Pp. 375– 391 in Nature, Nurture and Psychology, R.Plomin and G.E.McClearn, eds. Washington, DC: American Psychological Association. Weeks, D.E., and K.Lange 1988 The affected-pedigree-member method of linkage analysis. American Journal of Human Genetics 42:315–326. Xu, S., and W.R.Atchley 1995 A random model approach to interval mapping of quantitative trait loci. Genetics 141:1189–1197.