Skip to main content

Currently Skimming:

5 Guidance for Selection and Use of Population Descriptors in Genomics Research
Pages 115-148

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 115...
... In other situations, when descent-associated population descriptors are advisable or needed for methodological reasons, this chapter gives guidance on which approaches to consider and why. In formulating these recommendations, the committee recognizes that there exists a large amount of legacy data in which study participants have already been classified on the basis of population descriptors (Khan et al., 2022; Wallace et al., 2020)
From page 116...
... to the labels used in an analysis. The primary focus of this chapter is on the first two, namely the conceptual approaches and specific language that enable appropriate and accurate use of population descriptors in genomics research.
From page 117...
... For instance, some researchers merge genomic data sets from different sources and assign individuals to clusters on the basis of genetic similarity to each other or to reference panels. Then they assign labels to individuals based on a characteristic that is frequent in the cluster or by using the labels from the reference panels.
From page 118...
... Therefore, the committee recommends that researchers relying on such measures explicitly refer to genetic similarity when describing their results, rather than the shorthand of genetic ancestry. An exception is for human evolutionary genetics studies explicitly aiming to learn about genetic ancestries over time or space.
From page 119...
... of genetic similarity; Once a set is designated as a genetic ancestry group, its members are often assigned a geographic, ethnic, or other nongenetic label that is common among its members. Genetic similarity: quantitative measure of the genetic resemblance between individuals that reflects the extent of shared genetic ancestry.
From page 120...
... For example, when matching the background allele frequencies of cases to controls, there is a need to identify a set of individuals who are genetically similar, but not to rely on inferences about their genetic ancestry. Likewise, identifying individuals who are genetically similar to each other or to a reference panel is usually sufficient to delimit a subset of participants for genome-wide association studies (GWAS) . Although the distinction between genetic ancestry and genetic similarity may be subtle, it is nonetheless important to enable moving beyond fundamental misconceptions about population descriptors, particularly race and typological thinking.
From page 121...
... For example, a project may incorporate both geography and ethnicity simultaneously to distinguish, say, Kurds in Iraq from Kurds in Turkey. In some contexts, descent-associated population descriptors are used not as indicators of shared genetic ancestry but as proxies for shared environmental exposures (see "The Importance of Environmental Factors in Genetics and Genomics Research" in Chapter 2)
From page 122...
... Although the text does not cover every possible variation of the genetics study types, the intent is for the discussion and examples to allow researchers to understand why certain population descriptors are recommended or discouraged depending on the type of study and the goals of the research.
From page 123...
... 5 text and the E Descriptors could be used if appropriate decision tree in Appendix D proxies for environmental, not genetic, effects Indigeneity Geography Ethnicity/ Similarity Ancestry Genetic Genetic Race Notes GENOMICS STUDY TYPE Similarity suffices as a genetic 1: Gene Discovery Mendelian Traits � ?
From page 124...
... Best Practice 1: To enable identification of additional cases, rather than using genetic ancestry or ethnicity, researchers should use categories based on kinship (e.g., recent genealogical ancestors) , identity-by-de scent information, or fine-scaled geographical or genetic similarity data.
From page 125...
... Here again, genetic similarity is more appropriate than reference to genetic ancestry. For some traits considered Mendelian (e.g., Huntington's disease)
From page 126...
... . Study Type 3: Gene Discovery for Complex and Polygenic Traits For researchers who are mapping variants that influence complex trait values, common practice is to describe their study participants as members of genetic ancestry groups that are labeled with geographic, ethnic, or racial terms.
From page 127...
... Best Practice 6: When mapping variants that contribute to complex traits, the goal is to conduct the study in a set of individuals that are genetically more similar, rather than to infer ancestry per se. Therefore, researchers should characterize their study participants in terms of their genetic similarity to one another or to a reference panel, with a specified similarity measure (Coop, 2022)
From page 128...
... . For related reasons, the practice of performing genetic prediction after stratifying by a population descriptor can increase predictive power because it implicitly captures both genetic similarity and shared environmental exposures.
From page 129...
... . Considerations Common to Gene Discovery and Prediction for Complex and Polygenic Traits The committee recognizes that after delimiting study participants based on genetic similarity to a reference panel, researchers may want to refer to the set of study participants with a label based on ethnicity (e.g., Yoruba)
From page 130...
... that might enable genetic similarity to be assessed more precisely than is possible with group labels. Where no reference panel is available, researchers often use a group label based on an attribute that is common to the study participants, such as a subset of people who self-identify as "white British" in the UK Biobank.
From page 131...
... as a tool to better understand underlying mechanisms. When specific candidate loci or salient environmental factors are unknown, a common approach has been to use population descriptors, and in particular ancestry group labels, as a proxy for differences in allele frequencies across the genome and potentially environmental exposures.
From page 132...
... If, instead, the genetic variants are unknown, and researchers are interested in delimiting a set of individuals with similar allele frequencies, they should rely on genetic similarity rather than such descriptors as ethnicity or geography. Best Practice 12: Where the goal is to study the effect of unknown environmental exposures or possible gene–environment interactions, researchers should aim to replace or supplement population descriptors with direct information about potentially salient environmental factors.
From page 133...
... • Health Disparities Study Type 1: The sole goal is to study the role of one or multiple genetic variants on observed or possible health disparities between groups. Best Practice 13: In this type of study, what is needed is to consider the effects of the focal variant of interest among individuals with similar allele frequencies, so genetic similarity is the relevant de scriptor to use, and racial and ethnic labels should not be used.
From page 134...
... Study Type 7: Studies of Human Evolutionary History Population genetics studies of human history and prehistory aim to use genetics to make inferences about the genetic evolution of humans and integrate such inferences with data from archeology, history, paleontology, and other disciplines (e.g., Nielsen et al., 2017)
From page 135...
... In genetics studies of human evolutionary history, social or geographic population descriptors are often used to describe genetic ancestry groups inferred based on genetic similarity (e.g., labels may be based on shared characteristics of participants such as language spoken, self-identified ethnicity, or location sampled) in order to shed light on population history.
From page 136...
... Decision Tree for the Use of Population Descriptors To aid a researcher contemplating a specific genetics or genomics study, the committee believes that a decision tree to systematically decide which descent-associated population descriptors to consider using and which to
From page 137...
... . Harmonization of population descriptors, specifically, would allow greater interoperability among data sets in human genomics research.
From page 138...
... More specifically, for novel data collection, data should be collected per individual along multiple nongenetic dimensions and population descriptor types that may facilitate other studies. In addition, clear instructions should be provided on how downstream users can respect consent and any collaborative agreements with study participants
From page 139...
... In the context of genetics studies, genetic similarity to specific reference sets could have advantages for promoting harmonization. While a broader sampling of human genetic diversity is needed, current candidates for specific reference sets include, for example, data from the 1000 Genomes Project, the Human Genome Diversity Project, and the Simons Genome Diversity Project (1000 Genomes Project Consortium et al., 2015; Bergström et al., 2020; Cann et al., 2002; Mallick et al., 2016)
From page 140...
... For admixed individuals themselves, a harmonious approach using the language of genetic similarity would be to refer to the best approximating reference group; for example, "1KG-PEL-like," and "1KG-PUR-like" are two among many possible genetic similarity descriptors of Latino populations, with PEL = Peruvian in Lima, Peru, and PUR = Puerto Rican in Puerto Rico. While potentially difficult to read by novices, the use of abbreviations for precision and conciseness is in fact a key aspect of scientific language in many fields (e.g., chemistry and the abbreviations for the elements, though the committee notes the analogy is not exact as there are no fundamental elements with regards to genetic ancestry)
From page 141...
... Standardization for such genetic similarity procedures may be feasible, and would be fruitful to develop, especially as a fuller representation of human genetic variation is sampled by ongoing studies. Nonetheless, the abbreviation plus -like approach would have less vagueness than the current widespread use of such terms as European genetic ancestry and African genetic ancestry, where both the reference populations and the methods to ascribe an affiliation to European or African sources are unclear and make implicit assumptions about the time frame of interest.
From page 142...
... 2023. Genetic similarity versus genetic ancestry groups as sample descriptors in human genetics.
From page 143...
... GUIDANCE FOR SELECTION AND USE 143 Giannakopoulou, O., K
From page 144...
... 2007. Race, skin color and genetic ancestry: Implications for biomedical research on health disparities.
From page 145...
... 2019. Clini cal use of current polygenic risk scores may exacerbate health disparities: A systematic literature review.
From page 146...
... 2004. Implications of correlations between skin color and genetic ancestry for biomedical research.
From page 147...
... 2020. Genetic ancestry, skin color and social attainment: The four cities study.
From page 148...
... 2022. Challenges and opportuni ties for developing more generalizable polygenic risk scores.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.