Proceedings of a Workshop
The Promise of Single-Cell and Single-Molecule Analysis Tools to Advance Environmental Health Research
Proceedings of a Workshop—in Brief
Over the past decade, single-cell and single-molecule technologies have rapidly advanced. These new tools enable scientists to isolate individual cells; assay each cell’s DNA, RNA, proteins, and metabolites; and map cell contents and their molecular interactions. Most traditional analytical tools study cells and their molecular contents in bulk, providing information about the average cell and molecular complexes. These traditional approaches miss important differences between the cells and molecules. The use of single-cell and single-molecule technologies promises to generate new insights about the differences in function between individual cells and molecules, the organization and timing of responses to stimuli, cellular interactions as components of a complex system, and how these interactions may change with age, disease, and exposure to environmental stressors.
On March 7–8, 2019, the National Academies of Sciences, Engineering, and Medicine’s Standing Committee on the Use of Emerging Science for Environmental Health Decisions held a 2-day workshop to explore new single-cell and single-molecule analysis technologies. Workshop participants explored the state of this rapidly evolving field of study, reviewed preliminary uses of single-cell and single-molecule analysis tools in environmental health studies, and discussed the resources needed to make the data generated most useful to the biomedical and public health fields and to regulatory decision makers. In his introductory remarks, Kim Boekelheide from Brown University noted that single-cell and single-molecule analysis tools are widely applicable to many areas of science, but have not yet made much of an inroad into environmental health sciences. “We want to expose our community to the opportunities opened by these techniques and tools,” he said.
The workshop was sponsored by the National Institute of Environmental Health Sciences. This Proceedings of a Workshop—in Brief summarizes the discussions that took place at the workshop, with emphasis on the comments from invited speakers.
SINGLE-CELL AND SINGLE-MOLECULE ANALYSES 101
Single-cell and single-molecule analyses are not new concepts, said Norbert Kaminski from Michigan State University. He emphasized that what is novel about the approaches is “the tremendous advances” of the past few years—the result of a “convergence of life sciences, the physical sciences, mathematics, and engineering, to address complex problems in the biological sciences.” Both single-cell and single-molecule analyses are expected to lead to a better understanding and diagnosis of disease, he said.
What exactly is single-cell analysis? Norbert Kaminski defined single-cell analysis as the study of genomics, transcriptomics, proteomics, and metabolomics at the single-cell level. It is an area of study that goes back to the 1960s with the invention of the first fluorescence-based flow cytometer. New insights gained with these tools, said Norbert Kaminski, have challenged some basic assumptions in biology. For example, cells that are morphologically and genetically identical when viewed as a collection can be dramatically heterogeneous when studied individually. More importantly, this heterogeneity can change the behavior of entire populations of cells, he added.
The cell is the basic unit of life, began Aviv Regev of the Broad Institute. Being able to know cells is important for many reasons from “basic scientific curiosity about the world” to “understanding the manifestation of disease.” “The problem is that we do not really know our cells,” she said. Traditional bulk genomics is like a fruit smoothie said Regev. The analysis of cells and their molecular components is based on a blend of multiple cell types. Single-cell analysis is like a fruit platter. Scientists can “see all the components—the big distinctions and fine features” of individual cells. Regev explained that two major lines of technological advances within the past few years are single-cell genomics, particularly single-cell RNA sequencing, and spatial genomics. Rather than classify cells on the basis of their location or their shape, single-cell gene expression profiling allows cells to be defined as a point in the 20,000-plus dimensional gene expression space. Regev described spatial genomics as a fruit tart that enables scientists to study how cells are organized “in very precise positions in space” with respect to one another. Spatial genomics is an emerging toolbox in which the tools range from high resolution with less genomic information and low resolution with more genomic information. “These techniques are really part of the biological world now,” Regev emphasized. Single-cell analytic tools are being used to discover cell types that scientists did not know exist, to understand the order of temporal processes in biology, and to learn more about dynamic processes like the development of cells from a fertilized egg into a multicellular organism or cellular responses to environmental stimuli. “There has been dramatic growth in the scale of the data, in the ability to apply them across many systems in biology, and in biological insight,” Regev said.
Norbert Kaminski described single-molecule technologies as analytical tools that provide direct information about the behavior of individual molecules. Single-molecule imaging got its start in the 1990s with studies of the ATP hydrolysis. Single-molecular analysis provides the ability to study molecules in intact living cells (how molecules move and are distributed in cells), and how molecules interact with one another in a cell, said Norbert Kaminski.
How can scientists detect single molecules? M. Selim Ünlü of Boston University described how digital detection or counting single molecules is “a new frontier in biomarker analysis.” An advantage of single-molecule counting technologies over analog technologies, said Ünlü, is that researchers can receive a binary yes–no decision as to whether the molecule being measured is present or not with high certainty. Analog outputs require interpretation to determine a precise quantity of the bulk amount of biomolecule being measured. As a result, measurement fluctuations limit the accuracy in analog measurements, he explained. By way of analogy, Ünlü pointed to improved audio quality of digital music recordings in comparison to their analog predecessors (e.g., vinyl records). In addition, digital technologies are likely to be smaller and less expensive—and therefore more widely adopted—than corresponding analog measurement technologies, he added.
Noting the success of fluorescence-based DNA microarrays as analytical tools, Ünlü pointed out that their sensitivity and dynamic range are currently limited by the size of the microarray spots and the number of molecules needed in a given spot for detection. One commercial technology designed to overcome these limitations uses zero-mode waveguides to perform real-time DNA sequencing. It has been used to determine the origin of a cholera outbreak in Haiti and an E. coli outbreak in Germany. Other approaches have used droplet microfluidics, sliding glass plates, or microwell-based immunoassays to produce what are essentially microscopic reaction vessels that enable single-molecule detection.
Researchers have also used optical and mechanical micro-resonators to detect individual molecules, but this approach has mass transport limitations tied to the low probability of a molecule landing on the small sensor element at a low analyte concentration. Ünlü and his collaborators have achieved single-molecule detection on a large sensor surface by using gold nanoparticles to tag molecules, which are then detected by light reflectance. Non-fluorescent, light-scattering metal nanoparticles are particularly useful compared to fluorescent tags because the former do not bleach or saturate, and the detection equipment is relatively simple and inexpensive. He noted that other groups have also developed single-molecule detection technologies using metal nanoparticles, many of which allow for dynamic counting of individual molecules. Ünlü’s group has used gold nanorods to detect rare mutations in the presence of 100-fold more wild-type DNA.
Counting single biomolecules is the most accurate method for measuring the concentration of those biomolecules in solution, with optical methods emerging as powerful tools for multiplexed single-molecule counting. Single-molecule detection will complement single-cell studies by allowing researchers to tally the small absolute number of molecules from a single cell, said Ünlü. New technologies with more than 1,000-fold improvements in sensitivity over fluorescence-based techniques will allow investigators to study a much larger variety of protein and nucleic acid markers, such as microRNAs. Ünlü expects these technologies will enable earlier detection of cancer and infectious diseases, identification of a host of new biomarkers with utility for both diagnostic screening and environmental monitoring, and development of liquid biopsies and other precision diagnostics.
SINGLE-CELL AND SINGLE-MOLECULE ANALYSES IN ENVIRONMENTAL HEALTH RESEARCH
Ramnik Xavier from the Broad Institute provided an overview of the use of single-cell and single-molecule analyses in environmental health research. Genetic and environmental factors contribute to health and disease, stated Xavier. Single-cell and single-molecule analyses offer the potential to understand how these environmental and genetic factors interact, he said. There are strong associations between environmental pollution and many different health outcomes such as lung diseases like asthma and chronic pulmonary obstructive disease, metabolic liver diseases, or gastrointestinal diseases. Xavier explained that in order to study the influence of environmental exposures on health outcomes, researchers typically examine effects in peripheral blood in the hope that some of those effects might reflect what is going on in the lung, states of toxicity in the liver, or how tumor cells respond to treatment or develop resistance. But, “the peripheral blood is a poor readout of what might be going on in organ physiology and pathology and disease,” Xavier said. He argued that investigators “might need to think about the specific cell types that maintain homeostasis in these various organ systems,” and that single-cell and single-molecule resolution may enable researchers to identify how a disease is initiated, how diseases progress, or how treatments may cause additional toxicity. Xavier highlighted multiple recent examples of discoveries about gene–environment interactions enabled by single-cell analysis. For example:
- Single-cell mapping of the liver has identified some 20 distinct populations of cells and led to a basic framework of how to measure the effect of environmental insults on the development of liver disease.
- Single-cell analysis of the cells lining the small intestines has identified a host of new cell types capable of interacting with the environment and the body’s response to infectious organisms.
- Single-cell studies in leukemia have demonstrated that there are seven different states in which leukemic cells can exist and that treatment response depends on how many of these measurable states are present.
- Single-cell mapping of lung epithelial cells led to the discovery of one specific cell type that expresses the mutant cystic fibrosis transmembrane conductance regulator that causes this disease.
In studies of gene-based diseases, genetics provides a starting point, while single-cell analysis can create roadmaps that researchers can follow to discover the routes by which genes, cell types, and environmental factors interact to contribute to the development of disease, Xavier added.
One promising use of single-cell and single-molecule detection technologies, said Vasilis Vasiliou from Yale University, is for studying the metabolome as an indicator of the interactions between individuals’ genes and their exposure to substances in the environment. Untargeted or global metabolomics, he explained, aims to measure the broadest range of both endogenous and environmental low molecular weight metabolites in an extracted sample without prior knowledge of the metabolome. Ultra-high-performance mass spectrometry coupled with qualitative time-of-flight mass spectrometry (UPLC-QTOF-MS) enable Vasiliou’s team to simultaneously analyze 3,000 to 5,000 metabolites covering a range of chemical classes and metabolic pathways.
Quantitative, hypothesis-driven, targeted metabolomics focuses on the molecules identified by untargeted metabolomics as being important for a particular study. In a targeted approach, explained Vasiliou, the analysis is optimized to increase the sensitivity and resolution for each of the smaller number of metabolites. It can also be used to validate untargeted metabolomic data and further expand metabolic pathway analysis.
Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry can image metabolites within tissue samples and reveal tissue metabolite heterogeneity that can be overlaid with pathological features of the tissue. Vasiliou explained that tissue-imaging mass spectrometry can yield predictive, prognostic, diagnostic, and surrogate markers and provide information on the underlying molecular mechanisms of disease. For example, imaging mass spectrometry can reveal biomarkers for drug response phenotypes and define a metabolomic profile for a specific genotype. He said that it can also be used to describe the molecular landscape in human performance applications and in extreme environments, and it can quantify the localization of drug distribution in a targeted tissue, such as a tumor or the brain.
While Vasiliou’s group has been working to improve the resolution and sensitivity of tissue-imaging mass spectrometry, it has used it in its current form to study how alcohol causes alcoholic liver disease and how the disease progresses to fibrosis and cirrhosis. His team also studied metabolomics related to bacterial biofilm formation in colon
cancer and found that biofilm-associated tissues had increased levels of polyamine metabolites localized to the colon’s mucosal layer. Tissues with biofilm had increased inflammation and cellular proliferation and a pro-carcinogenic microbiome that produces polyamine metabolites.
One limitation of this approach is that it presents a snapshot of multiple cell types that can be influenced by the proportion of specific cell types in the tissue. Metabolic analysis of a population of individual cells would provide a more precise understanding of cellular biochemical status and how it might affect a cell’s phenotype, said Vasiliou. Toward that end, his group and others are working to improve the resolution of this technology to enable imaging at the level of organelles. He said at least one group has reported being able to classify cell types based on single-cell metabolomic analysis.
Other investigators have used single-cell mass spectrometry to characterize energy charge, redox state, and metabolite turnover in single human liver cells and to examine the effects of pesticide exposure on cellular metabolomics. Vasiliou’s team is using single-cell mass spectrometry to identify the mechanisms by which genomic instability triggered by substances in the environment can promote inflammation of the placenta and damage developing female embryos. It is also using this technology to trace the distribution of metal nanoparticles in individual human T cells. The ultimate goal of this work, he said, is to use metabolomics as a means of understanding how exposure to substances in the environment produces certain phenotypes.
There are challenges associated with single-cell metabolomics, said Vasiliou, including the low abundance of the molecules of interest, the rapid turnover rates of metabolites in a cell, artifacts caused by the MALDI matrix, and the limited availability of software that can integrate mass spectrometry data with microscopy images and enable linking specific metabolites to cellular pathology. Single-cell metabolomic analysis may be in its infancy, said Vasiliou, but he anticipates it will provide unique insights into the molecular mechanisms governing cell proliferation and differentiation, environment-induced changes in cellular function, and intracellular differences in susceptibility to adverse effects. He also expects single-cell metabolomics to generate a better understanding of the contributions of different cell populations to health and disease.
SINGLE-CELL AND SINGLE-MOLECULE FLUORESCENCE MICROSCOPY
Julie Biteen from the University of Michigan has been using single-molecule fluorescence microscopy to obtain high-resolution, dynamic information at a scale below a few microns. In contrast to Ünlü’s earlier statement that fluorescence-based methods have limited sensitivity, Biteen said that fluorescence is one of the most direct and amenable techniques for high-sensitivity detection. In fact, she said, fluorescence can detect single molecules with nanometer-scale localization precision using emission images obtained with a standard benchtop microscope. Moreover, this approach can image the motion of single-molecules inside a cell, and if dyes of different colors are used as labels, single-molecule fluorescence imaging can track several different molecules simultaneously.
Using single-molecule fluorescence microscopy, Biteen’s team has been studying the role that a protein known as MutS plays in DNA replication and repair in the model bacterium Bacillus subtilis. Using live cells, the investigators demonstrated that MutS, which binds to mismatched base pairs that can arise in DNA during replication and recruits other proteins to repair the mismatch, explores the entire bacterial cell over about 1 second, but dwells at the site of replication for 188 milliseconds before continuing to diffuse throughout the cell. This result showed for the first time that MutS monitors the position of replication and then spots a mistake rather than responding directly to mistakes as they occur. Additional work showed that all of the proteins involved in DNA replication are dynamic and move around continuously inside the cell.
Michael Mancini from the Baylor College of Medicine discussed his group’s use of multiplexed, high-throughput fluorescence microscopy to study steroid receptor functions in single cells as a means of examining how various chemical mixtures can disrupt endocrine function. Their work includes studying chemical mixtures found in floodwaters that inundated thousands of homes in the aftermath of Hurricane Harvey in 2017. Mancini used an imaging system developed earlier for quantitative visualization of estrogen-regulated transcription at the single-cell level. Mancini and his collaborators showed that most analogs of the chemical bisphenol A (BPA), an additive to plastics that disrupts estrogen receptor function, can bind to one of two estrogen receptors that either accelerate or retard growth even though they have been promoted as being safe replacements for BPA. Further work identified two BPA analogs that do not bind to either estrogen receptor.
For the post-Harvey floodwaters project, Mancini and his collaborators have begun working with reference sets of more than 100 chemicals provided by the Environmental Protection Agency and the Agency for Toxic Substances and Disease Registry. As a first step, they put each of the individual chemicals into their high-throughput, estrogen-
amenable receptor assay platform to create imaging-based signatures for each of the chemicals. They are now starting on mixtures of chemicals and have found that some mixtures have no effect on estrogen receptor function, while others can have a large effect.
Mancini’s group also used mRNA fluorescence in situ hybridization (RNA-FISH) to count the number of mRNA molecules produced by a model estrogen receptor target gene in response to estrogen stimulation. They identified heterogeneous responses at both the individual cell and the allele levels that are dependent on hormone concentration and the length of exposure time. This surprising discovery points to some epigenetic processes that his group will continue studying using a novel, high-throughput microscope. This microscope provides large fields of view (20× objective), but at the resolution of a 100× oil lens, facilitating the quantitation of thousands of cells per condition to study individual cellular and allelic responses after brief exposures to different endocrine disruptors, he explained. With these new tools and approaches, Mancini believes throughput and analyses will be increased further, including for whole genome analysis at the single-cell level. On a final note, Mancini said that this work is not possible without high-quality antibody probes. Unfortunately, too many of the commercially available probes are poor quality, he stated. As a result, Mancini’s group is using its high-throughput imaging analysis platforms to produce and screen monoclonal antibodies under the conditions in which they will be used in the final assays.
BIOINFORMATICS AND SINGLE-CELL ANALYSIS
“Data analysis should not be the bottleneck to adopting single-cell technology,” stated Lana Garmire from the University of Michigan. She added that data analysis is not something that domain experts should let dampen their enthusiasm to adopt these technologies in their laboratories. To address this concern, Garmire and her collaborators developed Granatum, a graphical, Web-based, single-cell, whole transcriptome, shotgun sequencing (scRNA-Seq) analysis pipeline. She noted that the bioinformatics community as a whole is rapidly developing a host of new tools to help annotate single-cell data using different clustering methods, differential expression, and various visualization techniques to assemble protein networks.
Currently, Garmire’s group is developing deepImpute, an open-source bioinformatics approach that uses a deep-learning neural network model to impute the dropout data points present in almost all scRNA-Seq data. She said DeepImpute produces more accurate results correlating to a higher degree with RNA-FISH data and uses computer resources more efficiently than other available software packages. In addition, DeepImpute is scalable—its deep-learning neural network can be trained with only a small amount of data and then used to process data from a large number of cells.
Garmire’s group is using single-nucleotide variations in scRNA-Seq data to identify tumor subpopulations and links between genotype and phenotype. This work has demonstrated that expressed and effective single-nucleotide variations (eeSNVs) are a better feature to use for identifying tumor subpopulations than gene expression features alone. To visualize tumor subpopulation data, her team developed what it calls bipartite graph visualization, which treats individual cells as nodes within a tumor and then generates networks using genes that correlate with eeSNVs. This analysis identifies which pathways are active in which subpopulations of cells within a tumor and provides insights into the different evolutionary processes that generated each subpopulation. Garmire believes that eeSNVs are an alternative, yet more robust feature than gene expression to demonstrate single-cell heterogeneity and that a linear-regression model can link eeSNVs with gene expression and prioritize them for further investigation.
CONVERGENCE OF SINGLE-CELL ASSAYS, BIOINFORMATICS, AND SIMULATION
To Rajanikanth Vadigepalli from Thomas Jefferson University, single-cell technologies change the way atlas building1 occurs thanks to the unprecedented resolution these technologies provide. Atlases alone, however, are not that useful for producing insights into a biological mechanism, he stated. To better examine biological mechanisms, the convergence of three distinct research perspectives—in vivo data, bioinformatic analyses, and simulations—come into play, Vadigepalli said. According to Vadigepalli’s conceptual framework, the three perspectives seek and offer answers to distinct questions: what is (in vivo data), what may be (bioinformatics), and what if (simulation), the convergence of which is essential to answer the how does question about specific biological mechanisms.
1 Creating maps of the human body, specific organs, or tissues that identify all of the different cell types, where the cells are located, and the genes the different cells express.
As an example, Vadigepalli discussed a project in which his team analyzed transcriptional phenotypes of single neurons in the brainstem to produce a network model of gene regulation dynamics in those neurons. They then developed a computational model to infer the duration of signaling pathway activity in single neurons and learn about the temporal heterogeneity of input (i.e., variation in stimulus history of these cells). Vadigepalli’s group took a similar approach to redefine the neural circuits involved in driving the central circadian clock.
His team has also been trying to identify the unresolved mechanisms involved in liver repair and regeneration using data from network modeling studies, single-cell gene expression, in vivo manipulation of microRNA, transcriptomics and microRNA-omics, and genome-wide transcription factor binding. He said one challenge is to understand how different cell types and functional states in those cell types are involved in the repair and regeneration process over time. Vadigepalli’s group developed a model of the multiscale control of liver repair that integrates molecular regulation, cell phenotypes, and physiological response to better understand the functional state transitions that underlying cell phenotypes go through to trigger and orchestrate the repair process.
One finding from this work was that shifting the dynamics of the transition that occurs in hepatic stellate cells in response to liver injury controls overall mass recovery in the liver. It appears that there are four molecular states in which hepatic stellate cells reside, with each state characterized by correlated modules of gene expression that can be measured. When the liver is injured, molecular characterization makes apparent shifts in the molecular states of individual hepatic stellate cells, said Vadigepalli. He concluded with a call to build such a convergence framework for research in toxicology in order to realize the promise of single-cell technologies to transform environmental health research.
LESSONS FROM THE PULMONARY FIBROSIS ATLAS
Idiopathic pulmonary fibrosis (IPF), a chronic, progressive, and highly lethal form of fibrotic interstitial lung disease, affects some 200,000 patients in the United States and approximately 6 million worldwide. There are two approved drugs that slow the disease progression, said Naftali Kaminski from the Yale School of Medicine, but neither has an obvious effect on survival or quality of life. IPF is an unusual inflammatory disease in that it is driven by recurring micro injuries that are likely caused by subclinical environmental exposures. He explained that these microinjuries to airway epithelial cells can trigger epithelium and fibroblast activation, producing an interstitial pneumonia lesion in lung tissue. There is a strong genetic predisposition to this disease, he added.
Years ago, Naftali Kaminski’s group pioneered the application of high-throughput transcriptomics to lung tissue, which led to significant insights about development pathways in the lung and the identification of lung-specific enzymes and microRNAs. Analyzing bulk RNA-Seq data from differentially affected regions in the same lung revealed a number of primary and secondary pathways from minimally affected regions to regions characteristic of end-stage disease. This work, however, was unable to distinguish between changes in gene expression in native cells versus infiltrating cells. To address that limitation, Naftali Kaminski and his collaborators developed two levels of an IPF cell atlas. The first level comprised scRNA-Seq profiling of dissociated lung tissue, bronchoalveolar lavage, peripheral blood mononuclear cells, and airway brushings, while a second level focused on disease microenvironments using spatial transcriptomics, nuclear RNA-Seq data from tissues characterized by microcomputed tomography, and specific protein validations. He noted that over the course of 1 year, his group will have analyzed approximately 1 million cells.
One lesson learned so far has been that dissociation of lung tissue favors macrophages and lymphocytes, which skews the resulting data. Moreover, different dissociation methods are biased toward different cell types. The solution to this problem was to process enough tissues using different methods to isolate rare, disease-associated cells in addition to the more common cell types like macrophages and lymphocytes. Another lesson from this work was that outliers matter and to not automatically discard data from a sample that looks unusual because that can indicate the presence of an unusual manifestation of disease.
EXPLORING BARRIERS, CHALLENGES, AND LIMITATIONS
Integration of Single-Cell Datasets
Rahul Satija from New York University noted that there are other single-cell analytical technologies beyond scRNA-Seq that can be extremely powerful, including:
- single-cell assay for transposase accessible chromatin sequencing (scATAC-Seq) that examines chromatin accessibility,
- cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) that enables measuring proteins and RNA simultaneously and allows for immunophenotyping of cells, and
- spatially-resolved transcript amplicon readout mapping (STARmap) that provides three-dimensional positional information.
What is exciting about these technologies, said Satija, is that each one provides information about a different aspect of cellular identity, which in turn can yield a more detailed picture of the biology happening in a cell and tissue. The challenge is to integrate these data into a single analysis so that these experiments inform one another.
To illustrate an approach to address this challenge, Satija discussed a project he undertook to integrate data published in 2016 from four research groups that analyzed human pancreatic islets using four different scRNA-Seq technologies. While each of these experiments identified the same cell types, Satija wondered if a comparative analysis of cells grouped by their shared biological state might allow for asking questions about how these cells differ from one another. What makes this grouping possible is that all four datasets shared sources of variation that could be used to integrate the datasets.
In summarizing the key steps for data integration, Satija explained that the first step is to perform a joint dimensional reduction of different datasets by identifying shared correlation structures across datasets and shared cell states across technologies. Next, so-called anchor cells—those that share a biological state—are identified along with mutual near neighbors that also share the same biological state. This step increases the robustness to non-overlapping populations. Finally, these anchor cells are prioritized in analysis based on the robustness of the correlations among them. This last step is essential because the use of incorrect anchor cells is an infrequent, but inevitable occurrence.
Satija and his collaborators went to the literature to find every pancreatic islet dataset available, which turned out to come from eight research groups using six different technologies. His group integrated all eight datasets, covering 25,000 pancreatic islet cells, from both mice and humans and identified species differences and similarities. Satija’s group then used the same approach to integrate datasets from 100,000 cells harvested from 21 different mouse tissues and analyzed with two different technologies to produce an organism-scale atlas. He noted that this approach allows for the identification of rare cell types that would otherwise be difficult to find from a single dataset. They were also able to integrate data for 270,000 bone marrow cells obtained from 8 human donors, which then would enable asking questions such as how gene expression changes as hematopoietic stem cells differentiate into other lineages, how these stem cells differ across donors, how they change with age, and if there are gender differences.
Satija noted that this approach also works with multi-modal data—for example, protein and mRNA data or sequencing and imaging data—to provide new biological insights, including spatial expression of genes in three dimensions. His team demonstrated that its anchoring technique can harmonize in situ gene expression data and scRNA-Seq data to produce transcriptome-wide prediction of spatial gene expression patterns. He predicted that as datasets grow in size, it should be possible to ask how tissues are structured and how cell–cell interactions affect gene expression.
A NON-LINEAR MODEL FOR ANALYZING SINGLE-CELL RNA SEQUENCING DATA
Single-cell analyses are challenging, said Barbara Engelhardt from Princeton University, because of poorly defined cell types and cell states; rare, continuous, and unseen cell types; batch effects, dropouts, and doublets; and a heterogeneous latent dimension resulting from the complexity of expression patterns obtained from certain tissues. To address these challenges, Engelhardt and her colleagues developed a low-dimensional model of scRNA-Seq to help understand variation in gene expression across a population of cells.
After describing the mathematics of the model she and her colleagues developed, Engelhardt showed how this approach does a better job of identifying and separating different cell types and visualizing variation among cells compared to other computational methods. She then reviewed how her team used this analytical approach to analyze data from a number of published datasets, including those from human cerebral cortex cells; cell cycle, stage-labeled human embryonic stem cells; and CD34 cells. In one analysis, this approach was able to illustrate developmental trajectories of CD4+ cells infected by Plasmodium over the course of 7 days.
Engelhardt said this non-linear, robust, latent variable model can produce a low-dimensional representation of single-cell RNA-Seq data. It can also be extended to a semi-supervised form to enable batch correction of data from disparate sources. While there are many latent manifolds to consider for scRNA-Seq data, the advantages of the approach she described are that it can propagate uncertainty to downstream tasks and that it is robust to outliers.
CONTROLLING FOR UNWANTED TECHNICAL AND BIOLOGICAL VARIATION IN SINGLE-CELL DATA
Yoav Gilad from The University of Chicago noted that practically all datasets collected today come from studies where each individual’s cells are processed using a single batch, which means that batch and individual genotype are entirely correlated. One solution is to generate multiple replicates from each individual and spike them with external RNA controls consortium (ERCC) materials and unique molecular identifiers, the latter of which he believes should be included in every single-cell analysis to reduce variation. Gilad also mentioned the importance of developing quality control metrics independently for each single-cell analysis.
Gilad and his colleagues have tackled the challenge of identifying cell type and cell cycle phases in the context of the continuum of cell states in the cell cycle. This is done by taking advantage of a system that allowed them to obtain independent information about cell cycle by using fluorescence to measure the expression of genes whose activities are known to be cyclical and RNA sequencing from the same cell. The team was then able to determine the likelihood of a particular gene being expressed at a specific point in the cell cycle as a means of using gene expression as a predictor of cell cycle. The surprising finding from this work, said Gilad, was that a mere five genes—the top five genes in terms of their cyclic expression—could serve as a predictor and that adding more genes did little to improve prediction accuracy.
Gilad noted that researchers need to give more consideration to the design of their studies to better separate batch effects from individual effects. He also said that there is a need to evolve standard concepts that go beyond discrete classifications of cell state and to conduct more research into the sources of variation in single-cell analyses, especially in the context of single-cell studies for personalized medicine applications.
THE CHALLENGES OF TRANSLATING SINGLE-CELL GENOMICS TO THE CLINIC
Alex Shalek’s laboratory at the Massachusetts Institute of Technology focuses on understanding what constitutes immune system homeostasis and how to return the system to balance when homeostasis is disturbed. Toward that end, his group developed tools to use genomics for studying immune responses in the face of tremendous heterogeneity that exists in immune system cells, even those originating from a common cell type grown in the same environment. This heterogeneity might represent structure that is not yet appreciated, he explained.
Shalek’s initial work on gene expression in immune cells found that housekeeping and ribosomal genes are among the least variable across a particular set of activated immune cells while immune response elements are among the most variable. His group then showed that key single-cell genomic results obtained using scRNA-Seq could be validated using orthogonal methods such as RNA-FISH. What made this variation so exciting, he said, was that much of the differences between cells was structured instead of random. “We can uncover cell states and circuits, as well as their markers and drivers, from the structure in gene expression co-variation,” said Shalek.
Profiling how each cell’s identity, characteristics, environment, and interactions with other cells integrate and drive immune responses requires tools for deeply and controllably assaying the relevant properties of cells. Shalek’s group has developed micro- and nano-scale tools for profiling interacting cellular ensembles, such as tissues, including a massively parallel microfluidic system that entraps and barcodes the mRNA from individual cells in oil droplets. This system is also capable of processing 10,000 to 100,000 cells at a cost of $0.06 per cell. However, this system is inappropriate for use in a global health context because it was not designed with the appropriate performance characteristics; neither are the other two similar systems that other groups developed. Shalek’s solution, called Seq-Well, uses a 86,000-well system that works with low-input clinical samples as well as cell lines and other cell mixtures. Seq-Well has now been used in more than 100 labs on 6 continents to explore a wide variety of diseases.
One study his group conducted looked at chronic rhinosinusitis (CRS), a chronic allergic inflammation that occurs in about 12 percent of the population. Single-cell expression studies using the Seq-Well system revealed that CRS results from shifts in expression in epithelial cells in the sinuses that can lead to chronic inflammation and sometimes polyp formation, depending on which genes those cells are expressing. Additional studies showed there are differences in gene expression in different sinus regions and there is a lack of diversity in the epithelial ecosystem of nasal polyps due to an impaired differentiation trajectory in polyp epithelium. Shalek noted that Seq-Well allowed for unprecedented insights into an enigmatic cell state and the observation that epithelial cell diversity is reduced in the nasal polyp ecosystem. This work also showed that immune effector cytokines act directly on human tissue stem cells and that these stem cells can remember inflammatory challenges in their epigenome. It is possible, he said, that
therapies targeting the intrinsic memories formed in stem cells may lead to new approaches for treating CRS and other inflammatory diseases.
THE HUMAN CELL ATLAS: A CHALLENGE AND AN OPPORTUNITY
With emerging tools of high-resolution spatial genomics, the opportunity exists to create a new spatially and genomically defined cell atlas for all human cells, said Regev. Early insights gained from this type of cell atlas include discovering previously unknown cell types, determining the order in which cells develop, and identifying the programs cells use during their lifetime. For example, Regev and her colleagues developed an algorithm that allows them to use gene expression data to work backward in a cell’s development to identify a cell’s origin. They also compared gene expression patterns in individuals with melanoma who responded to or did not respond to immunotherapy to identify an expression signature that predicts immunotherapy resistance. This process can now be used to identify drugs that reverse the resistance signature.
Regev said that studies such as these led to the call for creating a period table of cells—the Human Cell Atlas (HCA)—that at a minimum identifies all of the cells in the human body and provides their characteristics to use as a reference map. Currently, HCA is proceeding along two parallel tracks, one analyzing individual cells, the other analyzing cells in the spatial context of their tissues. Work is also ongoing to optimize the pipelines for obtaining and processing tissues from both live patients and deceased transplant donors and to optimize protocols for success across different tissues. This project’s goal is to have a model that explains how every cell comes to be and what will happen over time when a cell is perturbed in any one of a number of ways. This information would then provide a basis for understanding, diagnosing, and monitoring human health and disease. All told, 62 countries, 848 institutes, and some 13,000 researchers have joined this effort, which is currently focusing on 12 systems in the human body.
So far, HCA released a first draft of cell atlases of the kidney, skin, lung, gut, and immune system, as well as a developmental cell atlas and a tumor cell atlas. The consortium is also working to develop a coordination platform that will include data from model organisms.
Orit Rozenblatt-Rosen from the Broad Institute discussed her work using single-cell analysis to create cell atlases and using them as roadmaps to understand how tumors develop. She noted that tumors are highly heterogeneous and comprise a complex cellular ecosystem with multiple cell types. Single-cell RNA sequencing techniques that enable characterizing tens of thousands of cells in a single experiment provide the means to identify the molecular status of the many cell types in a tumor, characterize where they are located in the tumor, and the role these different cell types play in tumor growth and resistance to treatment.
In melanoma, for example, single-cell analysis revealed two different molecular signatures: regions of a tumor that exclude T cells displayed one expression profile, while tumor regions with high levels of infiltrating T cells displayed a different expression profile. These RNA expression profiles predict disease progression, clinical response to immunotherapy, and the duration of the response. These results led to the testable hypothesis that reversing the so-called resistance program could improve therapeutic response. In fact, mice treated with a combination of drugs to reverse this resistance program did produce a positive clinical outcome. Rozenblatt-Rosen and her colleagues are now planning a clinical trial of this approach in human melanoma patients.
One challenge going forward is to develop systematic pipelines for creating tumor cell atlases that can be used to profile tumors, Rozenblatt-Rosen noted. Doing so will require experimental methods to deal with the different sources of patient tumor material in real time, tools to analyze the individual cells in those tumor samples, and computational tools to assemble spatial maps from the single-cell data. Her team has now analyzed more than 20 different tumor types and developed a method for RNA analysis from single nuclei, enabling it to use frozen tissue samples rather than having to process freshly harvested samples. They have also built an RNA sequencing toolbox that allows them to quickly identify the optimal conditions for analyzing a specific type of tissue sample, as well as other sets of tools for creating special maps, studying chromatin, analyzing proteins at a single-cell level, and multiplexing these different types of analyses.
Rozenblatt-Rosen noted that when working with clinical samples, it is important to develop tools that fit into the clinical workflow, starting with patient consent, biospecimen acquisition, and parsing the biospecimen for various diagnostic procedures. Her team worked with colleagues at the Dana-Farber Cancer Institute and Massachusetts General Hospital to craft a workflow built on a dedicated communication system that enables close coordination between the clinical and research staff. A cloud-based computation pipeline generates sharable and reproducible analytical results on close to 1 million cells within 2 hours, as opposed to the days required for conventional tumor diagnostic procedures.
In closing, Rozenblatt-Rosen said her group created a team to train clinical staff at other institutions as a means of building a critical mass of investigators to generate a human tumor atlas. “In the end, we hope we will be able to build tumor atlases that will help us generate better drugs for precision medicine,” she said.
CONSIDERATIONS FOR APPLICATION IN ENVIRONMENTAL HEALTH
Mel Andersen from ScitoVation discussed new ways to model dose–response relationships of exposure to environmental chemicals at the single-cell and single-molecule level. Chemical perturbations frequently take place within single cells, stated Andersen. Cells can have “all-or-none” responses to a chemical exposure, though as chemical doses increase, more cells respond. However, the cellular response is not the result of the coordination of multiple genes within the cell, Andersen said. When a chemical binds to a receptor, triggering a cellular response, it reflects interactions at many promoters, sometimes hundreds or thousands. Interactions with such a large group of promoter regions would lead to inhibitory interactions, rather than cooperative interactions, he explained. Thus, the all-or-none responses likely require activation of post-translational processes that then coordinate the transcriptional responses of multiple genes, stated Andersen.
One model to explain how cells transition from no response to complete response is non-linear positive feedback loops linked together to cause a cell to switch from one stable state to a second stable state. In some cases, this switch can be permanent—the new cellular state persists even with the removal of the activating compound. Andersen explained that researchers identified several mechanisms that can drive this type of switch, all of which depend on positive feedback loops. He emphasized that mechanistic cellular toxicity studies need to include greater consideration of the underlying network motifs regulating cellular-level responses. “Understanding the quantitative aspects of network motifs relevant in toxicity pathway perturbations will be an integral component for these cell-based dose–response assessments and for training future toxicologists,” said Andersen.
Andersen noted that the idea that cells respond in an all-or-none fashion points to the importance of subtyping cells as a means of identifying those cells that are responding to a chemical perturbation instead of measuring responses in a population of cells. This demonstrates how single-cell analysis applies to environmental health research, he said. Andersen emphasized that the single-cell approach could enable the mode of action of a given chemical perturbation to be tied to only the responding cells, instead of the entire population of cells. Studying just the responding cells will improve understanding of mode of action, he concluded.
Andersen noted that the challenge in toxicology and risk assessment is to identify a level of chemical exposure that is safe, which currently is mostly done by the extrapolation of toxicology data from studies of intact animals to a low-dose paradigm. This extrapolation approach, however, does not take into consideration new tools for examining individual cellular responses to perturbations. Thus, Anderson raised the question: What direction might risk assessment take as investigators produce data on single-cell and single-molecule response patterns?
Andersen emphasized that a focus on the responses of cells and key molecules has a good chance of improving chemical risk assessment. But, he pointed out, chemical risk assessment has made little use of more mechanistic, cell-based, dose–response models, instead relying more on case law and safety factors. Andersen reiterated that cellular responses to perturbations are more likely to be all-or-none responses rather than graded responses and noted that the networks that control these sudden transitions contain molecular components that change incrementally until they reach a point where that transition occurs, producing larger phenotypic changes at the cellular level.
Andersen concluded his comments with a set of challenging questions, the answers to which could influence how to leverage findings from single-cell and single-molecule analyses in environmental health decisions:
- How do scientists place the responses of single cells into context with those occurring in a tissue or an organism?
- Would scientists consider regions with post-translational modification control that preceded all-or-none responses to be adaptive or would variability in these processes become required considerations in defining population variability in a risk assessment context?
- Can scientists develop improved methods to look at cellular trajectories (i.e., moving from a basal cellular state to the fully activated cellular phenotype) that would complement more traditional approaches of measuring responses to exposure in populations of cells?
- Can the field of toxicology establish a mindset to pay as much attention to the network and signaling motifs associated with responses as it does to the responses themselves?
In his concluding remarks, Norbert Kaminski noted that researchers are taking measurements at a resolution and precision that is unprecedented using these new single-cell and single-molecule technologies. In large part, the people developing these techniques are not biologists—but rather mathematicians, engineers, and physical scientists—pointing to the importance of convergence regarding applications in environmental health science. He also pointed to the apparent heterogeneity among cells, even those of the same type, in the same tissue, and in the same environment, and how important HCA will be for truly understanding the biology underlying disease processes. Identifying the specific cells and specific molecules involved in a disease will open the door to identifying the changes in gene expression that are central to the disease process, which in turn will create opportunities for developing new biomarkers for disease and new therapies.
On a technical note, Norbert Kaminski stressed the importance of workflow and standard operating procedures to address issues of reproducibility of the data generated from single cells. It is encouraging, he said, that investigators are developing computational approaches to dealing with confounders that can introduce bias into these measurements.
Regarding environmental health science, it will be important to understand and characterize the “normal” state of cells in order to have a validated baseline against which to understand how chemical perturbations affect single cells, said Norbert Kaminski. Having said that, he also noted that while he can see using these new technologies to understand biology, he is not yet sure what role they can play to inform risk assessment and science policy decisions.
Disclaimer: This Proceedings of a Workshop—in Brief was prepared by Joe Alper and Keegan Sawyer as a factual summary of what occurred at the workshop, with assistance from Susan Martel. The workshop committee’s role was limited to planning the event. The statements made are those of the rapporteurs or individual workshop participants and do not necessarily represent the views of all workshop participants, the workshop committee, or the National Academies of Sciences, Engineering, and Medicine.
Workshop Committee on the Promise of Single-Cell and Single-Molecule Analysis Tools to Advance Environmental Health Research: Norbert Kaminski (Chair), Michigan State University; Lesa Aylward, Summit Toxicology; Sudin Bhattacharya, Michigan State University; Kim Boekelheide, Brown University; M. Selim Ünlü, Boston University; Ramnik Xavier, Broad Institute.
Sponsor: This workshop was supported by the National Institute of Environmental Health Sciences.
About the Standing Committee on the Use of Emerging Science for Environmental Health Decisions: The Standing Committee on the Use of Emerging Science for Environmental Health Decisions convenes public workshops to explore the potential use of new science, technologies, and research methodologies to inform personal, public health, and regulatory decisions. These workshops provide a public venue for multiple sectors—academia, industry, government, and nongovernmental organizations, among others—to exchange knowledge and discuss new ideas about advances in science, and the ways in which these advances could be used in the identification, quantification, and control of environmental impacts on human health. More information about the standing committee and this workshop can be found online at https://bit.ly/2MMwBef.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2019. The Promise of Single-Cell and Single-Molecule Analysis Tools to Advance Environmental Health Research: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/25492.
Division on Earth and Life Studies
Copyright 2019 by the National Academy of Sciences. All rights reserved.