The collection and analysis of big data are expected to transform the field of cancer research and improve cancer care. Innovations in analytic methods, coupled with large-scale efforts to collect and extract meaning from big data, are advancing progress in precision medicine initiatives and facilitating research on factors that influence cancer incidence and outcomes.
However, there is a risk that not all individuals and communities will benefit equally from these advances. There are concerns about whether applications of big data research will help to reduce existing health disparities in oncology, or whether they might inadvertently exacerbate these disparities. Analyses of big data have the potential to elucidate ways in which socioeconomic status, race and ethnicity, and other social determinants of health (SDOH) contribute to cancer incidence and outcomes, and may also identify promising avenues for intervention. At the same time, both the underrepresentation of minority and vulnerable populations in big datasets and inappropriate data analysis have the potential to generate biased and inaccurate conclusions that threaten equitable progress in cancer research and care.
1 The planning committee’s role was limited to planning the workshop, and the Proceedings of a Workshop was prepared by the workshop rapporteurs as a factual summary of what occurred at the workshop. Statements, recommendations, and opinions expressed are those of the individual presenters and participants, and are not necessarily endorsed or verified by the National Academies of Sciences, Engineering, and Medicine, and they should not be construed as reflecting any group consensus.
The National Cancer Policy Forum, in collaboration with the Committee on Applied and Theoretical Statistics, held the workshop Applying Big Data to Address the Social Determinants of Health in Oncology on October 28–29, 2019, in Washington, DC. This workshop examined SDOH in the context of cancer, and considered opportunities to effectively leverage big data to improve health equity and reduce disparities. The workshop featured presentations and discussion by experts in technology, oncology, and SDOH, as well as representatives from government, industry, academia, and health care systems. Workshop speakers examined topics that included the following:
- the impact of SDOH on cancer incidence and outcomes;
- opportunities to capture and analyze precise and meaningful data on SDOH in oncology;
- cultural, technical, and legal challenges in applying big data to address SDOH in cancer;
- ethical considerations in big data research and opportunities to improve representation and reduce bias in this research; and
- policies, practices, and research priorities to advance progress using big data to improve health and reduce health disparities in oncology.
This Proceedings of a Workshop summarizes the presentations and highlights suggestions from individual participants to apply big data to assess and address SDOH in oncology. These suggestions are discussed throughout the proceedings and are summarized in Box 1. Appendix A includes the Statement of Task for the workshop. The workshop agenda is provided in Appendix B. Speaker presentations and the webcast have been archived online.2
Several speakers discussed SDOH and their increasing importance in cancer care and research. John Ayanian, Alice Hamilton Distinguished University Professor of Medicine and Healthcare Policy and director of the Institute for Healthcare Policy and Innovation at the University of Michigan, reported that SDOH are defined by the World Health Organization (WHO) as “conditions in which people are born, grow, live, work, and age” (CSDH, 2008). WHO notes that SDOH are “shaped by the distribution of money, power, and resources at global, national, and local levels,” and that they are major drivers of health inequities, including differences in health status within and among countries. Examples of SDOH include characteristics such as income,
2 See https://www.nationalacademies.org/our-work/health-literacy-and-communicationstrategies-in-oncology-a-workshop (accessed April 9, 2020).
education, occupation and employment status, access to food and housing, and neighborhood and environmental factors.3
3 For more information on SDOH, see https://catalyst.nejm.org/doi/full/10.1056/CAT.17.0312 (accessed April 9, 2020).
Several participants described the importance of SDOH for health. Scarlett Lin Gomez, professor of epidemiology and biostatistics at the University of California, San Francisco, Helen Diller Family Comprehensive Cancer Center, and director of the Greater Bay Area Cancer Registry, reported on a meta-analysis finding that social factors account for more than one-third of total deaths in the United States (Galea et al., 2011). Eliseo Pérez-Stable,
director of the National Institute on Minority Health and Health Disparities, said that SDOH are powerful influences on health outcomes: “Your zip code is more important than your genetic code for your life expectancy.”
Several workshop speakers described ways in which SDOH can increase an individual’s risk of developing cancer or experiencing worse outcomes. For example, Ayanian noted that people with lower levels of education are more likely to smoke cigarettes, increasing their risk for lung cancer (NCHS, 2017). Ayanian added that access to high-quality health care is another mechanism through which SDOH affect cancer outcomes, and the quality of care that patients receive is often influenced by patient, community, and health system factors (Ayanian, 2008) (see Figure 1). Ayanian noted that SDOH affect patient outcomes across the continuum of cancer care, through prevention, early detection, treatment, and survivorship care (Ward et al., 2008). “We need to think not just about the biological factors that influence cancer outcomes and cancer risk, but the social, economic, and cultural factors, which are very much driven by politics and policies,” Ayanian said.
Robert A. Winn, director of the University of Illinois Cancer Center,4 agreed and stressed that researchers and clinicians need to broadly consider the environment and context of disease. Using the term “communityomics,” Winn encouraged examination of factors such as obesity, violence, poverty, and stress in communities, as well as the built environment,5 lack of exercise opportunities, and environmental pollutants on cancer incidence and mortality. Winn added that these factors are embedded within our communities, but they tend to be fairly invisible. Sean Khozin, associate director of the Food and Drug Administration’s (FDA’s) Oncology Center of Excellence,6 added that the exposome—or the totality of nongenetic environmental exposures from conception onward (Vrijheid, 2014)—is “highly influential in understanding the underlying mechanisms of oncogenesis and susceptibility to developing disease” (see Figure 2).
Several speakers pointed out that adverse social and economic characteristics often coexist and may exacerbate the risk for poor health outcomes. Ayanian referred to this phenomenon as “intersectionality,” or the presence of multiple interdependent and dynamic social and economic statuses, all of which can affect cancer risk, diagnosis, treatment, and patient outcomes
4 As of December 2019, Robert Winn is director of the Virginia Commonwealth University Massey Cancer Center.
5 The built environment “includes all of the physical parts of where we live and work (e.g., homes, buildings, streets, open spaces, and infrastructure).” See https://www.cdc.gov/nceh/publications/factsheets/impactofthebuiltenvironmentonhealth.pdf (accessed March 26, 2020).
6 As of 2020, Sean Khozin is global head of data strategy for Janssen Research & Development at Johnson & Johnson.
(Williams et al., 2012). Nicole Stern, physician with the Sansum Clinic and past president of the Association of American Indian Physicians, noted that many American Indian and Alaska Native (AIAN) populations have overlapping social and economic disadvantages, including living in rural regions with limited access to health care, having low levels of educational attainment, and experiencing poverty. She said that these interdependent factors help account for the high rates of colorectal cancer mortality among AIAN populations. These AIAN individuals are more likely to be diagnosed with advanced colorectal cancer and have worse prognoses compared to white populations, and colorectal cancer mortality is 39 percent higher among AIAN populations compared to white populations (Cueto et al., 2011; Jemal et al., 2004; Perdue et al., 2014).
Caroline Fichtenberg, managing director of the Social Interventions Research & Evaluation Network (SIREN) at the University of California, San Francisco, agreed that it is important to understand interconnectedness among SDOH, particularly those related to economic insecurity. “Thinking about them as isolated needs is probably not very accurate in terms of people’s lived experience,” she said. Rebecca Miksad, senior medical director of Flatiron Health, agreed and provided an example of a patient who was diagnosed with late-stage cancer. This patient held off on seeking care earlier in the disease course due to a lack of access to care, and also because she did not fully appreciate that her symptoms were indicative of serious illness. While she was receiving cancer treatment, she also had challenges with making her appointments because she could not afford to take time away from work.
Katherine Hempstead, senior policy advisor for the Robert Wood Johnson Foundation, stressed that it is important to understand that SDOH are more fundamental than social needs,7 and it is important to not exclusively focus on an individual need—such as food insecurity—when health outcomes are influenced by multiple factors. She noted that food insecurity is often driven by low income, which not only limits access to healthy foods but also affects access to adequate housing and the availability of transportation for medical appointments. She stressed that the intersectionality of SDOH requires adopting a broad perspective for both research and patient care. Gwen Darien, executive vice president of advocacy and engagement at the National Patient Advocate Foundation, agreed, but stressed that it is important to address individuals’ social needs while also understanding social and political context.
Cara James, director of the Office of Minority Health at the Centers for Medicare & Medicaid Services (CMS), suggested that there should be common terminology to distinguish social needs from SDOH, and called for col-
lective accountability and coordination to address patients’ social needs. She said that there are several governmental agencies—including those focused on education, transportation, and housing and urban development—whose missions directly align with addressing individuals’ social needs. “Why are we looking to the health care system to fix these things when we have in practice an infrastructure that is supposed to be in place to address these larger needs?” she asked. Beth Virnig, senior associate dean of academic affairs and research and professor of health policy and management at the University of Minnesota School of Public Health, added that this also raises questions about which social factors should be included in a patient’s electronic health record (EHR) and for what purposes.
Several workshop participants discussed the importance of health equity in the context of SDOH. Gomez referenced Paula Braveman’s definition of health equity as “social justice in health,” and said that “health equity will be achieved when no one is denied the possibility to be healthy by belonging to a group that has historically been economically or socially disadvantaged” (Braveman, 2014). Gomez said that policies that influence the distribution of resources and access to health care services have been responsible for perpetuating social and economic inequities, which in turn contribute to health disparities8 (Warnecke et al., 2008). Blythe Adamson, senior quantitative scientist at Flatiron Health, quoted Yousuf Zafar from Duke University in saying, “Cancer injustice is not a scientific problem. This is a policy problem and we need to level the playing field” (Goodman, 2019). Monica Webb Hooper, associate director for cancer disparities research and director of the Office of Cancer Disparities Research at Case Comprehensive Cancer Center, added, “This is an issue of social justice: [it is a matter] of providing equal access and opportunity for all and doing our best to remove these individual and systemic barriers to fair health treatment.”
Fichtenberg noted that several health systems and payers are investing in novel approaches to improve health by addressing patients’ social needs and/or SDOH. John Steiner, senior investigator for the Institute for Health Research at Kaiser Permanente Colorado, agreed and noted, “Kaiser is [operating under the assumption] that the next big thing to improve quality and reduce costs in health care is to assess social determinants and use that information to improve care in various ways.” Fichtenberg added that a National Academies consensus study report on integrating social care into the delivery of health care concluded that “taking the social conditions in which an individual lives, works, and plays into account is critical to improving both primary prevention
8 Health disparities are “preventable differences in the burden of disease, injury, violence, or opportunities to achieve optimal health that are experienced by socially disadvantaged populations” (CDC, 2008).
and the treatment of acute and chronic illness because social contexts influence the delivery and outcomes of health care” (NASEM, 2019, p. 19).
Several workshop participants stressed that accurate collection and analysis of large and diverse datasets will be critical to assessing and addressing SDOH in oncology. Nicholas Horton, Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College, noted that new data and analytic techniques are enabling researchers to answer important questions about health that were previously inaccessible. Tony Kerlavage, director of the Center for Biomedical Informatics & Information Technology (CBIIT) at the National Cancer Institute (NCI), added that there are unprecedented amounts of data produced in basic and clinical research and care delivery, and that data are increasingly generated from new sources, such as wearable devices and digital survey tools. Paul Tang, vice president and chief health transformation officer at IBM Watson Health, agreed and noted that analyzing big data has the potential to elucidate previously unknown connections between SDOH and health.
With the widespread adoption of EHRs in cancer care, analysis of real-world data can provide important insights that supplement data collected from clinical trials. Victoria Seewaldt, Ruth Ziegler Professor and Chair of Population Studies and associate director of population sciences research at City of Hope, noted that participants in clinical trials do not reflect the diversity of the patient population, and that diverse racial and ethnic populations are often underrepresented in research (Oh et al., 2015).
Khozin noted that traditional clinical trials are exclusionary. A meta-analysis of immuno-oncology trials found that among 16,000 clinical trial participants, more than three-quarters were white and there were more male than female participants (Khozin et al., manuscript in preparation). The meta-analysis also found a lack of geographic representation, with most participants enrolled from Europe and North America; among those enrolled in the United States, participants were disproportionately from the East Coast. Augusto Ochoa, director of the Stanley S. Scott Cancer Center and professor of pediatrics at the Louisiana State University Health Sciences Center New Orleans, stressed that clinical trials often underrepresent patients who receive their cancer care in community oncology practices, where the vast majority of people with cancer are treated. Clifford Hudis, chief executive officer of the American Society of Clinical Oncology (ASCO) and the Conquer Cancer Foundation, agreed and added that clinical trial participants are also more likely to be of higher socioeconomic status and from urban and suburban areas. Several speakers expressed concerns about the generalizability of clinical
trial results to the broader populations of patients with cancer. “I question the representatives of our analyses and bias from the systematic missingness of data,” Adamson said.
Several participants noted that the data collected from EHRs, administrative claims databases, and other sources are more likely to be representative of diverse populations, and are more likely to provide insights on how SDOH affect health outcomes. However, several participants noted that such insights will be valid only if the data on which they are based are broadly inclusive. Gomez said this is especially true for combined datasets that have sufficient subpopulations for analysis. For example, she referenced how her group’s current study, which combined two large databases, found that the incidence of lung cancer among women of specific Asian American, Native Hawaiian, and Pacific Islander ethnicities who did not smoke was approximately twice as high as that for white women who did not smoke. Ayanian noted that by mining health care data on multiple levels—including patient, clinician, practice, and health system characteristics—researchers can more fully assess the quality of care that people receive. Adamson noted that big data research can also help investigators evaluate health care policies and their effects on health equity.
Many speakers discussed which SDOH are of high priority in cancer research and care, as well as the data sources that could be leveraged to collect this information.
Chanita Hughes-Halbert, associate dean for assessment, evaluation, and quality improvement; professor of psychiatry and behavioral sciences; and distinguished AT&T Endowed Chair for Cancer Equity at the Medical University of South Carolina, noted that an Institute of Medicine report recommended that several social and behavioral risk factors—including social isolation, financial resource strain, intimate partner violence, tobacco or alcohol use, physical activity, stress, and depression—be assessed as part of a clinical encounter (IOM, 2014). Fichtenberg added that social risk factors collected in current research often include food security, housing security, housing quality, transportation, utility needs, legal needs, child care needs, safety concerns, and social support. In addition to capturing data on social risk, Ayanian stressed that it is important to develop and implement measures that capture information on resilience and other factors that could enable individuals to achieve good health outcomes despite exposure to social risk. Stanton L. Gerson, director of the Case Comprehensive Cancer Center and professor
of hematological oncology at Case Western Reserve University, added that it is also important to collect data on where people live, because this can help researchers capture neighborhood-level SDOH. J. Leonard Lichtenfeld, deputy chief medical officer of the American Cancer Society, recommended collecting longitudinal SDOH data along with data on the performance of the economy over time. “If you don’t parallel the social determinants over time along with other considerations, you lose an opportunity to understand what happens to people [as a result of changing economic conditions],” he said.
Workshop participants described a variety of data sources that are available to assess and address SDOH in oncology (see Box 2). Fichtenberg noted that data may either be collected from individuals (individual-level data) or geographic areas (area/geographic-level data) (see Figure 3).
Marta Jankowska, assistant research scientist at the University of California, San Diego, said that satellite imaging is an important source of geographic-level SDOH data. She noted that satellite images have been available for decades, but recent improvements in both image quality and analytic techniques have improved their utility for health research. Jankowska said that a major advantage of satellite imaging is that these data are often free, ubiquitous, and captured repetitively on a frequent basis. Furthermore, satellite images are flexible enough to identify both large-scale patterns (e.g., weather) and small-scale features of the built environment.
Loretta Erhunmwunsee, assistant professor of thoracic surgery at City of Hope, noted that area-level data (such as neighborhood-level data) are often easier to collect than individual-level data, but there is often a strong correlation between the two. She suggested that future research should combine sources of neighborhood-level data and individual-level data to produce a more comprehensive perspective of SDOH. Jaime Hart, assistant professor of medicine at Brigham and Women’s Hospital and Harvard Medical School, also encouraged researchers to think at the population level when studying SDOH, and not just at the individual level. She noted that some cancer-causing factors, such as air pollution, affect groups of individuals and cannot be modified on an individual level.
Although geographic-level data can help illustrate connections between SDOH and health when combined with other data sources, several workshop participants stressed that geographic data alone may be insufficient for certain purposes. Gomez suggested that it is important to consider interactions
between geographic-level and individual-level data, because environmental exposures may have differential effects depending on individual characteristics. Fichtenberg noted that while geographic data describe the environments in which individuals reside, they do not directly measure the circumstances of their lives. “Individuals may live in neighborhoods where there are a lot of
challenges, but may not themselves face those challenges,” she said. Virnig noted that the larger the geographic area, the greater the chance that the mean values collected will fail to represent the full spectrum of individual experiences.
Several workshop participants described sources of individual-level data for SDOH, including EHRs and administrative claims databases. International Classification of Diseases codes in these databases can be used to identify patients with cancer, and SDOH data such as race, ethnicity, insurance status, and language fluency are often collected as well. However, several workshop participants noted that SDOH data in EHRs are often captured in unstructured data fields and require either a natural language processing9 tool or a human data processor to make them usable for research.
Workshop participants also described the use of technology to collect individual-level data that may inform research on SDOH. For example, global positioning system (GPS) data can be linked with other data sources to assess environmental exposures. Jankowska noted that this might provide more accurate information than residential data would with respect to environmental exposure, given that most people spend a substantial portion of their day away from their home. “GPS data can give us a really accurate sense of the environmental exposures and social environments that someone is actually experienc-
ing on a day-to-day basis,” she said. Khozin added that several new digital sensors are also enabling researchers to directly capture patients’ experiences.
Several workshop participants described strategies for combining individual-level and geographic-level data to provide a more complete picture of SDOH. Erhunmwunsee described how she combined publicly available national data, including from the U.S. Census, with data from EHRs, institutional tumor registries, and national cancer databases to study the links among neighborhood, SDOH, and patient outcomes in cancer care (Erhunmwunsee et al., 2012). Hart reported that the Center for Research on Environmental and Social Stressors in Housing Across the Life Course (CRESSH)10 recently combined its administrative, satellite, meteorology, and global information system (GIS) databases for all of Massachusetts, enabling researchers to access area-level data on SDOH such as air pollution, economic vulnerability, racial and economic segregation, food insecurity, and housing insecurity. She said that researchers are starting to link this information to mortality data, and are also exploring ways to link to cancer registries to assess cancer outcomes in relation to SDOH (Yitshak-Sade et al., 2019).
Adamson said that big data endeavors often require collaboration across multiple disciplines, including software engineering, data science, epidemiology, and oncology. In addition, Karen Basen-Engquist, director of the Center for Energy Balance in Cancer Prevention and Survivorship at The University of Texas MD Anderson Cancer Center, said that combining datasets may also require cooperation across institutions that are reluctant to partner. Scott Ramsey, director of the Hutchinson Institute for Cancer Outcomes Research at the Fred Hutchinson Cancer Research Center, agreed that collaboration among disparate organizations can be difficult, and suggested that a unifying call to action can help spur progress. He said that an important incentive for the creation of a statewide cancer consortium was the recognition that there were significant disparities in cancer care in Washington State and the need to improve outcomes for all patients. John Steiner noted that another incentive to collaborate is to improve generalizability of findings, especially among smaller subpopulations that are underrepresented within individual datasets. “The incentive for researchers to
share data across organizations is to address this generalizability issue … and to support good science,” he said.
Lynne Penberthy, associate director of the NCI Surveillance Research Program, added that an incentive for pathology laboratories and pharmacies to share data with cancer registries is to better understand how tests and therapies are being used. She described a partnership among Walgreens, the Georgia Comprehensive Cancer Registry, and a statewide hospital discharge database to identify potential pharmacy interventions that might prevent unnecessary hospitalizations and emergency room visits.
“We partner out of necessity. Big population science has always been a big team sport but now the players are much larger…. We’re going to need to collaborate on a global scale,” said John Michael Gaziano, principal investigator of the Department of Veterans Affairs’ Million Veteran Program.
Many speakers discussed the potential for artificial intelligence (AI) and new statistical techniques to improve the knowledge base on how SDOH affect cancer risk and outcomes.
Advances in AI technologies underlie much of the enthusiasm for mining big data for insights related to SDOH and cancer outcomes. Many of those advances involve greater computing power and new machine learning algorithms that enable a computer program to learn patterns and classify novel data. As Seewaldt noted, oncologists tend to base their treatment decisions on their review of clinical practice guidelines, the medical literature, and their experiences caring for thousands of patients with cancer. However, an AI program can base decisions on analyses of vastly larger datasets—such as EHRs from millions of patients—that can be combined with a multitude of other large data sources (e.g., radiologic images and reports, pathology reports, scientific and medical research databases, and clinical trial results).
AI also has the potential to improve treatment decision making for smaller population subgroups, if the data that are used to train algorithms are more diverse and contain more minority subgroups than the data that a clinician uses to inform his or her decisions. Seewaldt said that the hope for AI is to “discover the diversity that exists within each of us and the diversity in the interventions that we should each receive. Let’s harness AI as a way of embracing diversity.”
Adamson discussed natural language processing algorithms, which can be used to extract clinical information from both structured and unstructured
data fields. Workshop participants also discussed machine learning, a subtype of AI in which a computer program attempts to identify patterns within a dataset without human intervention based on analysis of training datasets. I. Glenn Cohen, James A. Attwood and Leslie Williams Professor of Law and faculty director of The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School, explained that machine learning algorithms are classified into supervised and unsupervised forms. In supervised learning, the program is presented with example inputs (e.g., pictures of skin moles) along with the desired outputs (e.g., diagnoses of benign or non-benign), and the computer program works to derive a general rule that links those inputs and outputs. In contrast, in unsupervised machine learning, no desired outputs are given, and the program is left to identify patterns independently, Cohen noted.
David Steiner, senior clinical research scientist at Google Health, said that the availability of large datasets for training and validation have enabled machine learning approaches to generate accurate and actionable information. He stressed that, with less data, machine learning is often less effective.
Cohen pointed out that AI can be used to augment clinician expertise, rather than replace clinician decision making. David Steiner agreed, and added, “AI will arguably never be able to understand context—this amazing clinical acumen for both diagnostic interpretation and patient care, which is complex. So we need to distill the information [with AI] and then allow humans to interpret it.” Tang added, “The true working of any technology should be the intersection of the technology with humanity. That produces a better human professional that helps others.”
David Steiner noted that some research has found that the combination of a pathologist and a machine learning algorithm is more accurate and efficient at diagnosing metastatic breast cancer from images of sentinel lymph node specimens than either approach alone (Liu et al., 2019; Steiner et al., 2019). Additional studies have found that AI programs are as effective as pathologists at grading prostate cancer (Nagpal et al., 2019) and can identify lung cancers missed by human experts (Ardila et al., 2019). “AI offers the opportunity to translate code into better care,” David Steiner emphasized. By integrating effective AI approaches in care workflows, Cohen and Steiner noted that AI has the potential to expand capacity for cancer diagnosis, particularly in low-resource settings.
Cohen noted that most research on AI in oncology has focused on imaging, as well as on diagnostic and prognostic applications. He said that future applications will likely include increased emphasis on treatment decision making, such as choosing between different chemotherapies or different modalities of treatment. To build these AI applications, Tang said, researchers are compiling data from EHRs and other data sources to com-
pare an individual patient’s information to that of millions of other patients to make tailored treatment recommendations optimized to an individual patient’s characteristics.
David Madigan, dean emeritus of the Faculty of Arts and Sciences and professor of statistics at Columbia University, noted that biases in observational research can lead to inaccurate and unreliable conclusions. These errors include selection bias, measurement error, and unmeasured confounding variables. “We basically say ‘let’s assume that all of these sources of bias don’t exist’ and we proceed. But that is just pure hubris, and that’s dangerous,” he said. He stressed that it is critical that researchers properly account for uncertainty in data analyses.
Several speakers described techniques for correcting biases and improving the validity of statistical inference to facilitate the use of big data for SDOH research in oncology. Madigan described an algorithm-driven approach that estimates causal effects from observational data. This algorithm uses negative controls—interventions known to have no known causal links to an outcome—as points of comparison for an intervention to derive more reliable confidence intervals for estimates of effect. “Generally these intervals are wider than the nominal confidence intervals because they account for other sources of error,” Madigan said (Schuemie et al., 2020).
Adamson described two other analytic techniques, difference-in-difference and interrupted time series analysis,11 and noted that they may also be used to improve trust in the results obtained from observational data. She noted that interrupted time series analysis was used by FDA researchers to assess the impact of a new label restriction for immunotherapies by comparing actual treatment trends to expected trends to identify the effect of the policy change (Parikh et al., 2019).
Osonde Osoba, information scientist at the RAND Corporation and professor at the Pardee RAND Graduate School, reported on various techniques to ensure patient privacy while using large digital datasets for research.12 He
11 Difference-in-difference and interrupted time series analyses are quasi-experimental statistical techniques to improve causal inference from observational data. For more information, see https://www.mailman.columbia.edu/research/population-health-methods/ difference-difference-estimation (accessed April 10, 2020) and https://www.bmj.com/content/350/bmj.h2750 (accessed April 10, 2020).
12 Additional discussion of patient privacy concerns is included in the “Challenges” section of this Proceedings of a Workshop.
said that deidentified13 EHR data are often used in research, but it may be possible to reidentify individuals from these datasets using variables such as age and partial zip code. To ensure that data remain deidentified, Osoba reported on a technique called k-anonymity, in which identifying characteristics are reported in a way that they no longer represent a particular individual. For example, someone with an age of 21 could be listed as having an age range of 21 to 28 to help prevent reidentification. Osoba noted, however, that current analytic techniques to protect patient privacy are still imperfect. “They provide barriers but … there is always background information in other datasets, so even if you think your data are deidentified, by linking enough of those background databases, it may be possible to reidentify many of your participants.” Given the potential for reidentification, several speakers discussed strategies to protect patient privacy (see the “Implementing Data Policies and Standards” section).
Several workshop participants discussed promising research directions for SDOH in oncology and strategies to advance progress in the field. Ayanian noted there is insufficient research linking many SDOH to health outcomes, in part due to a lack of real-world data on social determinants. He said that government data, as well as data from health providers and health plans, include only a small number of variables (such as country of origin and rural versus urban residence) that may be relevant to SDOH (Buntin and Ayanian, 2017; NASEM, 2017) (see Figure 4).
Virnig agreed that SDOH should be better documented in EHRs and claims data. She suggested that health care systems should commit to documenting social needs in medical records, because social needs affect health outcomes. To build trust within communities and facilitate expanded data collection efforts, Deborah Schrag, chief of the Division of Population Sciences at the Dana-Farber Cancer Institute and professor of medicine at the Harvard Medical School, suggested conducting research on optimal strategies for engaging patient communities and conveying the importance of collecting SDOH information.
13 According to the Health Insurance Portability and Accountability Act, under the Safe Harbor standard, data are considered deidentified when they have been stripped of 18 elements that could potentially be used to identify individuals, including names, geographic subdivisions smaller than a state, dates related to individuals (e.g., date of birth or death), telephone numbers, mailing addresses, and Social Security numbers. For more information, see 45 CFR 164.514 at https://www.ecfr.gov/cgi-bin/text-idx?SID=e2d01c39ca31d7134098caedf4c4e1e3&mc=true&node=se45.2.164_1514&rgn=div8 (accessed April 16, 2020).
Pérez-Stable noted that additional research is also needed on recently identified SDOH, including food access, policing activity, safety, and community cohesion. “We need more research on how these factors interact with our clinical reality for individuals and how we are able to evaluate that,” he said. Gomez noted that there is a need to develop better analytic techniques to understand the dynamic interactions among different SDOH to better assess the impact of intersectionality on health outcomes. Lisa Richardson, director of the Division of Cancer Prevention and Control at the Centers for Disease Control and Prevention, said that it is critical to conduct SDOH research with the purpose of improving health outcomes for individual patients.
Several speakers said that additional research is needed to identify best practices for collecting and interpreting SDOH data over time. “Social determinants of health can change over the life course so we need to think carefully about whether we are getting the right measure so that we are understanding the right association,” Virnig said. Hart agreed, noting “You are not who you are only at the time you are talking to your clinician. All your lived experiences come along with you and frame your health status at any given time, so thinking about this time dimension is key.”
Ayanian suggested adopting life-course and multi-level approaches (i.e., those targeted at both individual and neighborhood levels) to analyzing social risk factors, as well as investigating biosocial and gene–environment interactions and resilience factors. “When do people do well even though they face a heavy burden of social risk?” he asked. Gomez also stressed the importance of research to elucidate biological mechanisms, such as stress response, through which SDOH can affect health outcomes. She described a multi-level research project called Research on Prostate Cancer in Men of African Ancestry (RESPOND), which aims to identify both biological and social drivers of prostate cancer in African American men (see Box 3). Hughes-Halbert discussed the importance of understanding how psychosocial stress is converted to biological stress that contributes to the initiation and progression of disease. Osoba strongly encouraged interdisciplinary research to investigate these questions. He suggested involving experts from diverse fields, including economists, social scientists, computer scientists, and clinicians, in the study of SDOH.
Virnig suggested that researchers should conduct more policy intervention research, stressing that these endeavors are important to achieving improvements in health outcomes. She added that it is unfortunate that the results of many small-scale policy interventions on SDOH are never published, because these results should be widely disseminated to inform future research and policy. To better evaluate and share the impact of policy innovations, she recommended that researchers leverage tools such as the Learning Health System Model from the Agency for Healthcare Research and Quality (AHRQ, 2019). John Steiner agreed, and stressed that social needs interven-
tions should be evaluated with the same rigor as clinical research. He added that there are several systematic reviews of interventions to address basic needs, but the data to guide policy changes are often limited. Fichtenberg noted that SIREN14—a national initiative housed at the University of California, San Francisco—focuses on building and disseminating evidence about health care sector strategies to improve social conditions. In addition to initiating and conducting high-quality research, SIREN also collects and disseminates research findings through its Web-based Evidence and Resource Library.
Multiple speakers noted that there is limited information about the needs of undocumented immigrant populations, and that conducting SDOH research to better understand and support their needs is challenging. George Weiner, C.E. Block Chair of Cancer Research and director of the University of Iowa Holden Comprehensive Cancer Center, asked whether it would be possible to identify and address these patients’ care needs without putting them at risk of deportation. Pérez-Stable responded that this is a key concern, and suggested that federally qualified health centers (FQHCs)15 may be best suited to obtaining more information on the needs of these individuals. He added that undocumented immigrants are not eligible for Medicaid or other government resources, and often depend on charity care. Robin Yabroff, senior scientific director of health services research at the American Cancer Society, added that it is important to identify research strategies that enable patients to safely access health care and contribute to SDOH research.
Many workshop participants discussed challenges in leveraging big data to assess and address SDOH in oncology. These challenges involve cultural, technical, and legal factors.
Several speakers said that cultural challenges—including a lack of trust in clinicians and researchers among vulnerable populations, implicit bias among clinicians and researchers, and inadequate collaboration among investigators—are impeding research on SDOH in oncology. Hudis stressed that achieving a shared vision and purpose is critical to addressing these cultural challenges: “Constructing a world where we have access to the social determinants of health data and we do something about them, will never succeed unless we have societal buy-in … [but] it’s a surmountable problem,” he said.
Darien noted that patients may be less likely to provide information on SDOH for research or health care purposes if they do not trust the
15 FQHCs are “community-based health care providers that receive funds from the Health Resources and Services Administration (HRSA) Health Center Program to provide primary care services in underserved areas.” They “meet a stringent set of requirements, including providing care on a sliding fee scale based on ability to pay and operating under a governing board that includes patients” (HRSA, 2019).
researchers or clinicians. Stern added that many Native Americans mistrust the research community, given that prior interactions with researchers have often been disrespectful of the Native American communities’ culture and social norms. “Trust is an underappreciated social determinant of health,” said Webb Hooper, and cited research showing that African Americans report greater mistrust of researchers than non-Hispanic whites (Braunstein et al., 2008; Kennedy et al., 2007). Webb Hooper said that this mistrust is partly driven by prior mistreatment by medical and research institutions, as well as concerns about the misuse of their data in research. Osoba agreed, saying, “Often there is such a huge power imbalance [between researchers and community members] that it is hard for consumers to trust you to use their data responsibly.” Cohen noted that this mistrust contributes to underrepresentation of racial and ethnic minority patients in clinical research, and when this research is used to train machine learning algorithms, this underrepresentation may further perpetuate disparities in health outcomes (see the “Ensuring Equity and Quality in AI Technologies” section).
A multi-site study examining patient perspectives on the collection of social risk data in clinical settings found that 80 percent of patients found it acceptable for clinicians to ask about social needs, but only about 68 percent of patients were comfortable with social needs information being recorded in EHRs (De Marchis et al., 2019). “We think that’s because once it’s in the EHR, patients know it can spread more easily and won’t know what it’s going to be used for,” Fichtenberg said.
Arlene Ash, professor of population and quantitative health sciences and division chief of biostatistics and health services research at the University of Massachusetts Medical School, also described the difficulties of SDOH data collection. She said it is important to build patients’ understanding of the value of SDOH data as well as patients’ confidence that sensitive data will be handled appropriately. Pérez-Stable added that the collection of extensive data in clinical encounters can also interfere with the delivery of care, because these data collection efforts may require clinicians to spend more time interfacing with EHRs during patient visits. He suggested that health care systems and researchers consider strategies for data collection that do not hamper patient–clinician communication during clinical visits.
Several workshop participants described how implicit bias among clinicians and researchers can affect research prioritization and study design, as well as the collection and analysis of data. For example, Erhunmwunsee noted that clinicians often make negative assumptions about patients with greater social needs, and treat patients from minority racial and ethnic backgrounds
differently (FitzGerald and Hurst, 2017; Hall et al., 2015). “When there’s an interaction between patient and clinician or researcher, we have got to be very cognizant that we ourselves come in with our assumptions,” she said. She noted that many researchers may incorrectly assume that African American patients would not want to participate in clinical trials due to mistrust, and this assumption could dissuade clinicians from offering the option of a clinical trial. To ensure that diverse participants are included in research, Virnig suggested that researchers should seek to recruit patients from varied clinical settings, and not only from academic medical centers.
Virnig noted that bias can also affect which research areas receive attention and funding. “We need to challenge our basic scientists to see the bias in what they’re choosing to [study] and what they are not.”
Several workshop participants pointed to inadequate data sharing and collaboration among researchers as additional challenges. Kerlavage noted that there are several cultural, financial, and legal barriers that create disincentives for data sharing and aggregation. Ash noted that even databases generated within states are often not shared. In her work to build models that predict health care costs and outcomes in Massachusetts, she said that it has been difficult to acquire data on nutrition and incarceration status. “We could have data that captures so much more about what’s really important for people’s health, but we don’t because we have siloed organizations,” she said. Schrag added that even when these datasets are available to researchers, acquiring the data is often time consuming and expensive. She advocated for more collaboration and database integration among federal agencies. John Steiner agreed, noting that Kaiser Permanente has taken measures to enable data sharing among its health care partners.
Patient-reported outcomes (PROs) are an important component of SDOH data in oncology, but are often difficult to collect from vulnerable populations, said Schrag. For example, it may be difficult to collect PROs using digital tools in rural areas, due to limited cellular or broadband coverage. “If we have data solutions that purely depend on technology, we will continue to have a gap and leave [some] people behind,” she said. Language differences can also complicate the collection of accurate PROs, said Abby King, professor of epidemiology and population health and professor of medicine at Stanford University. She added that when she provides the same questions to Spanish-speaking and English-speaking participants, the words have slightly
different meanings in different languages, even when they have been carefully translated.
Several participants discussed the education and training needs to leverage big data analytic methods in SDOH research in oncology. “We need to develop the analytic skills and the technical infrastructure and human capacity to be able to handle these new messy and huge data sources, and to teach ourselves how to create reliable predictions based on those data,” said Ash.
Hempstead suggested educating researchers and the public on these methods so “we can create a little bit more trust and clarity around some of the methodology people use.” Jankowska also suggested training researchers to analyze large, “messy” datasets that better represent real-world context. Gomez added, “We need to be mindful of how we’re educating and training the folks who are using the data to be responsible data stewards, and also, unfortunately, to think about those people who might be using it for nefarious purposes and how to protect against that.” She also pointed out the need to educate clinicians about the importance of collecting SDOH data, because many do not see it as part of their purview.
Ahmed Hassoon, research associate in the Division of Cardiovascular and Clinical Epidemiology at Johns Hopkins University, stressed the importance of investments in training the next generation of leaders and scientists, noting that his institution has no clear educational or career path for the interdisciplinary research required for applying big data to address SDOH in oncology. Yabroff and Hudis responded that many organizations, including the National Institutes of Health, have begun to offer training grants to address this issue. Lichtenfeld added that the American Cancer Society devotes a substantial portion of its extramural grant funding to supporting young investigators in an effort to train the next generation.
John Steiner noted that Kaiser Permanente’s medical school focuses on population health and community-linked care as a key pillar of its curriculum. The school ensures that medical students receive practicum experience in conducting community health assessments and quality improvement programs that rely heavily on Kaiser’s extensive data collection efforts. “This could be a genuinely different kind of medical school training experience,” he said, noting that Kaiser’s program is inspired by the Mayo Clinic and other institutions that have incorporated population-based care into the structure of their medical schools.
Many speakers described a variety of technical challenges that are preventing researchers and clinicians from assessing and addressing SDOH in oncology, including a lack of interoperability, inadequate data standardization, and lack of appropriate validation of AI technologies.
Many speakers stressed that a primary challenge in conducting research is a lack of data standardization and interoperability among datasets, which can impede data integration and analysis. Interoperability challenges arise both in research and in clinical care. Virnig noted that it is often difficult to transfer data across EHRs, and patient data are often lost or difficult to piece together in the transition between health systems. She added that the lack of EHR interoperability also complicates the extraction of SDOH data for research purposes. Madigan noted that one initiative to address the lack of standardization in real-world data is the Observational Health Data Sciences and Informatics collaboration (see Box 4).
Several workshop participants explained how data standardization can help facilitate data linkages. Kerlavage noted that combining data from different sources requires a minimal set of demographic or clinical data that can be used to identify individuals. A lack of standardization complicates the use of these data to join datasets. Kerlavage also noted that the ability to link cancer registry data with EHRs, administrative claims data, the National Death Index, residential history, and environmental exposure data could be invaluable for research on SDOH, but these linkages are difficult to achieve due to the lack of standardization. Adopting structured coding systems for medical terminology can help facilitate data linkages, said Fichtenberg. However, she said codes for SDOH data and related activities are not comprehensively available (Arons et al., 2018), a challenge that The Gravity Project is attempting to solve (see Box 5).
Kerlavage noted that efforts to integrate diverse datasets will require more than data standardization. “Studies generate different data types, and use different analysis tools and IT infrastructures, which are not interoperable, and this leads to data collections being siloed from each other,” he said. He
added that there are often no standard data collection practices, platforms, or policies. Some of these issues are starting to be addressed with Cloud-based commons, which amass large amounts of data and bring analysis tools to the data, rather than the reverse. While repositories of data analysis tools are being created, Kerlavage stressed that validation of these tools is lacking. He added that sharing statistical, AI, and machine learning models is critical for validation, but there is variable willingness to share such models, and no agreed-upon approach to facilitate such sharing. To promote transparency and better science, Kerlavage suggested that researchers who share their data and tools should receive credit for doing so (e.g., in grant funding and promotion and tenure decisions).
Several speakers highlighted the challenge of ensuring that AI technologies used in cancer research and care are high quality and achieve the goal of health equity. “If there is inequity built into the existing health care delivery system, it is going to be in the data you analyze, and you have to think about what to do about it,” said Ash.
Webb Hooper noted that there is evidence of racial bias in AI algorithms used in health care. She referenced a recent study that evaluated a decision-making algorithm used widely in hospitals that systematically discriminated against African American patients (Ledford, 2019). The algorithm, designed to identify high-risk patients who needed additional support for follow-up care, based its assessment on care costs for patients over the prior year. Because African American patients tended to have lower health care costs than white patients in the prior year, the algorithm did not identify as many Black patients as candidates for the support program. However, several speakers stressed that the lower health care costs were related to a lack of access to care, and not level of need. Webb Hooper said that developing robust and unbiased algorithms will require collecting more complete and accurate data from patients. Gomez noted that developing unbiased algorithms will also require community and scientific content expertise to interpret AI results.
David Steiner agreed, and stressed that the quality of machine learning algorithms is dependent on the data used to train and validate them. Underrepresentation of subpopulation groups in these training sets can create algorithms that perpetuate biases. Steiner suggested that developers need to be aware of these risks and select datasets that are representative of the populations in which the algorithms will be used. Cohen added that developers should make AI models more transparent to enable external validation, but he noted that algorithms developed by for-profit companies are often proprietary, and the companies are unlikely to share them for validation purposes.
Cohen also noted that in current practice, patients are not routinely informed when AI technologies are used in their care, and he questioned whether this should be disclosed and whether patients should consent to their use. Cohen added that the regulation of AI technologies is an evolving area. He noted that FDA currently regulates some software associated with medical devices.16 As machine learning programs are increasingly adopted in health care, their decision-making capabilities will also evolve, which he said will further complicate their regulation.
Many speakers described legal challenges that may arise when conducting big data research on SDOH. Kristen Rosati, partner at Coppersmith Brockelman, PLC, provided an overview of the complex set of rules and regulations that aim to protect patient privacy and data security (see Box 6). She noted that some of these provisions are overlapping or may be contradictory, which can make compliance challenging for researchers and institutions. Rosati noted a recent trend toward implementation of rigorous state-based consumer privacy protection laws, beginning with the California Consumer Privacy Act.17 In addition, she said there is bipartisan support for federal legislation to strengthen consumer privacy protections.18 She added that the fragmented nature of state, federal, and international laws protecting patient privacy and their varying interpretations often create barriers to effective collaboration and data sharing.
Virnig said that the Privacy Rule promulgated under the Health Insurance Portability and Accountability Act (HIPAA)19 is well intentioned, but its standards for deidentification of data encumber research that aims to understand and address SDOH (IOM, 2009). Cohen added that the HIPAA Privacy Rule is designed to protect electronic health data at hospitals, medical practices, and health insurers, which are considered HIPAA-covered entities. However, data generated by HIPAA-covered entities represent only a small portion of sensitive health data (see Figure 5). He noted that data generated from other sources, such as smartphones or wearable devices, may also need
16 For more information, see https://www.fda.gov/medical-devices/digital-health/softwaremedical-device-samd (accessed April 10, 2020).
17 For more information on the California Consumer Privacy Act, see https://www.csoonline.com/article/3292578/california-consumer-privacy-act-what-you-need-to-know-to-be-compliant.html (accessed March 5, 2020).
18 For more information, see https://www.congress.gov/bill/115th-congress/housebill/4081 (accessed April 10, 2020).
patient privacy protections, but these sources are not covered by current federal regulations (Price and Cohen, 2019).
Several workshop participants also raised the challenge of protecting patient privacy after identifying information has been removed from health data (i.e., the data have been deidentified).20 Osoba said technological barriers are usually effective to deter unmotivated adversaries, but that there is still an obligation to try to protect patient privacy, noting that privacy protections and technologies cannot guarantee complete anonymity. Rosati agreed, and cited a study that found it is possible to identify nearly all Americans from almost any dataset with as few as 15 attributes (Rocher et al., 2019). “There is a lot of risk right now to deidentified datasets and very little protection for individuals” against reidentification from deidentified data, Rosati stressed. Given these limited protections, Rosati noted that some members of the privacy advocacy community are pushing for policies that require patient consent for any data usage, including deidentified data. But Rosati said these policies would create substantial challenges for researchers who use big data, particularly in the case of real-world data that were not initially collected for research purposes. Instead, a better policy outcome would be to enact prohibitions against reidentification of individuals from deidentified datasets.
Jankowska noted the inherent tensions between making consent forms easily understandable for patients while still including the information required by Institutional Review Boards. Osoba suggested that many patients have difficulty processing complex patient consent and disclosure forms, and expressed skepticism about the possibility of creating plain-language consent forms that still convey necessary information. However, he added that he was “optimistic that there are other ways to engender [patient] trust.”
Cohen also expressed skepticism that patient consent for future data use can truly be informed: “Are we really expecting patients to understand the potential risks and benefits of sharing their data?” He suggested looking to systems-level strategies for protecting patients’ interests in health research. For example, he said that people in Denmark are willing to share sensitive data—such as genetic information in government databases—because they are confident that the country’s medical and social systems will protect them from adverse consequences, such as discrimination, if a privacy breach were to occur. Osoba pointed out that privacy expectations are often contextual, and many patients may be more willing to sacrifice some measure of privacy for causes they support.
20 HIPAA-protected data are considered deidentified when they have been stripped of all identifying information or when an expert statistician has reviewed the dataset and certified that the data are deidentified (for more information, see https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html#standard [accessed March 5, 2020]).
Several workshop participants raised the question of whether the risks of privacy loss are outweighed by the potential benefits of being able to analyze patient data. Osoba pointed out that disclosure risks can disproportionately burden minority populations. “If we want to make the models fair, we need representation from minority classes, but that representation opens them up to a higher risk of [adverse effects of ] disclosure,” he said. Rosati added that it is important to balance potential risks and benefits for each patient population. Webb Hooper agreed, stressing that it is incumbent on scientists to ensure that research associated with the risk of a potential loss of privacy is conducted with the explicit purpose of improving the health and well-being of the individuals who have contributed their data.
Many workshop speakers discussed policies and practices to advance progress in the application of big data to assess and address SDOH in oncology. These policies and practices included the following:
- embracing diversity and equity,
- promoting stakeholder engagement,
- translating research into action,
- implementing data policies and standards,
- revising payment policies, and
- supporting SDOH research.
Webb Hooper emphasized that “equity in research should be the fundamental component of the studies you are conducting, from conceptualization to study design, inclusion and exclusion criteria, recruitment, retention, analyses, interpretation, and dissemination.” She suggested that research teams should receive training to understand and conduct health equity work. Such teams include not just the recruiters, coordinators, and community health workers who interface the most with participants, but also the principal investigators and coinvestigators. “Purely academic or arms-length knowledge of communities is insufficient. Experiences with all populations will enhance our level of cultural competence,” she stressed.
Several workshop participants highlighted the importance of promoting diversity in the research workforce, both in demographic composition and in disciplinary expertise. Winn suggested hiring staff for cancer registries and cancer research from the neighborhoods in which this research is conducted, so that workforce diversity reflects participant diversity. Jankowska noted that
ensuring greater racial and ethnic diversity among research team members can lessen the risks for developing biased algorithms. Adamson agreed, and said that greater transparency of code and algorithms can also avert biased algorithms, because peer reviewers can deeply interrogate the methods. Richardson said that an increased use of cross-disciplinary research teams would bring a diversity of perspectives and training backgrounds to SDOH research, and she called for changing the culture of academia to place greater value on collaboration. Webb Hooper added, “It is important to have a team of stakeholders at the table who have an equal voice … so that it is more of a truly collaborative process.”
Workshop participants discussed opportunities to promote diversity in clinical trials. Webb Hooper noted that discussions of underrepresentation of minority populations in clinical trials often focus on patient-level barriers, but she suggested that the onus for achieving diverse patient participation in clinical research should fall on research teams and clinicians. We need to look “inward to understand what we can do to overcome these major barriers, specifically interpersonal, institutional, and policy-level concerns,” Webb Hooper said. Hudis agreed, and suggested that federal regulators could hold drug companies responsible for ensuring diversity of participants in clinical trials for their products. Broadening clinical trial eligibility requirements could also facilitate participation from underserved communities, he said.
Ochoa described the work of the Gulf South Clinical Trials Network (see Box 7). This network employs recent college graduates from local communities in its cancer screening and prevention programs, such as NCI’s Tomosynthesis Mammographic Imaging Screening Trial. Their involvement has helped “create the trust that rapidly evolved into a large number of accruals” in clinical trials, Ochoa said. He also reiterated the importance of training personnel who interact with research participants so that they are communicating with participants as equal partners, and not in a condescending manner.
Several speakers discussed the potential of decentralized clinical trials21 to promote diversity by involving participants where they live and reducing burdens associated with travel. Investigators could leverage a variety of methods for collecting data in a decentralized manner, including sending research personnel to participants’ homes and having participants visit local laboratories for the collection of biospecimens. Khozin noted that FDA is
21 Decentralized clinical trials enroll patients and collect data beyond the boundaries of academic medical centers by using nontraditional clinical sites and/or telehealth technologies. For more information, see https://www.ctti-clinicaltrials.org/projects/decentralized-clinical-trials (accessed March 5, 2020).
currently working on creating guidance for decentralized clinical trials, and he suggested that digital health technologies, such as remote monitoring devices, could also be leveraged for data collection (Khozin and Coravos, 2019). He noted that digital technologies can passively collect data and reduce participant burden. For example, smart watches could be used as an alternative to asking participants to answer questionnaires on physical activity and functional status. Khozin also suggested that FQHCs could serve as key partners in decentralized clinical trials by reaching underserved communities.
Workshop participants suggested several strategies for promoting patient and community engagement and collaboration, including building trust and
incorporating patients’ voices in research design and implementation. Darien stressed that patient engagement requires trust. Building trust requires considering the patient’s perspective and following a fundamental advocacy adage, “nothing about me without me.” Ochoa stressed that effective communication with communities is also essential for building and maintaining trust.
Several workshop participants discussed community-based participatory research (CBPR)22 and other strategies to partner with communities in SDOH
22 CBPR is “a collaborative approach to research that equitably involves all partners in the research process and recognizes the unique strengths that each brings. CBPR begins with a research topic of importance to the community and has the aim of combining knowledge with action and achieving social change to improve health outcomes and eliminate health disparities” (Community Health Scholars Program, 2002, p. 2).
research. Winn noted that it is important to engage community members in health research to ensure that community voices inform research design and implementation. John Steiner agreed that is it important to fully integrate and honor patients’ perspectives in the research process. Gomez agreed that it is essential to work with communities to ensure “we’re collecting the right data and that we’re disseminating it in a meaningful and responsible way.” Hughes-Halbert suggested that CBPR and community engagement should be the overarching framework for research on SDOH. Darien agreed, noting that for all research, it is important to ask patients about their concerns, fears, and hopes, and how they think the study can be beneficial for themselves or their communities. King described a project called “Our Voice,” which encourages citizens to collect environmental data from their own communities (see Box 8). Erhunmwunsee noted that engagement and excitement from communities provide momentum and political will to support research and policy change.
Gomez and Hughes-Halbert suggested prioritizing research to better understand reasons why vulnerable populations may be reluctant to participate in research. “Academics often skip the step of talking to people and understanding what they saw and thought,” Gomez said.
Seewaldt noted that both communities and clinicians will be more likely to engage in research if investigators clearly communicate how they will benefit from the research being conducted. Seewaldt added that many underserved communities are unaware of databases that can provide valuable health-related data to community health advocates. “There are a lot of data out there but I think it needs to be better advertised to the people who need those data,” Seewaldt said. Katherine Tossas-Milligan, research assistant professor of epidemiology at the University of Illinois Cancer Center, added that researchers should brief community leaders about research results. “Those individuals have collective power in our communities and we don’t currently have a national strategy for informing and disseminating data to politicians so they can understand and use them to benefit [their] community,” she said.
Several participants discussed strategies to ensure that research findings are leveraged to improve policy and the delivery of health care. Ayanian said the Metropolitan Chicago Breast Cancer Task Force23 is an example of effective translation of research into practice. After data identified a widening gap in breast cancer mortality among African American and white women in Chicago in the early 2000s, the task force convened community, health system, and political leaders to reduce disparities in access to high-quality screening services, reduce wait times for treatment, and improve treatment quality. Ayanian said recent data suggest that this disparity in breast cancer mortality has begun to narrow (Sighoko et al., 2017). He noted that a comparable program in New York City effectively translated data and community engagement into policy actions to eliminate racial and ethnic disparities in colorectal cancer screening (Itzkowitz et al., 2016).
Ramsey reported on a Washington State collaboration of clinicians, patients, and health insurers aimed at improving the quality of cancer care by promoting transparency through public reporting of clinical outcomes and cost data. By linking data from diverse sources—including EHRs, administrative claim databases, and cancer registries—this group has created a comprehensive and continuously updated database of approximately 160,000 people with cancer. The collaborative produces periodic reports on clinic-level quality
measures and costs for oncology patients in the state. Ramsey said a recent report showed that disparities in cancer survival are associated with insurance status, income, education, race, neighborhood, and rurality (Hutchinson Institute for Cancer Outcomes Research, 2019). These findings galvanized the Washington State Health Care Authority and other policy makers to adopt the report’s cancer metrics and implement policies to reduce disparities. He said the report has also motivated clinicians and payers to develop quality improvement initiatives.
Several participants suggested strategies for improving the translation of research findings into actions that address SDOH in cancer care. Pérez-Stable suggested that health systems should be held accountable for health equity, in the same way they are accountable for the quality of health care. James noted that the movement toward value-based care has helped health care systems recognize that addressing patients’ social needs is an important strategy to improve care and reduce costs. John Steiner agreed, and added that Kaiser Permanente has prioritized opportunities to meet the social needs of its members (see Box 9). Other participants suggested empowering clinicians to modify treatment plans based on patients’ social needs, and making greater use of nurses, patient navigators, and social workers to address SDOH. Khozin noted that many clinicians are well-versed in collecting SDOH data but have found it hard to intervene on SDOH because they have not traditionally been within the purview of health care. He also said that a contributor to clinician burnout is a sense of powerlessness in helping patients address social factors that are affecting their health.
Many speakers discussed strategies for data policies and standards, including best practices for data analysis and privacy protection.
Participants offered numerous suggestions for collection and analysis of data to improve data completeness and promote interoperability. As reviewed in the section on technical challenges, several speakers stressed the need for data standardization. For example, Schrag said that there are at least 10 different ways to measure food insecurity. She suggested that SDOH could be included in NCI’s list of common data elements for clinical trials. Ayanian said that the National Academies report on accounting for social risk factors in Medicare payment provided recommendations for collecting SDOH data in health care (NASEM, 2017). He added that Massachusetts developed standards for collecting data on race, ethnicity, and preferred language from patients,
which has improved data quality and could serve as a model for other states (Jorgensen et al., 2010). Richard Schilsky, chief medical officer and executive vice president of ASCO, described the Minimal Common Oncology Data Elements (mCODE) initiative—led by ASCO, the American Society for Radiation Oncology, FDA, MITRE Corporation, and the Alliance for Clinical Trials in Oncology Foundation—to identify a core set of structured data elements for oncology EHRs (HL7 FHIR, 2019; mCODE, 2019). He said mCODE provides both a common data language and an open-source, nonproprietary data model for interconnectivity across systems to facilitate clinical care and research. Schilsky noted that approximately 70 data elements have been released, but few capture SDOH. He encouraged researchers to use the mCODE website24 to submit additional SDOH data elements for consideration.
Kerlavage noted that despite efforts by NCI and other organizations, there is “little consistency in data elements, terminology, or data models across repositories, basic research studies, cohort studies, or clinical trials.” Kerlavage called for creating a virtual national registry with automated linkages to enable linkage of cohort studies and clinical trials to cancer registries in all 50 states. He said this would enable synthesis of a person’s entire health record and social determinants data, and the ability to follow the person’s health longitudinally. James also suggested creating longitudinal databases that could be used to assess the associations between SDOH and cancer incidence and outcomes.
Ash suggested creating federated data models25 to facilitate the use of data from disparate datasets. Penberthy added that federated analytics are needed to collaborate on a global scale. Schrag suggested that data ecosystems should be designed to promote flexibility. “We have to design systems not just based on what is important today, but for the new things that are going to be important tomorrow that we haven’t even conceived of yet. We can’t preconceive the future, but we can create systems that are able to accommodate change,” Schrag said.
Hughes-Halbert, Gomez, and Pérez-Stable suggested greater inclusion of SDOH in EHRs and a more structured and standardized way of collecting them. Gomez said that such standardization will enable researchers to combine datasets for research purposes. This may be accomplished through cross-stakeholder efforts like mCODE. Gomez also suggested accreditation requirements or professional guidelines for hospitals and clinical practices, which could further incentivize data standardization and specify which SDOH data should be collected.
Adamson recommended standardization of policies and procedures for data curation. Hempstead stressed that standardization for big data is lacking, especially for data being generated in the private sector. Tang suggested standards for data transparency, which would help ensure data accuracy and the validity of how researchers are using and interpreting the data. Virnig said there should also be standards for data quality, and that these should depend on how the data will be used (e.g., whether they will be used for population studies versus studies of more narrow cohorts).
Gomez suggested that researchers should aim to collect data that are as granular as possible, with linkages across data sources and studies to improve statistical power to make inferences about small population subgroups (NASEM, 2018). For example, Gomez suggested using finer distinctions among heterogeneous Asian and Hispanic populations. Williams added that it
25 Federated data systems allow data from disparate datasets to be accessed through a unified interface without requiring the merger of data. For more information, see https://www.sciencedirect.com/topics/computer-science/data-federation (accessed March 5, 2020).
is important for research to distinguish between African Americans and African immigrants living in the United States. Stern pointed out that American Indians are often categorized as Latino because they are immigrants from Latin countries, and that many urban American Indians are misclassified. Gomez added that sexual and gender minority populations also need to be better distinguished. Ayanian stressed that classification standards should be based on how patients view themselves. Darien added that although concerns are often expressed about whether self-reported patient data are valid, the accuracy of reporting by clinicians should also be considered.
Several speakers stressed the importance of protecting patient privacy and preventing data misuse, especially with SDOH data. For example, Rosati said that health care institutions should prepare data use agreements to control downstream use of deidentified datasets, even though data use agreements are not legally required. Weiner said that there should be penalties for misuse (IOM, 2009). Osoba agreed, and suggested that part of protecting privacy might involve more proactive ways of compensating people when privacy breaches do occur. Rosati stressed that new privacy legislation is needed to protect individuals from having their data reidentified, because current federal law does not provide this protection.
Rosati also suggested the creation of new federal data protections that supersede individual state laws so there is consistency across the country. She noted that a law is needed that recognizes the importance of the use of big data for research, but better protects individuals than current law. For example, although the Genetic Information Nondiscrimination Act26 is aimed at countering discrimination by health insurers, Cohen and Rosati pointed out that it does not prevent discrimination by life, or disability, or long-term care insurers. Fichtenberg also suggested guarding against discriminatory uses of SDOH data when determining services offered or when making population health management decisions.
Virnig suggested making the HIPAA Privacy Rule’s list of 18 HIPAA “identifiers” that must be removed for deidentification more flexible to allow for trade-offs such as keeping zip code information under certain circumstances. “The rules around confidentiality and deidentification are unnecessarily restrictive in ways that aren’t protecting privacy, but are preventing us from answering [important] questions,” said Virnig. For example, she said that the requirement that the date of a health event be removed makes it nearly
26 For more information, see https://www.federalregister.gov/documents/2016/05/17/2016-11557/genetic-information-nondiscrimination-act (accessed April 10, 2020).
impossible to study adverse events associated with the subsequent illness or treatment.
Several workshop participants discussed ethical and privacy considerations in the collection and use of secondary data, which are data collected incidentally in addition to the primary data of interest in a study. For example, health research using pharmacy data may also capture information about nonmedical purchases. These secondary data may be used productively in research to promote the public good, but also present the potential for misuse. Osoba suggested that access to secondary data should be allowed for research that adheres to the normative goals of fairness and privacy. However, Rosati noted that it is often difficult to assess what a normative goal would be, given the decentralized health care system and the variable nature of how datasets are regulated. Cohen suggested that decisions about the appropriateness of using secondary data should focus on whether the most vulnerable populations would benefit or be harmed by this use. “That is something you can look at systematically that could be normative criteria you might put in place,” he said.
Fichtenberg suggested considering how to give patients the ability to access and to monitor their EHRs and provide more confidence that those records will not be used inappropriately. Cohen also suggested considering the use of patient collectives, in which patients place their data into a legal trust with a trustee to oversee its use. “Why are we putting so much pressure on individual patients rather than thinking about patients as a collective?” he asked.
Osoba suggested creating a federated learning system for AI. Instead of one central person who created the machine learning system having access to all of the health data used to fine-tune it, the creator would see only the updates to the system that improve its quality without knowing the specific health data on which those updates are based.
Several workshop participants discussed how payment policies may hamper SDOH research. Hudis noted that Medicaid is the only insurance program in the United States not required to provide coverage for clinical trial participation. He suggested that this policy may contribute to clinical trial disparities, noting that approximately 20 percent of Medicaid beneficiaries are African American (KFF, 2013) but African Americans constitute less than 4 percent of clinical trial participants (Nazha et al., 2019). “This is a fixable disparity,” Hudis stressed, noting that there is currently a draft bill, the Clinical Treatment Act, aimed at rectifying this.27
27 For more information, see https://www.congress.gov/bill/116th-congress/house-bill/913 (accessed April 29, 2020).
Schrag added that often the most affluent health care systems doing the most cutting-edge clinical oncology trials do not accept Medicaid. She suggested making acceptance of Medicaid insurance mandatory for all health care systems to help alleviate disparities in care and outcomes. James agreed, adding that “we have an inherently discriminatory health care system” due to factors such as statutory requirements that determine CMS reimbursement based on where clinicians practice. Ash responded, “A policy could disallow enormous differences in what clinicians are paid to provide the same service depending upon whether they operate in a poor community health center or if they operate in the fanciest academic medical center.” She noted that some states have taken action to disallow such discrepancies in reimbursement.
Schrag suggested that as precision medicine becomes more important in oncology, insurance policies should cover the cost of genetic sequencing and validated “omics” testing for everyone. “We really need information representative of everyone, and we have a lot of work to do to get there,” she stressed. Schrag also suggested policies to support the use of telemedicine, including reimbursement for virtual consultations. “We need to deliver precision medicine at scale without expecting that every patient is going to be able to make it to one of the major cancer centers where there is expertise. Our telehealth policies are really outmoded and have to catch up, which involves payment reform and incentives, to accomplish that goal,” Schrag stressed.
Fichtenberg said that financial incentives are often not well-aligned to address patients’ social care needs. For example, she said that the Massachusetts Medicaid program provides a payment supplement for health systems that provide care to patients who are experiencing homelessness or who live in economically deprived neighborhoods. Although health systems are compensated at a higher rate for providing care to these individuals, there are no financial incentives in the current system to reduce homelessness. Schrag said mechanisms are needed to hold health care systems accountable for population health. “We don’t want to set up perverse incentives, but we do need some incentives to make health systems feel responsible for the health of the underlying population, without cherry-picking [patients who have lower social risks] for whom it’s easy to deliver high-quality cancer care,” she said. Hughes-Halbert agreed that health care systems need to develop strategies to address social risk factors among their patient populations. Ayanian stressed that actionable policies to address SDOH should be targeted at every level: nationally, within states, within communities, and within health care systems.
Many workshop participants called for additional research support for SDOH in oncology. “For those of us who have the capacity to fund this
research, we have to start putting our money where our mouth is,” James said. Yabroff recommended that funders should create training grants to prepare the next generation of researchers to conduct SDOH research using big data. Hudis agreed, and noted that ASCO has begun providing young investigator awards for big data research projects. Lichtenfeld added that the American Cancer Society could also be a source of such funding.
Lichtenfeld noted that fully engaging communities in research requires additional time and resources compared to more traditional research methods. He said this additional effort is often not well supported by academic institutions or funders, which puts such research at a disadvantage for receiving funding and the researchers who conduct it at a disadvantage for receiving academic promotion. Lichtenfeld and others suggested developing new metrics to weigh the value of such research. Workshop participants also suggested that funders prioritize support for the creation and curation of databases that can be used for SDOH research, noting that these time- and resource-intensive efforts are critical for advancing research progress but are often undervalued by funders and academic institutions. Yabroff also encouraged the development of targeted program announcements from different funding organizations that focus on training investigators in data science and SDOH.
Winn reiterated that SDOH are interrelated but distinct from social needs and recommended precision in terminology in research and clinical care. He said that a broad, systems-based approach is needed to examine SDOH, akin to the systems-level approach in biological research.
He stressed that SDOH researchers should maximize community involvement to ensure that research efforts are well aligned with patient and community priorities. He said that conducting effective community-based research requires building trust with community partners, identifying champions, and promoting transparency and dissemination of research results within the community. Winn encouraged workshop participants to consider opportunities to partner with community organizations and FQHCs. He said that these collaborations could transform FQHCs not just to deliver primary care but also to provide broad and equitable patient access to oncology research and the advances in patient care that stem from this research.
Winn emphasized that it is critical to apply the principle of intersectionality of SDOH when prioritizing research on interventions to promote health equity. He said that researchers will need to leverage new analytic capabilities and integrate a broad array of data sources to appropriately capture and evaluate the complexity of exposures, context, and patient outcomes across time. Winn added that these complex analyses will require investments in
workforce development and data infrastructure and that data standardization and aggregation will also be critical to advancing progress in SDOH research in oncology.
Winn noted that new regulatory paradigms and laws may be required to better protect patient privacy without unduly hampering SDOH research, particularly in the current era of big data. He also agreed with suggestions to hold health care organizations and insurers accountable for population health. He noted that experimentation with payment and care delivery models could help expand access to high-quality care for vulnerable populations and promote health equity.
Winn emphasized the need to standardize language used to discuss SDOH and to consistently incorporate SDOH data into EHRs. He noted that it is also important to ensure that big data research on SDOH leads to actionable interventions that are appropriately scaled and implemented. Winn stressed that collaboration requires compromise, and that “inreach” with communities may be more important than outreach in refining research and addressing SDOH.
Winn concluded by noting that research on SDOH in oncology is advancing rapidly, especially due to innovations in analytic methods and novel data sources. He said that political will is needed to translate this critical research into actionable interventions to improve patient health outcomes and equity in cancer care.
AHRQ (Agency for Healthcare Research and Quality). 2019. Supporting learning health systems. https://www.ahrq.gov/learning-health-systems/supporting.html (accessed June 29, 2020).
American Cancer Society. 2020. National Cancer Database. https://www.facs.org/qualityprograms/cancer/ncdb (accessed June 29, 2020).
Ardila, D., A. P. Kiraly, S. Bharadwaj, B. Choi, J. J. Reicher, L. Peng, D. Tse, M. Etemadi, W. Ye, G. Corrado, D. P. Naidich, and S. Shetty. 2019. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine 25(6):954–961. doi: 10.1038/s41591-019-0447-x.
Arons, A., S. DeSilvey, C. Fichtenberg, and L. Gottlieb. 2018. Documenting social determinants of health-related clinical activities using standardized medical vocabularies. JAMIA Open 2(1):81–88. doi: 10.1093/jamiaopen/ooy051.
ASCO (American Society of Clinical Oncology). 2020. CancerLinQ—About us. https://www.cancerlinq.org/about (accessed June 29, 2020).
Ayanian, J. Z. 2008. Determinants of racial and ethnic disparities in surgical care. World Journal of Surgery 32(4):509–515. doi: 10.1007/s00268-007-9344-4.
Braunstein, J. B., N. S. Sherber, S. P. Schulman, E. L. Ding, and N. R. Powe. 2008. Race, medical researcher distrust, perceived harm, and willingness to participate in cardiovascular prevention trials. Medicine 87(1):1–9.
Braveman, P. 2014. What are health disparities and health equity? We need to be clear. Public Health Reports 129:5–8. doi: 10.1177/00333549141291S203.
Buntin, M. B., and J. Z. Ayanian. 2017. Social risk factors and equity in medicare payment. New England Journal of Medicine 376(6):507–510. doi: 10.1056/NEJMp1700081.
CDC (Centers for Disease Control and Prevention). 2008. Community health and program services (CHAPS): Health disparities among racial/ethnic populations. Atlanta, GA: Department of Health and Human Services.
CDC. 2019. Behavioral risk factor surveillance system. https://www.cdc.gov/brfss/index.html (accessed June 29, 2020).
CODE (Center for Open Data Enterprise). 2019. Our story. https://www.opendataenterprise.org/about#our_story (accessed June 29, 2020).
Community Health Scholars Program. 2002. Stories of impact.http://www.kellogghealthscholars.org/about/ctrack_impact_scholars_book.pdf (accessed June 29, 2020).
CSDH (Commission on Social Determinants of Health). 2008. Closing the gap in a generation: Health equity through action on the social determinants of health. Final Report of the Commission on Social Determinants of Health. Geneva, Switzerland: World Health Organization. https://www.who.int/social_determinants/final_report/csdh_finalreport_2008.pdf (accessed June 29, 2020).
Cueto, C. V., S. Szeja, B. C. Wertheim, E. S. Ong, and V. L. Tsikitis. 2011. Disparities in treatment and survival of white and Native American patients with colorectal cancer: A SEER analysis. Journal of the American College of Surgeons 213(4):469–474. doi: 10.1016/j.jamcollsurg.2011.05.026.
De Marchis, E. H., D. Hessler, C. Fichtenberg, N. Adler, E. Byhoff, A. J. Cohen, K. M. Doran, S. Ettinger de Cuba, E. W. Fleegler, C. C. Lewis, S. T. Lindau, E. L. Tung, A. G. Huebschmann, A. A. Prather, M. Raven, N. Gavin, S. Jepson, W. Johnson, E. Ochoa, Jr, A. L. Olson, M. Sandel, R. S. Sheward, L. M. Gottlieb. 2019. Part I: A quantitative study of social risk screening acceptability in patients and caregivers. American Journal of Preventive Medicine 57(6S1):S25–S37.
EPA (Environmental Protection Agency). 2019. EJSCREEN: Environmental justice screening and mapping tool. https://www.epa.gov/ejscreen/what-ejscreen (accessed June 29, 2020).
Erhunmwunsee, L., M.-B. M. Joshi, D. H. Conlon, and D. H. Harpole, Jr. 2012. Neighborhood-level socioeconomic determinants impact outcomes in nonsmall cell lung cancer patients in the Southeastern United States. Cancer 118(20):5117–5123. doi: 10.1002/cncr.26185.
FitzGerald, C., and S. Hurst. 2017. Implicit bias in healthcare professionals: A systematic review. BMC Medical Ethics 18(1):19–37.
Galea, S., M. Tracy, K. J. Hoggatt, C. DiMaggio, and A. Karpati. 2011. Estimated deaths attributable to social factors in the United States. American Journal of Public Health 101(8):1456–1465. doi: 10.2105/AJPH.2010.300086.
Goodman, A. 2019. Expert point of view: Yousuf Zafar, MD. The ASCO Post, June 10, 2019. https://www.ascopost.com/issues/june-10-2019/epov-yousuf-zafar (accessed June 29, 2020).
Hall, W. J., M. V. Chapman, K. M. Lee, Y. M. Merino, T. W. Thomas, B. K. Payne, E. Eng, S. H. Day, and T. Coyne-Beasley. 2015. Implicit racial/ethnic bias among health care professionals and its influence on health care outcomes: A systematic review. American Journal of Public Health 105(12):e60–e76.
HL7 FHIR (HL7 Fast Healthcare Interoperability Resources). 2019. HL7 FHIR Implementation guide: Minimal common oncology data elements (mCODE), Release 1. http://hl7.org/fhir/us/mcode (accessed June 29, 2020).
HRSA (Health Resources and Services Administration). 2019. Federally qualified health centers: Eligibility. https://www.hrsa.gov/opa/eligibility-and-registration/health-centers/fqhc/index.html (accessed June 29, 2020).
Hutchinson Institute for Cancer Outcomes Research. 2019. Community cancer care in Washington State: Quality and cost report 2019. Seattle, WA: Fred Hutchinson Cancer Research Center.
IOM (Institute of Medicine). 2009. Beyond the HIPAA Privacy Rule: Enhancing privacy, improving health through research. Washington, DC: The National Academies Press.
IOM. 2014. Capturing social and behavioral domains and measures in electronic health records: Phase 2. Washington, DC: The National Academies Press.
Itzkowitz, S. H., S. J. Winawer, M. Krauskopf, M. Carlesimo, F. H. Schnoll-Sussman, K. Huang, T. K. Weber, and L. Jandorf. 2016. New York Citywide Colon Cancer Control Coalition: A public health effort to increase colon cancer screening and address health disparities. Cancer 122(2):269–277.
Jemal, A., L. X. Clegg, E. Ward, L. A. Ries, X. Wu, P. M. Jamison, P. A. Wingo, H. L. Howe, R. N. Anderson, and B. K. Edwards. 2004. Annual report to the nation on the status of cancer, 1975–2001, with a special feature regarding survival. Cancer 101(1):3–27. doi: 10.1002/cncr.20288.
Jorgensen, S., R. Thorlby, R. M. Weinick, and J. Z. Ayanian. 2010. Responses of Massachusetts hospitals to a state mandate to collect race, ethnicity and language data from patients: A qualitative study. BMC Health Services Research 10(1):352–360.
Kennedy, B. R., C. C. Mathis, and A. K. Woods. 2007. African Americans and their distrust of the health care system: Healthcare for diverse populations. Journal of Cultural Diversity 14(2):56–60.
KFF (Kaiser Family Foundation). 2013. Medicaid enrollment by race/ethnicity.https://www.kff.org/medicaid/state-indicator/medicaid-enrollment-by-raceethnicity (accessed April 29, 2020).
Khozin, S., and A. Coravos. 2019. Decentralized trials in the age of real-world evidence and inclusivity in clinical investigations. Clinical Pharmacology & Therapeutics 106(1):25–27. doi: 10.1002/cpt.1441.
King, A. C., S. J. Winter, B. W. Chrisinger, J. Hua, and A. W. Banchoff. 2019. Maximizing the promise of citizen science to advance health and prevent disease. Preventive Medicine 119:44–47. doi: 10.1016/j.ypmed.2018.12.016.
Ledford, H. 2019. Millions of black people affected by racial bias in health-care algorithms. Nature 574:608–609. https://www.nature.com/articles/d41586-019-03228-6 (accessed June 29, 2020).
Liu, Y., T. Kohlberger, M. Norouzi, G. E. Dahl, J. L. Smith, A. Mohtashamian, N. Olson, L. H. Peng, J. D. Hipp, and M. C. Stumpe. 2019. Artificial intelligence-based breast cancer nodal metastasis detection: Insights into the black box for pathologists. Archives of Patholology & Laboratory Medicine 143(7):859–868. doi: 10.5858/arpa.2018-0147-OA.
mCODE (Minimal Common Data Elements). 2019. MCODETM: Minimal common oncology data elements. https://mcodeinitiative.org (accessed June 29, 2020).
Nagpal, K., D. Foote, Y. Liu, P.-H. C. Chen, E. Wulczyn, F. Tan, N. Olson, J. L. Smith, A. Mohtashamian, J. H. Wren, G. S. Corrado, R. MacDonald, L. H. Peng, M. B. Amin, A. J. Evans, A. R. Sangoi, C. H. Mermel, J. D. Hipp, and M. C. Stumpe. 2019. Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer. NPJ Digital Medicine 2:48. doi: 10.1038/s41746-019-0112-2.
NAS (National Academy of Sciences). 1994. Voice communication between humans and machines. Washington, DC: National Academy Press.
NASEM (National Academies of Sciences, Engineering, and Medicine). 2017. Accounting for social risk factors in Medicare payment. Washington, DC: The National Academies Press.
NASEM. 2018. Improving health research on small populations: Proceedings of a workshop. Washington, DC: The National Academies Press.
NASEM. 2019. Integrating social care into the delivery of health care: Moving upstream to improve the nation’s health. Washington, DC: The National Academies Press.
Nazha, B., M. Mishra, R. Pentz, and T. K. Owonikoko. 2019. Enrollment of racial minorities in clinical trials: Old problem assumes new urgency in the age of immunotherapy. American Society of Clinical Oncology Educational Book 39:3–10.
NCHS (National Center for Health Statistics). 2017. Health, United States, 2016: With chartbook on long-term trends in health. Atlanta, GA: Centers for Disease Control and Prevention. https://www.cdc.gov/nchs/data/hus/hus16.pdf (accessed June 29, 2020).
NCI (National Cancer Institute). 2019. Surveillance, Epidemiology, and End Results program: Overview of the SEER program. https://seer.cancer.gov/about/overview.html (accessed June 29, 2020).
Oh, S. S., J. Galanter, N. Thakur, M. Pino-Yanes, N. E. Barcelo, M. J. White, D. M. de Bruin, R. M. Greenblatt, K. Bibbins-Domingo, A. H. Wu, and L. N. Borrell. 2015. Diversity in clinical and biomedical research: A promise yet to be fulfilled. PLoS Medicine 12(12):e1001918. doi: 10.1371/journal.pmed.1001918.
OHDSI (Observational Health Data Sciences and Informatics). https://ohdsi.org (accessed April 10, 2020).
Parikh, R. B., B. J. S. Adamson, S. Khozin, M. D. Galsky, S. S. Baxi, A. Cohen, and R. Mamtani. 2019. Association between FDA label restriction and immunotherapy and chemotherapy use in bladder cancer. JAMA 322(12):1209–1211. doi: 10.1001/jama.2019.10650.
PDS (Project Data Sphere, LLC). 2020. About us. https://projectdatasphere.org/projectdatasphere/html/about (accessed June 29, 2020).
Perdue, D. G., D. Haverkamp, C. Perkins, C. M. Daley, and E. Provost. 2014. Geographic variation in colorectal cancer incidence and mortality, age of onset, and stage at diagnosis among American Indian and Alaska Native people, 1990–2009. American Journal of Public Health 104(S3):S404–S414. doi: 10.2105/AJPH.2013.301654.
Price, W. N., 2nd, and I. G. Cohen. 2019. Privacy in the age of medical big data. Nature Medicine 25(1):37–43. doi: 10.1038/s41591-018-0272-7.
RESPOND (Research on Prostate Cancer in Men of African Ancestry) Study. RESPOND African American prostate cancer study. http://respondstudy.org (accessed April 10, 2020).
Rocher, L., J. M. Hendrickx, and Y.-A. de Montjoye. 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications 10(1):3069. doi: 10.1038/s41467-019-10933-3.
Schuemie, M. J., M. S. Cepeda, M. A. Suchard, J. Yang, Y. Tian, A. Schuler, P. B. Ryan, D. Madigan, and G. Hripcsak. 2020. How confident are we about observational findings in healthcare: A benchmark study. Harvard Data Science Review 2(1). doi: 10.1162/99608f92.147cc28e.
Sighoko, D., A. M. Murphy, B. Irizarry, G. Rauscher, C. Ferrans, and D. Ansell. 2017. Changes in the racial disparity in breast cancer mortality in the ten US cities with the largest African American populations from 1999 to 2013: The reduction in breast cancer mortality disparity in Chicago. Cancer Causes & Control 28(6):563–568.
Steiner, D. F., R. MacDonald, Y. Liu, P. Truszowski, J. D. Hipp, C. Gammage, F. Thng, L. Peng, and M. C. Stumpe. 2019. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. American Journal of Surgical Patholology 42(12):1636–1646. doi: 10.1097/PAS.0000000000001151.
The Gravity Project. 2020. The Gravity Project.https://confluence.hl7.org/display/GRAV/The+Gravity+Project (accessed April 10, 2020).
U.S. Census Bureau. 2020. American Community Survey. https://www.census.gov/programssurveys/acs (accessed April 29, 2020).
USDA (U.S. Department of Agriculture). 2019. USDA ERS—Food access research atlas. https://www.ers.usda.gov/data-products/food-access-research-atlas (accessed June 29, 2020).
Vrijheid, M. 2014. The exposome: A new paradigm to study the impact of environment on health. Thorax 69(9):876–878. doi: 10.1136/thoraxjnl-2013-204949.
Ward, E., M. Halpern, N. Schrag, V. Cokkinides, C. DeSantis, P. Bandi, R. Siegel, A. Stewart, and A. Jemal. 2008. Association of insurance with cancer care utilization and outcomes. CA: A Cancer Journal for Clinicians 58(1):9–31. doi: 10.3322/CA.2007.0011.
Warnecke, R. B., A. Oh, N. Breen, S. Gehlert, E. Paskett, K. L. Tucker, N. Lurie, T. Rebbeck, J. Goodwin, J. Flack, S. Srinivasan, J. Kerner, S. Heurtin-Roberts, R. Abeles, F. L. Tyson, G. Patmios, and R. A. Hiatt. 2008. Approaching health disparities from a population perspective: The National Institutes of Health Centers for Population Health and Health Disparities. American Journal of Public Health 98(9):1608–1615. doi: 10.2105/AJPH.2006.102525.
Williams, D. R., E. Z. Kontos, K. Viswanath, J. S. Haas, C. S. Lathan, L. E. MacConaill, J. Chen, and J. Z. Ayanian. 2012. Integrating multiple social statuses in health disparities research: The case of lung cancer. Health Services Research 47(3 Pt 2):1255–1277. doi: 10.1111/j.1475-6773.2012.01404.x.
Yitshak-Sade, M., P. James, I. Kloog, J. E. Hart, J. D. Schwartz, F. Laden, K. J. Lane, M. P. Fabian, K. C. Fong, and A. Zanobetti. 2019. Neighborhood Greenness Attenuates the Adverse Effect of PM2.5 on Cardiovascular Mortality in Neighborhoods of Lower Socioeconomic Status. International Journal of Environmental Research and Public Health 16(5):814–824.
This page intentionally left blank.