Data science programs have the potential to attract broad participation, including diverse members from different disciplines (including the humanities, social sciences, and the arts) and from populations that are underrepresented in other similar science, technology, engineering, and mathematics (STEM) fields (see Box 4.1). Part of this potential comes from the various compelling application areas of data science, including digital humanities, computational social science, public policy, and many others. There are also numerous skill sets that are currently captured under a data scientist label that span multiple training and education levels.
There are many current and recent efforts aimed at increasing diversity, inclusion, and broadening participation in fields related to data science. These approaches can serve to inform emerging data science programs to encourage broad participation by design. The following highlights just a few of these efforts:
- NSF INCLUDES (Inclusion across the Nation of Communities of Learners of Underrepresented Discoverers in Engineering and Science)1 is a National Science Foundation (NSF) initiative designed to enhance U.S. leadership in STEM discoveries and innovations while supporting efforts to develop talent from all sectors of society to build the STEM workforce. The initiative aims to improve the preparation, increase the participation, and ensure the contributions of individuals from groups that have traditionally been underrepresented and underserved in the STEM enterprise, including women, members of racial and ethnic groups, persons with disabilities, and persons with low socioeconomic status. Significant advancement of these groups would result in a new generation of promising STEM talent and leadership to secure the nation’s future in science and technology.
- InGenIOus (Investing in the Next Generation through Innovative and Outstanding Strategies)2 is a collaboration among mathematics and statistics professional societies and NSF that culminated in a July 2013 workshop devoted to identifying and envisioning programs and strategies for increasing the flow of mathematical sciences students into the workforce pipeline.
- CS for All3 is a program that aims to provide all U.S. students the opportunity to participate in computer science and computational thinking education in their schools at the K–12 levels. Funded by NSF, this program focuses on researcher–practitioner partnerships that foster the research and development needed to bring computer science and computational thinking to all schools. Specifically, the program aims to provide high school teachers with the
1 See the NSF INCLUDES website at https://www.nsf.gov/pubs/2016/nsf16544/nsf16544.htm, accessed August 21, 2017.
2 Mathematical Association of America, “InGenIOus,” http://www.maa.org/programs/faculty-anddepartments/ingenious, accessed August 21, 2017.
3 National Science Foundation, “CS for All,” https://www.nsf.gov/pubs/2017/nsf17525/nsf17525.htm, accessed August 21, 2017.
- StatFest4 is a conference hosted by the American Statistical Association’s Committee on Minorities in Statistics. The goal of the program is to provide an opportunity for undergraduate students historically underrepresented in statistics to explore potential career options in the field and learn from industry and academic leaders. Each speaker session focuses on a different category of career trajectory, such as government, industry and consulting, academia, and graduate programs (ASA, 2017).
- Math Alliance5 is an organization focused on assisting mathematics undergraduate students from historically underrepresented backgrounds in pursuing a doctoral degree in the mathematical sciences. Based out of Purdue University, the program strives to improve diversity and inclusion into mathematics doctoral programs (including pure and applied mathematics, mathematical and applied statistics, and biostatistics) while encouraging research collaborations and community within the broader mathematical community (National Alliance for Doctoral Studies in the Mathematical Sciences, 2013).
preparation, professional development, and ongoing support that they need to teach rigorous computer science courses, while providing K–8 teachers with the instructional materials and preparation they need to integrate computer science and computational thinking into their teaching.
There are also a variety of programs and projects that aim to foster interaction between individuals with varying levels of science background. For instance, a Music Data Science Hackathon hosted by Data Science London and EMI Music brought together data scientists and music specialists in a competition to develop a program that would predict the next hit in music. Other programs are housed at the institution level, such as the Mobile News App Design Class at the University of Texas, Austin, which brings computer science and journalism students together with the task of designing a novel mobile application for news. These, and many other efforts in this area, can inform emerging data science education programs and attract students from non-science disciplines to potentially pursue a path in data science. This chapter discusses some recruitment and retention strategies, institutional partnerships and K–12 outreach, and the role of evaluation and assessment.
The focus on recruitment and retention extends beyond the obvious choices for data science majors and minors; other academically diverse student populations would benefit from the addition of data science to the curriculum. For example, it may be necessary to do targeted outreach to recruit students who are interested in enrolling in data science courses but may not find course titles immediately relevant or appealing. It may also encourage such students to know that a lack of preparation in data science does not equate to a lack of ability; developing multiple pathways to incorporate data science concepts into varied curricula via specialized connector courses or other “on ramps” could address these students’ concerns about knowledge gaps and allow them to gain the level of expertise appropriate for their interests and career goals. Recruiting students from diverse disciplines to data science courses could also improve retention in such courses due to the increased interest in and value added to the courses. Retention may also improve if the content of and the faculty for introductory data science courses are selected with the diverse backgrounds and interests of the student population in mind. Throughout this process, it is important to consider strategies to retain faculty members as well. Data science skills and expertise are highly sought-after in industry, and educators in these areas are strong candidates for industry employment (Kaminski and Geisler, 2012).
Challenges also persist in both recruiting and retaining underrepresented minorities and women in the sciences more broadly. It is useful to consider whether data science’s diversity and inclusion issues are unique as compared to those of other disciplines as well as what can be learned from other STEM programs that address diversity and inclusion well. Hiring a more diverse faculty may also help to attract a more diverse student population.
Recruitment and retention continue to be challenging in the workplace. In a professional research environment, employers want to hire people with literacy in computing, data science, and a domain science, but it can be difficult to find individuals who fit this description (Agarwal, 2016). Instead, employers often hire data science-literate people with domain expertise and provide more in-depth training in specific technical or professional skills. To increase diversity, inclusion, and data science literacy, employers could increase cross-disciplinary collaboration opportunities, create supportive team environments, counteract bias, and build mentor cohorts. Dedicated recruiting, inclusive recognition, and active support would also help.
Finding 4.1: Data science has the potential to draw in a diverse set of students and build in broad participation from the onset, rather than trying to broaden participation later. However, strategies are needed to recruit and retain these students.
Community colleges are well qualified to be highly effective providers of data science education while also serving as important partners for 4-year institutions that are considering the emerging role of data science education. Community college programs can serve to (1) be an entry point to inspire and attract diverse student populations to data science; (2) permit existing members of the workforce to retrain or obtain specific new skill sets to complement their education and experience; (3) create mechanisms by which students can certify specific or general skill sets with certificates or associate’s degrees; (4) build foundational, translational, ethical, and professional skills to support matriculation into 4-year college data science programs; and (5) provide opportunities for advanced high school students to begin data science training early. The majority of these purposes support undergraduate education objectives, while also targeting the specific needs of industry. Institutional, industry, and government partnerships are all important to the development of data science education that meets these objectives for community colleges.
Funding agencies can help support formal partnerships with interested community colleges and 4-year institutions by providing funding mechanisms that allow for the development of new curricula as well as professional learning opportunities for faculty.
Finding 4.2: Partnerships between 2- and 4-year institutions provide a valuable opportunity to develop innovative curricula, reach more diverse student populations, and expand the reach of data science education.
Elementary, middle, and high schools play an important role in developing data science education and preparing students to thrive in a modern workforce. With changes in federal legislation that call for students to be prepared to succeed in college and careers, states are looking to national content standards to provide a vision for K-12 education. These standards of practice include content areas that are relevant to data science education, such as science and engineering (Next Generation Science Standards6) and mathematics and statistics (Common Core State Standards7). Some of the practices called for in these standards include analyzing and interpreting data; using mathematics and computational thinking; and obtaining, evaluating, and communicating information. Embedded in these practices are such skills as being able to (1) identify significant features and patterns in data through tabulation, graphical interpretation, visualization, and statistical analysis; (2) make and test predictions through constructing simulations and recognizing, expressing, and applying quantitative relationships; and (3) communicate orally or in writing using tables, diagrams, graphs, and equations (NRC, 2012, pp. 49-53). Through the adoption of these national standards, data scientists may be positioned to play a role in curriculum development by working with curriculum designers to ensure alignment between the practices highlighted above and the requisite skills that are needed upon entry into data science programs.
In addition to efforts that could be achieved in formal educational spaces, there are outreach efforts to students in more informal spaces, including year-long afterschool programs, summer camps, high school internship programs, competitions, and websites designed to foster motivation and interest in
data science and other STEM fields. Focused data science summer programs can be effective in attracting high school students to postsecondary STEM fields. For example, the Michigan Institute for Data Science (MIDAS) at the University of Michigan has been running a data science summer camp for the past 2 years that uses art and sports activities to gently introduce teenagers to mathematics, computing, signal processing, and statistics. Such use of familiar activities to introduce data science makes it fun and sustains student interest in STEM. Such camps can also be used to reach a more diverse group of students.
These programs may also be offered by organizations, such as museums, that not only collaborate with the K–12 education system, but also seek to engage the broader community. For example, the Exploratorium8 has a repository of online materials and activities as well as community-based programs designed to foster the development of the skills highlighted above. Other online resources, such as Data.gov, provide parents and teachers access to materials to help advance children’s understanding of concepts associated with data science (Data.gov, 2013).
Good data science practices can inform the evaluation of programs targeted toward broad participation. It is useful for program evaluation to follow established best practices, including following an appropriate model for inclusion of metrics for participation in the overall goals of the program, clearly articulating these to all participants from the beginning of the program, establishing procedures for assessing these metrics on a regular basis, and specifying adaptation and modification procedures based on these formative and summative assessments.
In establishing approaches for measuring success, the tools of experimental design and analysis can be incorporated when appropriate (using, for example, comparison of treatment and control, randomized trials, nationally normed instruments, exploitation of natural experiments, appropriate descriptive analyses of observational data accounting for confounding factors, etc.). It may be necessary to consider an overall data plan at the start of the program as part of the evaluation plan, which would account for Institutional Review Board requirements if the data might be used for research rather than just within institutional planning.
Data sources useful for measuring diverse participation might include transcripts that reveal who takes data science courses and who completes a data science degree. Such data can be used to make both programmatic and cross-institutional comparisons. Institutional constraints may encourage a return-on-investment perspective as part of the evaluation, incorporating the impacts and costs of various targeted educational interventions to broaden participation.
Finding 4.3: Data science programs would benefit from ongoing curricular evaluation, especially with respect to how well curricular objectives are being met and the degree of curricular integration. Taking a cue from its own domain, these data could be used to inform data science instruction and curriculum.