National Academies Press: OpenBook
« Previous: 3 Data Science Education in the Future
Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×

4

Broad Participation in Data Science

Data science programs have the potential to attract broad participation, including diverse members from different disciplines (including the humanities, social sciences, and the arts) and from populations that are underrepresented in other similar science, technology, engineering, and mathematics (STEM) fields (see Box 4.1). Part of this potential comes from the various compelling application areas of data science, including digital humanities, computational social science, public policy, and many others. There are also numerous skill sets that are currently captured under a data scientist label that span multiple training and education levels.

There are many current and recent efforts aimed at increasing diversity, inclusion, and broadening participation in fields related to data science. These approaches can serve to inform emerging data science programs to encourage broad participation by design. The following highlights just a few of these efforts:

  • NSF INCLUDES (Inclusion across the Nation of Communities of Learners of Underrepresented Discoverers in Engineering and Science)1 is a National Science Foundation (NSF) initiative designed to enhance U.S. leadership in STEM discoveries and innovations while supporting efforts to develop talent from all sectors of society to build the STEM workforce. The initiative aims to improve the preparation, increase the participation, and ensure the contributions of individuals from groups that have traditionally been underrepresented and underserved in the STEM enterprise, including women, members of racial and ethnic groups, persons with disabilities, and persons with low socioeconomic status. Significant advancement of these groups would result in a new generation of promising STEM talent and leadership to secure the nation’s future in science and technology.
  • InGenIOus (Investing in the Next Generation through Innovative and Outstanding Strategies)2 is a collaboration among mathematics and statistics professional societies and NSF that culminated in a July 2013 workshop devoted to identifying and envisioning programs and strategies for increasing the flow of mathematical sciences students into the workforce pipeline.
  • CS for All3 is a program that aims to provide all U.S. students the opportunity to participate in computer science and computational thinking education in their schools at the K–12 levels. Funded by NSF, this program focuses on researcher–practitioner partnerships that foster the research and development needed to bring computer science and computational thinking to all schools. Specifically, the program aims to provide high school teachers with the

___________________

1 See the NSF INCLUDES website at https://www.nsf.gov/pubs/2016/nsf16544/nsf16544.htm, accessed August 21, 2017.

2 Mathematical Association of America, “InGenIOus,” http://www.maa.org/programs/faculty-anddepartments/ingenious, accessed August 21, 2017.

3 National Science Foundation, “CS for All,” https://www.nsf.gov/pubs/2017/nsf17525/nsf17525.htm, accessed August 21, 2017.

Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×

    preparation, professional development, and ongoing support that they need to teach rigorous computer science courses, while providing K–8 teachers with the instructional materials and preparation they need to integrate computer science and computational thinking into their teaching.

  • StatFest4 is a conference hosted by the American Statistical Association’s Committee on Minorities in Statistics. The goal of the program is to provide an opportunity for undergraduate students historically underrepresented in statistics to explore potential career options in the field and learn from industry and academic leaders. Each speaker session focuses on a different category of career trajectory, such as government, industry and consulting, academia, and graduate programs (ASA, 2017).
  • Math Alliance5 is an organization focused on assisting mathematics undergraduate students from historically underrepresented backgrounds in pursuing a doctoral degree in the mathematical sciences. Based out of Purdue University, the program strives to improve diversity and inclusion into mathematics doctoral programs (including pure and applied mathematics, mathematical and applied statistics, and biostatistics) while encouraging research collaborations and community within the broader mathematical community (National Alliance for Doctoral Studies in the Mathematical Sciences, 2013).

___________________

4 American Statistical Association, Committee on Minorities in Statistics, “StatFest 2017,” http://community.amstat.org/cmis/events/statfest, accessed August 21, 2017.

5 See the Math Alliance website at https://mathalliance.org/welcome/, accessed August 21, 2017.

Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×

There are also a variety of programs and projects that aim to foster interaction between individuals with varying levels of science background. For instance, a Music Data Science Hackathon hosted by Data Science London and EMI Music brought together data scientists and music specialists in a competition to develop a program that would predict the next hit in music. Other programs are housed at the institution level, such as the Mobile News App Design Class at the University of Texas, Austin, which brings computer science and journalism students together with the task of designing a novel mobile application for news. These, and many other efforts in this area, can inform emerging data science education programs and attract students from non-science disciplines to potentially pursue a path in data science. This chapter discusses some recruitment and retention strategies, institutional partnerships and K–12 outreach, and the role of evaluation and assessment.

RECRUITMENT AND RETENTION STRATEGIES

The focus on recruitment and retention extends beyond the obvious choices for data science majors and minors; other academically diverse student populations would benefit from the addition of data science to the curriculum. For example, it may be necessary to do targeted outreach to recruit students who are interested in enrolling in data science courses but may not find course titles immediately relevant or appealing. It may also encourage such students to know that a lack of preparation in data science does not equate to a lack of ability; developing multiple pathways to incorporate data science concepts into varied curricula via specialized connector courses or other “on ramps” could address these students’ concerns about knowledge gaps and allow them to gain the level of expertise appropriate for their interests and career goals. Recruiting students from diverse disciplines to data science courses could also improve retention in such courses due to the increased interest in and value added to the courses. Retention may also improve if the content of and the faculty for introductory data science courses are selected with the diverse backgrounds and interests of the student population in mind. Throughout this process, it is important to consider strategies to retain faculty members as well. Data science skills and expertise are highly sought-after in industry, and educators in these areas are strong candidates for industry employment (Kaminski and Geisler, 2012).

Challenges also persist in both recruiting and retaining underrepresented minorities and women in the sciences more broadly. It is useful to consider whether data science’s diversity and inclusion issues are unique as compared to those of other disciplines as well as what can be learned from other STEM programs that address diversity and inclusion well. Hiring a more diverse faculty may also help to attract a more diverse student population.

Recruitment and retention continue to be challenging in the workplace. In a professional research environment, employers want to hire people with literacy in computing, data science, and a domain science, but it can be difficult to find individuals who fit this description (Agarwal, 2016). Instead, employers often hire data science-literate people with domain expertise and provide more in-depth training in specific technical or professional skills. To increase diversity, inclusion, and data science literacy, employers could increase cross-disciplinary collaboration opportunities, create supportive team environments, counteract bias, and build mentor cohorts. Dedicated recruiting, inclusive recognition, and active support would also help.

Finding 4.1: Data science has the potential to draw in a diverse set of students and build in broad participation from the onset, rather than trying to broaden participation later. However, strategies are needed to recruit and retain these students.

Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×

INSTITUTIONAL PARTNERSHIPS

Community colleges are well qualified to be highly effective providers of data science education while also serving as important partners for 4-year institutions that are considering the emerging role of data science education. Community college programs can serve to (1) be an entry point to inspire and attract diverse student populations to data science; (2) permit existing members of the workforce to retrain or obtain specific new skill sets to complement their education and experience; (3) create mechanisms by which students can certify specific or general skill sets with certificates or associate’s degrees; (4) build foundational, translational, ethical, and professional skills to support matriculation into 4-year college data science programs; and (5) provide opportunities for advanced high school students to begin data science training early. The majority of these purposes support undergraduate education objectives, while also targeting the specific needs of industry. Institutional, industry, and government partnerships are all important to the development of data science education that meets these objectives for community colleges.

Funding agencies can help support formal partnerships with interested community colleges and 4-year institutions by providing funding mechanisms that allow for the development of new curricula as well as professional learning opportunities for faculty.

Finding 4.2: Partnerships between 2- and 4-year institutions provide a valuable opportunity to develop innovative curricula, reach more diverse student populations, and expand the reach of data science education.

K-12 OBJECTIVES

Elementary, middle, and high schools play an important role in developing data science education and preparing students to thrive in a modern workforce. With changes in federal legislation that call for students to be prepared to succeed in college and careers, states are looking to national content standards to provide a vision for K-12 education. These standards of practice include content areas that are relevant to data science education, such as science and engineering (Next Generation Science Standards6) and mathematics and statistics (Common Core State Standards7). Some of the practices called for in these standards include analyzing and interpreting data; using mathematics and computational thinking; and obtaining, evaluating, and communicating information. Embedded in these practices are such skills as being able to (1) identify significant features and patterns in data through tabulation, graphical interpretation, visualization, and statistical analysis; (2) make and test predictions through constructing simulations and recognizing, expressing, and applying quantitative relationships; and (3) communicate orally or in writing using tables, diagrams, graphs, and equations (NRC, 2012, pp. 49-53). Through the adoption of these national standards, data scientists may be positioned to play a role in curriculum development by working with curriculum designers to ensure alignment between the practices highlighted above and the requisite skills that are needed upon entry into data science programs.

PUBLIC OUTREACH

In addition to efforts that could be achieved in formal educational spaces, there are outreach efforts to students in more informal spaces, including year-long afterschool programs, summer camps, high school internship programs, competitions, and websites designed to foster motivation and interest in

___________________

6 See the Next Generation Science Standards website at https://www.nextgenscience.org/, accessed August 21, 2017.

7 See the Common Core State Standards website at http://www.corestandards.org/, accessed August 21, 2017.

Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×

data science and other STEM fields. Focused data science summer programs can be effective in attracting high school students to postsecondary STEM fields. For example, the Michigan Institute for Data Science (MIDAS) at the University of Michigan has been running a data science summer camp for the past 2 years that uses art and sports activities to gently introduce teenagers to mathematics, computing, signal processing, and statistics. Such use of familiar activities to introduce data science makes it fun and sustains student interest in STEM. Such camps can also be used to reach a more diverse group of students.

These programs may also be offered by organizations, such as museums, that not only collaborate with the K–12 education system, but also seek to engage the broader community. For example, the Exploratorium8 has a repository of online materials and activities as well as community-based programs designed to foster the development of the skills highlighted above. Other online resources, such as Data.gov, provide parents and teachers access to materials to help advance children’s understanding of concepts associated with data science (Data.gov, 2013).

EVALUATION AND ASSESSMENT

Good data science practices can inform the evaluation of programs targeted toward broad participation. It is useful for program evaluation to follow established best practices, including following an appropriate model for inclusion of metrics for participation in the overall goals of the program, clearly articulating these to all participants from the beginning of the program, establishing procedures for assessing these metrics on a regular basis, and specifying adaptation and modification procedures based on these formative and summative assessments.

In establishing approaches for measuring success, the tools of experimental design and analysis can be incorporated when appropriate (using, for example, comparison of treatment and control, randomized trials, nationally normed instruments, exploitation of natural experiments, appropriate descriptive analyses of observational data accounting for confounding factors, etc.). It may be necessary to consider an overall data plan at the start of the program as part of the evaluation plan, which would account for Institutional Review Board requirements if the data might be used for research rather than just within institutional planning.

Data sources useful for measuring diverse participation might include transcripts that reveal who takes data science courses and who completes a data science degree. Such data can be used to make both programmatic and cross-institutional comparisons. Institutional constraints may encourage a return-on-investment perspective as part of the evaluation, incorporating the impacts and costs of various targeted educational interventions to broaden participation.

Finding 4.3: Data science programs would benefit from ongoing curricular evaluation, especially with respect to how well curricular objectives are being met and the degree of curricular integration. Taking a cue from its own domain, these data could be used to inform data science instruction and curriculum.

___________________

8 See the Exploratorium website at https://www.exploratorium.edu/education, accessed August 21, 2017.

Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×
Page 26
Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×
Page 27
Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×
Page 28
Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×
Page 29
Suggested Citation:"4 Broad Participation in Data Science." National Academies of Sciences, Engineering, and Medicine. 2018. Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report. Washington, DC: The National Academies Press. doi: 10.17226/24886.
×
Page 30
Next: 5 Reflections »
Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report Get This Book
×
Buy Ebook | $14.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The need to manage, analyze, and extract knowledge from data is pervasive across industry, government, and academia. Scientists, engineers, and executives routinely encounter enormous volumes of data, and new techniques and tools are emerging to create knowledge out of these data, some of them capable of working with real-time streams of data. The nation’s ability to make use of these data depends on the availability of an educated workforce with necessary expertise. With these new capabilities have come novel ethical challenges regarding the effectiveness and appropriateness of broad applications of data analyses.

The field of data science has emerged to address the proliferation of data and the need to manage and understand it. Data science is a hybrid of multiple disciplines and skill sets, draws on diverse fields (including computer science, statistics, and mathematics), encompasses topics in ethics and privacy, and depends on specifics of the domains to which it is applied. Fueled by the explosion of data, jobs that involve data science have proliferated and an array of data science programs at the undergraduate and graduate levels have been established. Nevertheless, data science is still in its infancy, which suggests the importance of envisioning what the field might look like in the future and what key steps can be taken now to move data science education in that direction.

This study will set forth a vision for the emerging discipline of data science at the undergraduate level. This interim report lays out some of the information and comments that the committee has gathered and heard during the first half of its study, offers perspectives on the current state of data science education, and poses some questions that may shape the way data science education evolves in the future. The study will conclude in early 2018 with a final report that lays out a vision for future data science education.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!