The need to manage, analyze, and extract knowledge from data is pervasive across industry, government, and academia. Scientists, engineers, and executives routinely encounter enormous volumes of data, and new techniques and tools are emerging to create knowledge out of these data, some of them capable of working with real-time streams of data. The nation’s ability to make use of these data depends on the availability of an educated workforce with necessary expertise. With these new capabilities have come novel ethical challenges regarding the effectiveness and appropriateness of broad applications of data analyses.
The future of data science education is impacted by the continuing evolution of computing technology, analytical approaches, and tools; the corresponding demand from employers for new knowledge and skills; and new models for delivering education. Educational institutions may need to revise the content of their curricula and embrace multiple models of educational delivery (e.g., online, self-paced, team teaching both in and out of the classroom) to better appeal to a broad population of students and to better prepare students to enter the workforce.
The field of data science has emerged to address the proliferation of data and the need to manage and understand it. Data science is a hybrid of multiple disciplines and skill sets, draws on diverse fields (including computer science, statistics, and mathematics), encompasses topics in ethics and privacy, and depends on specifics of the domains to which it is applied. Fueled by the explosion of data, jobs that involve data science have proliferated and an array of data science programs at the undergraduate and graduate levels have been established. Nevertheless, data science is still in its infancy, which suggests the importance of envisioning what the field might look like in the future and what key steps can be taken now to move data science education in that direction. Future data science programs will need to incorporate a variety of skills. Strong analytic skills are needed to work with large, complex data sets. Oral and written communication skills are also necessary to engage with diverse audiences about real-world problems, to work in teams, and to participate in effective problem solving for both technical and ethical dilemmas encountered in uses of data science.
The committee has also identified several apparent hallmarks of effective data science education. Using real data will expose students to the messiness they will confront when solving real-world problems. Selecting applications with broad impact will make instruction more compelling, helping to attract and retain students. Teaching commonly used current methods will prepare them for the workplace, as will exposure to working in teams. Critical curricular topics include mathematical foundations, computational thinking, statistical thinking, principles of effective data management, techniques for data description and curation, data modeling approaches, effective communication skills, reproducibility challenges and current best practices, exposure to ethical dilemmas and problem-solving skills, and a range of domain-specific topics. Maturity in these and other areas results in what this committee defines as “data acumen,” which enables data scientists to make good judgments and decisions with data. The process of starting students down the path toward data acumen is a chief objective of data science education.
Because data science is inherently concerned with understanding and addressing real-world problems and challenges, the new and expanding field of data science may appeal to a wider variety of students. The field of data science encompasses multiple disciplines and varied skill sets and has the potential to attract students with diverse academic backgrounds and interests. The opportunity to build in
broad participation, diversity, and inclusion from the onset is a notable advantage, as compared to other related fields of study. The field presents new opportunities to attract and engage underrepresented student populations. Such potential opportunities can be realized through innovative cross-disciplinary pedagogical approaches led by highly trained and flexible faculty. To further increase participation, 4-year institutions can partner with 2-year institutions that have flexible programs as a way to offer more entry points into data science for advanced high school students, current members of the workforce, and future transfer students. Assessment and evaluation are especially valuable when building these new programs in part because it encourages consideration of how well curricular objectives are being met. The very tools of experimental design and analysis common in the field of data science will likely prove valuable in evaluating the success of data science programs.
This interim report from the Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective begins to address the statement of task presented in Box S.1. Specifically, this report lays out some of the information and comments that the committee has gathered and heard during the first half of its study, offers perspectives on the current state of data science education, and poses some questions that may shape the way data science education evolves in the future. This National Academies of Sciences, Engineering, and Medicine study, sponsored by the National Science Foundation, will conclude in early 2018 with a final report that lays out a vision for future data science education. What follows in this interim report are initial observations and findings concerning the state of data science education, a discussion of forward-looking opportunities, and key questions on which the committee seeks broad input.
The preliminary findings from the committee are described throughout this interim report and recapped in Chapter 5 where they are accompanied by a set of open questions developed by the committee about the future of data science education (also summarized in Box S.2).
Public input is sought on the following topics:
- Additional content for this study, including but not limited to case studies from institutions providing data science education, innovative ways to bring researchers together, best practices for program evaluation, and ideas for future topical webinars;