Starting a new data science program is challenging, to say the least. As with any new academic program, a curriculum needs to be determined, resources and faculty need to be found, and some means of assessment needs to be implemented. However, data science programs pose particular challenges owing to their interdisciplinary nature, the broad set of topics they encompass, and the acquisition of data and large-scale computational infrastructure they require.
Thus, launching a new undergraduate program in data science may be a significant undertaking in many institutions. Administrators and program developers will face myriad decisions. Should a new department be created to support this program? Or should existing departments take on the challenge solely or in collaboration with other departments? What content/level of high school knowledge will be useful or required of students entering the data science program? How will data science be integrated into the curriculum? Should it be included at the very beginning of a student’s coursework, after some prerequisite coursework, or as a capstone? Will it be a major, a minor, a general education requirement, or all of the above? Which mathematics, statistics, and computer science courses should be required of data science students? Should these be taught as separate courses, or should the content be integrated? How might institutions appropriately utilize the collections of online resources available and “downscale” these to appropriate levels if they are focused on more advanced training? Institutions developing undergraduate programs will also need to consider how ethics and communication will be
included in the curriculum, as well as how to ensure that the program is accessible to students from varied backgrounds.
New data science programs require resources, broad discussions with faculty and leadership across the institution, and perhaps approval through formal bodies. The backing of the administration as well as broad support from multiple departments is typically necessary, and attention to the costs and funding model from the outset can greatly increase the chance of success.
As institutions examine how best to provide data science education to their students, one solution may be to reconstitute, combine, or reenvision already existing curricula. Much of the research on how best to teach science, technology, engineering, and mathematics (STEM) concepts will be readily applicable. (See the discussion of data acumen attributes in Chapter 2 of this report for examples of introductory and advanced concepts.) While some coursework could be immediately swapped into a data science program, it is likely that this will take more forethought and planning to appropriately consider the learning outcomes and content knowledge that data science students need to have. In a number of programs (see Chapter 3), the first official data science offering is a brand-new class, meant to serve as a rich introduction to what it means to practice data science. In institutions with less funding or expertise for course development, the need to get a program up and running may push toward more borrowing of content if not whole courses. However, a strong data science program is likely to need eventually to move beyond “patching together” a curriculum or class.
In this section, the committee describes the key challenges that academic institutions will face as they set up a program. But this section begins with an opportunity: an important element of program design is ensuring that the program is welcoming and inclusive to all students, regardless of their identity-related characteristics or educational background and attainment.
According to the South Big Data Innovation Hub’s Keeping Data Science Broad, “the variety of perspectives such diversity [in terms of race, gender, religious affiliation, socioeconomic status, ethnicity, and first-generation status] provides is as essential as that provided by the transdisciplinary nature of data science for innovation and growth of the field” (Rawlings-Goss, 2018, p. 29). The report explains that the first step in creating a more inclusive environment is to ensure that students and faculty alike—at all types of educational institutions—have equitable access to resources (e.g., high-quality data, tools, technology, adaptable and appropriate curricu-
lum, and advisors). Also crucial to retaining broad participation in data science is a “culturally relevant curriculum,” a more diverse faculty, and collaborations between majority-serving and minority-serving institutions (Rawlings-Goss, 2018, p. 31).
Thus, it is the responsibility of academic institutions to ensure inclusion and broad participation and engagement in data science programs. Master (2017) suggests that data science programs at higher education institutions increase exposure to data science fields, broaden beliefs about who belongs in these fields, challenge students’ beliefs about fixed abilities, and show that data science can make a difference in society in order to broaden participation and engagement in data science. Williams (2017) suggests that faculty adjust curriculum to be more inclusive, create opportunities for students to engage in community data, affirm student ability, and create diverse teams of students. The efforts highlighted by Master and Williams not only lead to increased engagement, but they also stand to sustain participation of underrepresented populations in data science. If data science is to avoid a similar decrease in participation that occurred in the 1980s in computer science among female students, it is imperative that underrepresented students are supported both academically and through mentorship, recognizing the opportunities that the field of data science presents and the value they can add to it.
Some of the introductory data science courses described in this report have made inclusion and broad participation a central goal, shaping pedagogy, technical infrastructure, and staffing. Some notable steps include the following:
- Designing the material to avoid the need for mathematics, statistics, or programming prerequisites beyond that required for entry to the academic institution, thereby avoiding demographic skews that such prerequisites might induce.
- Using a computing infrastructure that does not rely on personal laptops or access to computer labs; possibly hosting the infrastructure entirely in the cloud so that it can be accessed through a web browser.
- Providing teams of laboratory assistants and tutors to give additional support for students needing assistance.
- Choosing project topics carefully to be of broadest interest and to raise awareness of social issues.
- Operating a cohort-based “data scholars” program1 in concert with the instructional program to address issues of underrepresentation.
Additionally, the huge opportunity of data science to be a gateway to STEM careers should be emphasized. The wide range of applications of data science to multiple fields, including humanities, social sciences, and the arts, expands the reach of STEM into society. Couching data science in terms of a life skill and a cultural pursuit can help reshape the image of science and increase the number of students interested in STEM fields. Therefore, use cases should be drawn not just from other STEM or scientific disciplines; they should also be drawn heavily from the arts, humanities, social sciences, and popular culture to attract new entrants to the field.
As many data science programs are being freshly created, ample opportunity exists to build broad participation from the beginning. While not a panacea, there are several actions that data science programs can take to broaden participation. The Joint Working Group on Improving Underrepresented Minority Persistence in STEM offered the following recommendations for broadening participation of underrepresented minorities in STEM programs (Estrada et al., 2016):
- Track and increase awareness of institutional progress toward diversifying STEM;
- Create strategic partnerships with programs that create lift;
- Unleash the power of the curriculum and active learning;
- Address student resource disparities; and
- Stimulate students’ creativity.
The Joint Working Group points out that there are many programs that have been successful in attracting and retaining underrepresented students in STEM disciplines. For example, in 2017, over half of the computer science graduates from Harvey Mudd College were women (Williams, 2017). Harvey Mudd has succeeded in attracting and retaining underrepresented students in part owing to its commitment to fostering a “growth mindset” in these students (Dweck, 2006). Harvey Mudd faculty teach problem solving using real-world examples, offer four unique styles of one introductory computer science course based on student knowledge and interest, require students to work together in completing homework assignments, and actively encourage students to enroll in a subsequent computer science course (Williams, 2017). To engage potential future students in STEM, Harvey Mudd also hosts a conference at which African American and Hispanic middle and high school girls have the opportunity to build partnerships with professional women practicing in STEM fields. Data science programs may benefit from looking to these and other successful STEM initiatives as models for attracting and retaining underrepresented students.
The Joint Working Group also suggests that such programs are most effective when coupled with assessment and evaluation data that “clearly show the amount of progress or disparity that exists at the institutional level” (Estrada et al., 2016, p. 8). Another way to increase participation is to avoid filter or gate-keeping courses (especially early in the program) and replace them with courses that entice student participation through heightening the excitement and applicability of data science. It may also behoove data science programs to consider which faculty are teaching first-year or introductory data science courses, ensuring that these faculty members can connect with and engage students.
Data science programs also need to embrace multiple entrance points into the discipline—think of the metaphor of a “watershed” in which students from a variety of educational backgrounds and fields can enter, rather than a “pipeline” from one or more particular fields into data science. Varma (2006) agrees that the notion of the pipeline does not extend far enough, as underrepresented minority students face heightened entry and retention barriers. In combination with the recommendations from the Joint Working Group, increasing teacher assistant training, awareness for advising staff, communication between students and faculty, and partnerships with high school teachers could help postsecondary data science programs retain a more diverse student body (Varma, 2006). Curricular options such as minors or data science add-ons to substantive disciplines are two possible ways to open the data science enrollments. In addition to focusing on programs in science fields and broadening participation there, programs in popular areas such as music or communications could be targeted for outreach.
Finding 4.1: The nature of data science is such that it offers multiple pathways for students of different backgrounds to engage at levels ranging from basic to expert.
Finding 4.2: Data science would particularly benefit from broad participation by underrepresented minorities because of the many applications to problems of interest to diverse populations.
Recommendation 4.1: As data science programs develop, they should focus on attracting students with varied backgrounds and degrees of preparation and preparing them for success in a variety of careers.
The popularity of data science courses and programs will affect academic infrastructure in several ways—notably, in terms of who will “own” the program and how it will be delivered. Faculty and administrators will need to examine how the goals of data science education align with the institution’s current infrastructure. What departments and colleges should be involved? Because data science intersects with mathematics, statistics, computer science, and other domains, institutions need to consider whether data science needs to become a stand-alone department or be integrated with other departments. Administrators will need to consider ways to motivate departments to work with one another across disciplines, and department chairs will need to consider ways to motivate their faculty to participate in the implementation of innovative new curricula, whether or not they are in the “home” department. This holistic approach toward data science education is crucial, particularly given the interdisciplinary nature of the field of data science.
Furthermore, given its interdisciplinary nature, a new data science program at the undergraduate level needs to involve the collaboration of several disciplines and programs. However, few instructors are likely to be available who are equally able to teach classes in the full complement of fields. Initially, at least, creative ways of involving faculty from multiple departments is likely to be necessary, so that they can learn from each other and so that students get the broad view of data science that the committee envisions.
However, cross-departmental or institutional collaboration to develop data science programs may prove easier in theory than in practice. In some colleges and universities, academic tribalism and the increased importance of tuition generation might impede these programs from being truly interdisciplinary. Thus, the flexibility to hire or train faculty in the multiple aspects of data science will be necessary to ensure that all programs still achieve their educational goals.
As one example, consider Virginia Tech’s solution to the organizational model. Virginia Tech offers a major in computational modeling and data analytics. The departments that host the major (i.e., computer science, statistics, and mathematics) span two colleges (i.e., the College of Engineering and the College of Science), making interdisciplinary communication and cooperation extremely important. To foster productive collaboration among and within its five interdisciplinary programs, the College of Science set up the Academy of Integrated Science, which is a department-level organizational structure that helps interdisciplinary programs by managing budgets, undergraduate advising, student recruitment, and assessment. Having such a body in place allows faculty to focus solely on developing and delivering curriculum to the students. The
Academy of Integrated Science also develops a memorandum of understanding for new faculty hires that establishes their roles in both their home departments and in the interdisciplinary programs (Embree, 2017).
Such cross-departmental collaboration requires new mechanisms for both funding and encouragement. Opportunities for a wide variety of faculty to participate in data science programs will need to be created, as will incentives and rewards for those faculty teaching data science. Reward systems more generally may need to adjust to place greater value on teaching more students, especially when that means there will be greater diversity in their level of preparation. As data science begins to enter conversations in many disciplines, educators and administrators will have to consider the roles of the humanities, social sciences, and arts programs. There are also opportunities for developing programs for students in non-STEM fields, although there are risks that these become “data science-lite” programs that add limited marketable or intellectual value to students.
Several specific hurdles to launching and sustaining data science programs have been encountered and to some extent overcome at various academic institutions. Some of these challenges are associated with growing pains of starting up any new program that is in high demand:
- Overcoming initial resistance. One of the first challenges prospective programs have to overcome is initial resistance by established departments and programs to launching a new program that is in intellectual proximity and competes for tuition dollars and other resources. This is especially challenging for data science, as it has a large footprint across many professional, scientific, and engineering disciplines.
- Recruiting and retaining faculty. Another important challenge is recruiting faculty to create and teach integrative introductory courses in data science and to serve as advisors and mentors for data science students. Departmentally centered tenure and promotion criteria may lead junior faculty to be reluctant to devote much time to launching new programs. An additional challenge has been retention of data science faculty in an economic environment where faculty are increasingly lured away by industry.
- Developing curricula. It is often challenging to develop a consensus on a core curriculum that best serves the various interests and backgrounds of data science students. In this era where many of the existing data science-related courses are oversubscribed, other departments can be reluctant to enroll data science students in their popular courses (e.g., machine learning, data mining, natural language, applied statistics) because doing so may take seats away from their own students.
- Providing physical space. To be most effective, data science programs need flexible physical space to create the collaborative environment in which their students thrive. Such well-situated space is often scarce.
- Facilitating interactive experiences. There is a lack of sustainable and scalable models for capstone programs and similar experiential integrative experiences that have been shown by the Association of American Colleges and Universities (2013) and others to be high-impact educational practices.
- Encouraging industry partnerships. With high turnover in the industry workforce, colleges are facing the challenge of building lasting industry relationships to keep education and training well matched to the needs of the rapidly evolving data science workforce.
Additional infrastructure considerations include enrollment budgets, strategies to build a data science major curriculum (i.e., prerequisites, introductory, advanced, applied, capstone), and ways to align general education requirements with data science. Institutions will need to consider how to provide and share resources for their varied data science experiences (e.g., textbooks, teaching materials, open access, clearinghouse). Advising will also be important for the success of data science undergraduate programs. Formal evaluation methods will need to be implemented to gauge the success of these programs and improve them. The Moore-Sloan Data Science Environments (2018) have put forth some suggestions on creating institutional change in data science, including establishing a neutral space for students and faculty to gather, providing access to professional data scientists and research software engineers who can assist and serve as role models, developing a data science consulting capability, considering the scalability of data science educational initiatives, encouraging software and data openness and reuse, and involving a wide range of people in data-intensive discovery.
Another challenge will be that many fields involved with data science are themselves experiencing rapid change and evolution. As a consequence, data science curricula will also likely evolve rapidly, and programs need to be ready and willing to adapt. This will undoubtedly lead to the same types of questions that have been explored in computer science and other rapidly evolving fields in past years. Last, it behooves institutions to consider the alternative pathways students might take into data science by removing obstacles and barriers for students who want to change their concentration to data science during the course of their studies or making it easier to add a data science minor. Overcoming these challenges will require institutions to broker between competing interests, to recruit new faculty and staff in data science, and to make strategic long-term investments to sustain the activity.
Finding 4.3: Institutional flexibility will involve the development of curricula that take advantage of current course availability and will potentially be constrained by the availability of teaching expertise. Whatever organizational or infrastructure model is adopted, incentives are needed to encourage faculty participation and to overcome barriers.
A major driver of data science education has been the evolution of data and the infrastructure for accessing it and analyzing it. Hands-on experience with the entire data science life cycle is an essential part of the training and education of data science students, regardless of the educational modality. In particular, students need to be taught how to handle large amounts of data and how to run scalable but sophisticated analysis software on the data—often requiring distributed data storage, multicore processing, and parallel computation. However, maintaining large complex data sets and high-performance computing systems on college campuses strains the resources of educational institutions. While several of the larger research universities retain high-performance computing and large server facilities, most universities and colleges are in the process of transitioning their computing and storage to cloud service providers that provide students reliable access to their data and the computational resources to run algorithms against the data. Thus, the cloud has played and will continue to play an important role in transforming data science education. A logical next step might be for colleges to band together to federate these cloud resources under an “academic cloud.”2 Such a federated academic cloud could provide common platforms for students across the nation, facilitating data integration and analysis, reducing costs to educational institutions, and balancing inequities in access to instructional resources.
Finding 4.4: The economics of developing programs has recently changed with the shift to cloud-based approaches and platforms.
As discussed in Chapter 2 of this report, there is a progression of topics and skill sets that will guide students to develop data acumen. Key concepts required to develop data acumen include mathematical
foundations, computational foundations, statistical foundations, data management and curation, data description and visualization, data modeling and assessment, workflow and reproducibility, communication, domain-specific considerations, and ethical problem solving. These skills then become transferable into a range of data science positions in the workplace.
Each undergraduate modality discussed in Chapter 3 offers a unique pathway to various data science careers. The degree to which each concept or skill is emphasized in each modality depends upon the respective career trajectories. While a 4-year data science degree may be most appropriate for some data scientists, a 2-year associate’s degree may be better suited for others. And while a boot camp may help prepare a business professional to incorporate data analytics in the workplace, a data science minor may offer valuable training for data-driven decision makers in a variety of fields. It is important to note that, as the field of data science continues to evolve at a rapid pace, it will often be necessary to reevaluate the types of careers utilizing data science as well as the data science skill sets necessary to achieve success in those careers.
Mirroring the variety of pathways for data science education discussed in Chapter 3, there are a number of ways in which data science courses may be taught. Some data science courses, owing to their interdisciplinary nature, are taught either by a team of faculty or by two faculty with the appropriate areas of expertise to cover multiple perspectives. Though this approach offers the most well-rounded experience for students, it can be difficult to find the administrative support and additional resources needed. It remains challenging to recruit appropriate new faculty to teach both introductory data science courses and courses for the data science major or minor. Faculty need to have multiple experiences with data science projects to develop the perspective to guide their students. These faculty need to be diverse and have the ability to serve as role models for future data scientists, while also meeting competencies in the practical data acumen areas discussed in the previous section and in Chapter 2 of this report. As the field of data science expands, faculty are likely to be needed in an even broader range of competencies.
Considerations for current faculty are also necessary, as many will need to be retrained in new data science methods and tools, both of which will continue to evolve rapidly in the coming years. Faculty will also benefit from professional development in new teaching approaches to best meet the needs, learning styles, and knowledge levels of future undergraduate students. Such training will be especially useful for faculty
teaching introductory classes composed of students with various academic backgrounds and career interests. Funded by the National Science Foundation, Training a New Generation of Statistics Educators3 is an example of a program that creates professional learning communities whose members participate in workshops, mentorship programs, and national conferences, all in an effort to increase their statistical content knowledge and improve their teaching. The current program includes over 70 instructors from 2-year institutions across the United States (Posner, 2017). Some academic institutions have developed their own focused data science education programs for their faculty; for example, the University of California, Berkeley, offers summer workshops on the pedagogy and practice of data science to engage faculty across the university.4
Perhaps most challenging is retaining faculty in data science programs. Given their skill sets, professors of data science domains are highly sought after throughout industry (Kaminski and Geisler, 2012). While academic institutions are unlikely to have the resources to offer comparable salaries, they need to consider alternative incentives (e.g., opportunities for transdisciplinary collaboration, for pursuing open-ended research topics, for developing curricula, and for professional stability [e.g., tenure]) that will appeal to, entice, and retain data science faculty.
In order for data science programs to flourish, the progress of data science students needs to be assessed early and often. Jordan (2017) asserts that assessment cannot be conducted without a clear understanding of the core skills for data science—collection, analysis, visualization, and sharing—as well as a clear definition of student learning outcomes (perhaps inspired by Bloom’s Taxonomy). She suggests the following eight steps to create and assess the data science education classroom:
- Understand the audience to create an impactful and positive learning environment,
- Know what motivates the students,
- Develop a code of conduct,
- Create challenge questions and exercises,
- Incorporate qualitative mechanisms to improve teaching and quantitative mechanisms to gauge student learning,
3 The website for this National Science Foundation-supported project is https://www.nsf.gov/awardsearch/showAward?AWD_ID=1432251, accessed February 6, 2018.
4 An overview of a collection of pedagogy workshops can be found at https://data.berkeley.edu/news/2018-data-science-education-opportunities, accessed April 22, 2018.
- Design interventions around learning outcomes as well as around students’ needs,
- Conduct long-term follow-up, and
- Continue to build the data science community.
Association of American Colleges and Universities. 2013. Capstones and integrated learning. Peer Review 15(4).
Boland, R. 2014. NSF invests millions in academic cloud computing testbeds. Signal, August 21. https://www.afcea.org/content/nsf-invests-millions-academic-cloud-computing-testbeds. Accessed February 22, 2018.
Dweck, C. 2006. Mindset: The New Psychology of Success. New York: Ballantine Books.
Embree, M. 2017. “Forging Virginia Tech’s CMDA Major Across Departments.” Webinar Presentation to the Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective, October 10. http://www.nas.edu/envisioningDS. Accessed February 14, 2018.
Estrada, M., M. Burnett, A.G. Campbell, P.B. Campbell, W.F. Denetclaw, C. Gutiérrez, S. Hurtado, et al. 2016. Improving underrepresented minority student persistence in STEM. CBE Life Sciences Education 15(3):es5.
Jordan, K. 2017. “Assessing Data Science Learning Outcomes.” Webinar Presentation to the Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective, October 24. http://www.nas.edu/envisioningDS. Accessed February 14, 2018.
Kaminski, D., and C. Geisler. 2012. Survival analysis of faculty retention in science and engineering by gender. Science 335(6070):864-866.
Master, A. 2017. “Diversity, Inclusion, and Increasing Participation in Data Science.” Webinar Presentation to the Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective, November 7. http://www.nas.edu/envisioningDS. Accessed February 14, 2018.
Moore-Sloan Data Science Environments. 2018. “Creating Institutional Change in Data Science.” White paper. http://msdse.org/files/Creating_Institutional_Change.pdf.
Posner, M. 2017. “Go to the People: Impactful Faculty Training in Data Science.” Webinar Presentation to the Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective, September 26. http://www.nas.edu/envisioningDS. Accessed February 14, 2018.
Rawlings-Goss, R. 2018. Keeping Data Science Broad: Negotiating the Digital and Data Divide Among Higher Education Institutions. South Big Data Innovation Hub. http://bit.ly/KeepingDataScienceBroad_Report. Accessed March 28, 2018.
Varma, R. 2006. Making computer science minority-friendly: Computer science programs neglect diverse student needs. Communications of the ACM 49(2):129-134.
Williams, T. 2017. “Diversity and Inclusion in Data Science: Using Data-Informed Decisions to Drive Student Success.” Webinar Presentation to the Committee on Envisioning the Data Science Discipline: The Undergraduate Perspective, November 7. http://www.nas.edu/envisioningDS. Accessed February 14, 2018.