What implications does the emergence of data science have for colleges and universities? This fundamental question leads to many other related questions, such as the following: How are tools and methods used in data science blended with other disciplines? How can students benefit from data science being delivered in unique ways? How does data science span siloed disciplines?
Before making curricular changes, an academic institution needs to take into account its infrastructure, its budget, and its business model, as well as the potential collateral benefits for the rest of the institution. Some universities start with curricular changes at the master’s level because those programs are generally easier to develop than undergraduate programs; however, because professionals with undergraduate degrees will be using different skill sets to fill different workforce roles than those with graduate degrees, essential data science skills training needs to be included at all levels of postsecondary education.
This chapter describes alternatives and models for innovative curriculum development, provides suggestions for institutions, and features examples of both innovative data science curricular approaches (see Box 3.1) as well as innovative approaches to evaluation.
New and modified courses may be necessary to teach in the emerging data science field. Curriculum development would benefit from drawing on the experiences of both new faculty with firsthand knowledge of emerging areas and more seasoned faculty with experience developing other curriculum initiatives. Co-curricular activities (activities that are connected to or mirror the academic curriculum) and interdisciplinary approaches also enhance educational experiences. Both established good practices and careful evaluations can help inform decisions among different approaches. There are a number of educational models that can be implemented in a data science curriculum depending on the curriculum goals that are identified. For example, a variety of educational pathways may each have their advantages in preparing students for their respective careers, whether these careers are vocational or professional. These pathways may include majors (and minors) centered on data science broadly, distinct data science concentrations in various fields (such as business, computer science, and statistics), or 1-year courses and certificate programs. Although full degree-granting programs in data science may not be available yet in many settings outside top-tier academic institutions, graduates will still need to come from community colleges, minority-serving institutions, and smaller colleges and universities in order to fill the pipeline of data talent. Professional master’s programs provide another pathway, because they have been helpful in some other interdisciplinary fields in providing a flexible environment to explore new educational programs and preparing students for the kind of workforce positions that data science is already providing.
The process whereby the goals are achieved can be varied. In terms of delivering content, flipped courses, hybrid courses, independent studies, experiential learning, modular courses, hackathons,1 data dives,2 and just-in-time learning are all viable options for students, although it is important for course design to align with course objectives and that courses are taught by faculty with the right instructional skills. In terms of creating content, it can be beneficial to teach some courses with a disciplinary context so that students appreciate that data science is not an abstract set of approaches.
One of the limitations of course deployment is the ability of faculty to teach data science. Co-teaching allows faculty with diverse areas of expertise the opportunity to collaborate and offer a more well-rounded course to the students; however, additional administrative support is often necessary, given the often larger resource needs for co-taught courses. Faculty may need to participate in re-training and faculty development to keep pace with changing technologies. Portable courses, where a course developed for one institution is replicated at another, help supplement faculty knowledge gaps.
The differences among various programs, and the types of employees they create, are often unclear to both hiring organizations and human resource departments. Having industry involved in developing and/or retooling data science courses can help ensure that programs meet workplace needs and that students going through these data science programs have employment opportunities upon completion. Improved collaboration can also help shape and enhance career paths in industry with positions that can both utilize data science skill sets and provide interesting opportunities for growth. Better integration of industry teachers in academia could help foster this collaboration.
A first step in establishing a new curriculum is to consider relevant experiences from other disciplines (e.g., the digital humanities) that have recently emerged from a period of reorganization and innovation. However, it is important to note that no single curriculum will be appropriate for all institutions. It may be necessary for educators to consider possible new disciplines and alternative degree structures and student pathways. Curricular development for quantitative material typically involves prioritizing concepts and skills, using educational research to guide pedagogy, borrowing or adapting existing materials, sequencing coverage of concepts, enhancing high-priority concepts and skills through repetition, and assessing the results. Diverse pedagogical approaches are valuable; institutions could begin by encouraging collaboration between departments and existing programs in considering new curricular initiatives.
For data science curricula specifically, course assignments and exercises can help motivate and ground exploration if data sets, case studies, and examples are chosen thoughtfully. Current research problems may be a good source of compelling topics. Heterogeneity of examples from diverse disciplines is desirable for better, richer team experiences, and data science education will evolve as data science itself evolves.
The evaluation and assessment of different approaches benefit from being grounded in the same theory of change3 that informs curriculum development, including specifying curriculum goals, identifying the right comparison group, and stating clearly the curricular intervention. It is helpful to disseminate the analytical results broadly so that the field can learn from both successes and failures.
1 A “hackathon” is a competition in which programmers and analysts work together to complete a task, usually for a prescribed and limited time period.
2 A “data dive” is an event in which organizations, often nonprofits, present a data-driven problem to a group with data science expertise to solve in a limited amount of time.
3 The Theory of Change is “a comprehensive description and illustration of how and why a desired change is expected to happen in a particular context. It is focused in particular on mapping out or ‘filling in’ what has been described as the ‘missing middle’ between what a program or change initiative does (its activities or interventions) and how these lead to desired goals being achieved. It does this by first identifying the desired long-term goals and then works back from these to identify all the conditions (outcomes) that must be in place (and how these related to one another causally) for the goals to occur” (Center for Theory of Change, 2016).
Finding 3.1: Data science curricula are enhanced by bringing together faculty from different disciplines, utilizing diverse pedagogical approaches, and building upon existing educational programs.
Evaluation, Assessment, and Accreditation
Assessment of student skill and conceptual development within formal courses benefits from being informed by the overall objective of the program and its associated curriculum. It is helpful for departments and institutions to consider criteria and methods for evaluating the entire program, particularly for interdisciplinary programs such as data science. Program evaluation approaches would be best implemented throughout the process of program/curriculum development from the time when the objectives of the program are first considered. Such an evaluation process can inform the formative and summative assessment methods used within the program and may suggest particular metrics for success. Indeed, the methods of data science could readily be applied to ascertain the data to be collected, the analysis methods to be used, and the metrics through which any analysis can specify program success or suggest efforts to adapt or modify the program to meet the success metrics. Having an evaluation process in place could assist institutions in preparation for any formal accreditation that is already being utilized or that might arise as the field of data science grows. It could also enable program designers to characterize the outcomes of students trained in data science relative to comparable students trained in other fields.
Fortunately, there is substantial literature available on ways to build evaluation and assessment into program design. In a very influential series of papers, Handelsman et al. (2004) argue that new types of science need to adopt active learning techniques. By this, they mean changing teaching from a lecture-based format to one that has both inquiry-based and modular-learning components and that treats students as scientists who “develop hypotheses, design and conduct experiments, collect and interpret data, and write about their results” (Handelsman et al., 2004). Handelsman et al. (2007) note that the line between active learning and assessment is difficult to define, but teaching that promotes students’ active learning (e.g., asking students to perform an action or task) can then help assess understanding. Hoey (2008) suggests that the key measures include the following:
- Knowledge of concepts in the discipline;
- Ability to conduct independent research;
- Ability to use appropriate technologies;
- Ability to work with others, especially in teams; and
- Ability to teach others.
Of course, new educational techniques have been scientifically evaluated in other contexts. The Department of Education’s What Works Clearinghouse4 provides a compendium of the results of different interventions at different grade levels. Universities are partnering with government agencies to link their data to administrative records to trace the earnings and employment outcomes of students well beyond their graduation date. For example, a pilot partnership among a number of universities (the University of California system, the University of Michigan, the University of Wisconsin, and the University of Texas), the Institute for Research on Innovation and Science,5 and the U.S. Census Bureau is linking individual-level transcript data to Longitudinal Employer-Household Dynamics6 program data. The plan is to scale the pilot nationally, if successful. The advantage of linking transcript data to administrative records is not only that it provides longitudinal information on the educational experience
of students, but also that it permits the construction of comparison groups—outcomes of groups of students who took data science classes can be compared with those of groups who did not take data science classes.
There is much to be learned from the experience in other fields in moving from a curriculum based on providing content to one that is both interdisciplinary and concept-driven. In the biological sciences, Gutlerner and Van Vactor (2013) argue forcefully for the development of modular classes—what they call “nanocourses.” Their approach brings together students from multiple backgrounds, engages faculty from a variety of disciplines, and creates “small discussion group activities that allow students to practice framing experiments into larger scientific contexts and disciplines” (Gutlerner and Van Vactor, 2013).
Faculty who are already busy teaching, engaging in faculty service, and conducting research, and who may not have deep expertise in data science topics, can find it hard to find time for curriculum development. The structure and content of faculty training and incentives are crucial, as is time and funding to support curriculum development. Networks built among multiple faculty from multiple disciplines and professional development offered on a regular basis can enhance interdisciplinary innovation. If these do not occur, faculty may be less equipped to teach their students, and institutions might not benefit from the cross-department educational collaborations that characterize successful data science programs. When data science programs or curricula are being established, it is important for academic institutions to consider the balance of talent throughout institutes and within departments.
Finding 3.2: Structured faculty training, meaningful incentives, and available time and funding to support curriculum development are all crucial to preparing faculty for data science education.
Structures of Academic Institutions
An institution’s infrastructure and organizational structure can shape, and perhaps limit, the possibilities for future data science programs. Economic structures, especially tuition-driven models, often form silos within departments and between disciplines. For example, if there are no mechanisms in place to adjust tuition payments, faculty pay rates, faculty course-load distributions, and general education requirements to accommodate cross-department and cross-disciplinary course offerings, data science course options could be more limited in scope and reach a smaller audience of students. Institutional considerations about whether to admit students to a particular program or a general program or to deliver only in-person courses may also affect a data science program’s ability to reach a more diverse group of students. Institutions offering flexible options for students who would like to enter the workforce with a wider range of skills may be more successful than those that offer only very restrictive degree programs. Academic institutions could benefit from the creation of an office devoted to campus data initiatives, such as data science education programs. Such an office could include support for teaching and a data facility that would provide access to data, including a secure environment for confidential data as well as information about data standards and reusable code (such as Jupyter Notebooks). Another possible consideration would be a “quantitative sciences education center” that is similar in structure and objectives to writing, computing, and statistics centers at most institutions but focused on development of broad student appreciation for quantitative sciences, including data science.
Finding 3.3: Data science programs often adapt to the existing infrastructure and organizational structure of an academic institution, but infrastructure innovations by the institution (e.g., in data provision, data and code access, and data documentation) can help data science programs be more collaborative and multidisciplinary.
Importance of Flexibility
The rapid flux in data science concepts, tools, and applications make flexibility and adaptability key to developing, achieving, and maintaining successful data science programs. Although students are generally becoming more comfortable with some computing and data science contexts (e.g., more accustomed to connectedness in every facet of society, more accepting of new technologies like artificial intelligence, and more adept with software and devices), it is important to identify and address potential gaps in knowledge to ensure data science programs are accessible to all future students, regardless of past experiences. Approaches to attract students who are interested in data science but have less quantitative backgrounds are important. Employers’ needs will also likely evolve as new skill sets are required. Educational institutions are also likely to experience change in their options for delivery, program types, and course content. Faculty flexibility in developing and modifying appropriate and timely course content to keep pace with the changes in the surrounding world may be invaluable. Instead of creating one-size-fits-all approaches to teaching data science concepts, institutions will have to remain open-minded and flexible to best meet the needs of students and the workplace.
Finding 3.4: To keep up with the quickly evolving field of data science and recruit students with more diverse backgrounds, educational approaches in data science need to be flexible in terms of what concepts, skills, tools, and methods are taught; how students are recruited; and how departments and programs collaborate to provide a full data science experience to students.