Data science educational opportunities for students are rapidly growing. Universities, colleges, community colleges, and other organizations are starting to offer a range of programs for students with different interests and backgrounds. This chapter provides an overview of the current landscape of classes and programs and identifies some of the key challenges facing those who aim to develop a data science program.
Undergraduate data science education is currently offered in many forms, and this variability is expected to continue in the near future. Common modalities include the following:
- Introductory exposure to data science, through a single inspirational course that could satisfy a general education requirement;
- Major in data science, including advanced skills, as the primary field of study;
- Minor or track in data science, where intermediate skills are connected to the major field of study;
- Two-year degrees and certificates;
- Other certificates;
- Massive open online courses (MOOCs), which can engage large numbers of students at a variety of levels; and
- Summer programs and boot camps, which can serve to supplement academic or on-the-job training.
As academic institutions add courses and programming around undergraduate data science, they will need to decide what modalities are institutionally appropriate, considering many factors such as student demand, faculty and institutional strengths and resources, and curricular fit. These choices may also be influenced by the existence of graduate programs in data science at the institution. Each of these modalities—with its strengths, limitations, and possible areas for improvement—is discussed in more detail in the following sections.
Introductory Exposure to Data Science
Several 4-year and, increasingly, 2-year academic institutions have some type of introductory data science course or sequence to educate students, and many more institutions will likely develop these courses in years to come, especially as degree programs are developed for students. These stand-alone courses and sequences provide interested students with an introduction to data science, attract students to data science majors or minors (if applicable), and broadly prepare students for an increasingly data-driven world.
Some institutions are implementing a general education offering or requirement in data science, which may be taught by multiple departments and offered via semester-long courses, modules, virtual sessions, self-guided instruction, or MOOCs. This flexibility ensures that data science may be integrated into other programs of study. Other institutions are providing introductory courses to meet the demands of interested students and to better prepare them to live and work in a world in which it is essential to know how to engage with data critically and carefully. Transferable data science skills include descriptive statistics, visualization, sampling, programming, managing databases, inference, and business analytics. Students from various disciplines seem eager to enroll in data science courses in part because the content may add value to their degrees.
The following are a few examples of institutions that provide an introductory data science experience:
- The University of California, Berkeley, a public research university, offers an introductory data science course, Data 8: Foundations of Data Science.1 This course is open to all students, regardless of
educational backgrounds or majors, and there are no prerequisites to enroll beyond those required for entry to the university. The course is cross-listed in the Department of Computer Science, the Department of Statistics, and the School of Information and is taught by an interdisciplinary team of faculty. Reflecting the level of student interest, it has grown from a pilot of less than 100 students in fall 2015 to over 1,100 in spring 2018, drawing from over 70 different majors (UC Berkeley, 2018). Each term, about a dozen wide-ranging “connector courses” are offered concurrently with the introductory course to connect it with areas of student interest, such as legal studies, cognitive neuroscience, geography, history, civil engineering, immunology, demography, psychology, business, and others.2 To further increase accessibility, the entire course experience is through Jupyter notebooks in the cloud, so students need only a web browser to participate fully.
- Amherst College, a private liberal arts college in Massachusetts, also offers an introductory data science course through its Mathematics and Statistics Department, STAT-231: Data Science.3 The course “provides a practical foundation for students to think with data by participating in the entire data analysis cycle. Students will generate statistical questions and then address them through data acquisition, cleaning, transforming, modeling, and interpretation . . . [and will use and apply] tools for data management and wrangling that are common in data science . . . to real-world applications” (Amherst College, 2017). The course has a prerequisite of some background in statistics and computer science.
- Carnegie Mellon University, a private research university in Pittsburgh, Pennsylvania, offers Reasoning with Data, a first course in statistics and data science focusing on concepts, interpretation, and communication. It is part of the required general education curriculum for all students in the Dietrich College of Humanities and Social Sciences. Students use an interactive platform that allows for analysis without using a specific programming language. The coursework includes several student-driven data analysis projects.4
- The University of Washington, a public research university in Seattle, offers Introduction to Data Science, which is cross-listed in
3 The website for STAT-231 is https://www.amherst.edu/academiclife/departments/courses/1718F/STAT/STAT-231-1718F, accessed January 25, 2018.
4 To view a sample syllabus from Reasoning with Data, see http://www.stat.cmu.edu/~rnugent/PUBLIC/teaching/200syllabus.pdf.
the Department of Statistics, the Department of Computer Science and Engineering, and the Information School.5 It uses the interactive textbook Computational and Inferential Thinking: The Foundations of Data Science (Adhikari and DeNero, 2018) and the Jupyter-based notebook environment of the University of California, Berkeley, course Data 8: Foundations of Data Science. Introduction to Data Science requires only precalculus as a prerequisite, and it addresses data collection and management, summary and visualization of data, basic statistical inference, and machine learning.
- Winona State University, a public university in Minnesota, offers an introductory data science course, DSCI 210: Data Science, that allows students to explore methods and techniques commonly used by data scientists. Participants have an opportunity to learn about data management, preparation, analysis, visualization, and modeling as well as to complete a data science project. Students are required to complete an introductory computing course as a prerequisite to enrolling in this class. DSCI 210 serves as an introduction to data science for nonmajors, but it also counts toward the requirements for Winona State’s B.S. in data science.6
- The computer science department at St. Olaf College, a private liberal arts college in Northfield, Minnesota, hosts the introductory course CSCI 125: Computer Science for Scientists and Mathematicians.7 In this course, students discuss how to handle, visualize, find patterns in, and communicate about data. Students have the opportunity to learn how to use common data science tools (e.g., Python and R) while working collaboratively on real-world problems. To enroll in the course, students need to have previous coursework in calculus.
- Montgomery College, a public 2-year college with campuses across Maryland, offers DATA 101: Introduction to Data Science.8 Students are required to have taken one of four approved statistics courses as a prerequisite to enroll in this course. Throughout the course, students explore methods to collect, organize, manage, examine, prepare, analyze, and visualize data. This introductory course can
5 To view a sample syllabus from Introduction to Data Science, see https://wstuetzle.github.io/IDS-syllabus-11-14-2017.html, accessed February 20, 2018.
6 The website for DSCI 210 and the B.S. in data science at Winona State is https://catalog.winona.edu/preview_program.php?catoid=10&poid=2474&returnto=958, accessed February 1, 2018.
7 The website for St. Olaf College’s computer science department is http://catalog.stolaf.edu/academic-programs/computer-science/, accessed February 1, 2018.
8 The website for DATA 101 is http://catalog.montgomerycollege.edu/preview_course_nopop.php?catoid=8&coid=11413, accessed February 1, 2018.
also be used to satisfy one of the requirements for Montgomery College’s Data Science Certificate, which adds courses on writing and communicating about data and using statistical methods, as well as a capstone experience.
Introductory data science experiences can also attract students to other data science programs offered by the institution, such as majors, minors, tracks, and certificates. These students include members of other disciplines, such as the humanities, social sciences, and the arts, as well as members of populations underrepresented in science, technology, engineering, and mathematics fields. Introductory data science courses offer a low-stakes opportunity (i.e., without barriers related to previous training or expertise) to investigate the field of data science and to be exposed to the skills and experiences that would be applicable to a wide range of future careers. Introductory courses serve as a possible gateway to degrees or specializations in data science, as they motivate students to see the valuable role that data science expertise will play in the future workforce.
Developing, implementing, and delivering an introductory data science course is not without its challenges, however. For example, while the prospect of offering introductory courses without prerequisites is most attractive for the purposes of broadening participation in data science, unrestricted enrollment must be reconciled with classroom capacity and instructor availability. Content is also likely to vary by course, owing to instructor expertise and interest and by student knowledge; this variability may create issues for students in terms of consistent preparation for future courses. These issues also confront developers of upper-level elective courses or of courses relating to a data science major, as it could be difficult to prepare content and develop learning outcomes for a population of students with such varied levels of knowledge and experience.
Introductory data science courses will continue to evolve and improve over time with innovative curriculum development that emphasizes real-world experiences and connections, ongoing faculty development opportunities, and cross-disciplinary collaboration among a wider spectrum of disciplines.
Major in Data Science
Data science majors are emerging across academic institutions and will continue to do so in years to come. Similar to introductory data science experiences, there is significant variation in program structure, goals, and content in these majors as well. Some data science majors are emerging as independent programs that interface with specific domain areas, while others are emerging as specializations within a given domain area.
The most common features of current data science majors include required courses in mathematics, statistics, and computer science. Within mathematics departments, requirements often include the following:
- Mathematics courses on linear algebra, calculus, and discrete structures;
- Statistics courses on introductory statistics, probability, and various kinds of applied statistics; and
- Computer science courses on database systems, programming, data structures, algorithms, and machine learning.
Some majors have courses listed as “data science,” including cross-listed courses with statistics and computer science, while other data science majors draw entirely upon courses from connected departments. Common topics taught under the “data science” listing include advanced data analytics, big data, data mining, simulation modeling, and computational thinking. Many data science majors include required or elective courses from outside the core departments—commonly economics, business, psychology, biology, and geography or geosciences. Many data science majors also require a hands-on practicum or capstone course to help reinforce skills.
Currently, many 4-year majors fall into one of three categories: (1) data science majors housed within a college or school of business (i.e., programs in business analytics, which usually involve more marketing and finance classes and fewer computational and mathematics courses); (2) data science/analytics majors housed in a mathematics or statistics department (i.e., above-average mathematics or statistics requirements with fewer “core” computational courses); and (3) data science programs housed in a computer science department as either a stand-alone major or as a concentration to information technology (i.e., more computational courses but potentially fewer “core” mathematics courses). Variations in courses offered and required within similarly labeled majors at different institutions are notable. A few 4-year undergraduate data science majors are hybrids of these three models, being administered jointly by multiple departments. The following list includes a variety of approaches to data science majors:
- The University of Michigan, a public research university in Ann Arbor, launched a major in data science in fall 2015. This major is a joint program offered by two departments in two different colleges: computer science in the College of Engineering and sta
tistics in the College of Literature, Sciences, and the Arts (LSA).9 The major requirements consist of a core of five courses: discrete mathematics, programming, data structures, probability and statistics, and applied regression. In addition to satisfying the core requirements, students select at least one course from each of three areas: machine learning, data management, and data science applications. Engineering students majoring in data science also take a computer professionalism course. Both LSA and engineering data science majors participate in a capstone experience, typically during their senior year.
- Smith College, a private liberal arts institution in Northampton, Massachusetts, began offering a major in statistical and data sciences in fall 2017.10 The program is not hosted by one campus department; instead, it draws on faculty and disciplines from across the college. The major requires 10 courses, including courses in statistics, computer science, data science, communication, and a domain area.
- Virginia Tech, a public research and land-grant university in Blacksburg, offers a major in computational modeling and data analytics in its College of Science.11 The major includes courses in mathematics, statistics, computer science, and data science, as well as a capstone experience.
- The University of California, San Diego, a public research university, launched a B.S. in data science in fall 2017 through its departments of cognitive science, computer science and engineering, and mathematics.12 The major consists of a technical lower division comprising mathematics, computer science and engineering, natural sciences, and five specific data science courses. The upper division has core mathematics, computer science, and data science courses; elective computer science and mathematics courses; and a senior project.13 On completion of the major, students need to be “versed in predictive modeling, data analysis and computational
9 The websites for the joint program at the University of Michigan are https://www.eecs.umich.edu/eecs/undergraduate/data-science/ and https://lsa.umich.edu/stats/undergraduatestudents/undergraduate-programs/majordatascience.html, accessed February 12, 2018.
techniques . . . [and have developed] undergraduate-level expertise in a specific subject area outside of data science” (UC San Diego, 2017).
- At the University of Rochester, a private university in New York state, the Goergen Institute for Data Science offers a major in data science in the form of either a B.A. or a B.S.14 While this major includes studies in computer science and statistics, as well as a capstone experience, it also allows students to take upper-level coursework in a focused domain area such as business, environmental science, biology, or political science.
- The Massachusetts Institute of Technology, a private research university in Cambridge, introduced a B.S. in computer science, economics, and data science. It seeks to equip students with a foundational knowledge of economic analysis, computing, optimization, and data science, as well as hands-on experience with empirical analysis of economic data to identify, analyze, and solve real-world challenges in real and virtual settings.15 Required courses are drawn from lists of preexisting courses in mathematics, electrical engineering and computer science, and economics covering linear algebra, discrete mathematics, probability and statistics, computation and algorithms, data science, intermediate economics, elective subjects from data science and economics theory, and communications practice.
- The University of California, Irvine, a public research university, houses a B.S. in data science in its Statistics Department, within the Bren School of Information and Computer Sciences. It consists of 16 lower division quarter courses, including nine computer science courses, four mathematics courses, and three statistics courses. Two of the three statistics courses are specifically related to data science. The upper division requirements include seven statistics courses, three computer science (including machine learning) courses, one writing course, one visualization course, three approved elective courses from mathematics/computer science/information, and a two-quarter capstone course.16
- The New York University School of Professional Studies offers a B.S. in applied data analytics and visualization. Its core curriculum
14 The website for the Goergen Institute for Data Science is http://www.sas.rochester.edu/dsc/undergraduate/index.html, accessed January 25, 2018.
15 The website for the MIT B.S. in computer science, economics, and data science is https://www.eecs.mit.edu/academics-admissions/undergraduate-programs/6-14-computer-science-economics-and-data-science, accessed February 20, 2018.
16 The website for the University of California, Irvine, data science degree is http://www.ics.uci.edu/ugrad/degrees/degree_datascience.php, accessed February 20, 2018.
consists of general education courses (e.g., writing, critical thinking, quantitative reasoning, scientific issues, and history, art, and culture) and liberal arts electives, with the following major requirements: mathematics (i.e., precalculus, calculus, and linear algebra), statistics, computer science (i.e., fundamentals, database design, networking, and systems analysis), eight specific data science and visualization courses, three electives in applied analytics and visualization, and a graduation project.17
Challenges in forming a major program include constructing gateways from established lower division courses to the advanced courses in the major, forming a set of upper division courses that covers the essentials of data science without essentially forming a double or triple major, incorporating a domain of application, addressing ethical issues and social implications, and considering a hands-on practicum or capstone integrative experience. Preexisting courses from computer science, statistics, applied mathematics, operations, and information management typically cover essential material but do so in the context of those majors. Forming new courses that integrate segments of several such courses may streamline the data science major but may overlap to some extent with existing course offerings. Thus, it is particularly important that the unique data science character be brought out when forming such amalgams.
Several institutions have developed courses that address these challenges in various ways. For example, Carnegie Mellon University’s statistics program builds on its introductory Reasoning with Data course with Methods for Statistics and Data Science. It focuses on regression and nonparametric methods, while requiring the use of R Markdown and GitHub in structured coding templates. Students design data analysis projects and write reports. Carnegie Mellon University’s computer science program offers an independent course, Practical Data Science,18 focused on data collection and processing, data visualization and presentation, statistical model building using machine learning, and big data techniques for scaling these methods.
Another example is from the University of California, Berkeley, which introduced the junior-level Principles and Techniques of Data Science.19
17 The curriculum for the New York University B.S. in applied data analytics and visualization can be found at http://www.sps.nyu.edu/academics/departments/mcghee/undergraduate/bachelors/bs-applied-data-analytics-and-visualization/core-major-curriculum.html, accessed February 12, 2018.
This course seeks to provide a second integrated experience of computational and inferential thinking that builds on the introductory foundational courses and adds mathematics background. The course is oriented around the data science life cycle. Rather than relying on the tool- and environment-agnostic pedagogy at the introductory level, this more advanced course provides direct exposure to current data science technology. It also seeks to provide background necessary for students to take advanced computer science and statistics courses that are particularly relevant to data science, without all of the conventional prerequisites of those majors. Topics include languages for transforming, querying, and analyzing data; algorithms for machine learning methods including regression, classification, and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.
Minor or Track in Data Science
For those undergraduate students who are not interested in or do not have the time to complete a 4-year major in data science, a minor or track in data science offers a useful alternative for introductory skill development. This may be a good approach for students in other domain-specific programs who want to gain data science expertise while remaining firmly in their chosen domain. In such programs, students are exposed to data science concepts, tools, and techniques that are related to their chosen discipline. A number of academic institutions offer minors or tracks in data science; the following is only a small selection of these programs:
- The Georgia Institute of Technology, a public research university in Atlanta, offers a minor in computational data analysis through its College of Computing.20 Required courses include Introduction to Computing for Data Analysis, Data and Visual Analytics, an approved introductory probability and statistics course, an approved computational methods course, and an approved computational data analysis elective.
- The University of Massachusetts Amherst, a public research university, offers a B.A. track in informatics in data science through its College of Information and Computer Sciences.21 Required courses include Mathematical Foundations of Informatics, an approved
20 The website for the Georgia Tech minor in computational data analysis is https://www.cc.gatech.edu/content/minor-computational-data-analysis, accessed January 25, 2018.
introductory statistics course, Problem Solving with Computers, Networked World, Human Computer Interaction, Web Programming, Databases, Informatics, Using Data Structures, Data Science, and approved electives in statistics, data science, social science, and/or business.
- The University of Washington offers a Data Science Option22 within several undergraduate programs, where each department may specialize its option within a common template. An option includes one course each in programming, machine learning, and societal implications of data science, as well as a course in two of three areas: data management, data visualization and communication, and advanced statistics and probability. Optionally, it may require Introduction to Data Science.
- Westminster College, a private liberal arts institution in Salt Lake City, Utah, offers a data science minor out of its Data Science Department.23 Required courses include Linear Algebra, Introduction to Statistics, either Introduction to Computer Science or Scientific Computing, Explorations in Data Science, two approved electives (both from mathematics, computer science, or statistics), and a capstone.
- Miami University of Ohio, a public research university in Hamilton, offers a business analytics minor hosted by its business department.24 Required courses include Business Statistics, Applied Regression and Analysis in Business, IT and the Intelligent Enterprise, Database Systems and Data Warehousing, Business Intelligence and Data Visualization, and two approved electives from data science, social science, and/or statistics.
- The Rose-Hulman Institute of Technology, a private college in Terre Haute, Indiana, offers a multidisciplinary minor in data science.25 Required courses include an approved introductory statistics course, Introduction to Software Development, Object-Oriented Software Development, and four electives from an approved list of statistics, computer science, and data science courses.
22 The website for the Data Science Option is https://www.cs.washington.edu/academics/ugrad/courses/data-science, accessed February 20, 2018.
23 The website for the Westminster College data science minor is https://catalog.westminstercollege.edu/undergraduate2016-2017/data-science-data/, accessed January 25, 2018.
24 The website for the Miami University of Ohio business analytics minor is https://miamioh.edu/fsb/academics/isa/academics/minors/business-analytics/index.html, accessed January 25, 2018.
25 The website for the Rose-Hulman multidisciplinary data science minor is https://www.rose-hulman.edu/academics/course-catalog/current/special-programs.html#multi-disc-minor-data-science, accessed January 25, 2018.
- Louisiana Tech University, a public research university in Ruston, offers a B.S. track in computer science, cloud computing, and big data through its College of Engineering and Science.26 Required courses include Calculus, Discrete Mathematics, Statistical Methods, Science of Computing, Systems Programming, Digital Design, Operating Systems, Theory of Computing, Data Structures, Advanced Data Structures and Algorithms, Programming Languages, Computer Architecture, Database Management Systems, Software Design and Engineering, Distributed and Cloud Computing, Computer Networks or Artificial Intelligence, Data Mining and Knowledge Discovery, and Advanced Data Mining, Fusion, and Application, plus approved electives in computer science and a capstone course.
Minors or tracks in data science give students a wider knowledge base, thus making them more marketable candidates for the future workforce—not only do these students have expertise in a particular domain area, but they also have valuable data science skills and experience that can be applicable to a variety of emerging career paths. Although the minor or track in data science is an attractive option for certain populations of students, such programs may be limited by time and space in their ability to help students develop data science foundations and skills in sufficient depth.
Two-Year Degrees and Certificates
Many 2-year institutions are starting data science programs using a wide variety of educational approaches, ranging from a few courses leading to a certificate to a full associate’s degree requiring 2 years of study. The blend among statistics, computer science, and applications is highly variable depending on the program. These programs will often simultaneously serve to (1) be an entry point to inspire and attract a wide variety of student populations to data science; (2) permit existing members of the workforce to retrain or obtain specific new skill sets to complement their education and experience; (3) create mechanisms by which students can certify specific or general skill sets with certificates or associate’s degrees; (4) build foundational, translational, ethical, and professional skills to support matriculation into 4-year data science programs; and (5) provide opportunities for advanced high school students to begin data science training early. The majority of these purposes support undergraduate
26 The website for the Louisiana Tech track in computer science, cloud computing, and big data is http://catalog.latech.edu/preview_program.php?catoid=3&poid=913, accessed January 25, 2018.
education objectives, while also targeting the specific needs of industry. For example, a data management program may focus largely on data systems, collection, and curation; a data analytics program may focus more on algorithms, statistics, and machine learning; and a business analytics program may focus on business and supply chain issues. Currently, most programs offer courses on data visualization, but relatively few programs offer courses on writing about data (communication in data science). A few of these programs are online only, while most have some online component. More than half of the 2-year institutions studied by the committee have their own unit of data science offering the data science associate’s degree or certificate, while programs that are hosted by other departments are typically in mathematics, business, or computer information systems. Some programs focus on training to acquire technical skills for industry and research through data science certificates (e.g., Montgomery College,27 Normandale Community College,28 Washtenaw Community College29). Others offer 2-year associate’s degree programs in data science that prepare students to transfer into 4-year programs. Two examples of the latter are the following:
- The Community College of Allegheny County, with multiple Pennsylvania campuses, offers an associate’s degree in data analytics technology30 and prepares students to transfer to a 4-year institution with a data analytics bachelor’s degree program or pursue employment in a variety of data science roles. Upon completion of their coursework, students need to be able to “identify and define data challenges in a variety of industries, collect and organize information from many sources, discover patterns and relationships in large data sets applying statistical tools, resolve business questions and make recommendations through data mining techniques, and communicate tactical and strategic objectives utilizing data visualization techniques” (CCAC, 2018).
28 The website for the Normandale Community College certificate in data analysis is http://www.normandale.edu/continuing-education/data-analysis, accessed February 12, 2018.
29 The website for the Washtenaw Community College certificate in applied data science is http://www.wccnet.edu/academics/programs/view/program/CTADS/, accessed February 12, 2018.
30 The website for the Community College of Allegheny County data analytics technology program is https://www.ccac.edu/Data_Analytics_Technology.aspx, accessed February 1, 2018.
- Nashua Community College, located in New Hampshire, offers an associate’s degree in foundations of data analytics.31 This degree program is also designed for students who want to pursue a bachelor’s degree in data analytics at a 4-year institution. Upon completion of this degree, students are expected to be able to “demonstrate technical proficiency and effective problem-solving ability in completing mathematical processes; identify the benefits of quality, timeliness, and continuous improvement in regards to software development; apply critical-thinking skills to identify, analyze, and solve problems; apply mathematical concepts to other disciplines including business, economics, social sciences, and the natural sciences; demonstrate the ability to follow a systematic progression of software development and refinement when designing and developing software for a project; participate effectively as a member of a team; articulate an understanding of the need for lifelong learning; demonstrate an understanding of diversity through interaction with project teammates; and develop software programs that reflect the application of up-to-date tools and techniques of the discipline” (Nashua Community College, 2018).
Two-year institutions throughout the United States have created certificate programs in data science. For instance, Maryland’s Montgomery College (2018) has recently built a 2-year certificate program around courses in mathematics and statistics; fundamentals of coding, writing, and communication in data science; and a capstone project applying Cross-Industry Standard Process for Data Mining methodology to create “original yet reproducible analyses in a variety of formats for the general public and for members of the data science community.” Outcomes from this program include a student’s ability to do the following:
- Assess different analysis and data management techniques and justify the selection of a particular model or technique for a given task,
- Execute analysis of large and disparate data sets and construct models necessary for these analyses,
- Summarize findings of complex analyses in a concise way for a target audience using both graphics and statistical measures, and
- Demonstrate competency with programming languages and environments for data analysis (Montgomery College, 2018).
31 The website for the Nashua Community College foundations of data analytics program is http://www.nashuacc.edu/academics/associate-degrees/stem-and-advancedmanufacturing/398-foundations-in-data-analytics, accessed February 1, 2018.
Although 2-year institutions confront many of the same obstacles in curricular and programmatic development as 4-year institutions, they also face many unique challenges. Administrators and faculty need to consider how best to develop content that will satisfy the needs of both students who may be interested in enrolling in only one course and those who plan to embark on a certificate program or an entire degree program. Furthermore, 2-year curricula are expected to engage students who seek an associate’s degree and specific workforce training as well as students who need to develop a broader skill set and knowledge base in order to transfer to a 4-year bachelor’s degree program. Two-year college faculty and administrators also have to be continuously aware of and responsive to the demands of the local workforce to be prepared to upskill both their transient students and established professionals as efficiently and as appropriately as possible.
Other certificates in data science are proliferating across academic institutions and industry, including at the undergraduate level (e.g., the University of Georgia Applied Data Science Certificate Program;32 the Temple University Certificate in Data Science: Computational Analytics;33 the University of Missouri, St. Louis, Certificate in Data Science;34 the Johnson County Community College Data Analytics Certificate;35 and the Great Bay Community College Certificate in Practical Data Science36); the graduate level (e.g., Georgetown University’s certificate in data science,37
32 The website for the University of Georgia Applied Data Science Certificate Program is https://csci.franklin.uga.edu/certificate-applied-data-science, accessed February 6, 2018.
33 The website for the Temple University Certificate in Data Science: Computational Analytics is http://bulletin.temple.edu/undergraduate/science-technology/computer-information-science/data-science-computational-analytics-certificate/, accessed February 6, 2018.
34 The website for the University of Missouri, St. Louis, Certificate in Data Science is https://www.umsl.edu/mathcs/undergraduate-studies/certificatedatascience.html, accessed February 6, 2018.
35 The website for the Johnson County Community College Data Analytics Certificate is http://catalog.jccc.edu/degreecertificates/computerinformationsystems/dataanalyticscert/, accessed February 6, 2018.
36 The website for the Great Bay Community College Certificate in Practical Data Science is http://greatbay.edu/courses/certificate-programs/data-practical-data-science, accessed February 12, 2018.
37 The website for the Georgetown University certificate in data science is https://scs.georgetown.edu/programs/375/certificate-in-data-science/?utm_source=Google&utm_medium=Search&utm_campaign=FY18_Search_Professional_Certificates&gclid=CjwKCAiAksvTBRBFEiwADSBZfI934rB_0ubuN0Zzmjr4xGtGmDLbei9zB5L8HN4MiLIzWEZN1BiKXBoC4ZIQAvD_BwE, accessed February 12, 2018.
the University of Michigan certificate in data science,38 and others39); and the executive level (e.g., Columbia University40 and Kellogg Executive Education at Northwestern University41). Online certificate options are also becoming common (e.g., edX MOOC42 and Coursera43) and offer an accessible alternative to in-person learning. (MOOCs are discussed more in the next section.)
While certificate programs can be a valuable resource in upskilling current members of the workforce and supplementing other existing educational modalities, the program offerings are varied and often difficult to compare. Some certificates are granted after the successful completion of a single course, while others require the completion of a multicourse series. The lack of standardization can make it challenging for employers and prospective students to assess what the certificates represent and how they may fit within their data science education and training goals.
Massive Open Online Courses
Several data science courses and programs are burgeoning as online courses or MOOCs. These offerings come in many forms, including standalone courses and series of multiple linked courses. Some MOOCs can provide certificates to demonstrate completion of a course or series.
39 For an online directory of graduate-level certificates in data science, see http://www.mastersindatascience.org/schools/certificates/, accessed February 12, 2018.
40 The website for the Columbia University Data Science Institute is https://industry.datascience.columbia.edu/executive-training, accessed February 12, 2018.
41 The website for the Kellogg Executive Education programs is http://www.kellogg.northwestern.edu/executive-education/individual-programs/executive-programs/bigdata.aspx, accessed February 12, 2018.
42 The website for the edX Data Science for Executives program is https://www.edx.org/professional-certificate/data-science-executives, accessed February 12, 2018.
43 The website for the Coursera Executive Data Science Specialization is https://www.coursera.org/specializations/executive-data-science, accessed February 12, 2018.
44 The website for the Coursera data science offerings is https://www.coursera.org/browse/data-science?languages=en, accessed February 20, 2018.
45 The website for the edX data science offerings is https://www.edx.org/course/subject/data-analysis-statistics/data-science-courses, accessed February 20, 2018.
- A Microsoft Professional Program in Data Science built on nine courses, starting from Excel, expanding to R and Python, and leading into a project.46
- A MicroMasters in data science by the University of California, San Diego, consists of four courses: Python for Data Science, Statistics and Probability for Data Science using Python, Machine Learning Fundamentals, and Big Data Analytics using Spark.47
- A MicroMasters in big data by the University of Adelaide consists of five courses focusing on programming, computational thinking, big data fundamentals and analytics, and a capstone project.48
- A version of the University of California, Berkeley, introductory course Data 8: Foundations of Data Science is offered as three 5-week courses forming a professional certificate: Computational Thinking with Python, Inferential Thinking by Resampling, and Prediction and Machine Learning.49
Current Coursera offerings include the following:
- A data science specialization, created by Johns Hopkins University with industry partnership from SwiftKey and Yelp, is made up of 10 courses designed to expose students to the full data science life cycle and concludes in a capstone project.50
- A specialization in probabilistic graphical models, created by Stanford University, requires a total of three 5-week courses in representation, inference, and learning.51
- A specialization in business statistics and analysis, created by Rice University with industry partner BaseMetrics, provides an overview of the tools and techniques used by data analysts in the business world via four courses and a capstone experience.52
46 The website for the Microsoft Professional Program in Data Science is https://www.edx.org/microsoft-professional-program-data-science, accessed February 20, 2018.
49 The website for the Data 8: Foundations of Data Science MOOC is https://www.edx.org/professional-certificate/berkeleyx-foundations-of-data-science, accessed February 20, 2018.
50 The website for the Coursera data science specialization is https://www.coursera.org/specializations/jhu-data-science#about, accessed February 20, 2018.
51 The website for the Coursera specialization in probabilistic graphical models is https://www.coursera.org/specializations/probabilistic-graphical-models, accessed February 14, 2018.
52 The website for the Coursera specialization in business statistics and analysis is https://www.coursera.org/specializations/business-statistics-analysis, accessed February 14, 2018.
- A specialization in data analysis and interpretation, created by Wesleyan University in partnership with DRIVENDATA and The Connection, teaches the fundamentals of data science through four project-based courses, using either SAS or Python, and a capstone experience.53
MOOCs can be valuable resources for fast on-ramping, supplementing curriculum, and educating the citizen scientist in data science, albeit with low completion rates (Chuang and Ho, 2016). They can also provide content that can be utilized in traditional classroom settings. However, limitations exist. In order for MOOCs to succeed, they need to be organized, cohesive, thorough, and interactive. MOOC developers, like any curriculum developers, will also confront the challenge of keeping pace with quickly evolving data science topics.
Summer Programs and Boot Camps
Summer programs and boot camps provide students the opportunity for focused data science experiences, which can supplement their formal education and on-the-job training.
Currently, there are relatively few data science-focused summer programs targeted toward undergraduate students—far fewer than those aimed at graduate students—but this is likely to change in the future as more undergraduate students are exposed to data science topics. The majority of these programs are organized by academic institutions and are designed to fast-track undergraduate students into data science by balancing experiential learning, projects with real data, and team building with some course instruction. Current programs tend to be small (from 8 to 40 students), which allows many to fully fund students through fellowships and scholarships. The programs are of varying durations, ranging from 1 week to 12 weeks.
Summer programs are often driven by a theme and target specific types of students—engineering/mathematics/computer science research-bound students (e.g., the Iowa State University Midwest Big Data Summer School54), underrepresented students from a broad spectrum of
53 The website for the Coursera specialization in data analysis and interpretation is https://www.coursera.org/specializations/data-analysis, accessed February 14, 2018.
majors (e.g., the Microsoft Research Data Science Summer School in New York City55), students interested in societal impact (e.g., the University of Chicago Data Science for Social Good56), or students interested in health applications (e.g., the University of Michigan Big Data Summer Institute57). Testimonials from students in the longer-duration summer schools indicate that they benefit from the long-term immersive environment, where they have a chance to develop community, communication, and teamwork. These immersive programs can be life changing, altering students’ academic and career pathways and often motivating them to go to graduate school.
Summer programs are an effective mechanism for student on-ramping that can energize students to develop a passion for data science. However, they are currently reaching only the relatively small number of students who are prepared for an intensive and sometimes stressful, fast-paced program. By their very nature, these programs work best when the small groups of students are together long enough to benefit from the immersive summer school experience, actively collaborating on learning, solving problems, and forming bonds of friendship. Scaling summer programs so that they reach a larger number of students faces several challenges. Developing a summer school demands a considerable amount of effort for faculty, requiring careful orchestration of a challenging quick-delivery curriculum accompanied by hands-on projects and other activities. There is scant availability of grant funding for development of the unique, multidisciplinary curriculum of data science, and there are not many funding sources for scholarships for students in need of financial aid. Furthermore, keeping the student-to-teacher ratio small will require additional resources for faculty and graduate teaching assistants that may not be available to academic institutions. There is evidence that programs can be more effective when they include a mix of highly motivated students that includes underrepresented minorities or economically disadvantaged students (Fine and Handelsman, 2010). An additional challenge is maintaining a healthy learning environment for students as they struggle to understand concepts in these fast-paced programs. Established academic and industry partnerships on data science summer programs are rare but would be worthwhile.
Many data science boot camps offered today are for-profit ventures that aim for professional development, corporate training, and continuing education (i.e., training and preparation of the data science workforce). They aim to teach large numbers of students. To do this, they frequently blend online and on-site learning, with the principal instructors online and local instructors on-site to reinforce training. The programs last from 1 weekend to 12 weeks and are organized into modules tightly coupled to workforce skills gleaned directly from industry. Many cover advanced topics like deep learning and recommender systems. These boot camps often have tracks that distinguish among subspecialties of data science, such as data engineering, data analytics, or data science for business.
Boot camps are often run by instructors who are practicing data scientists in industry. They can teach students to use a wider range of tools than students may experience in the classroom, and course content can quickly evolve as new tools emerge. Current tools include Pandas, BeautifulSoup, Seaborn, D3.js, R, and Python; however, data management, visualization, and analysis tools are continuously evolving. Boot camp participants may also have the opportunity to explore the data science life cycle in greater depth with student-driven projects: students may pose their own questions and collect their own data rather than starting projects after data have already been collected. With their deeper contacts with industry, boot camp instructors can offer direct links to the broader data science community. Many boot camps offer career advising, job placement, and networking opportunities as well.
Founded in 2013, Metis58 is a boot camp with locations in New York, California, Illinois, and Washington. Metis offers 12-week boot camps that aim to bridge the gap between industry and academia and serve as a complement to other learning mechanisms. Participants learn a combination of theoretical concepts and applications including how to ask a solvable question, scope projects, collaborate and communicate with multidisciplinary groups, and use emerging tools and technologies.
Another example of a data science boot camp is that provided by General Assembly, with locations throughout the United States and abroad. Its 12-week immersive program in data science,59 described as a “career accelerator,” includes units on Git, UNIX, and relational databases; data analysis and Python; machine learning, modeling techniques,
58 The website for Metis is https://www.thisismetis.com/data-science-bootcamps, accessed January 25, 2018.
59 The website for General Assembly’s immersive experiences is https://generalassemb.ly/education/data-science-immersive, accessed February 1, 2018.
and big data; critical thinking and synthesis; and visualization, presentation, and reporting.
DataCamp60 offers a unique boot camp experience in that all of its coursework is provided in an online learning environment. Twenty different “tracks” are offered, depending upon whether the participant wants to develop particular skills or to train for a new career, or whether the user wants to develop expertise in R or Python, for example. A number of “project” courses have been created to help refine and integrate earlier knowledge.
While these boot camp experiences can be positive for students, they also tend to be very expensive and therefore out of reach for many undergraduate students. Many boot camps are specifically designed for recent graduates or professionals seeking a career change and thus may not be appropriate for undergraduate students. The immersive experience that a boot camp offers may also not be the right model for certain people, depending on their schedules, their career or educational goals, their learning styles, and their previous training (Feldon et al., 2017).
However, the unique position of these boot camps at the interface between education and industry could offer lessons for data science undergraduate-focused boot camps and on-ramping activities. Their ability to adjust in real time to industry demands, their intent to deliver sustainable fundamental data science skills, and their emphasis on project-based experiences would be applicable and beneficial in other settings and will continue to serve as a complementary activity to online courses, advanced degrees, and hackathons.
An alternative model to the mostly for-profit boot camps discussed above is the workshop approach that has been extensively developed by the nonprofit Carpentries61 and given to thousands of participants worldwide. This organization has motivated a large number of students and researchers to receive training, initially focused on effective and quality software development through Software Carpentry and more recently on fundamental data skills through Data Carpentry. Their workshops are typically a few days in length and hosted by institutions and partners desiring a domain-specific introduction to basic data science for graduate students, faculty, and local researchers. The Carpentries accomplish this at low cost through volunteer instructors who complete an extensive training program. Although much of this enterprise has focused on more advanced students with explicit domain knowledge but little prior computational experience, the approach could be expanded to focus on undergraduates.
Finding 3.1: Undergraduate education in data science can be experienced in many forms. These include the following:
- Integrated introductory courses that can satisfy a general education requirement;
- A major in data science, including advanced skills, as the primary field of study;
- A minor or track in data science, where intermediate skills are connected to the major field of study;
- Two-year degrees and certificates;
- Other certificates, often requiring fewer courses than a major but more than a minor;
- Massive open online courses, which can engage large numbers of students at a variety of levels; and
- Summer programs and boot camps, which can serve to supplement academic or on-the-job training.
Recommendation 3.1: Four-year and two-year institutions should establish a forum for dialogue across institutions on all aspects of data science education, training, and workforce development.
There is considerable potential to infuse data science education into middle school and high school curricula, particularly in laying the groundwork for many of the aspects of data acumen discussed in the previous chapter (Finzer, 2013). There is an opportunity for college-level courses to drive data science content down into middle and high school curricula. For example, the curriculum from Jevin West’s and Carl Bergstrom’s course at the University of Washington, Calling B.S.: Data Reasoning in a Digital World, is being adapted by and adopted in high school classrooms across the country (UW, 2017). High school teachers have found the curriculum to provide an innovative method to teach students how to analyze information responsibly. Having such experiences at the high school level better prepares students both for postsecondary curricula and for the data-driven workforce that awaits them. For example, the New York Hall of Science62 provides an opportunity for children, young adults, and their families to increase their understanding of data science through various interactive museum exhibits, data fests, and mobile city science programs.
Such programs appeal to many students and community members as they offer an engaging alternative to traditional classroom learning.
However, infusing data science into the middle and high school curriculum is not a simple task, especially with changes in mathematics education as a result of the Common Core State Standards.63 Although, as of early 2018, 42 states and the District of Columbia have adopted their academic standards to be aligned with the Common Core (for some states with modifications), there are still some states that have not. This means that there is still variability across states in terms of the progression and coherence of learning opportunities that students are presented with for mathematics instruction. Teachers, particularly middle school educators, have expressed feeling overburdened with the standards as the development of curricular materials has lagged and guidance is still needed with respect to classroom implementation (Bay-Williams, Duffett, and Griffith, 2016). Whereas this could be viewed as a complication for a simple infusion of data science into at least the middle school classroom, it could also be presented as an opportunity given the need to generate materials instead of overhauling existing ones.
Course sequencing issues that are prevalent at the middle school level also exist at the high school level, although such issues can be heightened owing to limitations in the organizational structure of high school mathematics curricula. Specifically, some courses become “gatekeepers” to more rigorous mathematics courses, and students who are “tracked” through a particular sequence of courses may not be presented with the opportunity to develop strong postsecondary mathematical skills (Gamoran, 2009; Lucas, 1999; Lucas and Berends, 2002; Oakes, 2005). These issues can be compounded by issues of equity and access (Cha, 2015; Dondero and Muller, 2012; Lleras, 2008), which could have implications for students’ access to any data science instruction in middle and high school. Therefore, careful consideration is needed to ensure that the infusion or placement of data science into the mathematics curriculum for middle and high school allows for equitable access and opportunities for a broad spectrum of students.
Interested educators at both the middle and high school levels would benefit from access to more resources that will allow them to integrate data science concepts into their classroom teaching, especially as curriculum materials are still being developed.64 The data science community,
63 The website for the Common Core State Standards Initiative is http://www.corestandards.org/standards-in-your-state/, accessed February 23, 2018.
64 Dozens of high schools in California are already offering data science classes for eleventh and twelfth graders that combine statistics and programming instruction through hands-on activities. For more information, see Jones (2018).
perhaps through a future professional society (discussed in Chapter 5 of this report), has an opportunity to better engage middle and high school educators through national conferences and online information sharing mechanisms. Special consideration and outreach to schools with students from predominantly underrepresented backgrounds may allow for increased opportunities for access to data science concepts.
Adhikari, A., and J. DeNero. 2018. Computational and Inferential Thinking: The Foundations of Data Science. https://www.inferentialthinking.com/. Accessed April 17, 2018.
Amherst College. 2017. “Data Science.” https://www.amherst.edu/academiclife/departments/courses/1718F/STAT/STAT-231-1718F. Accessed January 25, 2018.
Bay-Williams, J., A. Duffett, and D. Griffith. 2016. “Common Core Math in the K-8 Classroom: Results from a National Teacher Survey.” https://eric.ed.gov/?id=ED570138. Accessed March 29, 2018.
CCAC (Community College of Allegheny County). 2018. “Data Analytics Technology (788): Associate of Science.” https://www.ccac.edu/Data_Analytics_Technology.aspx. Accessed March 29, 2018.
Cha, S.-H. 2015. Exploring disparities in taking high level math courses in public high schools. KEDI Journal of Educational Policy 12(1):3-17.
Chuang, I., and A. Ho. 2016. “HarvardX and MITx: Four Years of Open Online Courses—Fall 2012-Summer 2016.” http://dx.doi.org/10.2139/ssrn.2889436. Accessed April 1, 2018.
Dondero, M., and C. Muller. 2012. School stratification in new and established Latino destinations. Social Forces 91(2):477-502.
Feldon, D.F., S. Jeong, J. Peugh, J. Roksa, C. Maahs-Fladung, A. Shenoy, and M. Oliva. 2017. Null effects of boot camps and short-format training for PhD students in life sciences. Proceedings of the National Academy of Sciences 114(37):9854-9858.
Fine, E., and J. Handelsman. 2010. “Benefits and Challenges of Diversity in Academic Settings.” Brochure prepared for the Women in Science and Engineering Leadership Institute. http://wiseli.engr.wisc.edu/docs/Benefits_Challenges.pdf.
Finzer, W. 2013. The data science education dilemma. Technology Innovations in Statistics Education 7(2):1-9.
Gamoran, A. 2009. Tracking and inequality: New directions for research and practice. Pp. 213-228 in The Routledge International Handbook of the Sociology of Education, eds. M.W. Apple, S.J. Ball, and L.A. Gandin. New York: Routledge.
Jones, C. 2018. “Big data” classes a big hit in California high schools. EdSource, February 19. https://edsource.org/2018/big-data-classes-a-big-hit-in-california-high-schools/593838. Accessed March 22, 2018.
Lleras, C. 2008. Race, racial concentration, and the dynamics of educational inequality across urban and suburban schools. American Educational Research Journal 45(4):223-233.
Lucas, S.R. 1999. Tracking Inequality: Stratification and Mobility in American High Schools. New York: Teacher’s College Press.
Lucas, S.R., and M. Berends. 2002. Race and track location in U.S. public schools. Research in Social Stratification and Mobility 25:169-187.
Montgomery College. 2018. “Data Science Certificate: 256.” http://catalog.montgomerycollege.edu/preview_program.php?catoid=8&poid=1877&returnto=1322. Accessed January 25, 2018.
Nashua Community College. 2018. “Why Choose Foundations in Data Analytics?” http://www.nashuacc.edu/academics/associate-degrees/stem-and-advancedmanufacturing/398-foundations-in-data-analytics. Accessed March 29, 2018.
Oakes, J. 2005. Keeping Track: How Schools Structure Inequality. New Haven, Conn.: Yale University Press.
UC Berkeley (University of California, Berkeley). 2018. “Data 8: Foundations of Data Science.” http://data8.org. Accessed January 25, 2018.
UC San Diego (University of California, San Diego). 2017. “Data Science Undergraduate Program.” http://dsc.ucsd.edu. Accessed January 25, 2018.
UW (University of Washington). 2017. “Calling Bullshit” makes an impact at schools across the country. https://ischool.uw.edu/news/2017/10/calling-bullshit-makes-impact-schools-across-country. Accessed February 22, 2018.