Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
1 Introduction In the past decade, the world has been transformed by the rapidly evolving field of data science. This new science, which is already revolutionizing business, science, and society, builds on an array of technological developments, including the widespread use of smartphones and rapid technological progress in computing and communications. Massive investments have gone into building out wireless infrastructure and data centers (the cloud) and into leveraging such facilities. New methods have been developed to connect and understand the data being generated.1 In this new landscape, all individuals constantly generate data about their whereabouts, habits, and preferences. All parts of commerceâbrowsing, ordering, shipping, inventory, manufacturing, advertisingâhave gained a digital footprint. Social network sites illuminated relationships among billions of individuals, and tweets and posts made global-scale communication patterns instantly visible. Governmental bodies digitized and gave public access to vast corpora of data and documents. Most of recorded history and literature became digitized and accessible for algorithmic analysis. Electronic health records allowed medical analyses across populations and time, while genomic sequencing brought individualized treatment to the cellular level. Design and synthesis of pharmaceuticals, materials, and chemicals became computational. The volume of data being collected automaticallyâand the processing of such dataâsoared. New data-driven services arose (e.g., navigation apps, ride-hailing apps, and voice- driven assistants), exploited this new data-driven environment, and convinced the public of the power and elegance of the data-driven paradigm. Several of the highest market capitalization companies were heavily involved in digital transformation, displacing oil and car companies that had been market leaders for decades. These emblematic advances signal more extensive and widespread transformations to come. The smartphone, mobility, genomics, and cloud ârevolutionsâ are in fact only at their inception as technologists find ways to leverage them ever further. The increased use of Internet-connected home thermostats and fitness wristbands marked the beginning of the Internet-of-Things era, in which people 1 Science and engineering provide many notable examples of digital transformations in the previous decade foreshadowing the large public transformation now taking place. Examples from the 1995-2005 time frame include virtual observatories (see, for example, Szalay and Gray, 2001; NSF, 2018) and advanced computational methods (see, for example, Berman, Fox, and Hey, 2003). 1-1 PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION
are surrounded by an environment that is instrumented, communicative, and responsive. Meanwhile, rapid advances in machine learning are enabling new applications. This yearâs entering undergraduates, who may be in the workforce until roughly 2075, will face an employment landscape transformed by these developments. The data-driven era will spawn many new occupational niches based on the massive opportunities presented by new kinds and volumes of data even as it supplants traditional occupational categories. Today, the term âdata scientistâ typically describes a knowledge worker who uses the complex and massive data resources characteristic of this new era. However, data science is a broader concept involving principles for data collection, storage, integration, analysis, inference, communication, and ethics appropriate for this new data-driven era. Several industries and academic disciplines have perceived that a new field of data science is emerging out of several established fields, including information technology, computer science, statistics, mathematics, operations management, and business analytics. However, core data science concepts involving the aforementioned principles are not being conveyed by mainstream training in any one field because data science is not reducible to any of the preexisting fields. Data scientists of the future will need to be educated in the full scope of data science principles. There are many reports that industry finds itself constrained by todayâs relatively small supply of well-trained data science talent, and data scientist hiring demand has begun to increase rapidly; some projections forecast that approximately 2.7 million new data science positions will be available by 2020 (Columbus, 2017). Not only is the lack of data science talent an issue, but so too is studentsâ lack of understanding about what a data scientist is and what types of tasks such an individual might perform. It is imperative that educators, administrators, and students begin today to consider how to best prepare for and keep pace with this data-driven era of tomorrow. Undergraduate teaching, in particular, offers a critical link in offering more data science exposure to students and expanding the supply of data science talent. Many distinct data science roles will exist in the future workplace; both specialists and broad users with different levels of knowledge and different skill sets will be in high demand. Understandings of and applications for data science vary among professionals, within academic institutions, and throughout the broader world. One common observation is that data science is now essential in many academic fields (Hey, 2009) and can be both pervasive in and yet distinct from other disciplines. For example, data science techniques and tools may be applied commonly across a variety of disciplines, including those in the sciences and in the humanities. However, what gives data science its unique identity is that it draws on individual skills and concepts from a wide spectrum of disciplines that may not always overlap with one anotherâa truly multidisciplinary field. As discussions continue regarding the distinctions among data science, computer science, statistics, and other fields, many U.S. academic institutions are considering how to best deliver data science education and thus better prepare graduates for the data-driven era that lies ahead of them. The need for data science instruction is much broader than just the major and extends to a wide range of students from varied programs. Depending on the studentsâ levels of interest and career goals, as well as institutional goals and resources, one can envision a variety of models for data science instruction, including discipline-centered data science courses offered by specific academic departments focusing narrowly on the skills needed by that departmentâs majors, large introductory data science courses serving the campus-wide student body, highly structured course sequences within a formal data science major, online courses, boot camps, and other innovative approaches. To achieve this vision, data science education and practice demand a level of collaboration not necessarily seen in other fields, new approaches to evaluating educational outcomes, and a constant eye toward refining and evolving the undergraduate experience as this field continues to advance. Stand-alone data science departments may emerge naturally on some campuses when the level of collaboration surpasses the bandwidth of currently 1-2 PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION
established departments or when the student demand increases greatly. However, developing stand-alone departments is not the only means of effective delivery of data science education nor may it be appropriate in all settingsâequipping students with data science skills can be done through a variety of pathways, as will be discussed in this report. A LOOK TO THE FUTURE Imagine it is now 2040. Students born in 2018 are graduating from college. It is more than 30 years since billions of autonomous sensors and devices started continuously delivering data to cloud- based databases, which record the states and activities of vehicles, buildings, customers, patients, and citizens. Many other data-driven changes that were difficult to foresee have become pervasive and important. Thus, it is not farfetched to expect academic institutions to envision the data-driven world of 2040 as they shape the future undergraduate experience. In the ideal case for the future evolution of data science, all private industries and public agencies would use data confidently and efficiently to operate fairly without gender or racial bias. Data science jobs would be plentiful. While some of these data science jobs would require vocational education, other data science subspecialties would require certificates, associateâs degrees, and bachelorâs degrees. Efforts would have been undertaken to distribute the workforce equitably over rural, urban, and suburban regions; socioeconomic strata; and ethnic identities. The importance of data skills would be appreciated in all high schools, and the vast majority of high school graduates would have basic understanding of data science. Data science methods would be used by data science programs to continuously evolve to meet the needs of their students. Data scientistsâ work would be varied, and different skill mixes would be needed for different data science positions. Some of these individuals would have been trained in particular fields but have learned data science along the way. Others would have explicit degrees in data science. For those who need a degree in data science for their work, there will likely be many options. They might earn those degrees remotely, on site, or in combination. They might learn through a combination of interactive web applications and augmented reality simulations, interactions with fellow learners and multidisciplinary faculty, and immersive industry apprenticeships. Students in 2- and 4-year institutions would be exposed to important concepts through a range of motivating applications. Humanities, social sciences, and professional education (e.g., music, art, and architecture) would be taught for enrichment, for building cross-disciplinary communication skills, and as contexts in which to provide examples of different types of data. Ethical data concepts like privacy, justice, fairness, and reproducibility would be taught continuously in safe spaces where students learn from their mistakes without penalty and without harm to others. Faculty would use data science to continuously monitor their studentsâ progress and to adapt their curriculum to ensure student competency, confidence, and well-being with respect to the needs of industry, government, and society. The committeeâs vision for the world of 2040 has many debatable elementsâwhether the transformations just described will actually go nearly as far as depicted or whether this mostly utopian vision will develop dystopian elements. This much is not debatable: the undergraduate instructional framework will need to transform if it is to support the transition from the world of 2018 to the likely world of 2040. This report outlines some considerations and approaches for academic institutions and others in the broader data science communities to help guide this transformation, but it is not intended to be a final word on undergraduate data science education. This vision needs to be continually evolved and refined as the field matures. 1-3 PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION
REPORT OVERVIEW In Chapter 2, the committee considers what data science professionals will need to know. Because expectations and tasks for data scientists will vary across industries and over time, it is important to consider the skill sets, learning outcomes, and ethical considerations best suited for individual undergraduate students to be successful in their future careers. In Chapter 3, the committee lays the groundwork for exploring how these data science students can be educated and thus well prepared. Using data from existing data science education programs, the committee discusses the successes and challenges associated with implementing and delivering 2- and 4-year undergraduate programs and classes, alternative courses, and interdisciplinary approaches in an effort to guide individual institutions to follow the pathways that simultaneously align with their missions and meet the varied needs of the field of data science. In Chapter 4, the committee describes a number of challenges that arise in creating a new data science program. Acknowledging that the field of data science and the content of data science education will continue to change rapidly, the committee considers how to evolve from current to future data science education and practice in Chapter 5. The committee evaluates strategies to refine educational and administrative infrastructure, create professional development opportunities, and utilize professional societies. In Chapter 6, the committee offers a summary of its findings and recommendations that appeared throughout Chapters 2 to 5. REFERENCES Berman, F., G. Fox, and A.J.G. Hey, eds. 2003. Grid Computing: Making the Global Infrastructure a Reality. West Sussex, UK: Wiley. Columbus, L. 2017. IBM predicts demand for data scientists will soar 28% by 2020. Forbes, May 13. Hey, T., S. Tansley, and K. Tolle, eds. 2009. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, Va.: Microsoft Research. NSF (National Science Foundation). 2018. âHistory: The NEON Project.â http://www.neonscience.org/observatory/history. Accessed February 6, 2018. Szalay, A., and J. Gray. 2001. The world-wide telescope. Science 293(5537):2037-2040. 1-4 PREPUBLICATION COPYâSUBJECT TO FURTHER EDITORIAL CORRECTION