National Academies Press: OpenBook
« Previous: 5 Shared Resources
Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×

6

Workshop Lessons

Robert Kass (Carnegie Mellon University) led a final panel discussion session at the end of the workshop. Panelists included James Frew (University of California, Santa Barbara), Deepak Agarwal (LinkedIn Corporation), Claudia Perlich (Dstillery), Raghu Ramakrishnan (Microsoft Corporation), and John Lafferty (University of Chicago). Panelists and participants were invited to add their comments to the workshop; final comments tended to focus in four categories: types of students, organizational structures, course content, and lessons learned from other disciplines.

WHOM TO TEACH: TYPES OF STUDENTS TO TARGET IN TEACHING BIG DATA

Robert Kass opened the discussion session by noting that the workshop had shown that there are many types of potential students and that each type would have different training challenges. One participant suggested that business managers need to understand the potential and realities of big data better to improve the quality of communication. Another pointed out that older students may be attracted to big data instruction to pick up missing skill sets. And another suggested pushing instruction into the high-school level. Several participants posited that the background of the student, more than the age or level, is the critical element. For instance, does the student have a background in computer science or statistics? Workshop participants frequently mentioned three main subjects related to big data: computation, statistics, and visualization. The student’s background knowledge in each of the three will have the greatest effect on the student’s learning.

Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×

HOW TO TEACH: THE STRUCTURE OF TEACHING BIG DATA

Numerous participants discussed the types of educational offerings, including massive online open courses (MOOCs), certificate programs, degree-granting programs, boot camps, and individual courses. Participants noted that certificate programs would typically involve a relatively small investment in a student’s time, unlike a degree-granting program. One participant proposed a structure consisting of an introductory data science course and three or four additional courses in the three domains (computation, statistics, and visualization). Someone noted that the University of California, Santa Barbara, has similar “emphasis” programs in information technology and technology management. These are sought after because students wish to demonstrate their breadth of understanding. In the case of data science, however, students may wish to use data science to further their domain science. As a result, the certificate model in data science may not be in high demand, inasmuch as students may see value in learning the skills of data science but not in receiving the official recognition of a certificate.

A participant reiterated Joshua Bloom’s suggestion made during his presentation to separate data literacy from data fluency. Data fluency would require several years of dedicated study in computing, statistics, visualization, and machine learning. A student may find that difficult to accomplish while obtaining a domain-science degree. Data literacy, in contrast, may be beneficial to many science students and less difficult to obtain. A participant proposed an undergraduate-level introductory data science course focused on basic education and appreciation to promote data literacy.

Workshop participants discussed the importance of coordinating the teaching of data science across multiple disciplines in a university. For example, a participant pointed out that Carnegie Mellon University has multiple master’s degree offerings (as many as nine) around the university that are related to data science. Each relevant discipline, such as computer science and statistics, offers a master’s degree. The administrative structure is probably stovepiped, and it may be difficult to develop multidisciplinary projects. Another participant argued that an inherently interdisciplinary field of study is not well suited to a degree crafted within a single department and proposed initiating task forces across departments to develop a degree program jointly. And another proposed examining the Carnegie Mellon University data science master’s degrees for common topics taught; those topics probably are the proper subset of what constitutes data science.

A workshop participant noted that most institutions do not have nine competing master’s programs; instead, most are struggling to develop one. Without collective agreement in the community about the content of a data science program of study, he cautioned that there may be competing programs in each school instead of a single comprehensive program. The participant stressed the need to understand the core requirements of data science and how big data fits into data science.

Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×

Someone noted the importance of having building blocks—such as MOOCs, individual courses, and course sequences—to offer students who wish to focus on data science. Another participant pointed out that MOOCs and boot camps are opposites: MOOCs are large and virtual, whereas boot camps are intimate and hands-on. Both have value as nontraditional credentials.

Guy Lebanon stated that industry finds the end result of data science programs to be inconsistent because they are based in different departments that have different emphases. As a result, industry is uncertain about what a graduate might know. It may be useful to develop a consistent set of standards that can be used in many institutions.

Ramakrishnan stated that “off-the-shelf” courses in existing programs cannot be stitched together to make a data science curriculum. He suggested creating a wide array of possible prerequisites; otherwise, students will not be able to complete the course sequences that they need.

WHAT TO TEACH: CONTENT IN TEACHING BIG DATA

The discussion began with a participant noting that it would be impossible to lay out specific topics for agreement. Instead, he proposed focusing on the desired outcomes of training students. Another participant agreed that the fields of study are well known (and typically include databases, statistics and machine learning, and visualization), but said that the specific key components of each field that are needed to form a curriculum are unknown.

Several participants noted the importance of team projects for teaching, especially the creation of teams of students who have different backgrounds (such as a domain scientist and a computer scientist). Team projects foster creativity and encourage new thinking about data problems. Several participants stressed the importance of using real-world data, complete with errors, missing data, and outliers. To some extent, data science is a craft more than a science, so training benefits from the incorporation of real-world projects.

A participant stated that an American Statistical Association committee had been formed to propose a data science program model for a statistical data science program; it would probably include optimization and algorithms, distributed systems, and programming. However, other participants pointed out that that initiative did not include computer science experts in its curriculum development and that that would alter the emphases.

One participant proposed including data security and data ethics in a data science curriculum.

Several participants discussed how teaching data science might differ from teaching big data. One noted that data science does not change its principles when data move into the big data regime, although the approach to each individual step

Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×

may differ slightly. Temple Lang said that with large data sets, it is easy to get mired in detail, and it becomes even more important to reason through how to solve a problem.

Ramakrishnan recommended including algorithms and analysis in computer science. He noted that although grounding instruction in a specific tool (such as R, SAS, or SQL) teaches practical skills, teaching a tool can compete with teaching of the underlying principles. He endorsed the idea of adding a project element to data science study.

PARALLELS IN OTHER DISCIPLINES

Two examples in other domains that were discussed by participants could provide lessons learned to the data science community.

  • Computational science. A participant noted that computational science was an emerging field 25 years ago. Interdisciplinary academic programs seemed to serve the community best although that model did not fit every university. The participant discussed specifically how the University of Maryland structured its computational-science instruction, which consisted of core coursework and degrees managed through the domain departments. The core courses were co-listed in numerous departments. That model does not require new hiring of faculty or any major restructuring.
  • Environmental science. Participants discussed an educational model used in environmental science. An interdisciplinary master’s-level program was developed so that students could obtain a master’s degree in a related science (such as geography, chemistry, or biology). The program involved core courses, research projects, team teaching, and creative use of the academic calendar to provide students with many avenues to an environmental-science degree.
Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×

This page intentionally left blank.

Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×
Page 40
Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×
Page 41
Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×
Page 42
Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×
Page 43
Suggested Citation:"6 Workshop Lessons." National Research Council. 2015. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/18981.
×
Page 44
Next: References »
Training Students to Extract Value from Big Data: Summary of a Workshop Get This Book
×
Buy Paperback | $34.00 Buy Ebook | $27.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As the availability of high-throughput data-collection technologies, such as information-sensing mobile devices, remote sensing, internet log records, and wireless sensor networks has grown, science, engineering, and business have rapidly transitioned from striving to develop information from scant data to a situation in which the challenge is now that the amount of information exceeds a human's ability to examine, let alone absorb, it. Data sets are increasingly complex, and this potentially increases the problems associated with such concerns as missing information and other quality concerns, data heterogeneity, and differing data formats.

The nation's ability to make use of data depends heavily on the availability of a workforce that is properly trained and ready to tackle high-need areas. Training students to be capable in exploiting big data requires experience with statistical analysis, machine learning, and computational infrastructure that permits the real problems associated with massive data to be revealed and, ultimately, addressed. Analysis of big data requires cross-disciplinary skills, including the ability to make modeling decisions while balancing trade-offs between optimization and approximation, all while being attentive to useful metrics and system robustness. To develop those skills in students, it is important to identify whom to teach, that is, the educational background, experience, and characteristics of a prospective data-science student; what to teach, that is, the technical and practical content that should be taught to the student; and how to teach, that is, the structure and organization of a data-science program.

Training Students to Extract Value from Big Data summarizes a workshop convened in April 2014 by the National Research Council's Committee on Applied and Theoretical Statistics to explore how best to train students to use big data. The workshop explored the need for training and curricula and coursework that should be included. One impetus for the workshop was the current fragmented view of what is meant by analysis of big data, data analytics, or data science. New graduate programs are introduced regularly, and they have their own notions of what is meant by those terms and, most important, of what students need to know to be proficient in data-intensive work. This report provides a variety of perspectives about those elements and about their integration into courses and curricula.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!