Given the wide-ranging applications, potential impacts, and important implications for society, the committee began its reflections on the future of data science with aspects of ethical conduct as part of a broader set of skills and capacities.
Emerging data science technologies and methodologies (1) blur differences between “public” and “private” data, (2) offer more widespread access to data and related tools, (3) influence and affect society at large, and (4) create greater opportunities for deeper insights through the use and integration of multiple data sources. As a result, data ethics take on an ever more prominent role in both data science curricula and data science practice.
The Hippocratic Oath, which details the ideal conduct of physicians in terms of their treatment of patients and interactions with colleagues, has historically been affirmed by physicians to acknowledge their understanding of key ethical principles for their profession (Box 5.1). Similarly, the Canadian “Calling of an Engineer” ceremony for engineering graduates helps establish shared moral and social responsibilities (NSPE, 2009). The pervasive impact of data science suggests that a similar oath would be beneficial for data scientists, whose work has a direct impact on individuals throughout society and on the advancement of the body of scientific knowledge. Data science students learn to solve complex problems in the world and use data to make decisions, while understanding limitations of data sets and methods.
An oath of this sort may be helpful in formalizing the role of data ethics and to inspire future data scientists to practice with honor, “do[ing] no harm” to the subjects involved in or affected by their work. This oath also formalizes the professional role of the data scientist, offering guidance on appropriate conduct to those entering the field and encouraging collaboration across diverse communities.
What might a Hippocratic Oath for data science include? To explore this question, the committee developed the text in Box 5.2 as a preliminary form of a possible pledge for future data scientists. The proposed Data Science Oath highlights aspects of data ethics and the value of incorporating societal impact as part of data science education.
At the midpoint of its study, the committee finds that it is important that data science education incorporate real data, broad impact applications, commonly deployed methods, and ethical considerations, as well as provide support for work in teams. Other critical content areas include data description and curation, mathematical foundations, computational thinking, statistical thinking, data modeling, computing, reproducibility, and data ethics. Students would also benefit from developing deep analytic and communication skills so as to better work with large, complex data sets and engage with diverse audiences about real-world problems that data science can help solve. All of these promote the
development of data acumen. Highly trained and flexible faculty, innovative cross-disciplinary pedagogical approaches, and diverse participation would enhance learning experiences. Such programs’ successes can then be evaluated and assessed using the very tools of experimental design and analysis common in the field of data science.
The findings from the preceding chapters are restated below along with key questions on which the committee would like to gather public input.
Finding 2.1: A critical component of data science education is to guide students to develop data acumen. This requires exposure to key concepts in data science, real-world data and problems that can reinforce the limitations of tools, and ethical considerations that permeate many applications. Key concepts related to developing data acumen include the following:
- Mathematical foundations,
- Computational thinking,
- Statistical thinking,
- Data management,
- Data description and curation,
- Data modeling,
- Ethical problem solving,
- Communication and reproducibility, and
- Domain-specific considerations.
The necessary levels of exposure to each area will vary based on the overall objectives and duration of the data science program as well as the goals for the students.
- Which key components should be included in data science curriculum, both now and in the future?
- How could these components be prioritized or best conveyed for differing types of data science programs?
- How can opportunities to enhance data acumen (i.e., the ability to make good judgments and decisions with data) be integrated into data science educational programs?
- How can data acumen be measured or evaluated?
Finding 2.2: It is important for data science education to incorporate real data, broad impact applications, and commonly deployed methods.
- How can partnerships between industry and educational programs be encouraged?
- Could a focus on real problems serve as a means to attracting more diverse students?
- How can students gain access to real-world data sets?
Finding 2.3: Incorporating ethics into an undergraduate data science program provides students with valuable skills that can be applied to complex, human-centered questions across disciplines.
- How can ethical considerations be best incorporated throughout the data science curriculum?
- How can students be taught to apply ethical decision making throughout the problem-solving process?
Finding 2.4: Strong oral and written communication skills and the ability to work well in multidisciplinary teams are critical to students’ success in data science.
- How can communication and teamwork be fostered in data science programs?
- What type of multidisciplinary teams serve as effective models for the real world? Will these groupings be different in the future?
Finding 3.1: Data science curricula are enhanced by bringing together faculty from different disciplines, utilizing diverse pedagogical approaches, and building upon existing educational programs.
- What are known good practices for fostering collaboration between departments and existing programs?
- What new directions and opportunities exist for new curricular initiatives?
- What pedagogical approaches are particularly relevant to data science, both now and in the future?
Finding 3.2: Structured faculty training, meaningful incentives, and available time and funding to support curriculum development are all crucial to preparing faculty for data science education.
- What types of training would be beneficial to faculty?
- How could incentives be restructured to encourage more faculty development in data science?
Finding 3.3: Data science programs often adapt to the existing infrastructure and organizational structure of an academic institution, but infrastructure innovations by the institution (e.g., in data provision, data and code access, and data documentation) can help data science programs be more collaborative and multidisciplinary.
- What are current infrastructure obstacles and how can they be rethought going forward?
- How could organizational structures be modified and/or incentives added to encourage data science collaboration and innovation?
Finding 3.4: To keep up with the quickly evolving field of data science and recruit students with more diverse backgrounds, educational approaches in data science need to be flexible in terms of what concepts, skills, tools, and methods are taught; how students are recruited; and how departments and programs collaborate to provide a full data science experience to students.
- How can data science programs build in flexibility and adaptability so they can be most responsive to changes in the field?
- How can flexibility encourage more diverse students?
Finding 4.1: Data science has the potential to draw in a diverse set of students and build in broad participation from the onset, rather than trying to broaden participation later. However, strategies are needed to recruit and retain these students.
- How can broad participation, diversity, and inclusion be ingrained in data science programs?
- What strategies to recruit and retain diverse students can data science programs deploy, and what examples can inform these efforts?
Finding 4.2: Partnerships between 2- and 4-year institutions provide a valuable opportunity to develop innovative curricula, reach more diverse student populations, and expand the reach of data science education.
- How can partnerships between 2- and 4-year institutions be facilitated?
- How do the skills and concepts taught at a 2-year institution vary based on students’ goals?
- What aspects of data science education are appropriate and feasible to develop at 2-year institutions?
Finding 4.3: Data science programs would benefit from ongoing curricular evaluation, especially with respect to how well curricular objectives are being met and the degree of curricular
integration. Taking a cue from its own domain, these data could be used to inform data science instruction and curriculum.
- What evaluation and assessment objectives are currently being used in data science programs, and how will these differ in the future?
- What best practices in evaluation and assessment can inform data science programs?
- What data are available to evaluate the effectiveness of different data science approaches?
- What standard evaluation approaches should be adopted?
The committee seeks input from the growing data science community and the public on the following topics:
- Additional content for its study, including but not limited to case studies from institutions providing data science education, innovative ways to bring researchers together, best practices for program evaluation, and ideas for future topical webinars;
- The proposed Data Science Oath outlined at the beginning of this chapter; and
- The questions posed in the previous section.
Please visit the following webpage to provide input: http://www.nas.edu/EnvisioningDS.
This page intentionally left blank.