Throughout this report, the organization of the work of digital curation has been discussed as a continuum. At the one end are a great variety of professions—research scientists, business analysts, even video editors and sound masters. Digital curation may be only an occasional task, but is nonetheless an essential part of their jobs. At the opposite end are specialists whose work consists primarily, if not exclusively, of the active management and enhancement of digital information assets for current and future use. That trope of a continuum will be continued here, in considering the education and training of a digital curation workforce. Preparing that workforce will require educational opportunities that span the entire continuum. This will include graduate-level education in digital curation for some, discrete study programs and certificates for others, perhaps supplementary courses inserted into established curricula in other fields, or exposure through online courses or conferences. Because the activities of digital curation are conducted along a continuum by a broad spectrum of workers, the educational opportunities appropriate for training the digital curation workforce also cover a large span.
As noted in Chapter 1, the two ends of the continuum may be very distant from each other, but they are also connected. This too has implications for preparing the workforce. Digital curation specialists will need some knowledge of the disciplines and domains in which the digital information they curate will be used. Without some familiarity with the problems to be addressed, the goals to be pursued, as well as the customary methods, nomenclature, and practices of the fields in which the digital information assets are used, curators will not be able to make good decisions as they manage and enhance those assets for current and future use. Similarly, those who conduct curatorial activities as only a small part of their work, will need some study and command of the knowledge and skills of digital curation, regardless of how well they are educated in their own domains.
This chapter is the committee’s consideration of education across the entire continuum. It presents a vision of a proper educational program for a digital curation specialist. It also reflects on how a more limited program of study might be inserted or integrated into the preparation of other professions, whose practitioners will also have some responsibilities for digital curation. It then surveys the educational opportunities currently available along the entire continuum, for students training to be digital curation specialists, students acquiring some knowledge of curation while pursuing degrees in their other disciplines, and midcareer employees seeking to upgrade their skills. The chapter concludes by identifying next steps to be taken.
The proper preparation of professional digital curators is receiving increased attention. One purpose of the National Research Council’s “Symposium on Digital Curation in the Era of Big Data” (July 2012) was to explore views on educational requirements from a representative
group of practicing experts and stakeholders from government, a range of scientific disciplines, the entertainment and computer industries, and research libraries. As part of that symposium, Elizabeth Liddy, Dean of the School of Information at Syracuse University, an iSchool with a strong track record in education for information professionals in e-science and data science, proposed core competencies for digital curation (Liddy, 2012).
The Library of Congress has also addressed the education of digital curators through its Digital Preservation Outreach and Education program, which aims to establish a national trainer network that will provide instruction for organizations seeking to preserve their digital content. One part of the program has been an effort to determine a baseline curriculum for digital preservation based on six concepts to identify, select, store, protect, manage, and provide.1
Perhaps the most in-depth effort to develop a graduate-level curriculum to prepare students to work in the field of digital curation was undertaken by the School of Information and Library Science at the University of North Carolina at Chapel Hill with funding from the Institute of Museum and Library Services. That effort resulted in the Digital Curation Curriculum (DigCCurr),2 a comprehensive, structured curriculum organized along six dimensions:3
- Mandates, Values, and Principles;
- Functions and Skills;
- Professional, Disciplinary, Institutional, Organizational, or Cultural Context;
- Type of Resource;
- Prerequisite Knowledge; and
- Transition Point in Information Continuum.
The committee reviewed the output of the DigCCurr project, considered the workshops held during the course of this study, and also consulted other independent research. Informed by that material, the committee proposes the following 11 distinct knowledge and skill areas as essential to the education of professionals in the field of digital curation. The descriptions are not meant to be comprehensive, but rather, suggestive of the relevant range of expertise:
i. General background and abilities. An educational background that includes mathematics and science, enhanced by a disciplinary or domain specialization, provides a sound foundation for careers in digital curation. Given the multifaceted nature of digital curation, though, being able to cope with issues such as heterogeneity, complexity, and volume of data and information can be highly useful. Skills typically considered “soft” are likely to be important as well. These include the ability to communicate effectively, to work both independently and collaboratively, to question assumptions and innovate creative solutions, and to negotiate solutions involving competing priorities.
ii. Data practices. Rather than being appended to and distinct from the activities of an ongoing enterprise, digital curation needs to fit as seamlessly as possible within the organizational and institutional context of data production and use. Those providing digital curation services can benefit from understanding this context, including:
- Disciplinary, professional, and institutional practices;
- Research methods, instruments, tools, and protocols;
- Standards of evidence, quality, and uncertainty;
- Data types and formats for both quantitative and qualitative data;
- Data processing, transformation, and documentation processes; and
- Relevant standards for data, data models, metadata schemas, ontologies, and technologies.
iii. Data collection and management. This category involves proficiency in a wide array of diverse activities, starting with gathering and analyzing requirements, identifying and selecting data of interest, and developing effective processes for data acquisition or harvesting. Preparation of data for broader use would follow, including such activities as cleaning, normalizing, reformatting, and, perhaps, anonymizing information or related steps to preserve confidentiality. Additional complex processes are involved to prepare data for ingestion or deposit into a repository, including the generation of metadata aligned with relevant schemas and ontologies, the creation of unique identifiers for managing not only citations, but also versions, components, subsets, and other derivative products. Tracking provenance and documenting the appropriate contextual information to support long-term preservation (e.g., OAIS4) are also essential to curation aimed at reuse of data for new purposes. Support for data annotation and publication may also be required, as well as facilities for building and sustaining linkages to related literature and data and integration of functionality.5 Processes for deselecting, removing, and destroying data that no longer satisfy retention criteria are likely to be a part of this process as well.
iv. Data analytics. Data analytics, or the ability to explore, extract, and validate new relationships or features from a body of quantitative or qualitative data, draws on repository resources and services (e.g., looking for undiscovered relationships among Linked Open Data). Digital curators can benefit from understanding how researchers conduct this work in order to facilitate it. Significant aspects of data analytics include:
- Research design,
- Sampling techniques,
- Hypothesis development and testing,
- Data mining,
- Information extraction,
- Algorithmic thinking and programming, and
- Performance evaluation and risk analysis.
v. Presentation and visualization. While information presentation and visualization are typically viewed as the product, or output, of data analytics, they also have a role in the larger life cycle of digital repository services, to the extent that they can provide accessible insight into the nature of the curated resources. To that end, those providing digital curation services can
4Open Archival Information System, ISO 14721:2012, see http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=57284.
benefit from an understanding of techniques used in presentation and visualization, including information design and contextualization, as well as the evaluation of products, algorithms, and specific programs.
vi. Archiving and preservation. The knowledge a digital curator ought to acquire regarding archiving and preservation overlaps in principle with that of a more traditional archivist, but includes a substantial number of areas of expertise that are not typically included in archival education. Technical areas include information technology skills such as managing a local server, a computer cluster, a cloud computing resource, or any combination of these. They also include a broad understanding of how archiving and preservation requirements are addressed for digital resources, which do not enjoy the permanence of physical artifacts and require, instead, sustained attention. Some of these skills include:
- Ensuring the integrity and security of digital resources;
- Authenticating those who seek access to the digital resources;
- Deploying appropriate approaches for long-term preservation, including media refreshment, emulation, migration, conversion, and canonicalization;
- Employing appropriate preservation models, such as OAIS, LOCKSS,6 or PLANETS;7
- Assessing the trustworthiness of digital repositories using resources such as TRAC;8 and • Understanding the forensic role and responsibilities of digital repositories.
Furthermore, traditional archivists who typically are trained to work with historical sources for use by scholars in humanities and social sciences, genealogists, investigative journalists, and the like, will need deep immersion in the types of resources, methods, and data practices present in a much wider range of disciplines.
vii. Technologies, tools, and infrastructure. Digital curation involves building bridges from the producers of digital resources in many organizational contexts (e.g., in universities, government, industry, and the public) and who work in extraordinarily varied areas (e.g., research, finance, manufacturing, medicine, entertainment, and so on), to an equally varied population of current and future users. Those providing digital curation services confront a daunting array of technology-enabled choices, and need to understand and anticipate the implications of their decisions. A representative set of areas in which such knowledge can be identified includes:
- Data acquisition (e.g., from instrumentation, sensors, lab notebooks, and geographically enabled devices);
- Data modeling;
- Database design, construction, and management;
- Software development environments;
- Network architecture;
- Repository infrastructure;
- Web services;
8Trusted Repository Audit Checklist; see http://www.crl.edu/archiving-preservation/digital-archives/metricsassessing-and-certifying-0.
- Access systems;
- Preservation systems;
- Markup languages;
- System administration;
- Usability testing;
- Technology assessment; and
ix. Values and principles. Digital curation services ought to conform to the values and principles of the respective discipline, as well as those of the organization providing the services. The global expanse and ubiquity of network infrastructure, however, presupposes an underpinning of responsibility for principled activities grounded in fundamental values. The education of those working in digital curation is improved by the careful and deliberate attention to ethical, legal, cultural, and economic considerations that impact the production of knowledge and contribute to the evidentiary record. Digital curation providers will be accountable for the services offered, including conformance to relevant privacy provisions and the legitimate expectations of their users. The regulations, policies, norms, and values surrounding access, privacy, retention, repurposing, and manipulation of digital information are complex, ambiguous, and sometimes contradictory. Professionals engaged in digital curation should be prepared to analyze ethical dilemmas, identify conflicting principles and policies, and make informed recommendations or decisions to resolve such conflicts.
x. Services and support. As is true of many professions, service provision is at the heart of digital curation operations and key to their success, and covers a range of areas of responsibilities, including:
- Liaison and consulting;
- Instruction and training;
- Enhancement, including metadata, annotation, and linking;
- Information resource development;
- Outreach, advocacy, and promotion;
- Current awareness services (push and pull); and
- Support for virtual communities.
xi. Management and administration. Likewise, those working in digital curation should be competent in basic management and administrative processes, including the following:
- Cost-benefit analysis,
- Strategic planning,
- Project management and planning,
- Staff development,
- Grant and report writing,
- Cross-institutional coordination,
- Expectation management and complaint handling.
The program of study in digital curation envisioned above for training curatorial specialists would be neither feasible nor relevant for those aiming to conduct digital curation as but one part of their research or practice in other domains. What would be the best strategy for educating these other students? This is an important concern. For example, as many domain researchers producing digital data now learn about data management in an ad hoc way, they are not attending to preservation aspects of data management and are unaware of existing data services (Jahnke et al., 2012).
Providing students in disciplines generating and using digital content with the necessary digital curation knowledge and skills might be achieved by selectively integrating that content into their existing curricula. This approach, sometimes referred to as “microinsertion,” would consist of introducing small amounts of new material on very specific curation topics into existing course materials, lectures, readings, exercises, and exams.10 Materials for microinsertion might include lecture notes, reviews of the literature, examples of best practice, test questions, practical exercises, videos, or any other form of pedagogical material. Material would be modularized to ease its insertion into existing programs and could be made available via the Internet, with appropriate methods of search, evaluation, and retrieval.
For disciplines that are particularly data intensive, greater weight might be given to the study of digital curation. Beyond additional course material or even entire courses, some fields might conceive subspecialties or programs, perhaps resulting in a certificate of completion rather than a degree. Such programs would yield a level of proficiency in digital curation beyond what might be expected of a typical student in the field, though not at the level of a professional digital curator.
Existing curricula in science, business, medicine, engineering, and other fields are already crowded with required material. Adding further content in digital curation will require carefully developed strategies, demonstration that the value of this additional training exceeds whatever it might displace, and leadership within each domain to advocate for such changes.
10Microinsertion has also been advocated as a suitable way to include material on ethics in programs that have never in the past given sufficient attention to it (Davis, 2006).
Many professional programs in library science, information science, archival science, and information management have responded to the immense growth of digital information and the need for its professional curation by creating programs in digital curation, digital preservation, digital libraries, and management of electronic resources. The most organized and concerted efforts to instill digital curation knowledge and skills are within professional schools of library and information science (LIS) or Information Schools (also known as iSchools). Many of the early efforts, identified in Gold’s (2010) comprehensive time line of data curation initiatives in LIS schools have matured into established programs. A small but growing number of iSchools now offer concentrations or certificate programs in digital curation.
Education initiatives supported by the Institute of Museum and Library Services (IMLS) Laura Bush 21st Century Librarian Program have contributed significantly to building capacity in digital curation education. Below are summaries of some of the notable initiatives sponsored by this IMLS program:
- Beginning in 2006, the University of North Carolina at Chapel Hill (UNC) and the University of Illinois at Urbana-Champaign received multiple awards to build capacity in digital curation education. UNC DigCCurr initiatives in digital curation, discussed in Section 4.1, culminated in a new post-master’s certificate in 2013. The University of Illinois’ specialization in data curation began in 2007 and was extended from the sciences to the humanities in 2008. A second specialization in sociotechnical data analytics was launched in 2012.
- The University of Arizona received two awards to advance digital collections and curation education efforts in 2006 and 2009.
- In 2008, the University of Michigan’s educational offerings were advanced by a collaboration to support internships and curriculum development in digital curation and preservation.
- Syracuse University started a program related to digital curation by developing training for e-science librarians in 2009, and introduced a certificate of advanced study in its data science program in 2012.
- The University of Tennessee received awards for training Ph.D.-level educators in science data and information in 2009, and for master’s students in scientific data curation in 2011.
- In 2011, the University of North Texas began development of a post-master’s graduate academic certificate in digital curation and data management.
- One of the newest efforts is the National Digital Stewardship Residency, a collaboration between IMLS and the Library of Congress Office of Strategic Initiatives to “build a dedicated community of professionals who will advance our nation’s capabilities in managing, preserving, and making accessible the digital record of human achievement.”11
The National Science Foundation (NSF) has also supported projects related to curation for many years and provided part of the early foundation for some of the efforts listed above. Of
particular note among NSF-supported initiatives is the ongoing Open Data Integrative Graduate Education and Research Traineeship (IGERT) at the University of Michigan.12 The goals of the IGERT initiative are to promote conduct of responsible data-intensive science and engineering and to build a community of practice around open sharing and reuse of data in bioinformatics and materials science and engineering. The NSF has also funded other digital curation education initiatives. The new data science certificate program at Syracuse University was seeded by an NSF award for training cyberinfrastructure facilitators in 2008. The first step in the curriculum for the Specialization in Data Curation at Illinois was developed with an NSF award in 2006 for a Scientific Information Specialist program that evolved into a master’s degree option in the campus bioinformatics program.
Despite these various initiatives, education options for digital information professionals remain limited even as demand for digital curation competencies is increasing in the job market. An analysis of LIS-oriented data curation programs identified only 16 institutions offering data curation courses (Harris-Pierce and Liu, 2012). The Data Curation Curriculum database,13 which gathers information on programs and course descriptions related to data curation at LIS and iSchools, also suggests that educational opportunities are inadequate to the needs of training a skilled digital curation workforce (see Varvel et al., 2012, for details on objectives, methods, and limitations of the database14). In reviewing available information from the database on 475 courses in 158 separate programs at 53 institutions, the educational options appeared to be uneven, with limited opportunities for intensive digital curation preparation.
The limitations revealed by the Data Curation Curriculum database fall into several categories. A review of the course descriptions found in the database suggests that significant skill sets needed for digital curation, in areas such as metadata and digital preservation, are being covered in varying degrees. For example, of the approximately 60 courses covering metadata, about two-thirds explicitly considered digital information, with about one-third of those addressing digital research data. Only a few course descriptions specified coverage of scientific metadata standards, metadata for data management applications, or more general information modeling for digital content and data. Courses addressing preservation were even less aligned with a digital curation focus. Of the 62 course descriptions covering preservation, only 19 specified digital preservation as the primary subject of the course or a topic within the course. Project management course descriptions generally included planning and management for digital libraries or digital preservation, but only 14 such courses were identified across all schools.
Course descriptions contained in the Data Curation Curriculum database suggest that instruction in technology, statistics, and computer programming directly relevant to digital curation is also limited.15 Other areas of importance to the field of digital curation but which were addressed by only a moderate number of courses include selection and appraisal, access
15Course descriptions in the Data Curation Curriculum database reveal that among relevant technology courses, only 8 courses appeared to be exclusively devoted to database administration and design, and 13 systems analysis courses were identified. Statistics was explicitly covered in 16 courses, including topics on data mining, data analytics, descriptive and inferential statistics, and probability. Computer programming options were also limited. Typically in LIS programs, aspects of programming are embedded in a number of different kinds of courses, such as information processing and data mining. Students at most schools would have access to programming courses in computer science departments, but these classes would be unlikely to have an orientation to curation concerns, as in LIS courses that relate programming specifically to information resources and services.
and use, legal issues, and research methods. Further, the Data Curation Curriculum database furnishes no evidence of adequate adaptation of other relevant areas traditionally covered within LIS programs, such as information behavior, information retrieval, collection development, and information policy. Other more specific topics important for digital curation were rare and scattered within different kinds of courses.16
Overall, the educational opportunities for individuals wishing to pursue professional training as digital curators have grown. New programs, many supported by IMLS and NSF, have been established in recent years. Capacity in digital curation education is being built. Nonetheless, the more traditional training programs are only beginning to adapt their course offerings to the needs of digital curation professionals. Although many of the principles and skills covered in conventional degree programs are integral to digital curation education, courses continue to be too general in nature, with inadequate attention given to the specific knowledge and skills needed for curation of digital information.
Students pursuing courses of study in a wide range of disciplines increasingly need, and are being provided with, exposure to the knowledge and skills necessary for digital curation. In some cases this involves the microinsertion of additional course content, but it may include entire course modules or certificate programs that convey a more substantial grounding. At whatever level, students incorporate this work into their primary fields of study. In some instances, attention to both digital curation and a scientific discipline become entirely entwined. This is the case in several newly emerging hybrid fields.
Materials suitable for microinsertion are being developed in a number of disciplines. The Digital Library for Earth Science Education17 was organized over a decade ago precisely to develop such materials. Often, minicourses in digital curation are created by academic libraries. These provide instruction in data management skills for domain students and researchers, as part of an acceleration of campus-based services for research data. Students are the primary audience at a number of institutions, with programs targeting undergraduate and graduate students in medical and health programs (Piorun et al., 2012) and engineering and science graduate students for instruction in “data information literacy” (Carlson et al., 2011).
One promising model for transitioning domain experts into curation work has been developed by the Council on Library and Information Resources/Digital Library Federation as an extension of their Postdoctoral Fellowship Program. Recent Ph.D.s from any natural or social science discipline are eligible for placements at host institutions in units engaged in curation operations and research. To date, most positions have been in research libraries or situated within collaborations among libraries and other university units, with a few opportunities in data centers.
Minicourses, microinsertion of material, and postdocs provide some training in digital curation to those studying other fields. Fuller attention to digital curation may be accomplished through more extensive programs offered within academic departments. Programs in Geographic Information Science (GIScience) provide an example. Such programs now exist in a large
16For example, only one course explicitly included data harvesting and aggregation. Two courses covered data quality, and isolated courses were identified on data manipulation and exploratory data analysis. The topic of data transformation was covered in two courses on “Organization of Information in Collections” and “Digital Library Implementation.”
number of institutions of higher education in the United States. Their purpose is to impart the skills and principles needed to work intensively with geographic (or geospatial) information, using technologies such as geographic information systems (GIS), satellite remote sensing, and geographically distributed sensor networks. Programs in GIScience have long included material related to the curation of geospatial data. It is common, for example, to find data sharing and metadata, the management of distributed databases, archiving, and online and cloud-based GIS, all discussed within the context of geospatial data. GIScience programs can be found within departments of geography, earth science, civil engineering, urban planning, and several other more traditional disciplines.
Increasingly, subspecialties addressing aspects of digital curation are evolving into certificate programs, either freestanding or housed within other academic departments. The committee reviewed 37 data-oriented programs compiled primarily from informal inventories (Fox, 2012; Varvel et al., 2011). These included programs embedded within computer science, informatics, business, and the sciences. The programs exist at the undergraduate, master’s, and doctoral levels, with specializations including data analytics, business analytics, predictive analytics, data science, web science, and information technology and systems. Notable examples include:
- Certificate in Data Science at the University of Washington eScience Institute,
- Informatics degrees at Indiana University, and
- Information Technology and Web Science program at Rensselaer Polytechnic Institute.
Many data science programs emphasize statistics and analytics, with strong coverage of programming, databases, and data mining, with some attention devoted to areas such as system design and data management. Other prominent areas include information visualization and research methods. There appears to be limited coverage of storage, data processing and transformation, data collection, and ethics, all of which are important elements of competency in digital curation. Surprisingly, standards appeared to be strongly emphasized in only one program.
In some fields, students have the opportunity to integrate study in their chosen discipline with training in aspects of informatics. This is particularly the case at the graduate level, with many of the digital curation skills discussed in this report covered in graduate programs, as educational opportunities evolve in tandem with shifts toward data-intensive research and informatics. Research in certain fields, such as molecular biology, biodiversity, ecosystem studies, and climate science has become very data intensive, leading to hybrid specialties such as ecoinformatics, bioinformatics, biodiversity informatics, and climate informatics.
The emergence of the field of bioinformatics is a response to the need to manage the burgeoning data resources available in biology in order to take advantage of new analytical methods and techniques. Bioinformatics has grown rapidly as a career path, as evidenced by associated training programs. Currently, approximately 25 U.S. universities and colleges offer undergraduate programs in bioinformatics, most commonly as an area of concentration related to computer science, bioengineering, life sciences, or another field. At the graduate level, 19 universities offer doctoral programs and 32 offer master’s programs with bioinformatics in the title. These include programs related to medical informatics, computational biology, and genetics. The content of these programs, displaying an integration of informatics and biology, is typified by the program of study required for a master’s degree at The Johns Hopkins University:
- Core courses in molecular biology, genetics, algorithms, database concepts, and biological databases;
- Concentrations in informatics tools, protein structure data and proteomics, genome sequence data, microarray data, and semantic web studies; and
- Computer science courses including software engineering, XML (extensible markup language) design, data visualization, machine learning, distributed systems, and cloud computing.
Biodiversity informatics is another example of a hybrid field that has fully integrated the study of digital curation at all levels. For undergraduate students, the NSF has supported a Research Coordination Network in Undergraduate Biology Education called Advancing the Integration of Museums into Undergraduate Programs (AIM-UP!) to introduce museum-based informatics and data curation into the curriculum. Digital curation skills are increasingly taught as part of doctoral training in biodiversity research fields such as taxonomy and systematic biology. Short courses for established biodiversity researchers have also begun to appear over the past decade in some conferences in this and other fields.
Exposure to and practical experience with current digital curation practices can be valuable complements to formal programs of study. Unfortunately, practitioners often lack support or reward structures for participating in teaching and training. A few programs have developed promising strategies for providing students with internship opportunities (e.g., Kim et al., 2011). Other models for student field experiences include partnerships between academic programs and established national data centers to build on the advanced expertise developed within these institutions over decades (e.g., Kelly et al., 2013). Currently, such opportunities are available to a relatively small number of students.
Attention is often accorded to the development of courses and curricula for students. What of the continued education and training of midcareer employees? These individuals also need opportunities to deepen or upgrade their skills and knowledge in digital curation.
A large portion of the current digital curation workforce consists of midcareer professionals, some of whom received their education when most information was still in analog form. What is certain is that data and computing were accelerating at a much slower pace, and print journals and books were considered the primary mechanisms for scholarly publication. Information professionals find that their training is rapidly dated, due to the rate of change in all aspects of information technology. In the scientific disciplines, many researchers with responsibilities for digital curation developed those skills without formal training, often following completion of graduate degrees.
Midcareer employees, practitioners, and researchers require opportunities to further their training in ways more flexible than traditional academic coursework and formal certification. Continuing education options for working professionals have been developed by a variety of institutions in various formats. These nondegree options include 1-day workshops, short courses, conferences, and longer institutionally based programs developed by iSchools and professional associations.
Two iSchools established regular institutes beginning in the mid-2000s, with DigCCurr at the University of North Carolina focusing on digital curation and the Summer Institute for Data
Curation at the University of Illinois focusing on curation of research data. These early efforts emphasized the full curation life cycle and best practices and tools for digital curation. More recently, the E-Science Institute, developed by the Association of Research Libraries, the Digital Library Federation, and DuraSpace, offered a series of learning modules to assist academic libraries in advancing an agenda for e-research support, with a particular focus on the sciences, cyberinfrastructure, and data curation. Another active technical community working in libraries, museums, and archives developed the CURATEcamp “unconference” series in 2010.
Efforts in the specific area of digital preservation have been sustained for over a decade. One initiative in this area is the Digital Preservation Management Workshops series.18 This curriculum has three themes—organizational infrastructure, technological infrastructure, and requisite resources—for a target audience of managers of digital preservation programs in cultural institutions. The Library of Congress has also developed the Digital Preservation Outreach and Education program, as noted earlier in this report.
A number of professional organizations in the sciences are also active in fostering opportunities for continuing education for scientists in their disciplines. The Federation of Earth Science Information Partners (ESIP) and the American Geophysical Union (AGU) offer workshops and short courses on best practices in data management. The NSF-funded DataONE initiative has developed curriculum modules in data management. Emergent communities of practice that develop and share new knowledge also provide a kind of informal education through participation in their initiatives and the resources they generate. The Biodiversity Informatics Standards (TDWG) association (formerly, the Taxonomic Database Working Group), discussed in Chapter 2, is an example.
Employers are also concerned with upgrading the skill sets of their employees. Many seek to furnish in-house opportunities for professional development for their staff. Several programs are available for employers to use. One model program, developed by the Bibliothèque Nationale de France (Bermès and Fauduet, 2011), covers four areas: digital information, data models, project management, and long-term preservation. Topics such as formats, models, and standards for digital objects are highlighted, with an emphasis on areas that need to adapt to change, such as management, workflows, and proprietary rights.
Online continuing professional development is another attractive option for working professionals, and can range from webinars to executive degree programs. Among the more visible trends in higher education in recent years has been the introduction of Massive Open Online Courses (MOOCs). These are online courses offered free or at low cost by reputable institutions, and carry no course or degree credit. MOOCs may provide a suitable way to promote education in digital curation, particularly to small institutions with limited resources for professional development.
This chapter has provided the committee’s vision of what will be necessary for the education of a workforce for digital curation, addressing the continuum from dedicated curatorial specialists to researchers and practitioners undertaking some curatorial activities. It has also surveyed current opportunities available to those pursuing education and training in digital
18Partially funded by grants from the National Endowment for the Humanities, the series was begun at Cornell University in 2003, moved to the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan in 2008, and is now hosted by the Massachusetts Institute of Technology.
curation along many different paths. Those opportunities will continue to evolve in response to the demand for training in digital curation.
Expertise requirements will clearly vary depending on application area, scale of operation, and work environment, with positions ranging from curation-centric roles in multidisciplinary repositories, archives, libraries, and data centers, to highly specialized discipline-, industry-, or application-specific roles (Hedstrom, 2012). Curation-centric positions will rely on the broad knowledge and skills underlying digital curation processes and technologies, with an emphasis on interoperability and the differing data requirements across disciplines. Specialized positions will demand more domain-specific knowledge and related informatics and computational expertise.
Of course, the full complement of knowledge and skills will rarely be found in one individual. An effective digital curation strategy is thus likely to engage the coordinated efforts of a team, leveraging the strengths of each member and balancing the best practices of digital curation with the intellectual traditions and requirements of the discipline. This will help ensure the appropriate accessibility and usability of the digital resources, for new communities of users in the future.
This sketch of the future is reasonable, yet not certain or complete. Different institutions, disciplines, domains, and sectors of the economy display very different levels of awareness of the need for and value of digital curation. Furthermore, the pace at which they will recognize the benefits of digital curation, adopt standards and best practices, and invest in both automated solutions and human personnel remains to be seen. Pressure for greater access to digital information in scientific research, education, industry, government, and cultural heritage suggests that there is an immediate need to build capacity in the nation’s workforce to meet digital curation demands. This will not happen without resources and leadership.
During the next decade, there will be a particular need for leaders in a wide variety of organizations who can develop digital curation policies, programs, and technologies that reinforce each other and facilitate curation throughout the information life cycle. Those leaders will need to rely on professional curators who will create standards and best practices, monitor developments in the field, solve problems that result from new technologies or disruptive technological change, and train others in digital curation.
The importance of digital curation has long been recognized by some. As we noted already in Chapter 2, a substantial body of reports and studies have focused on digital curation (e.g., Lord and Macdonald, 2003; Swan and Brown, 2008; Interagency Working Group on Digital Data, 2009; National Science Board, 2005; Blue Ribbon Task Force on Sustainable Digital Preservation and Access, 2010; Auckland, 2012; Lyon, 2012). As the accumulation of digital information continues to increase exponentially, becoming both ubiquitous throughout society and critical to its functioning, the necessity for rigorous curatorial processes will become profoundly apparent. It is important for those in a position to influence this eventuality to lead the coordinated effort at resource mobilization, technological innovation, and education to prepare the workforce for effective, scalable, and affordable digital curation. Future efforts will build on the well-established foundations and service orientations of research libraries and archives, data centers, government agencies, research communities and commercial service providers.
As presented above, early initiatives sponsored by the federal government and conducted by LIS programs and iSchools produced courses, concentrations within degree programs, and continuing education for practitioners. More recently, awareness of the need for curation services for digital research data has increased due to further federal actions, including the introduction of guidelines from funding agencies requiring research proposals to include data management plans, and the February 2013 memorandum issued by the Executive Office of the President, Office of Science and Technology Policy (OSTP)—Increasing Access to the Results of Federally Funded Scientific Research (Holdren, 2013), which covers both peer-reviewed publications and digital data. Progress in digital curation has benefited from the very substantial attention of NSF (from the disciplinary perspective of the sciences) and IMLS (from the professional perspective of library and information science). Sustaining this progress during a period of rapid change and growth is essential.
Beyond NSF and IMLS, many other federal agencies and departments have a fundamental responsibility to carry out well-informed digital curation, and therefore have a stake in the development of a strong, expert workforce in digital curation. They include the Library of Congress, the National Archives and Records Administration, the National Institutes of Health, the Department of Defense, the National Aeronautics and Space Administration, the Department of Energy, and the Census Bureau, to name only a few. Indeed, all government agencies can be encouraged to participate in the advancement of digital curation from their mission-oriented perspectives and to integrate the evolving state of the art into institutional policies and practices. The OSTP may be able to provide coordinating leadership across the federal government.
The private sector also has much to gain from effective and consistent digital curation processes, policies, and procedures, and therefore in the preparation of the digital curation workforce. Many businesses have major investments in digital information assets; some industries are entirely dependent on them. Business schools, which increasingly include data analytics and data mining in their curricula, also have a responsibility to educate business leaders about the value and need for digital curation. Moreover, an inevitable result of advances in digital curation will include proposed modifications to existing standards and proposals for new standards. Expertise in digital curation within the relevant standards bodies and professional associations will help ensure that the interests of all parties, private and public, are addressed.
Educating and training a workforce for digital curation will take resources and leadership. A sustained and targeted campaign could engage the full spectrum of relevant stakeholder communities. Possible sponsors of such a campaign would have to truly understand the extent of the challenge to speak with authority and conviction. The national libraries (particularly the Library of Congress and the National Library of Medicine) are natural candidates. They have the reputation and a record of achievement in related areas. The momentum of the campaign could be increased by enlisting other organizations. For example, relevant professional associations (e.g., Association of Research Libraries,19 American Library Association,20 Special Libraries
Association,21 Coalition for Networked Information,22 Association for Information Science and Technology,23 American Association for the Advancement of Science,24 and EDUCAUSE25) can make such outreach part of their respective missions. The iSchools can build awareness among faculty, students, and alumni working in the field. The campaign would benefit from a bold and memorable tagline, perhaps something along the lines of “Dire Digital Stakes: Past, Present, and Future at Risk.”
In addition to a bold campaign, incremental steps will also contribute to progress toward a well-prepared workforce in digital curation. One such step is the recruitment of more students with backgrounds in the sciences into graduate programs for digital curation specialists. For the most part, students currently applying to graduate programs in library and information science earned their undergraduate degrees in the humanities and social sciences. A survey of 3,507 recent graduates of 39 LIS programs in North America (2000-2009) indicates that only 25 percent of the students had Bachelor of Science degrees (Marshall, 2012). An earlier survey of LIS graduates in North Carolina indicated that 242 of 2,633 students had science backgrounds, with 80 in applied science, about 60 each in mathematics and life sciences, and about 20 each in earth and physical sciences.
The rapid transition to data-driven science and business analytics would benefit from a curatorial workforce that is knowledgeable and proficient in the sciences as well as in digital curation. Educators in existing data curation and data science programs have reported difficulties recruiting students, especially those with a background in the domain sciences. Attracting students with science backgrounds into iSchools will likely require broader awareness of the intrinsic challenges and rewards of digital curation, its essential role in advancing scientific research, and the career options in the field (Weber et al., 2012).
To recruit and retain a quality workforce, a career path needs to be attractive and visible (Gregg, 2012). Career advancement might involve a path that crosses a range of professional contexts, including “small data, big data, individuals, small research teams, large corporate endeavors” and across organizations (Thomas, 2012). Professional opportunities will increase if skilled workers are able to move both within and across sectors.
The growth of communities of practice in the field of digital curation will also be essential for sustaining career opportunities and development. A professional community is forming, fostered in part by conferences and workshops such as the International Digital Curation Conference, the Committee on Data for Science and Technology—World Data System conferences, the Research Data Alliance Plenary meetings, the ASIS&T Research Data Access and Preservation Summit, and iPRES. Recurring institute training programs and sessions devoted to data science and informatics at many disciplinary conferences also help to build the professional community. A coordinating body would be useful to guard against fragmentation of this community, to support cross-fertilization across the growing education initiatives, and to ensure that education programs address emerging needs in research and practice.26
26An excellent model for a discipline-specific coordinating body for an increasingly active and distributed digital curation community is the International Society for Biocuration, a not-for-profit organization for biocurators, developers, and researchers that promotes the biocuration field and serves as a forum for the exchange of information (Howe, et. al. 2008).
The digital curation workforce also includes a role for citizen science. The increase of citizen science has contributed substantially to various types of research, from collecting data (e.g., the Audubon Bird Count)27 to annotating them (cf. Flickr)28 and processing them (cf. SETI@home).29 To involve the public, the challenge is to identify tractable opportunities in digital curation and to make them accessible to the relevant communities of interest on a distributed basis. Public libraries can fulfill an important role in outreach, education, and fostering contributions from the public, such as metadata or tagging by amateur scientists, history enthusiasts, and others who have curatorial expertise.
One further step to be taken is the articulation of a research agenda. Digital curation continues to advance. It does so within a continually changing context. Thus the preparation of its workforce will also be in flux. Research is needed to investigate what challenges exist, how best to meet them, and how to translate those strategies into standard practice. Input to that research agenda should come from the full continuum of those engaged in digital curation activities, from curatorial specialists through to domain experts and industry practitioners.
Conclusion 4.1: Although the number and breadth of educational opportunities supporting digital curation have grown, existing capacity is low, especially for the initial education of professional digital curators and the midcareer training of professionals in other fields. In particular:
- Graduate and postgraduate certificate programs for educating professional digital curators (e.g., in LIS schools and iSchools) are expanding, but workforce demand is projected to exceed the output of existing programs.
- Midcareer practitioners with little or no formal education in digital curation rely on a spectrum of types of training, including online and in-person, experimental and time-tested, and just-in-time training, but this too is not sufficiently developed.
Conclusion 4.2: The knowledge and skills required of those engaged in digital curation are dynamic and highly interdisciplinary. They include an integrated understanding of computing and information science, librarianship, archival practice, and the disciplines and domains generating and using data. Additional knowledge and skills for effective digital curation are emerging in response to data-driven scholarship. More specifically:
- Individuals with an undergraduate degree in science, technology, engineering, or mathematics (STEM) disciplines and graduate-level education in digital curation are—and will continue to be—in particular demand as digital curators.
- Discipline specialists with informatics and digital curation expertise are, and will continue to be, in demand to provide discipline-focused curation services.
- Although the multidisciplinary character of digital curation as a career currently suggests a graduate education level, some knowledge and skills may be acquired through 2-year associate or 4-year bachelor’s degrees.
- Continuing professional education alternatives will need to be flexible and diverse, providing a range of introductory and more specialized options through several modes of delivery, such as workshops, tutorials, online course modules, and webinars.
Conclusion 4.3: The range of needs and opportunities in digital curation, particularly when reflected in Office of Personnel Management position descriptions and Bureau of Labor Statistics descriptions of occupations, will require building and advancing a diverse community supported by a core of professionals and practitioners.
Recommendation 4.1: OSTP should convene relevant federal organizations, professional associations, and private foundations to encourage the development of model curricula, training programs and instructional materials, and career paths that advance digital curation as a recognized academic and professional discipline.
Recommendation 4.2: Educators in institutions offering professional education in digital curation should create cross-domain partnerships with educators, scholars, and practitioners in data-intensive disciplines and established data centers. The goals of these partnerships would be to accelerate the definition of best practices and guiding principles as they evolve and mature, to help ensure that educational and training opportunities meet the needs of scientists in specific disciplines, analysts in different business sectors, and members of other communities utilizing digital curation systems and services.
Recommendation 4.3: Federal agencies, private foundations, and industrial research organizations should foster research on digital curation that makes fundamental progress on problems with practical applications in their respective domains. Initial activities should focus on establishing research priorities and baseline analyses, including engagement and outreach through
- Conferences and symposia designed to recognize and communicate the need for, benefits of, and successes in digital curation; and
- Workshops for researchers in the public and private sectors to develop coordinated research agendas focused on enhancing the value and utility of digital resources, including metadata, interoperability, and automation.
The resulting agendas for research in digital curation should be tightly coupled with the curricula and offerings of educational programs to shape the field during a time of dynamic and dramatic growth and change.
Auckland, M. 2012. Re-skilling for Research: An Investigation into the Role and Skills of Subject and Liaison Librarians Required to Effectively Support the Evolving Information Needs of Researchers. Research Libraries UK. http://www.rluk.ac.uk/files/RLUK%20Re-skilling.pdf.
Bermès, E., and L. Fauduet. 2011. The human face of digital preservation: Organizational and staff challenges, and initiatives at the Bibliothèque Nationale de France. International Journal of Digital Curation 6(1):226-237.
Blue Ribbon Task Force on Sustainable Digital Preservation and Access. 2010. Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information. http://brtf.sdsc.edu/.
Carlson, J. R., M. Fosmire, C. Miller, and M. Sapp Nelson. 2011. Determining data information literacy needs: A study of students and research faculty. Portal: Libraries and the Academy 11(2):629-657.
Davis, M. 2006. Integrating ethics into technical courses: Micro-insertion. Science and Engineering Ethics 12(4):717-730.
Fox, G. 2012. Data analytics and its curricula. Presented at Microsoft eScience Workshop, Chicago, October 9.
Gold, A. 2010. Data curation and libraries: Short-term developments, long-term prospects. Office of the Dean (Library) 27. http://works.bepress.com/agold01/9.
Gregg, M. 2012. Workforce demand and career opportunities: Scientific data centers. Presented to the Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements, National Research Council, Washington, DC, July 19.
Harris-Pierce, R. L., and Y. Q. Liu. 2012. Is data curation education at library and information science schools in North America adequate? New Library World 113(11 and 12):598-613.
Hedstrom, M. 2012. Digital data curation—workforce demand and educational needs for digital data curators. In Trusted Digital Repositories & Trusted Professionals International Conference Proceedings, Florence, Italy.
Holdren, J. P. 2013. Increasing Access to the Results of Federally Funded Scientific Research. Memorandum for the Heads of Executive Departments and Agencies. Office of Science and Technology Policy. http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf.
Howe, D., M. Costanzo, P. Fey, T. Gojobori, L. Hannick, W. Hide, D. P. Hill, R. Kania, M. Schaeffer, S. St Pierre, S. Twigger, O. White, and S. Y. Rhee. 2008. Big data: The future of biocuration. Nature 455(7209):47-50.
Jahnke, L., A. Asher, and S. D. C. Keralis. 2012. The Problem of Data. CLIR Publication 154. Council on Library and Information Resources, Washington, DC. http://www.clir.org/pubs/reports/pub154.
Kelly, K., C. L. Palmer, V. E. Varvel, Jr., S. Allard, C. Tenopir, M. S. Mayernik, and M. Marlino. 2013. Model development for scientific data curation education. In Proceedings of the 8th International Digital Curation Conference, Amsterdam.
Kim, Y., B. K. Addom, and J. M. Stanton. 2011. Education for eScience professionals: Integrating data curation and cyberinfrastructure. International Journal of Digital Curation 6(1):125-138.
Liddy, E. 2012. Digital curation as a core competency. Presented to the Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements, Board on Research Data and Information, National Research Council, Washington, DC, July 19.
Lord, P., and A. Macdonald. 2003. e-Science Curation Report: Data Curation for e-Science in the UK: An Audit to Establish Requirements for Future Curation and Provision. Digital Archiving Consultancy Limited. http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf.
Lyon, L. 2012. The informatics transform: Re-engineering libraries for the data decade. International Journal of Digital Curation 7(1):126-138. http://ijdc.net/index.php/ijdc/article/view/210/279.
Marshall, J. G. 2012. WILIS 2, Ver. 9. University of North Carolina Odum Institute for Research in Social Science.
National Science Board. 2005. Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Washington, DC: National Science Foundation. http://www.nsf.gov/pubs/2005/nsb0540/.
Piorun, M. E., D. Kafel, T. Leger-Hornby, S. Najafi, E. R. Martin, P. Colombo, and N. R. LaPelle. 2012. Teaching research data management: An undergraduate/graduate curriculum. Journal of eScience Librarianship 1(1):8.
Rappa, M. 2012. Education for data scientists. Presented to the Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements, National Research Council, Washington, DC, July 19.
Raskino, M. 2013. 5 Facts About Chief Data Officers. http://blogs.gartner.com/mark_raskino/2013/11/06/5-facts-about-chief-data-officers/Swan, A., and S. Brown. 2008. Skills, Role & Career Structure of Data Scientists and Curators: An Assessment of Current Practice and Future Needs. Bristol, UK: JISC http://www.jisc.ac.uk/publications/reports/2008/dataskillscareersfinalreport.aspx.
Thomas, C. 2012. Views of the sponsors: Institute of Museum and Library Services Scientific Data Centers. Presented to the Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements, National Research Council, Washington, DC, July 19.
Varvel, V. E. Jr., C. L. Palmer, T. C. Chao, and S. Sacchi. 2011. Report from the Research Data Workforce Summit, December 6, 2010, Chicago, IL. https://www.ideals.illinois.edu/bitstream/handle/2142/25830/RDWS_Report_Final.pdf.
Varvel, V. E. Jr., E. J. Bammerlin, and C. L. Palmer. 2012. Education for data professionals: A study of current courses and programs. Pp. 527-529 in Proceedings of the 2012 iConference. New York: ACM.
Weber, N. M., C L. Palmer, and T C. Chao. 2012. Current trends and future directions in data curation research and education. Journal of Web Librarianship 6(4):305-320.