Data Stewardship and Accessibility in the Social Sciences
The Inter-University Consortium for Political and Social Research (ICPSR) is an interdisciplinary institution established in 1962 to provide data stewardship and access for a wide range of datasets from the social sciences. Part of a global network of social science data archives, ICPSR is the world’s largest archive of digital social science data and is hosted by the University of Michigan.a It is supported by dues from more than 600 member institutions, plus support from government agencies and other research sponsors.
ICPSR, which currently houses 7,500 studies and 500,000 data files, has recommended guidelines, but not requirements, for submission of data. As part of its mission, ICPSR proactively seeks out data at risk of being lost. It also emphasizes the importance of preparing good documentation, or metadata, which are critical to data interpretation and to successful data sharing and preservation. These metadata include project summaries, descriptions of data collection instruments, summary statistics, database dictionaries, and bibliographies. As technology progresses, ICPSR migrates data to new storage media and maintains sets of redundant copies in various locations.
Ownership and access to data in the social sciences is determined by funding, with contract-funded data belonging to the sponsor and grant-funded data belonging to the grantee (typically a university). ICPSR does not acquire copyright to databases but instead requests permission to redistribute. Barriers to data access and sharing in the social sciences include generally weak federal requirements to archive and provide access to research data and the heterogeneity of expectations across fields (with economics, demography, sociology, and criminology having a stronger tradition of data sharing than anthropology and epidemiology).
In a recent ICPSR study on data-sharing and archiving practices, researchers surveyed principal investigators from NIH- and NSF-funded projects and asked whether their projects had produced data and, if so, whether the data had been archived (see Figure 4-2). Of the 1,599 responses received as of late 2008, 327 studies had been archived, 876 studies were still in the hands of researchers, and 396 studies had been “lost.”
making data available on a long-term basis diffuses more widely and becomes easier to use.
Institutional and disciplinary digital data repositories have been growing steadily. The emergence of open access software tools for building repositories (such as DSpace, EPrints, and Fedora), external repository hosting services,