Data Sharing Within Economics
Economists rely on an enormous variety of research data—for instance, administrative data from government records, datasets provided by companies to the federal government, or data provided directly to researchers by companies. Some economists rely on methods similar to those used by anthropologists, in which large quantities of data are collected and analyzed. Often the datasets are subject to confidentiality agreements because individuals could be identified from the data. Use of the data may even be restricted to “enclaves,” where a researcher has to work on a nonnetworked computer in a secure room from which materials cannot be removed.
Analysis of economic data may depend critically on highly complex computer programs. These programs, rather than the actual data, can be the most valuable part of an economist’s research, because many datasets are available publicly, whereas a computer program could embody months or years of individual effort. Thus, to assess the original analysis, other researchers often need access to the computer programs as well as to the original data.
As in other sciences, the social sciences have an expectation of reproducibility—that if the data are available and analyzed with the same assumptions, the same results will emerge. But without considerable assistance from the original researchers, actual replication of published results in economics can be time-consuming, tedious, and subject to many errors. Furthermore, journals are reluctant to publish studies that are confirmatory rather than groundbreaking. Social scientists, like other scientists, are more interested in doing their own studies and getting credit for something new than in repeating work that has already been done.
Even if replication is not common, the data should be available to enable replication, but in economics this often is not the case.a Several years ago two economists wrote to the authors of every paper in the March 2004 issue of the American Economic Review, a leading journal in the field, and requested the data to replicate the research. Although the journal has a statement saying “Authors are required to maintain their data and supply it to other researchers upon request,” 14 of the 15 sets of authors to whom the economists wrote said that they did not have the data or would not share them. The authors summarized their findings in an article and submitted it to the American Economic Review, which published their paper.
As a result of this and other cases, the American Economic Review adopted a new policy. For published articles, the authors must provide both the data and the programs sufficient for the articles’ findings to be replicated. These data and programs are then posted on the journal’s Web site. If the use of the data is restricted, the authors must provide instructions on how to obtain permission to use the data. If some of the data are proprietary, the editors try to work out ways for other researchers to use the data. In addition, the journal is encouraging studies to reanalyze data and replicate results.
The American Economic Review is supported by dues from 20,000 members and has the resources to institute such a policy, whereas journals with fewer resources could have difficulty adopting and enforcing the same or similar policies. Also, the data and programs are not requested at the time of submission of an article—only upon acceptance—so that the 92 percent of the papers submitted to the journal that are rejected do not fall under the new guidelines. Some economists have decided not to submit a paper to the American Economic Review because they do not want to release their data or software. Nevertheless, because authors want to publish their papers in the journal, it has considerable influence over their actions.