Sharing Research Data 15 cate universally strong interest in wider data sharing and strong convictions that, if data sharing were properly developed, substantial benefits would ensue. The benefits could change the environment in which researchers work. (Expected benefits are discussed further in the papers in Part II.) Some investigators regard their work as definitive. Results are sometimes made to sound more sweeping than is justified. Trial analyses that do not look good may not be reported. Possible weaknesses in data and methods may be ignored, if they are not generally known, and otherwise may be tre- ated as peripheral. The possibility that other researchers will subsequently find ways to collect more informative data and perform more incisive analyses is not contemplated. Investigators may defend and amplify what they regard as theirs, sometimes to the point of misrepresentation. Few areas of research achieve such definitive results that improvements are not possible. Breakthroughs occur, but they are usually not fully understood or developed for some time. Meanwhile, less spectacular but still vital accretions of knowledge proceed. Data sharing would surely help some people overcome narrow views and pretentious habits. An improved spirit of research would benefit the products. COSTS OF DATA SHARING Data sharing involves costs as well as benefits. lithe costs may at times outweigh the benefits. And those who pay the costs often do not share in the benefits. Most of the difficulties of data sharing could be overcome if the scientific community and funding agencies were to commit substantial resources to data sharing and if scientific recognition were given to researchers who shared their data. But the scientific community, funding agencies, and especially in- dividual researchers have a good many otherand often higher- priorities. An appreciation of the obstacles to and costs of data sharing may suggest some remedies as well as help in constructing some reasonable and workable principles for data sharing. This section summarizes some of the obstacles and costs. Technical Obstacles Technical obstacles to sharing computer-readable data include incompatibili- ties in machine and software systems and data file structures. In early com- puter technology, technical factors sometimes consitituted nearly insurrnount- able barriers to transfemng data from one computer to another. Now, however, difficulties encountered in transferring data are largely due to the practices of data collectors and processors rather than to technical factors.
16 Committee on National Statistics Data collectors should, therefore, anticipate Mat data may be shared and make necessary plans. Although the technical requirements and characteristics of computer programs and systems for data management and analysis do not pre- vent data shanog, they may complicate it. For example, data organized for analysis using the Statistical Package for the Social Sciences (SPSS) cannot be analyzed using the Statishca1 Analysis System (SAS) without reformatting and reorganizing the data. Most data-base dictionaries in use in the social sciences are tied specifically to certain software packages such as SPSS, OSIRIS (organized set of integrated routines for We investigation of social science data), or SAS; Weir conversion for use by over packages usually is not straightforward. Thus, researchers attempting to use data prepared by others often must forgo direct use of ~nfonnation contained in Me "foreign" data-base dictionary. Researchers can facilitate data sharing by assimilating data in machine- and program-compatible formats. Documentation Typically, data sets are poorly documented. Researchers keep die details of data collection, variable construction, and particular quirks of the data in their memories and do not put them in writing. Data collectors sometimes prefer data preparation and documentation practices with which they are familiar, al- ~ough these practices may be at odds win accepted standards. Accomplishment of the particular research goals of initial investigators may not require fillly cleaned tapes and well-documented data; data are collected primarily to achieve these research goals, not to serve the purposes of data sharing and secondary analysis. The documentation requirements of research and scientific publication usually differ from Dose of data sharing. Moreover, available financial resources often are seen as inadequate to sup- port data collection and analysis and certainly inadequate for elaborate data preparation and documentation. Consequently, We documentation required for effective sharing is not done. A distinction should be made between technical and substantive documen- tation. Basic standards for technical documentation have been established and are in use in the preparation of many research data collections (Geda, 1979; Roistacher, 1980~. Less clear are the standards for matters such as de- scriptions and explanations of sampling procedures; the original design of Me data collection and any deviations; the assumptions that underlie particular questions, combinations of questions, and derived measures; and Me degree to which inst~nents were pretested and We results of Rose pretests.4 4Denved measures, such as scales or recodes that collapse variables, are often poorly docu- mented. Sometimes, in order to maintain confidentiality, the actual data collected cannot be
Sharing Research Data 17 Practices in this area are less consistent and probably generally less adequate than in the case of technical documentation. Yet these aspects of documenta- tion are essential for the effective secondary use of data collection. Data may be in perfect technical order, but if substantive documentation is inadequate, data are subject to inadvertent misuse with the result of misleading or er- roneous findings. Costs to the Original Researcher Although it serves science for researchers to share their data and permit reanalysis and replication, it is often not in their interest to do so. Researchers face the costs of documentation for He use of others, of storing and transferring data, and of conducting tutorials so that subsequent analysts understand the data. Other costs are less susceptible to monetary valuation and to recompense but are no less real. Researchers face the possibility Cat errors in Heir origin- al analyses will be exposed. Initial investigators may also fear that subse- quent analysts may publish results before they do, a problem that is pa~cular- ly vexing with panel studies. And researchers know that those who reanalyze data will be able to publish only if the reanalysis contradicts or goes beyond the original work. Researchers may be concerned about the qualifications of investigators re- questing data and fear that poor reanalysis may require burdensome rebuttal or reflect adversely on original research. Initial investigators may fear criticism ~at, even if unwarranted, may be detrimental. Researchers may even fear that data made accessible during the peer review process may be published by others. Sharing data involves loss of control over data, the purposes for which they are used, and He methods of analysis. That requests for the shar- ing of data are often met with delays and noncooperation is not surprising (see Wolins, 1962; see Hedrick, in this volume, for a detailed discussion of these issues.) Costs to Subsequent Analysts Subsequent analysts also encounter some costs. Despite more compatible equipment and careful planning by original collectors, not all data may be shared easily. Sharing may be time-consuming and expensive to the subse- quent analyst as well as to He initial researcher, particularly if He data set is shared, but aggregates or derived measures can be. It is particularly important in such cases to document for subsequent analysts how the combinations were put together.