should be disclosed are the subsets of data needed to verify and reproduce the specific conclusions. Expert judgment must be exercised during the editorial review process to determine whether the information in question is an integral part of the discovery or merely provides background.
Some members of the scientific community might like to have access to every available piece of information that an investigator has collected during the course of his or her research. In some fields, such as genome sequencing, groups of researchers have set up mechanisms for sharing some unpublished data. However, it is generally accepted that a scientist has not only the right but also the obligation to evaluate, organize, and ascertain the reproducibility of data before their dissemination via publication. Therefore, in presenting their final findings, authors are not obliged to provide all the raw or unprocessed data they have generated.
Sharing large datasets or databases that contain information about human subjects presents a special challenge because of the requirement to protect the rights and privacy of people who participate in research studies. Clinical databases might contain details that would permit linkages to identify research participants. The committee recognizes that databases arising from clinical studies or treatment trials must be made available in a manner that complies with applicable standards for protection of human subjects (Department of Health and Human Services, 2001).
Publications that deal with software or algorithms, like those involving large datasets or databases, are relatively new in the life-sciences literature. There are no consistent, accepted community standards for sharing such information. As with the other standards discussed in this report, the committee considers that those for sharing software and algorithms should be guided by UPSIDE, as enunciated for other categories of publication: that the purpose of publication is to enable