At two times during the workshop, the participants broke into four small groups to engage in discussions triggered by vignettes submitted in response to the planning committee’s call for accounts of real-life efforts and activities relevant to ethics, data, and international research collaboration. The breakout group discussion guide is included in Appendix C. Before the workshop ended, a rapporteur from each of the four breakout groups gave a 12-minute synopsis of those discussions.
Shanni Silberberg, AAAS Science and Technology Fellow at the U.S. Agency for International Development, said her group’s discussion started with the multidimensionality of the context for data collection, with that multidimensionality representing geographic differences in the population being sampled and even contextual differences within small populations, such as “indigenous people.” There is also a context to data depending on when it is collected, with the example being the seasonality of agricultural data and the bias that would result from only collecting data at one time of the year. She also noted it is important to consider the demand characteristics; that is, what the subjects believe the data will be used for may inform whether they will provide a certain answer. For example, if someone from Feed the Future, with grant money in hand, asked community members what their food security situation was, they would likely say it was horrible so that they could access those funds. Silberberg’s group also noted that the identity of the researcher—male or female, for example—can also affect bias in the collected data. Taken together, these potential sources of bias should be noted in the metadata.
In its discussion of ethics, this group concluded there are different ethical considerations at different stages of a project and these should be examined and reexamined as a project evolves. Ethical issues include how to deal with using data for a different purpose or in a different study than originally intended and what the value chain is for data. It is also important, said Silberberg, to consider managing expectations about the use of the data and how they will be governed in terms of who controls the data and what the regulations and rules for their use. A governance framework, she said, is necessary for setting out processes for data collection, storage, and dissemination, with the latter including plans for returning the data and discussing the knowledge gained from the data with the community
that was the source of the data. These plans need to consider what language and form to deliver the data and findings to the community and how to explain the conclusions in the context of that community.
The governance framework also needs to consider how privacy will be managed, how to balance privacy and having the data be open for wider use, and how to allocate intellectual property that arises from the data. The former becomes more important in collaborations and networks, where data sharing is essential to the success of such partnerships. Also important, she said, is giving credit to local scientists who are partners in international collaborations and considering the ethics of sticking the local scientists last in the list of contributors. Transparency about who is funding a study is important as well.
Returning to the subject of metadata, this group defined it as all the information necessary and relevant to make ethical and data use decisions. While this does not mean that researchers must collect every single piece of information about the data and the subjects of the data, it does mean that there needs to be transparency about the data, its potential biases, and limitations. Each field, said Silberberg, needs to define the minimum requirements for metadata so that other researchers can use the data.
This group also discussed the need for a process for creating and sharing databases. That process, said Silberberg, would be defined by a broker or governance structure that in the ideal is unbiased. For context, the group referred to the World Health Organization, which sets certain standards that other organizations or governments consider—but are not bound to follow—when formulating their own regulations. She concluded with the group’s framing of this process as one that, to the extent possible, is a process that is thoughtfully designed to include advocates from different points of view who can inform or act as honest brokers. These honest brokers should be informed by an expanded view of metadata to include the context of data collection, such as the rights of vulnerable groups, the seasonality of data collection, and other features. The process should also include a declaration of conflicts of interests and the formulation of a best practices list, said Silberberg in closing. A second rapporteur from group A then talked briefly about the group’s discussions about secondary use of archived data. This discussion, she said, emphasized the importance of fair access principles and transparency.
Kristin Tolle, the rapporteur for group B, said her group struggled with the question of who should be responsible for other entities in a collaboration that are collecting and using data. The group wondered if the charter of the Research Data Alliance or the World Wide Web Consortium should be expanded to address this issue in the context of an international collaboration. The group also debated whether Facebook, which has two billion users, would be considered a member state to such an organization, what the penalty would be for people or organizations that violate the rules set by the governing body for data capture and reuse, and what kind of governance is needed for extraterrestrial data. It also discussed how to enforce the Common Rule and universal design while considering cultural issues and rules established by local communities.
Today, said Tolle, data seems boundaryless, which raises the issue of data ownership. Group B also raised the issue of long, incomprehensible end-user license agreements, the fact that few people read them, and the need for a better framework for dealing with privacy and choice. Such a framework needs to include categories for how it applies to researchers, individuals, governments, groups, and unions; how it applies at local and international levels; what the enforcement model will be; and how disputes over data access and ownership will be resolved. That discussion led to the question of whether it is even possible to develop such a framework given that the horse is out of the barn and how cultures that have a different concept of self will identify with such a framework. Finally, the group felt strongly that there needs to be broad representation in the discussions about this framework. Another group B participant added that the group thought the question “Do those who own the data own the future and what does data ownership mean?” might be an interesting topic for a future workshop.
Ruxandra Draghia and Andreas Rechkemmer served as the rapporteurs for group C, which focused on ethics and data in the context of international collaborative research involving data from an indigenous population. Some of the discussion, said Draghia, revolved around the notion of risk and the ability to create self-imposed limits to research. Put in the context of the cultural values of the participants, that decision to participate cannot be prescriptive, or as one group member put it, “We do not want to be missionaries.” That implies needing to build capacity in local communities to create a sustainable model for data collection and establish long-term relationships based on mutual trust and respect. This should be seen, said Draghia, as an iterative process rather than a snapshot of how this is done today. With new participants, discussions should include goals and methodologies to agree ahead of time how to deal with potential challenges and risks in large-scale data management and collaborative scientific research.
For ethics, the group questioned what the carrots and sticks for the researchers involved would be. The example of Singapore was brought up to illustrate how some nations have developed laws that spread wealth across the population rather than concentrating it in the hands of a few by bringing on board civil society and having the local population take ownership of the process. This led to the suggestion of looking at the 17 sustainable development goals, picking one per year, and having the international community work in a collaborative manner, using all its tools, including data, to solve the issues for that goal. Doing so would require looking at each goal from a global, local, and discipline level and developing a plan for data collection, publication, dissemination, and communicating the results to the affected communities and the rest of the world’s population.
Rechkemmer said the group discussed the need for harmonization and standardization at the international level and the fact that many United Nations agencies are trying to do that. Group C also talked about the universalization and democratization of data sets and access to them, which is an ethical prerogative. In terms of next steps, this group wanted to express strongly its desire to continue this activity and hold follow-on activities to this workshop. “There are not many people worldwide who do this kind of work, and some people are shying away
from the combination of data, ethics, and international research, so this is cutting edge and important,” said Rechkemmer. Ideas for future workshops would be to spend more time looking at equity and trust issues, as well as education and training. The group spent some time talking about certification versus standardization and were not clear what certification in this space means.
One idea from group C was to apply a governance of the commons approach, for which Elinor Ostrom won the Nobel Prize in Economics in 2009, to the data world. Rechkemmer noted that a group at the University of Washington is working on what it calls a data commons approach, and the group suggested looking at identifying rules for a self-governance approach to managing data sets and the ethical challenges that would come with such an approach.
Austen Applegate, senior program assistant with the Board on Higher Education and Workforce at the National Academies, served as the rapporteur for group D, which discussed the challenges of dealing with the fundamental of linguistics in a multinational collaboration involving people who come from different cultural backgrounds. The group noted the importance of consent and guaranteeing that translations of consent documents and the questions to answer are faithful and accurate and that the translations back into the researchers’ language are also faithful and accurate. This group also discussed the challenges of dealing with the historical context of posing certain questions and issues arising from questions that one culture might find acceptable and another insulting or violating privacy. It will be important, then, to involve the communities being studied when questions and methods are being formulated, rather than later.
The group then discussed the difference between asking “What is wrong with you?” versus “What is wrong with the situation?” and used the example of researchers studying the high incidence of diabetes in a tribal population. Rather than asking the people what is wrong with them, it was important to consider the historical context of this community losing its land and not being able to cultivate their own crops and use the land to their benefit. The question then became, “Why is this population suffering?”, which led to a different set of questions and a different type of study.
Group D also identified the context of research being conducted and how long data can be accessible and available as important issues to resolve. The group felt that ownership of data should not change and therefore should not be transferrable, which means that the original owner should be the responsible steward for the data. Going forward, the group stressed the importance of involving more diverse populations in these conversations, of the need to raise the consciousness of the public as to how data are collected and used by large corporations, research entities, and the government, and of the need to balance innovation and protection of data subjects. On a final note, another group D member added that data ethics training should be embedded in data science education and that data science programs should be available more widely through the education system.