National Academies Press: OpenBook

Reproducibility and Replicability in Science (2019)

Chapter: Appendix E: Conducting Replicable Surveys of Scientific Communities

« Previous: Appendix D: Using Bayes Analysis for Hypothesis Testing
Suggested Citation:"Appendix E: Conducting Replicable Surveys of Scientific Communities." National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. doi: 10.17226/25303.
Page 193
Suggested Citation:"Appendix E: Conducting Replicable Surveys of Scientific Communities." National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. doi: 10.17226/25303.
Page 194
Suggested Citation:"Appendix E: Conducting Replicable Surveys of Scientific Communities." National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. doi: 10.17226/25303.
Page 195
Suggested Citation:"Appendix E: Conducting Replicable Surveys of Scientific Communities." National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. doi: 10.17226/25303.
Page 196

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Prepublication copy, uncorrected proofs. APPENDIX E: Conducting Replicable Surveys of Scientific Communities Collecting reliable and valid survey data requires carefully constructing a sampling frame, ensuring that each respondent from that sampling frame has an equal and known chance of being selected, and putting procedures in place to ensure that not just the most motivated respondents respond but that follow-ups and other incentives also help recruit hard-to-reach respondents (for a good overview, see Brehm, 1993). The quality of data collections also depends on how questions are worded and ordered and on how information nonresponse bias might influence results (for an overview, see Dillman et al., 2009). When assessing scientists’ attitudes about replicability and reproducibility, transparency in reporting methods and adhering to state-of-the-art tools of sampling representative groups of respondents and eliciting valid responses is particularly important. Unfortunately, even some deviations from scientific protocols can produce significantly skewed results that provide little information about what one wants to measure. Attempts to measure or even accurately record the attitudes of scientists about potential concerns related to replicability and reproducibility face a particularly difficult task: for scientists in general, or even researchers in a particular field, there is no easily accessible comprehensive list of scientists or researchers, even within any given country. The rest of this appendix discusses issues of sampling frame, response biases, and question wording and order. SAMPLING FRAME Many of the existing attempts to survey scientists about replicability and reproducibility issues have not used a carefully defined populations of scientists. Instead, data collections have drawn on nonrepresentative self-selected populations that are convenient to survey (e.g., scientists publishing in particular outlets or members of professional associations) or used other haphazard sampling techniques—such as snowball sampling or mass emails to listservs—that make it impossible to discern which populations were reached or not reached. As a result, researchers who might try to replicate these studies would not even be able to follow the same sampling strategy and would have no measurable indicators of how closely a new sample – drawn on the basis of similarly nonsystematic methods – is to the original one. Fortunately, public opinion researchers (informed by related work in social psychology, political science, sociology, communication science, and psychology) have developed very sophisticated tools for measuring attitudes in a valid and reliable fashion. Like other surveys, any survey of scientists would be based on the assumption that one cannot contact to everyone in the target population, that is, not all scientists or not even all researchers in a particular field. Instead, a carefully conducted survey of scientists would define a sampling frame that adequately captures the population of interest, draw a probability sample from that population, and administer a questionnaire designed to produce reliable and valid responses. At the sampling stage, this work typically involves developing fairly elaborate search strings to capture the breadth and depth of a particular scientific discipline of field (e.g., Youtie, et al., 2008). These search strings are used to mine academic databases, such as Scopus, Web of Science, or Google Scholar for the population of articles published in a particular field. The next step would be to shift from the article level to the lead author level as the unit of analysis; in that form, those datasets could serve as the sampling frame for drawing probability samples for specific 193

Prepublication copy, uncorrected proofs. time periods, for researchers above certain citation thresholds, or other criteria (for overviews, see Peters, 2013; Peters et al., 2008; Scheufele et al., 2007) Most importantly, sampling strategies like these can be documented transparently and comprehensively in ways that would allow other researchers to create equivalent samples for replication studies. RESPONSE BIASES Minimizing potential biases related to sampling, however, is not just a function of defining a systematic, transparent sampling frame, but also a function of using probability sampling techniques to select respondents. Probability sampling (often confused with simple random sampling) means that each member of the population has a non-zero, known, and equal chance of being selected into the sample. A first indication of how successful a survey is in reaching all members of a population are cooperation and response rates. Reporting standards developed by the American Association for Public Opinion Research (2016) for calculating and reporting cooperation and response rates take into account not only how many surveys were returned, but also provide transparency with respect to sampling frames (e.g., respondents who could not be reached because of invalid addresses), explicit declines, and simple nonresponses. Unfortunately, many surveys of scientists on replicability and reproducibility to date do not follow even minimal reporting standards with respect to response rates and therefore make it difficult for other researchers to assess potential biases. Even response rates, however, provide only limited information on systematic nonresponse. Especially for potentially controversial issues, like reproducibility and replicability, it is possible that researchers in particular fields, at certain career levels, or with more interest in the topic are more likely to respond to an initial survey request than others. As a result, state-of- the-art surveys of scientists typically follow some variant of the Tailored Design Method (Dillman et al., 2009), with multiple mailings of paper questionnaires over time, paired sometimes with precontact letters by the investigators, small incentives, reminder postcards, online follow-up, and other tools to maximize participation among all respondents. Following this approach, , regardless of the mode of data collection, is crucially important for minimizing systematic nonresponse based on prior interest, time constraints, or other factors that might disincentivize participation in a survey. Again, many of the published surveys of scientists on replicability and reproducibility issues either rely on single-contact data collections with limited systematic follow-up or do not contain enough published information for other researchers to ascertain the degree or potential effect of systematic nonresponse. QUESTION WORDING AND ORDER Survey results depend heavily on how questions are asked, how they are ordered, and what kinds of response options are offered (for an overview, see Schaeffer and Presser, 2003). Unfortunately, there is significant inconsistency across current attempts to measure scientists’ attitudes on replicability and reproducibility with respect how responsive questionnaires are to potential biases related to question wording and order. This issue complicates interpreting survey results. Simply using the term “crisis” to introduce questions in a survey about the nature and state of science is likely to influence subsequent responses by activating related considerations in a respondent’s memory (Zaller and 194

Prepublication copy, uncorrected proofs. Feldman, 1992). A powerful illustration of this phenomenon comes from public opinion surveys on affirmative action In some surveys, 70 percent of Americans supported “affirmative action programs to help blacks, women and other minorities get better jobs and education.” In other surveys that rephrased the question and asked if “we should make every effort to improve the position of blacks and minorities, even if it means giving them preferential treatment,” almost the same proportion, 65 percent, disagreed.1 This problem can be exacerbated by social desirability effects and other demand characteristics that have the potential to significantly influence answers. It is unclear, for example, to which degree author surveys sponsored by scientific publishers about a potential crisis incentivize or disincentivize agreement with the premise that there is a crisis in the first place. Similarly, some previous questionnaires distributed to researchers asked about the existence of a potential crisis, providing three response options (not counting “don’t know”):  There is a significant crisis of reproducibility  There is a slight crisis of reproducibility  There is no crisis of reproducibility Note that two of options implied the existence of a “crisis of reproducibility” in the first place, potentially skewing responses. All of these factors confound and limit the conclusions that can be drawn from current assessments of scientists’ attitudes about replicability and reproducibility. We hopes that systematic surveys of the scientific community that follow state-of-the-art standards for conducting surveys and for reporting results and relevant protocols will help clarify some of these questions. Using split-ballot designs and other survey-experiment hybrids would also allow social scientists to systematically test the influence that the sponsorship of surveys, question wording, and question order can have on attitudes expressed by researchers across disciplines. 1 See action/ [January 2019]. 195

Prepublication copy, uncorrected proofs. REFERENCES American Association for Public Opinion Researc . (2016). Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. Lenexa, KS: Author. Brehm, J. (1993). The Phantom Respondents: Opinion Surveys and Political Representation. Ann Arbor, MI: The University of Michigan Press. Dillman, D.A., Smyth, J.D., and Christian, L.M. (2009). Internet, Mail, and Mixed-Mode Surveys: A Tailored Design Method (3rd ed.). Hoboken, NJ: Wiley. Peters, H.P. (2013). Gap between science and media revisited: Scientists as public communicators. Proceedings of the National Academy of Sciences, 110(Supplement_3), 14102. doi:10.1073/pnas.1212745110 Peters, H.P., Brossard, D., de Cheveigne, S., Dunwoody, S., Kallfass, M., Miller, S., and Tsuchida, S. (2008). Science communication. Interactions with the mass media. Science, 321(5886), 204-205. doi:10.1126/science.1157780 Schaeffer, N.C., and Presser, S. (2003). The science of asking questions. Annual Review of Sociology, 29(1), 65-88. doi:doi:10.1146/annurev.soc.29.110702.110112 Scheufele, D.A., Corley, E.A., Dunwoody, S., Shih, T.-j., Hillback, E., and Guston, D.H. (2007). Scientists worry about some risks more than the public. Nature Nanotechnology, 2(12), 732-734. Youtie, J., Shapira, P., and Porter, A. (2008). Nanotechnology publications and citations by leading countries and blocs. Journal of Nanoparticle Research, 10(6), 981–986. Zaller, J., and Feldman, S. (1992). A simple theory of survey response: Answering questions versus revealing preferences. American Journal of Political Science, 36(3), 579-616. 196

Reproducibility and Replicability in Science Get This Book
Buy Prepub | $69.00 Buy Paperback | $65.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery.

Concerns about reproducibility and replicability have been expressed in both scientific and popular media. As these concerns came to light, Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research.

Reproducibility and Replicability in Science defines reproducibility and replicability and examines the factors that may lead to non-reproducibility and non-replicability in research. Unlike the typical expectation of reproducibility between two computations, expectations about replicability are more nuanced, and in some cases a lack of replicability can aid the process of scientific discovery. This report provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook,'s online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!