Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Prepublication copy, uncorrected proofs. APPENDIX E: Conducting Replicable Surveys of Scientific Communities Collecting reliable and valid survey data requires carefully constructing a sampling frame, ensuring that each respondent from that sampling frame has an equal and known chance of being selected, and putting procedures in place to ensure that not just the most motivated respondents respond but that follow-ups and other incentives also help recruit hard-to-reach respondents (for a good overview, see Brehm, 1993). The quality of data collections also depends on how questions are worded and ordered and on how information nonresponse bias might influence results (for an overview, see Dillman et al., 2009). When assessing scientistsâ attitudes about replicability and reproducibility, transparency in reporting methods and adhering to state-of-the-art tools of sampling representative groups of respondents and eliciting valid responses is particularly important. Unfortunately, even some deviations from scientific protocols can produce significantly skewed results that provide little information about what one wants to measure. Attempts to measure or even accurately record the attitudes of scientists about potential concerns related to replicability and reproducibility face a particularly difficult task: for scientists in general, or even researchers in a particular field, there is no easily accessible comprehensive list of scientists or researchers, even within any given country. The rest of this appendix discusses issues of sampling frame, response biases, and question wording and order. SAMPLING FRAME Many of the existing attempts to survey scientists about replicability and reproducibility issues have not used a carefully defined populations of scientists. Instead, data collections have drawn on nonrepresentative self-selected populations that are convenient to survey (e.g., scientists publishing in particular outlets or members of professional associations) or used other haphazard sampling techniquesâsuch as snowball sampling or mass emails to listservsâthat make it impossible to discern which populations were reached or not reached. As a result, researchers who might try to replicate these studies would not even be able to follow the same sampling strategy and would have no measurable indicators of how closely a new sample â drawn on the basis of similarly nonsystematic methods â is to the original one. Fortunately, public opinion researchers (informed by related work in social psychology, political science, sociology, communication science, and psychology) have developed very sophisticated tools for measuring attitudes in a valid and reliable fashion. Like other surveys, any survey of scientists would be based on the assumption that one cannot contact to everyone in the target population, that is, not all scientists or not even all researchers in a particular field. Instead, a carefully conducted survey of scientists would define a sampling frame that adequately captures the population of interest, draw a probability sample from that population, and administer a questionnaire designed to produce reliable and valid responses. At the sampling stage, this work typically involves developing fairly elaborate search strings to capture the breadth and depth of a particular scientific discipline of field (e.g., Youtie, et al., 2008). These search strings are used to mine academic databases, such as Scopus, Web of Science, or Google Scholar for the population of articles published in a particular field. The next step would be to shift from the article level to the lead author level as the unit of analysis; in that form, those datasets could serve as the sampling frame for drawing probability samples for specific 193
Prepublication copy, uncorrected proofs. time periods, for researchers above certain citation thresholds, or other criteria (for overviews, see Peters, 2013; Peters et al., 2008; Scheufele et al., 2007) Most importantly, sampling strategies like these can be documented transparently and comprehensively in ways that would allow other researchers to create equivalent samples for replication studies. RESPONSE BIASES Minimizing potential biases related to sampling, however, is not just a function of defining a systematic, transparent sampling frame, but also a function of using probability sampling techniques to select respondents. Probability sampling (often confused with simple random sampling) means that each member of the population has a non-zero, known, and equal chance of being selected into the sample. A first indication of how successful a survey is in reaching all members of a population are cooperation and response rates. Reporting standards developed by the American Association for Public Opinion Research (2016) for calculating and reporting cooperation and response rates take into account not only how many surveys were returned, but also provide transparency with respect to sampling frames (e.g., respondents who could not be reached because of invalid addresses), explicit declines, and simple nonresponses. Unfortunately, many surveys of scientists on replicability and reproducibility to date do not follow even minimal reporting standards with respect to response rates and therefore make it difficult for other researchers to assess potential biases. Even response rates, however, provide only limited information on systematic nonresponse. Especially for potentially controversial issues, like reproducibility and replicability, it is possible that researchers in particular fields, at certain career levels, or with more interest in the topic are more likely to respond to an initial survey request than others. As a result, state-of- the-art surveys of scientists typically follow some variant of the Tailored Design Method (Dillman et al., 2009), with multiple mailings of paper questionnaires over time, paired sometimes with precontact letters by the investigators, small incentives, reminder postcards, online follow-up, and other tools to maximize participation among all respondents. Following this approach, , regardless of the mode of data collection, is crucially important for minimizing systematic nonresponse based on prior interest, time constraints, or other factors that might disincentivize participation in a survey. Again, many of the published surveys of scientists on replicability and reproducibility issues either rely on single-contact data collections with limited systematic follow-up or do not contain enough published information for other researchers to ascertain the degree or potential effect of systematic nonresponse. QUESTION WORDING AND ORDER Survey results depend heavily on how questions are asked, how they are ordered, and what kinds of response options are offered (for an overview, see Schaeffer and Presser, 2003). Unfortunately, there is significant inconsistency across current attempts to measure scientistsâ attitudes on replicability and reproducibility with respect how responsive questionnaires are to potential biases related to question wording and order. This issue complicates interpreting survey results. Simply using the term âcrisisâ to introduce questions in a survey about the nature and state of science is likely to influence subsequent responses by activating related considerations in a respondentâs memory (Zaller and 194
Prepublication copy, uncorrected proofs. Feldman, 1992). A powerful illustration of this phenomenon comes from public opinion surveys on affirmative action In some surveys, 70 percent of Americans supported âaffirmative action programs to help blacks, women and other minorities get better jobs and education.â In other surveys that rephrased the question and asked if âwe should make every effort to improve the position of blacks and minorities, even if it means giving them preferential treatment,â almost the same proportion, 65 percent, disagreed.1 This problem can be exacerbated by social desirability effects and other demand characteristics that have the potential to significantly influence answers. It is unclear, for example, to which degree author surveys sponsored by scientific publishers about a potential crisis incentivize or disincentivize agreement with the premise that there is a crisis in the first place. Similarly, some previous questionnaires distributed to researchers asked about the existence of a potential crisis, providing three response options (not counting âdonât knowâ): ï· There is a significant crisis of reproducibility ï· There is a slight crisis of reproducibility ï· There is no crisis of reproducibility Note that two of options implied the existence of a âcrisis of reproducibilityâ in the first place, potentially skewing responses. All of these factors confound and limit the conclusions that can be drawn from current assessments of scientistsâ attitudes about replicability and reproducibility. We hopes that systematic surveys of the scientific community that follow state-of-the-art standards for conducting surveys and for reporting results and relevant protocols will help clarify some of these questions. Using split-ballot designs and other survey-experiment hybrids would also allow social scientists to systematically test the influence that the sponsorship of surveys, question wording, and question order can have on attitudes expressed by researchers across disciplines. 1 See http://www.pewresearch.org/fact-tank/2009/06/15/no-to-preferential-treatment-yes-to-affirmative- action/ [January 2019]. 195
Prepublication copy, uncorrected proofs. REFERENCES American Association for Public Opinion Researc . (2016). Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. Lenexa, KS: Author. Brehm, J. (1993). The Phantom Respondents: Opinion Surveys and Political Representation. Ann Arbor, MI: The University of Michigan Press. Dillman, D.A., Smyth, J.D., and Christian, L.M. (2009). Internet, Mail, and Mixed-Mode Surveys: A Tailored Design Method (3rd ed.). Hoboken, NJ: Wiley. Peters, H.P. (2013). Gap between science and media revisited: Scientists as public communicators. Proceedings of the National Academy of Sciences, 110(Supplement_3), 14102. doi:10.1073/pnas.1212745110 Peters, H.P., Brossard, D., de Cheveigne, S., Dunwoody, S., Kallfass, M., Miller, S., and Tsuchida, S. (2008). Science communication. Interactions with the mass media. Science, 321(5886), 204-205. doi:10.1126/science.1157780 Schaeffer, N.C., and Presser, S. (2003). The science of asking questions. Annual Review of Sociology, 29(1), 65-88. doi:doi:10.1146/annurev.soc.29.110702.110112 Scheufele, D.A., Corley, E.A., Dunwoody, S., Shih, T.-j., Hillback, E., and Guston, D.H. (2007). Scientists worry about some risks more than the public. Nature Nanotechnology, 2(12), 732-734. Youtie, J., Shapira, P., and Porter, A. (2008). Nanotechnology publications and citations by leading countries and blocs. Journal of Nanoparticle Research, 10(6), 981â986. Zaller, J., and Feldman, S. (1992). A simple theory of survey response: Answering questions versus revealing preferences. American Journal of Political Science, 36(3), 579-616. 196