5
Methods of Assessing Science

This chapter presents a general framework for thinking about methods of assessing science retrospectively or prospectively, reviews the conceptual and empirical literatures on the selected methods, and discusses their likely relevance and feasibility for research priority–setting decisions in the Behavioral and Social Research (BSR) Program of the National Institute on Aging (NIA). The focus here is on seeking methods that can provide science managers with the best possible input to priority-setting decisions while also achieving basic goals of accountability and rational decision making. Quantitative methods are attractive in terms of accountability, in the accountant’s sense of comparing different investments in research on a common numerical scale. They are also conducive to improved outcomes to the extent that the measures are valid indicators of what they purport to measure.

Most of this chapter is devoted to examining the strengths and weaknesses of various methods of assessing science. Science assessments have become commonplace and include assessments of fundamental science (National Science and Technology Council, 1996), of technology development programs (Link, 1996; Ruegg and Feller, 2003), and of the performance of specific academic and research laboratories, in both the United States and other countries (e.g., Bozeman and Melkers, 1993; Moed et al., 2004). The assessments include commission reports, agency-specific commissioned evaluations, and academic works. As Table 5-1 shows, various assessment methodologies have arisen from different disciplinary and multidisciplinary perspectives, measuring different aspects of scientific activity, and addressing various science and technology policy and assessment questions.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging 5 Methods of Assessing Science This chapter presents a general framework for thinking about methods of assessing science retrospectively or prospectively, reviews the conceptual and empirical literatures on the selected methods, and discusses their likely relevance and feasibility for research priority–setting decisions in the Behavioral and Social Research (BSR) Program of the National Institute on Aging (NIA). The focus here is on seeking methods that can provide science managers with the best possible input to priority-setting decisions while also achieving basic goals of accountability and rational decision making. Quantitative methods are attractive in terms of accountability, in the accountant’s sense of comparing different investments in research on a common numerical scale. They are also conducive to improved outcomes to the extent that the measures are valid indicators of what they purport to measure. Most of this chapter is devoted to examining the strengths and weaknesses of various methods of assessing science. Science assessments have become commonplace and include assessments of fundamental science (National Science and Technology Council, 1996), of technology development programs (Link, 1996; Ruegg and Feller, 2003), and of the performance of specific academic and research laboratories, in both the United States and other countries (e.g., Bozeman and Melkers, 1993; Moed et al., 2004). The assessments include commission reports, agency-specific commissioned evaluations, and academic works. As Table 5-1 shows, various assessment methodologies have arisen from different disciplinary and multidisciplinary perspectives, measuring different aspects of scientific activity, and addressing various science and technology policy and assessment questions.

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging TABLE 5-1 Some Methodologies for Science Assessment and the Attributes of Scientific Activity They Measure Methods Attributes Measured Coauthorship links, multinational research articles Scientific collaboration, globalization Patent citation analysis Economic value of patents Cross-disciplinary coauthorships and citations Multidisciplinarity and interdisciplinarity of research Citations from clinical guidelines, regulations, and newspapers Practical use of research Scientist-inventor relationships, citations from articles to patents Knowledge flows from science to technology Co-occurring word and citation analysis Sociocognitive structures in science Use of first names of authors or inventors Participation of women in science SOURCE: Moed et al. (2004). These assessment efforts have generated several broadly accepted “best practice” principles. For example, the National Science and Technology Council (1996:xii) set forth the following nine principles for assessment of fundamental science programs: Begin with a clearly defined statement of program goals. Develop criteria intended to sustain and advance the excellence and responsiveness of the research system. Establish performance indicators that are useful to managers and encourage risk-taking. Avoid assessments that would be inordinately burdensome or costly or that would create incentives that are counterproductive. Incorporate merit review and peer evaluation of program performance. Use multiple sources and types of evidence, for example, a mix of quantitative and qualitative indicators and narrative text. Experiment in order to develop an effective set of assessment tools. Produce assessment reports that will inform future policy development and subsequent refinement of program plans. Communicate results to the public and elected representatives. These principles are generally sensible, but they leave some important questions unaddressed. One of these is how to establish useful performance

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging indicators and incorporate peer review and evaluation at the same time. In this chapter, we adopt a conceptual framework for thinking about assessment methods that we think will allow such questions to be addressed more systematically. Our recommendations are in Chapter 6. A FRAMEWORK: ANALYSIS AND DELIBERATION AS ASSESSMENT STRATEGIES We find it useful to consider the issues of research assessment, both prospective and retrospective, in light of a distinction made in a previous National Research Council (NRC) study. In Understanding Risk: Informing Decisions in a Democratic Society (1996), an NRC committee distinguished between two methods for seeking practical understanding that it called analysis and deliberation. Analysis “uses rigorous, replicable methods developed by experts to arrive at answers to factual questions”; deliberation “uses processes such as discussion, reflection, and persuasion to communicate, raise and collectively consider issues, increase understanding, and arrive at substantive decisions” (p. 20). In stylized terms, counting patents or citations to studies, constructing network diagrams of communication patterns, and enumerating publications in designated major journals are analytic methods, whereas peer review conducted through discussions in advisory panels is a deliberative method. Understanding Risk noted that science policy decisions typically employ both analysis and deliberation and argued that it is appropriate for them to do so. Among the reasons identified for using deliberation are that the most useful type of analysis often is not self-evident and is best determined through dialogue involving both the potential producers and the users of the analysis, and that judgment is inevitably involved in finding the meaning of analytic findings and uncertainties for specific decisions, particularly when the decisions must be made against multiple objectives. The report defined the challenge for public policy as one of finding procedures (called analytic-deliberative processes in the report) that appropriately integrate the two methods. In an effective analytic-deliberative decision process, those involved in making a decision determine the kinds of analysis they need, see that the analysis is conducted as needed, and deliberate on the choices they face, informed by the analysis and discussion of its strengths and limitations. A central point of Understanding Risk was that even in such enterprises as environmental risk assessment, which are commonly seen as relying almost completely on analysis, the need for deliberation is critical. Expenditures on analysis can have little practical value if the analysis is not directed to the most important questions for decision makers. Deliberation is needed to ensure that government procures the right science for the pur-

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging pose. Deliberation is also critical because a single set of scientific findings can have various implications for policy, depending on judgments about matters that analysis alone cannot resolve, such as how much weight to give to the different outcomes of a policy choice that has multiple consequences and how to act in the face of gaps and uncertainties in available knowledge. Deliberation is needed to give due consideration to the possible meanings of what is and is not known. So even in very analysis-heavy areas of policy, the value of analysis ultimately depends on the quality of the deliberation that shapes and interprets the analysis. Research policy presents a different situation from environmental and health policy with respect to the roles of analysis and deliberation. The value of deliberation is well established for making decisions about scientific research portfolios, as reflected in the careful efforts that research agencies such as the National Science Foundation (NSF) and the National Institutes of Health (NIH) make to devise and reevaluate their peer review and advisory processes. However, the value of analysis, especially that grounded in the use of quantitative measures, remains in dispute. Debate also continues about whether use of analytical methods has contributed to improved science policy decision making or has been dysfunctional (Perrin, 1998; Radin, 2000; Feller, 2002; Weingart, 2005). As already noted, the prevailing view in the scientific community emphasizes expert peer review as the most effective available method. Reframing the debate along the lines suggested by Understanding Risk, that is, in terms of the appropriate roles of analysis and deliberation, may help to find optimal ways to use both sources of information. We begin by noting that all policy decisions in a democracy are ultimately deliberative. The issue is not whether to replace deliberation with analysis in making decisions, because decisions will continue to be deliberative. The issues are whether there are useful roles for analysis in a deliberative decision process and, if so, how the use and interpretation of analysis should be organized (and by whom) in research policy making. Thus, it is useful to focus attention on a set of empirical questions such as these: Can deliberations about the past progress of scientific fields and the best way to shape research portfolios be better informed by the use of appropriate analytic methods? If so, which analytical tools hold promise for better informing judgments about behavioral and social research on aging? What institutional structures and procedures are effective for selecting, shaping, and interpreting analysis to inform research policy choices? How do different structures and procedures for analytic deliberation affect the distribution of decision-making influence and authority among researchers, research managers, and representatives of society?

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging In keeping with the tenor of mainstream conclusions of the academic research community that is the primary performer of BSR-funded research, we find it convenient to start with the judgment of the Committee on Science, Engineering, and Public Policy (1999a) report on evaluating research programs, that expert judgment is the best means of evaluating research. We further accept the widespread assessment found in the bibliometric literature that there is no single approach that will always work best and therefore that it makes sense to develop a toolbox of methods, both analytic and deliberative, for informing judgment (e.g., Grupp and Mogee, 2004). Different analytic tools might have value for different assessment purposes. They might be useful for measuring research results, organizing information brought to bear by applying other analytic tools (e.g., to arrive at numerical weights for different kinds of information), or helping to structure deliberative processes. Thus, it is appropriate to ask both about the validity of particular measures or indicators for particular purposes and about how such measures might add value to a deliberative, judgment-based process. For convenience, we divide the following discussion into methods that are primarily analytical, those that are primarily deliberative, and those that combine both strategies. On the basis of an initial review of a larger set of decision-making techniques, we have selected three analytical approaches as most applicable to the needs for prospective and retrospective assessment as defined by BSR: Bibliometric analysis of the results of research and the connections among research efforts, reputational studies (such as can be obtained by surveying the members of research communities), and decision analysis.1 We also discuss peer evaluation procedures, usually a purely deliberative method. Finally, we turn to analytic-deliberative approaches. A familiar one in the context of the NIH is the Consensus Development Conference, which combines analysis and deliberation but has not been adapted for making research policy decisions. We also discuss one ongoing effort in the NRC to employ an analytic-deliberative approach to a problem of comparing research in different fields, in this case, energy research. ANALYTICAL METHODS As noted in Chapter 4, comparative analysis of scientific progress across fields presents major challenges. The uneven pace and seemingly unpredictable paths of scientific progress and of its application to practical problems make it hard to get unambiguous meaning from even the most systematic analysis of past events in a field. Comparisons across fields are even more difficult because the paths toward progress and the barriers to it may vary systematically from one field to another. These are among the reasons that scientists and science managers have at times resisted the use of analytical techniques, especially quantitative ones, for assessing science. In addition,

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging there is the possibility that quantitative methods may be applied in automatic ways that exclude the judgment of the people who know the science best. In the discussion that follows, we presume that the value of analyses is not to replace judgment, but to inform it. We consider the potential roles of analytic techniques in that light. Bibliometric Analysis The term scientometrics broadly relates to the generation, analysis, and interpretation of quantitative measures of science and technology. As described by van Raan (1988b:1), the field is based on the use of “mathematical, statistical, and data-analytical methods and techniques for gathering, handling, interpreting, and predicting a variety of features of the science and technology enterprise, such as performance, development, dynamics.” Bibliometrics, the quantitative study of patterns of published scientific output and their use (e.g., citations), is the subset of scientometrics that is our primary focus of attention.2 Bibliometric and other scientometric methods were developed originally for exploring the workings of the scientific enterprise, that is, as descriptive and analytical tools, not as evaluative or predictive ones (Thackray, 1978; Godin, 2002). Their descriptive accuracy was originally validated against expert opinion. Scientometric researchers believe that a better quantitative understanding of scientific processes is needed in order to build and validate theories in the sociology of knowledge (e.g., van Raan, 2004). The distinction between the descriptive uses of bibliometrics to understand the working of science and the evaluative uses to assess performance (van Leeuwen, 2004) is important because the strengths and weaknesses of any quantitative approach, and its value to its users, depend on the questions being posed and the use to which the technique is put. Measurement of publications and citations can be used to describe the activities of a nation, an institution, a research group, or an individual; the dynamics of fields of science that can be specified in bibliometric terms (e.g., by their leading journals or by keywords that can be found in the titles or abstracts of publications); and the relationships between and among specified fields. It can be used to build and test theories of the content and structure of science (e.g., Price, 1963), to demonstrate the contribution of publicly funded science to technological innovation (e.g., Narin et al., 1997), to highlight “hot” areas of science or hot researchers, or to track the import and export of ideas among fields.3 When bibliometric measures are treated as outputs, they can be combined with input measures, such as expenditures or personnel complements, to compare the past performance of research institutes, departments, and the like, or of fields, subfields, and

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging disciplines. In this use, they have the advantage of making different things comparable on the same scale. One potentially valuable contribution of bibliometrics to the assessment of scientific fields is that it makes possible the assessment of the import and export of ideas between fields by following cross-citation patterns (van Leeuwen and Tijssen, 2000). By identifying the authors of articles published in or cited by a diverse set of journals, it is possible in principle to identify patterns of scientific collaboration across fields. It also is possible, by examining the scholarly profiles of the collaborators or their institutions or both, to assess whether particular established or newly emerging research fields are attracting the best and brightest of a nation’s current and future scientists (Glanzel and Schubert, 2004; Morillo et al., 2001). Bibliometric data might also be useful for discerning and offsetting observed tendencies of proposal review panels to discriminate against “crossdisciplinary proposals that lack an established peer group” (Porter and Rossini, 1985:38).4 However, bibliometric measures have shortcomings as a guide to evaluative use and research decision making by mission agencies. Bibliometrics emphasizes publications in peer-reviewed journals. It does not account for practical applications that may be of value to research sponsors, research performers, and society. It provides no place for nonacademics to apply their values in gauging the societal importance of research findings. As usually implemented, it advantages journal authors over book authors or others whose works are not in major databases (Lamont and Mallard, 2005), and it favors quantitative work (which is more likely to appear in journals than books) and authors who speak to narrower and academic audiences (Clements et al., 1995, looking at sociology, in Lamont and Mallard, 2005). It favors types of research that suit high-impact journals over other types of research, such as clinical and application-based research (Kaiser, 2006). And it may overvalue scientific outputs that are frequently cited because they are controversial or wrong. Many of these shortcomings can be alleviated to a degree by careful research design, but however well this is done, the evaluative meaning of bibliometric comparisons requires interpretation, as we discuss below. To move beyond a general review of the strengths and weaknesses of bibliometric techniques as a means of setting research priorities, we were briefed by Anthony F.J. van Raan, a leading developer and analyst of bibliometric techniques, on what one might learn from those techniques; we then commissioned a pilot study designed to determine whether it was possible to map the direction of behavioral and social science research in aging using bibliometric indicators. Committee members, whose expertise extends across (and beyond) the behavioral and social science domain of BSR’s program, specified keywords intended to define certain areas of research on aging of programmatic concern to BSR. For each area, committee members

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging supplied an initial list of core journals in which research containing these words was likely to be published. Ed Noyons, a distinguished bibliometrican and specialist in bibliometric mapping from the University of Leiden, Netherlands, was commissioned to conduct a fuller search of bibliometric citations based on these keywords and to develop bibliometric maps of relationships between and among research clusters, journals, and authors. The pilot study quickly revealed that the basic outputs of the exercise, such as the size of the corpus of work in a field and the boundaries of the field (e.g., which key articles are and are not included) were quite sensitive to the choice of keywords. The pilot study strongly suggested that if bibliometric indicators are to be used for research assessment, considerable reliance must be placed on the subject-matter experts to guide and review the work of the specialists in bibliometrics who will perform the actual studies. Several iterations of generation and analysis of data will probably be needed before the assigned experts are satisfied with the output. The reliability of this method, that is, the extent to which different experts’ lists of keywords would yield similar results, is unknown. Thus, the meaning of analyses that are sensitive to expert judgment on the input end is likely to be open to different interpretations by experts who have different views of the research area in question. These concerns are likely to be most serious when bibliometric analysis is used to assess the dynamics of emerging research fields that lack established publication outlets or generally shared terminology. Reputational Studies Surveys and interviews have often been used to solicit the views of representative samples of scientific communities about issues on which judgments are to be made. An example is the periodic surveys the NRC has organized to assess research doctorate programs in American universities (e.g., National Research Council, 1995b, 2003). The reputational approach has the advantages that, unlike informal peer-review discussions that draw on reviewers’ understandings of the reputations of researchers and research fields, it is systematic, it can be used continually over time, and its methods can be made transparent. The approach also has significant validity problems for its usual purpose, which is to compare entities that are presumed to be of the same type (such as university departments of psychology or economics). The problems include biases that may be introduced by relying on reputation (e.g., sensitivity to name recognition effects driven by the size of the research unit or the presence of a single well-known individual) and the difficulties of comparability among entities that may have the same names but are quite different in composition or objectives. In addition, the nature of the entities being compared can change over time, as, for example, when

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging taxonomies of fields become outmoded (see National Research Council, 2003, for further discussion). Reputational approaches have additional limitations for the task of concern in this study, making comparisons across different scientific fields or subfields. Most fundamental is that there is no single scientific community that can be surveyed to get meaningful information. Very few individuals, if any, are equally well informed about each of the fields to be compared, so that sampling techniques create a difficult, perhaps insoluble, dilemma. It is possible to create an acceptable representative sample of researchers across the broad area in which comparisons are to be made (behavioral and social research on aging), but such a sample will include many respondents who are well informed about their own parts of this broad field but not about other parts. Alternatively, it is possible to create acceptable representative samples for each of the narrower fields to be compared, but this procedure will reproduce the problem that led to this study in the first place: the possibility that different standards of quality are being used in different subfields, making community judgments noncomparable across fields. We have been unable to identify a way out of this dilemma. We do not see value in reputational studies for making comparisons of different fields without a prior demonstration that there is a valid method of eliciting comparable judgments. Value may be gained by systematically eliciting judgments of research progress from samples of narrow research communities, especially if there may be differences in judgments within the field (e.g., between younger scholars and the ones most likely to be placed on deliberative peer review groups). However, surveys should not replace judgment, and research managers need to judge whether the potential knowledge to be gained from adding a survey to judgment is worth the incremental cost of survey research. Our judgment is that it will be worthwhile only in special cases. Decision Analysis The above analytical methods all inform judgment by providing decision-relevant information that decision participants would not otherwise have. Decision analysis, by contrast, provides a set of techniques that can be used to organize and structure deliberation. Decision-analytic techniques have not been given much attention in science policy, and, when proposed as decision aids, they have often met stiff resistance from scientists (Fischhoff, 2000; Arkes, 2003). We see these techniques as worthy of renewed attention because they have proved useful for assisting choices in other practical contexts in which (a) decisions are complex, (b) decisions have consequences for multiple important outcomes, (c) considerable uncertainty exists about how each choice will affect the

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging outcomes, and (d) opinions diverge about the relative value of the outcomes. For example, these techniques have been used to help design safety features in complex technologies, to assess the environmental and public health risks of chemicals, and to inform decisions about the siting of hazardous waste facilities. Decision-analytic techniques help clarify and allow for separate consideration of the key elements of a decision, particularly the relationships between actions and their various consequences, the valuation of these consequences, and the relationships among the decision elements (e.g., Edwards, 1954; Behn and Vaupel, 1982; Howard and Matheson, 1989; Pinkau and Renn, 1998; van Asselt, 2000; Jaeger et al., 2001; North and Renn, 2005). Decision analysis offers both quantitative and qualitative methods. Quantitative techniques include benefit-cost analysis, multiattribute utility analysis (von Winterfeldt and Edwards, 1986), value-tree analysis (e.g., Keeney and Raiffa, 1976), value-of-information analysis (Raiffa, 1968), quantitative characterization of uncertainties (Morgan and Henrion, 1990), and prediction markets (Berg and Reitz, 2003). The usefulness of these techniques depends on the availability of quantitative estimates of the effects of policy choices or new scientific information on highly valued outcomes that are reasonably accurate or have estimable uncertainties. It also depends on developing some justifiable method for aggregating different kinds of outcomes. Because of the shortcomings of fundamental understanding of how research activities lead to scientific or technological progress (see Chapter 4), the continuing uncertainty or loose coupling of such progress when it occurs to the desired societal objectives, and the difficulties associated with aggregating different kinds of outcomes, these basic requirements are not currently met for research policy on behavioral and social science and aging. Thus, we do not recommend the use of quantitative techniques of decision analysis to inform decisions about setting priorities for basic behavioral and social science research on aging. Qualitative techniques of decision analysis, by which we mean techniques for structuring or organizing decision problems without attempting to quantify the effects of decisions, are more modest in their objectives than the quantitative approaches, but they seem to have greater potential for assisting with priority setting in research policy. Decision analysis, used to structure choices, can make decision processes more transparent, thus contributing to accountability, by creating frameworks for examining issues, focusing deliberation on explicit evaluative criteria, and helping diverse groups understand the bases of their divergent judgments (North and Renn, 2005). It is likely that the best ways to employ decision science approaches for structuring research policy choices will have to be developed over time and adapted to meet particular needs (Fischhoff, 2000). Here we note two approaches that may provide useful starting points

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging for such development. Both involve developing simple conceptual models of how research might contribute to a set of science policy objectives. One approach to doing this involves influence diagrams (Clemen, 1991). These are directed graphs in which each node represents a variable, arrows point from predictor variables to predicted variables, and the practical outcome variables that motivate research funding are prominently included (see Box 5-1). Another approach specifies the objectives of the choice at hand and further specifies elements or contributing factors to each objective as a way to structure consideration of the available options. This approach was used in a recent NRC study (2005a) that recommended five priority areas for social and behavioral science research to improve environmental decision BOX 5-1 Influence Diagrams of the Impacts of Scientific Research Fischhoff (2000) suggests that the process of developing influence diagrams of the pathways from research to its scientific results and societal benefits can clarify the place of various research activities in the larger enterprise and promote more focused discussion of priorities, even if credible numbers cannot be calculated to estimate the strengths of the relationships that the arrows represent. Such discussion could systematically address such questions as whether anyone in the scientific community is receiving research support to understand each element in the influence diagram and “whether the research investments are commensurate with the opportunities” (Fischhoff, 2000:82). As an example, Fischhoff presents an influence diagram in which the variable of central interest is the public health risks of Cryptosporidium. The diagram shows the roles of events in the biophysical environment (e.g., contamination of drinking water resulting from a flood), responses of individuals and organizations to the events, engineering practices (e.g., routine testing of the water), mass media coverage, and other factors. In such a diagram, various kinds of scientists can locate the points at which their research is relevant to reducing the risks. This diagram emphasizes a practical, health-related outcome that research might help improve. Similar conceptual models might be developed for NIA’s practical goals for research, such as to “improve health and quality of life of older people” and to “reduce health disparities among older persons and populations” (National Institute on Aging, 2001); for considering other important NIA goals for research, such as to “understand healthy aging processes”; or for comparing research programs that contribute differentially to different research goals.

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging industry and public interest groups. Thus, the committee collected the available analytic information and organized it around the cells of the matrix, but it ultimately relied on deliberative processes to reach its conclusions (National Research Council, 2001a). The Committee on Prospective Benefits of DOE’s Energy Efficiency and Fossil Energy R&D Programs prepared a prospective assessment (National Research Council, 2005c) using a modified analytic matrix that retained the distinction among the three objectives of the R&D programs. It considered the probability of the program achieving its goals of producing new technologies and the conditional probability of market acceptance of those technologies, evaluating these outcomes in relation to three scenarios of possible energy futures. It would be possible to develop an analogous approach for assessing behavioral and social research on aging. BSR could develop a simple but flexible evaluation methodology that is transparent and that could be applied consistently across fields. NIA strategic planning documents could specify the key objectives of BSR research. Retrospective analyses could consider a matrix of results that assessed realized benefits (e.g., to health and well-being), options benefits (e.g., development of techniques and procedures in health care), and knowledge benefits from each field in relation to the NIA research objectives. Knowledge benefits include not only knowledge that is applicable to technology or health care, but also improved basic understanding of processes of aging even if that knowledge has no foreseeable application. Prospective analyses would involve judgments of the likelihood that research investments would yield knowledge, options, and realized benefits of the types desired by BSR and NIA. Foresight Techniques Foresight as a technique for aiding science policy decisions has been defined as “the process involved in systematically attempting to look into the longer-term future of science, technology, the economy, environment and society with the aim of identifying the areas of strategic research and the emerging generic technologies likely to yield the greatest economic and social benefits” (Martin, 1996, p. 158). The approach is predicated on the beliefs that there are many possible futures, and that “the choices made today can shape or even create the future” (p. 159). Foresight approaches emphasize consultative processes among relevant stakeholders, with extensive provision for feedback among participants. A variety of techniques are employed to elicit projections of future trends and opportunities. These include creation of scenarios, trend analysis, Delphi techniques, technology roadmapping, among others. Foresight differs from the use made of advisory groups by federal agencies in the United States to project or recommend

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging future trends and opportunities in science in at least the following ways: it systematically engages a more diverse set of stakeholders in a single exercise; it employs specific techniques to structure future possibilities; and it incorporates iterative processes in which participants may modify their projections in light of information garnered about projections of other participants. Foresight is a well-established approach for assessing prospective developments in science in several European countries, Canada, Australia, and Japan (Martin, 1996). For example, a recent review of the United Kingdom’s Foresight Programme, launched in 2002, concludes that “the Programme has achieved its objectives of identifying ways in which future science and technology could address future challenges for society and identifying potential opportunities. It has succeeded in being regarded as a neutral interdisciplinary space in which forward thinking on science-based issues can take place” (Policy Research in Engineering, Science and Technology, 2006:3). Selected consideration and use of the technique is evident among U.S. science agencies (National Academy of Public Administration, 1999). However, it has been used less frequently than standing or specially constituted advisory panels that do not employ structured Foresight techniques. Selected advisory committees and external study commissions across agencies may have considered or used variations of Foresight. Several reasons may be adduced for the comparatively limited formal use of Foresight techniques in assessing and projecting future scientific trends in the United States. One is its association with the political imbroglios that led to the demise of the Office of Technology Assessment. Although we have not attempted to evaluate past experiences or current usage of Foresight methods, we note their relevance for possible adaptation to the needs of BSR for improved methods for informing science policy decisions. NIH Consensus Development Conference Model The Consensus Development Conference is a familiar analytic-deliberative process in NIH. This model, which has been used more than 120 times since 1977, follows a carefully thought out rationale and set of procedures (for a detailed description, see http://consensus.nih.gov/ABOUTCDP.htm). It is used to produce State of the Science Statements, which summarize available knowledge on controversial issues in medicine of importance to health care providers, patients, and the public. It is also used to produce Consensus Statements, which address issues of medical safety and efficacy, may go into economic, social, legal, and ethical issues, and may include recommendations. Consensus development conferences are deliberative in that the appointed panels discuss the implications of available scientific information

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging for medical practice and related issues and seek a consensus that reflects a collective judgment. They are different from the usual scientific peer review panels in that the membership is not restricted to scientists. They are analytic-deliberative because they rely not solely on judgment, but on systematic efforts to review the scientific literature and gather information from experts on the medical technology or treatment in question (analyses), and because the experts respond to inquiries from the panel and engage in discussion with it, thus closing the circle between analysis and deliberation in ways that can potentially change both processes. The consensus development process includes various safeguards of the independence and credibility of the panels, whose members are screened for bias and conflict of interest and deliberate in executive session to protect their independence from outside influence. The consensus statements are widely disseminated by NIH, but they are not government documents. They are statements by the panel, and their credibility flows from the reputations of the panel members and the procedures for ensuring that the panel is balanced, well informed, and independent. Consensus panels are notable for their breadth of participation. They are chaired by “a knowledgeable and prestigious person in the field of medical science under consideration” who is not “identified with an advocacy position on the conference topic.” They include research investigators in the field, health professionals who use the technology in question, methodologists, and “public representatives, such as ethicists, lawyers, theologians, economists, public interest group or voluntary health association representatives, consumers, and patients.” Members are selected for their ability to weigh evidence and to do collaborative work, as well as for their absence of identification with advocacy positions or financial interest related to the conference topic. The consensus development model has been used in NIH for providing advice on a variety of policy-related topics, but not in the area of research policy. In principle, though, elements of this model could be included in an analytic-deliberative process for advising on research policy in BSR or more broadly in NIA. To do this, several issues would need to be confronted: Who would be represented? For example, how broad should the participation be beyond the research community? In particular, what roles should various beneficiaries of research, from health care professionals to patients, have in advising on NIA research priorities? How would analysis be organized to support deliberation? Given the limitations of all the analytical approaches available in research policy, attention would have to be given to ensuring that the results of bibliometric or other methods of analysis are presented as data to be interpreted judiciously,

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging much as data from medical research are. The process would have to take into account the fact that the evidence on science policy choices is usually of lower quality than the evidence on medical treatments. How would results from a research policy consensus conference feed into institute decisions? This raises the same issues of research managers’ levels of discretion and of the balance of influence and power between research managers and others that arises with ordinary scientific peer review. With nonresearchers at the table in a consensus conference setting, these issues take on a different tone. What are advantages and disadvantages of this model compared with current, more purely deliberative, peer review and advisory processes? As these examples and experience in other areas of public policy decision making suggest, processes that incorporate relevant analytic techniques and information into deliberations in groups that represent the range of scientific knowledge and policy perspectives needed for wise decisions can result in recommendations and decisions with several desirable properties. The recommendations and decisions can be well informed about the available evidence, systematic in consideration of the evidence from all relevant policy perspectives, accountable, and even consensual among groups representing diverse perspectives. Because well-organized analytic-deliberative processes can entrain the full range of knowledge sources and perspectives on its interpretation, they are well suited to producing these desirable results—but they do not always produce them. Although research in some fields of public policy is beginning to identify the conditions and practices that are conducive to achieving these results (e.g., National Research Council, 1996, 1999; Renn et al., 1996), similar bodies of research have not yet been developed for the use of analytic-deliberative processes in science assessment. At present, it is worthwhile to seek to adapt practices from other fields, such as those described above, while also working to improve systematic knowledge about which processes of science assessment best meet the needs of organizations like BSR. CONCLUSIONS AND RESEARCH NEEDS Conclusions Assessing the progress and potential of scientific fields is a complex problem of multiattribute decision making under uncertainty. Scientific research activities have multiple objectives, including those of advancing pure science, building scientific capacity, and providing various kinds of societal benefits. Every research policy choice and every research activity will have its own profile with regard to effects on different objectives, and there is no

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging agreed weighting among the objectives. Consequently, judgment is required to assess the evidence regarding how science is progressing toward each objective, as well as to consider the weight to be given to progress toward each objective. None of the available analytical methods of science assessment is sufficiently valid to justify its use for assessing scientific fields or setting priorities among them. Judgment must be applied to interpret the results from these methods and discern their implications for policy choices. This situation seems unlikely to change any time soon. Therefore, the most appropriate use of quantitative methods is as inputs to analytic-deliberative processes of decision making. Analytic methods have the advantage in principle of making it possible to account for the progress of different fields in the same units, thus supporting priority-setting decisions in an accountable way. Each of them, however, has significant practical limitations. For example, bibliometric studies provide measures of scientific activity and of the extent to which disciplines and fields influence one another. They also have well-known limitations: they emphasize publications in the periodical literature over other scientific activities, and information about publications and citations must be interpreted in terms of the importance, correctness, and mission relevance of those activities. In addition, citation measures have been criticized as being susceptible to gaming, and reputational studies have the same limitation. Surveys of scientists to elicit their judgments are unlikely to be useful for comparing different research fields because few scientists are knowledgeable across fields, and no method is available for ensuring the comparability of judgments across the potential respondents. Quantitative methods from decision analysis are not suitable for informing science policy decisions by BSR because there is insufficient basic understanding of the paths from research activities to scientific or technological progress. Choices within NIA that involve comparisons among fields of behavioral and social science research can be better informed, more systematic, more accountable, and more strongly defensible if they are informed by appropriate systematic analyses of what these fields have produced and are likely to produce. We consider it possible to constitute expert review panels that draw on their own experiences and insights, augmented by quantitative data on the outputs, outcomes, impacts, productivity, or quality of research, to arrive at better informed and more systematically considered expert judgments about the progress and prospects of scientific fields than they could reach without quantitative data. We think that processes that organize ongoing exchanges of judgments between bodies of scientists and science managers can produce wiser decisions than processes based on either-or thinking. Although analytic techniques should not substitute for careful deliberation, deliberation informed by analysis can produce better results

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging than deliberation not so informed. In Chapter 6, we offer recommendations for structuring decision processes toward this end. Research Needs Several lines of research can contribute to the knowledge base needed for a social science of science policy (Marburger, 2005) that would improve science policy decision making. This research would aim to fill the above gaps in knowledge. The research effort should be broadly based to provide broader benefits and clearer knowledge about which aspects of scientific progress are general and which are domain- or discipline-specific. In addition, a broad effort may provide general lessons about advancing interdisciplinary and mission-relevant science that can flow from research sponsored by any of a number of agencies. Research is needed to achieve the following three objectives. Improving basic understanding of scientific progress and the roles of research funding agencies in promoting it. Research is needed to examine the nature and paths of progress in science, including the roles of decisions by science agencies. To support BSR, research is needed on progress in fields of behavioral and social science related to aging. Scientific progress is usefully understood in terms of a causal stream that roughly moves from (a) processes that structure research to (b) inputs to research to (c) scientific outputs to (d) scientific outcomes to (e) impacts on society, as these terms are defined in National Research Council (2005c) and elaborated in Chapter 4. Society closes the circle by providing inputs and structure for research, generating research questions, and in other ways. But for the purpose of evaluating the programs of science agencies, it is useful to focus on how variables earlier in the stream affect variables later in the stream. Thus, scientific progress is usually evaluated on its outcomes and impacts. Assessments of research programs must consider these consequences in light of the level of effort (e.g., processes and inputs) that went into trying to achieve them.

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging Research on the nature and paths of scientific progress can build basic understanding of conditions that facilitate and impede such progress. The research might include: Historical analyses of the evolution of scientific fields—their rise, continued fecundity, or decline—performed by or vetted by professional historians to ensure adherence to professional standards, especially in attributing causation. “Stories of discovery” or progress, as supported by BSR and other federal science agencies, while useful in putting a face to agency claims of contributing to scientific advance, are limited as tools of analysis. They are subject to selection bias that arises from examining only the successes from among the investments made by an agency or program. They also tend to highlight agency-specific contributions, without considering the importance of other sources of contributions to progress. What are needed are studies of fields that are generally considered to have been productive and of fields that are not so considered, conducted in a manner that meets professional historical standards (e.g., Nye, 1993; Kohler, 2002). These studies could usefully focus on how the processes that organize research programs and the inputs to those programs have affected scientific outputs, outcomes, and impacts. Advanced bibliometric analyses of the development of research fields, to provide a window into the development of research fields over time and the flows of influence among them. These studies should look at outputs in relation to measures of inputs and processes and also in relation to indicators of scientific outcomes and impacts. Particular emphasis should be placed on the cross-fertilization of research findings from one disciplinary domain to another and the emergence of new fields of knowledge. Some of the studies should consider the usefulness of bibliometric indicators of research outputs as measures of research progress, defined in terms of each of the sponsoring agency’s program goals. Such potential indicators will require careful methodological analysis to assess their validity and potential biases before they are ready for practical use, even as inputs to decisions. Studies of scientific progress using emerging databases of conference proceedings or other prepublication scientific outputs. In many fields, new research results are first presented in technical reports or at conferences. Data on such kinds of activity may provide earlier indicators of scientific progress than bibliometric measures. Analyses of research vitality or interest shown by active scientists in lines of research, focusing on research directions that are widely considered in hindsight to have been successful or unsuccessful in terms of yielding major scientific advances or societal impacts. The studies should examine the ways that the vitality of scientific fields may relate to subsequent scientific outcomes and impacts. For example, studies should be made of the

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging career paths of productive scientists (“stars”) in terms of their choice of research topics, the journals in which they publish, and the career paths of the graduate students they train. Such studies could test the hypothesis that progress in a field can be predicted from the quality of the researchers who are willing to allocate their time to a specific line of inquiry. Studies of the effects of the structure of research fields on their progress. These studies might compare the consequences for the development of scientific fields, particularly new fields, of research portfolios that emphasize large centers, database development efforts, or interactive workshops, with more traditional research portfolios emphasizing funding to individual investigators and small research groups. Research on the roles of science agency decisions in scientific progress can help the offices and agencies that sponsor it to make decisions about how to select and train research managers and organize advisory groups so as to better promote program goals for advancing science. This research might include: Studies of the role of officials in science agencies in promoting scientific progress. Some of these studies might follow the example of past research done for U.S. foundations (e.g., Kohler, 1991; Rooks, 2006) that has investigated how program managers have acted as entrepreneurs who help build new research fields and as stewards of vital fields. The research might also include studies of the characteristics of effective research entrepreneurs and stewards and studies of the effects of science agencies’ practices of hiring, training, and evaluating program managers on their scientific entrepreneurship and stewardship. Studies of how expert advisory groups, including study sections and advisory councils, make decisions affecting scientific progress (e.g., comparing decision making in disciplinary versus interdisciplinary advisory groups; examining the effects of emphasizing explicit review criteria, such as innovativeness, on group decisions; examining how review groups consider multiple decision criteria; investigating hypotheses, such as that peer review groups generally select in favor of methodological rigor at the expense of innovation and that different advisory groups have distinct cultural differences that affect their ability to nurture scientific innovation). Studies of the effects of the organization of advisory groups on their success at promoting interdisciplinary and problem-focused scientific activity and ultimately at improving scientific outcomes and societal impacts. These studies might examine the roles of advisory group chairs in shaping group decision rules; the effects of the characteristics of group members individually and collectively; and the processes of training, mentoring, and

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging socializing advisory group members and of oversight of advisory group processes. Improving understanding of the uses of analytic techniques in making research policy decisions. This research would support the development, trial use, and empirical investigation of a variety of quantitative measures and decision-analytic techniques for assessing the results of past research investments and setting research priorities. The studies would seek to validate analytical techniques and to determine their best uses, which may be different for different analytic techniques. The research might include: Studies comparing multiple indicators of research vitality, outputs, outcomes, or impacts of lines of research with each other and with the unaided judgment of experts in these areas to see whether it is possible to develop reliable and valid quantitative measures of scientific progress through a convergence of indicators and to determine whether any such measures might be useful as leading indicators that predict critical scientific outcomes or impacts. Comparative studies of fields that are widely judged to differ in rates of progress toward positive outcomes and impacts to see whether particular quantitative indicators or a convergence of indicators yield results consistent with expert judgment. Studies to assess the value of providing information developed through specific analytic techniques, such as bibliometric studies, for research priority setting. Studies using cross-citation patterns or analyses of academic and professional career trajectories of researchers and students can show whether such analyses add significantly to the decision-relevant knowledge of expert review groups and whether and how this information alters their recommendations. Studies of scientific impact using databases that cover citations in policy documents and the popular press, with the results examined from the perspectives of research scientists and policy makers. Tests of ways to employ a convergence of information from different analytic methods to inform priority setting. This research might identify whether certain ways of combining information from multiple sources can contribute to more robust and reliable decision making than reliance on any single method. Improving the incorporation of techniques for analysis and systematic deliberation into advisory and decision-making procedures. This research should explore and assess techniques for structured deliberation, some of

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging them incorporating information from potential quantitative indicators of scientific progress and potential, for retrospective assessment and priority setting. The research would be used to elaborate and refine deliberative methods for organizing peer review and expert advice. The research should include the following: Studies to develop techniques for structuring decision analysis for use in the research priority-setting tasks facing BSR. Some studies might develop influence diagrams modeling the relationships of scientific activities (processes, inputs, outputs) to BSR goals (especially outcomes and impacts) and explore the feasibility of using these to structure deliberation. The influence diagrams might be developed by outside researchers or BSR staff, in consultation with the program’s advisory council. Some studies might explore ways to structure discussions within deliberative groups around the multiple goals in the NIA program plan or around lists of types of scientific outputs, outcomes (e.g., dimensions of scientific progress), and impacts. These studies might involve the use of simulated advisory groups. Trials of analytic techniques for informing and structuring decisions in the deliberations of actual review and advisory panels or shadow panels created for experimental purposes. Some studies might provide panels with the most relevant available quantitative indicators for their tasks and leave them a period of time during their deliberations to discuss the meaning of the indicators for the decision at hand. Resources permitting, parallel panels could serve as comparison groups. In some studies, panels would be asked to apply structured methods for considering quantitative and qualitative information about the activities in the fields to be compared in relation to explicit criteria, such as lists of BSR strategic goals or dimensions of scientific progress, or to use influence diagrams showing plausible paths from research to the achievement of desired program goals. The studies would examine the effect of the interventions on (a) panel members’ reports of whether and how their thinking or their recommendations were affected; (b) indicators of decision quality, such as the number of relevant decision objectives and pathways from decisions to the achievement of objectives that are considered in the deliberations; and (c) the creation of a sufficiently explicit record of the rationale for the advisory panel’s recommendations to improve accountability and allow for a better informed exchange of judgments between researchers and research managers. Studies to adapt existing analytic-deliberative assessment approaches, such as the NIH Consensus Development Conference model to the purposes of assessment of research areas and research priority setting in BSR. Some of these studies might incorporate the above techniques for informing and structuring decisions. Some of the studies might include nonscientists, selected to represent the perspectives of the potential users or beneficiaries

OCR for page 95
A Strategy for Assessing Science: Behavioral and Social Research on Aging of the research, in the analytic-deliberative process. These studies could explore how adding these perspectives may affect the ways in which the advisory groups assess the benefits of research for basic understanding and for society. Comparative studies of advisory panels of different composition, particularly for recommending research priorities. For example, BSR, NIA, and the NIH Center for Scientific Review might vary the breadth of expertise of experts or the balance between senior and junior researchers. Such research would provide an empirical base for assessing the reliability of deliberative advice and the sensitivity of the advice to the intellectual backgrounds and practical orientations of panel members. Such experiments would also offer evidence to evaluate such claims as that panels of researchers are too conservative to support promising high-risk research or too uncritical in areas of expertise of only one or two panel members. Studies involving the instruction and training of advisory panel members to consider specific BSR and NIA objectives, including mission relevance, that go beyond generic considerations of the quality of proposed research. NOTES    1. Treated as subsets of the broader methodologies covered in this report and thus omitted from specific discussion are various Foresight techniques (Irvine and Martin, 1984) and mechanisms for scoring R&D priorities.    2. Analysis of other prominent performance measures contained within the larger scope of scientometric inquiry, such as patent statistics and publication-patent relationships, are not relevant to much of BSR’s research portfolio, which produces different kinds of impacts.    3. Debates about the relative contributions of theoretical and empirical approaches to scientific advance and about leader-follower relationships between them are staples in the history of science and entail issues that extend well beyond the scope of this report (see, e.g., Galison, 1999).    4. Bibliometric evidence on the social sciences, for example, consistently shows that sociologists and political scientists cite articles from economics journals more frequently than economists cite sociological or political science journals. These findings have been alternatively interpreted as indicating the greater generalizability and precision of economic modes of analysis, and thus its greater intellectual vitality, and as documenting the intellectually closed-loop, solipsistic nature of economic thinking (Laband and Pietter, 1994; MacRae and Feller, 1998; Reuter and Smith-Ready, 2002).