Click for next page ( 12


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 11
3 The Role of Metrics Metrics are necessary for evaluation, rational decision-making, and appropriate allocation of resources. It is useful to distinguish three classes of metrics: for inputs, outputs, and outcomes. Inputs are often measured in dollars spent, in part because such figures are relatively easily determined. Outputs are activity and productivity, whereas outcomes are effects and progress toward overall goals. Outcomes depend heavily on program objectives. Often, inputs are used as a proxy for outputs, but they are generally a poor substitute in that they do not account for the effectiveness or efficiency of a funded activity. A good metric for output should be an accurate measure of whether the desired outcomes of an activity have been achieved—outcomes that represent the value that the activity was intended to generate. In fact, however, many accepted quantitative metrics are used to measure what can be easily measured rather than the value created in the course of the activity. The relationship between metrics for output and for outcomes of the National Nanotechnology Initiative (NNI) can be illustrated by analogy with manufacturing. In manufacturing, a material or product is measured for three reasons: for quality control, for quality improvement, and to establish that a legal requirement specified in a contract between a supplier and a customer has been met. In the first case, all that is needed is a simple, reliable measure to identify when an acceptable outcome is no longer being produced; measurement yields a result as simple as “acceptable/unacceptable,” and the information that it provides stays local to provide quality control. In the second case, measurement is more quantitative and guides changes to produce better outcomes than previously obtained. In the third case, a supplier agrees to provide to the customer a material that has specific properties as measured by specific agreed-on, standardized techniques. In each of those cases, there is a well-established model that relates the measurement to the desired outcome, and the measures may be different for the three different functions of metrics. Applying that to the NNI, many NNI metrics are designed primarily for quality control within the individual agencies on the basis of their individual missions, and many of the possible metrics listed in Chapter 4 of this interim report are in that category. The issue, however, is how to assess the success of the NNI as a whole, as opposed to the success of the individual agencies in fulfilling their missions. Output data gathered by different NNI participating agencies cannot now be usefully compared. The measurement systems are not the same, and the metrics and processes used for quality control are peculiar to each agency, its mission, and its historical way of doing things. Furthermore, researchers and organizations know that they have been funded by a particular agency and are familiar with the agency’s metrics and desired outcomes. The committee learned, in contrast, that many programs and associated researchers do not know that their federally funded research and development (R&D) projects have been included in their funding agencies’ reported NNI program dollars. METRICS FOR ASSESSING THE NATIONAL NANOTECHNOLOGY INITIATIVE—SOME CONSIDERATIONS The NNI is being asked to establish metrics for quality improvement, that is, improvement of the NNI and its R&D system for addressing the four NNI goals, and contractual metrics, that is, regarding 11

OCR for page 11
the effective customer-supplier contracts between, such as taxpayers and the government, Congress and the NNI, principal investigators or companies and the agencies, workers and those who regulate nanotechnology in the workplace, and consumers and agencies that are responsible for food and product safety. For such sets of “customers-suppliers,” there must be a model that relates what is measured— outputs—to the short-term, intermediate-term, and long-term outcomes that the customer is paying for, and there must be an accurate system of quantitative and qualitative metrics that support the model. Without the model, metrics for output probably will lead to an incomplete and inaccurate assessment of whether the outcomes are being met—that is, whether the quality of the NNI program is high, the NNI is increasing its impact, and the NNI is meeting its “contract” with all its “customers.” Additional characteristics of a good metric are that the information supporting it are reliably and relatively easily obtainable and that, at the very least, the benefits contributed by the metric to evaluation, strategy, and priority-setting justify the cost of obtaining the information. The information generated by the metric also should be able to provide the basis of program decision-making; in other words, it should be actionable. Many metrics are too general to contribute to the discussion of any specific, important issue. The quest for good metrics is often framed as a quest for quantitative metrics, which can be measured in an objective way and for which the result is a number or a collection of numbers. However, the emphasis on having objective, numerical metrics often leads to collecting output data that are peripheral to the goals and outcomes of an activity. For example, the number of papers published per year by a researcher is not the only metric of scholarly achievement. Clearly, some consideration of quality and impact of output is also required. Various metrics related to citation may be of partial use in evaluating the quality of a body of publications, but if, for example, the utility of the results presented in publications is the quality or value being sought, citation-count metrics are poor indicators. Furthermore, there is general awareness that the choice of metrics may change the behavior of participants in ways not necessarily conducive to successful outcomes. That is a known and difficult problem that has received considerable attention. Academe’s answer to such problems is to evaluate a person on the basis of a model of academic success that uses a set of subjective, qualitative metrics supported by quantitative data on output and subjective evaluation of the data. That combination of subjective evaluations and quantitative output metrics has evolved to support a model of academic success for faculty at different career stages and performance levels, from assistant to full professor. Dependence on the subjective evaluation of a group of experts chosen for some mix of technical expertise, judgment, and breadth of knowledge of a field is key to this approach. Although the results of the application of qualitative metrics are subjective, such metrics have been demonstrated both to be reasonably reproducible and to encourage desired outcomes successfully; this suggests that the model on which they are based and the methods are reliable. The process has also been developed to ensure that the experts who provide the assessments are in positions of sufficient personal independence from the people being evaluated that they can render objective evaluations. Notwithstanding those issues, given the investment in and the scope of the NNI, quantitative and qualitative metrics can be applied to assess the impacts of NNI-related activity. Many major federal funders of nanotechnology research are working on the problem of defining a set of quantitative metrics that relate program outputs to desired outcomes in arenas that overlap with the NNI. A prime example is National Institute of Standards and Technology’s (NIST’s) leadership in developing metrics for technology transfer from federal agencies that have research facilities to the commercial marketplace. 1 The resulting metrics should be taken into account in the review of NNI activities with qualitative and semiquantitative assessments by experts. Ideally, such assessments would improve the efficiency, quality, and completeness of the review process. Such a collection of metrics, taken as a whole, may be viewed as an indicator of impact or success and provide guidance for decision-making and for allocation of resources. 1 See http://www.nist.gov/tpo/publications/upload/DOC-FY2011-Annual-Tech-Transfer-DOC.pdf. Accessed August 27, 2012. 12

OCR for page 11
Quantitative metrics require various kinds of output and outcome data—such as people trained, jobs created, papers published, awards earned, patents filed, companies started, and products created— measured over time for the agencies or organizations, researchers, and so on. To provide sound input for assessments, those data must be melded in weighted fashion in a manner that respects the missions, nature, and objectives of the responsible agencies or programs. Clearly, uniform models and metrics for all 26 NNI participating agencies are neither practical nor appropriate. Five agencies (the National Science Foundation [NSF], the Department of Defense, the Department of Energy, the National Institutes of Health, and NIST) account for well over 90 percent of the funds and effort expended. The other agencies play different, although still critical, roles in the development of nanotechnology and the NNI. The committee believes that it is important to select output and outcome metrics to minimize the burden on each agency of gathering and reporting data that are not central to its mission or that would require substantial added effort without substantial benefit to the NNI. The committee recognizes the great difficulty in defining robust models and metrics for a field as diffuse as nanotechnology and for agencies as diverse as the 26 NNI participating agencies. However, it urges that, as difficult as this task may be, whatever models and metrics are applied should be rigorous, that is, should have clearly and publicly defined assumptions, sources, methods, and means to test whether the models and data are accurate. Despite the recognizable value of many of the data provided to and by the NNI agencies and the National Nanotechnology Coordination Office, the origins of the data or assumptions used in collecting or collating the data were not always clear. Furthermore, the committee believes that data arising from “self-identification” or “self-reporting” do not always give an accurate and complete picture of the status of a field. If the data used are inaccurate or if the models or understanding that link even accurate data to desired outcomes have not been well established, evaluation, rational decision-making, and allocation of resources become compromised. The provenance of data, including the original assumptions and calculations used to develop them, must be clearly established, documented, and maintained. Although source data are not likely to be perfect, the intent should be to make the process of data selection and the results as transparent as possible. The committee sees promise in many of the aspects of the NSF Star Metrics project 2 but also grounds for concern. Directly accessing institutional human-resources databases to automate data collection on personnel, for example, seems excellent. However, the software algorithms used to parse project summaries to identify emerging fields of research may not be ready for application, given the sample outputs shown to the committee, so implementation of the Star Metrics approach to define fields and current funding levels without independent validation could lead to erroneous conclusions. That observation reflects the state of research that applies machine learning to social-science problems; advances in machine learning and automated inference from large datasets have proceeded rapidly, but validation of the calculated measures has lagged far behind. The lag results from the difficulty of validation, which requires careful sampling of adequate observations for field-work validation, such as interviews, surveys, and historical case studies; lack of collaboration between experts in quantitative data analysis and social-science field research methods; and lack of validated models that relate the output data to the desired outcomes. 3 Although software algorithms and data-mining offer promising approaches to data collection, the committee believes that use of a specific set of keywords or field categories, identified by research investigators or program managers, could be improved sufficiently with relatively little effort to be useful for future data collection. However, the committee was surprised to learn that the current software system for project monitoring in NSF, called FastLane—whereby investigators enter data into multiple fields to describe project participants, results, and outcomes, including papers published—apparently could not be used to mine the data supplied by NNI-supported projects. 2 See www.nsf.gov/sbe/sosp/workforce/lane.pdf. Accessed September 27, 2012. 3 G. King, Ensuring the Data-rich Future of the Social Sciences, Science 331:719-721, 2011. 13

OCR for page 11
In general, metrics will be poor if they present misleading information about actual or probable success in accomplishing desired goals, that is, the desired outcomes. There are several characteristics to avoid or minimize in developing metrics. For example, ambiguity in the definition of a metric can lead to combining incoherent data and to analyses of questionable value. Such ambiguity can result from metrics that are too complex. It is better to have simple metrics without too many qualifiers. Another type of problematic metric is one for which optimization of an individual result is easily accomplished at the expense of another important goal, especially if the latter is not captured by a corresponding metric. A great deal of care must be taken to understand the use of specific metrics in different NNI communities and agencies. For example, some communities write more and shorter papers and cite sparsely, whereas others write fewer and longer papers and cite generously. The different practices can produce different distributions of measures of output and impact, and comparisons among fields can become problematic. The effectiveness of a metric may also be compromised by lack of availability or accuracy of the corresponding data, owing, for example, to small samples, a dearth of accurate sources, estimation errors, and the burden of responding to numerous requests for data. For all those reasons, a model that has a balanced set of metrics should be established. In summary, the committee finds that strictly quantitative metrics of output are not by themselves dispositive in evaluating the success of the NNI in achieving its goals. Well-crafted qualitative and semiquantitative metrics and their review, supported by quantitative metrics, are more likely to be useful in producing evaluations that measure success and can be applied in setting NNI goals and policy. A POSSIBLE FRAMEWORK FOR ASSESSING SUCCESS The goal of this interim report is to consider definitions of success for the NNI (the desired short- term, intermediate-term, and long-term outcomes), metrics, and methods for assessing the NNI’s progress toward its goals. Establishing the connections between inputs, outputs, and short-term to long-term outcomes is difficult and requires articulation and validation of a model. A possible open framework of a model and system for assessing success in achieving desired outcomes for the individual funding agencies and the NNI as a whole is shown in Figure 3.1. Application programming interfaces and linked databases provide access to input and output data that may be used to trace the connections between inputs, outputs, and some short-term outcomes. Inputs may originate with persons or grants, whereas outputs can include publications and patents or organizations; arrows show explicit connections. The arrows suggest the direction of collaborations or connections between people and organizations, number of times that papers are cited in other publications, and outputs. Essentially, the framework links NNI research products, including grants, papers, and patents; NNI people; NNI agencies and other corporate, government, and academic institutions; and short-term, intermediate-term, and long-term NNI outcomes. Many of the proposed metrics for assessing output are available to or are under development by various agencies and firms. Google Scholar, for example, has disambiguated and linked the publication and patenting careers of many scientists and inventors (although that effort remains proprietary) and highlights the importance of an open framework. Once in place, such a framework could be used to generate metrics of output at various levels of analysis, including specific awards, principal investigators, institutions, or entire nanotechnology subfields. The resulting metrics for output will require careful validation, as discussed above. Although the framework would require substantial investment in record linkage and disambiguation, it would provide flexibility and allow reuse of investment in different scientific fields and bibliometric databases. 14

OCR for page 11
FIGURE 3.1 How inputs lead to outputs and, eventually, benefits: National Nanotechnology Initiative- related research funded through federal agencies leads, in one mode of translation, to publications and patents, which in turn lead to societal benefits realized in the creation of new knowledge, products, companies, and jobs. 15