On February 25-26, 2010, a group of behavioral and social scientists met to explore the feasibility of developing well-grounded common metrics to advance behavioral and social science research. With support from the William and Flora Hewlett Foundation, the Committee on Advancing Social Science Theory: The Importance of Common Metrics planned and organized the event to gather information and examine the issues involved. The idea for the resulting workshop was suggested by Marshall S. Smith when he was at the Hewlett Foundation. He posed the thesis that one reason the social sciences have greater difficulty, relative to other sciences, in advancing theory is because they have less commonality among their metrics.
WORKSHOP GOALS AND ISSUES
The Workshop on Advancing Social Science Theory: The Importance of Common Metrics had three goals:
To examine the benefits and costs involved in moving from metric diversity to greater standardization, both in terms of advancing the development of theory and increasing the utility of research for policy and practice.
To consider whether a set of criteria can be developed for understanding when the measurement of a particular construct is ready to be standardized.
To explore how the research community can foster a move toward standardization when it appears warranted.
The planning committee considered a large range of issues in designing the workshop and selecting the presentations and participants. For example, it might seem as if the benefits of common metrics are obvious. Just as a common language facilitates learning and communication of knowledge for many purposes, so do common metrics facilitate cumulative and comparative research and its dissemination for policy, practice, and common understanding. However, the importance attached to common metrics varies tremendously across the behavioral and social sciences. In economics, there is a history of reliance on theory to define measures, although that is not always the case, and the development of standardized economic measures has accompanied the development of the idea of data in the public service. In the health field, a diverse set of morbidity-based indicators suggests that less arbitrary ways of summarization are needed, with the Patient-Reported Outcomes Measurement Information System offering a roadmap for the future. And in psychology, in the cases where psychological processes lack an overall theory, a reward structure has developed that tends to place a premium on inventing new measures for the same construct rather than encouraging the use of common metrics.
The benefits of standardized measures depend ultimately on their acceptance by the research and policy communities. Use drives measures in the first place and therefore whether they are standardized. Measurement must begin with the end in mind, and, if common metrics are the goal, then their purposes must be considered. That said, one size does not fit all. In this regard, it may be that a common metric per se, is not the ideal, but rather a few metrics widely used.
Another issue considered by the planning committee is that different metrics serve different purposes. When no measure is a candidate for widespread application, the use of multiple measures can help to triangulate a construct and to test the robustness of effects across different operational definitions. Thus, harmonization of measures might be possible when standardization is not. Scientists tend to favor harmonization because it reflects the competition of ideas, and persistent use is evidence of a measure’s utility. Harmonization is seen as a form of standardization established among scientists, not imposed on them.
Although the original intent of the workshop was to focus on the importance of common metrics for advancing social science theory, in fact the discussion centered predominantly on how theory can inform measurement and on how common metrics can inform policy. Because common metrics require common concepts and construct validity, agreeing on an underlying theory is important. Sometimes theory is necessary but not sufficient for metric development in the social sciences. Often the lack of strong theories is reflected in the dearth of well-accepted common metrics. At other times,
it is impossible to measure the variables demanded by the theory. A consistent theme at the workshop was the paramount need for theory as well as for a public policy purpose in motivating standardization of measurement for a particular construct.
Good theory and good measurement are often prerequisites for a standardized measure. Sometimes a measure is introduced and becomes popular and thus is accepted as the standard. Sometimes the need for or utility of a measure drives the momentum toward a standardized measure. Sometimes a concept is based on a theory that is widely accepted in the scientific community and that prescribes how the concept is to be measured. The ability to develop a standardized measure thus depends in part on the state of theory in different fields.
Although theory guides measurement for scientific purposes, political judgments often influence the development of standardized measures. The more consequential a measure is for policy, the more likely that politics will override science in establishing a standardized measure. And of course how a social concept, such as poverty or disability, is measured has serious policy implications. The standardization of measures is a social and political process involving negotiation. In some situations, what is measured may be less important than how it is perceived and classified. An example is the challenge of assessing change that involves not only aging but also the perception of the change with age. Skepticism often accompanies metrics that are generated from a process that is too obviously political. The integrity of statistical agencies is more easily maintained if the construction of measures is guided by accepted theories and is as resistant as possible to political and other pressures.
The social and political context of the academic community is another consideration. Even when there is benefit to standardization, the incentives to develop common metrics may be inadequate, especially in fields that tend to reward the development of novel methods, concepts, and constructs or new measures for the same construct.
Workshop participants had diverse ideas responding to the question of what the research community can do to foster common metrics when they are warranted. If the process of adopting an official standardized measure for policy purposes is transparent, that may create an opportunity for the scientific community to weigh in on its scientific suitability. Because common concepts and constructs are measured differently by different disciplines, it is important to learn how each one uses terms and interprets their connotations and denotations. Improvements in theory may come from greater interactions among the social sciences, as well as between these disciplines and others, with a movement toward greater interdisciplinary research. Agreeing on the type of data to collect could be another way of promoting common metrics. The use of common metrics also can be ef-
fectively encouraged by grant-making institutions as part of the peer review process and by journal editors.
Despite the interest in common metrics, some measures appear to defy standardization. As we have come to understand race in social and cultural terms, for example, the concept of race has become inherently difficult to measure, let alone in a standardized way. Measures may obscure important information in the underlying data or may fail to recognize the complexity of the dimension of interest. In such cases, data in their raw, disaggregated form are often more useful than when clothed in a composite measure or in meta-analysis. Both location and metric can affect comparisons; calibrating individual scales, such as with the use of anchoring vignettes, can help circumvent some of these problems rather than assuming a common scale.
Measures may also need to change over time, because concepts and what society considers important change. For example, the concept of poverty has changed over time, along with prices, products, and social norms—and a useful measure will reflect these changes. In health care, ignoring improvements in treatment would underestimate growth in medical output. And in recent years, there has been greater interest worldwide in measuring less tangible concepts, such as subjective well-being, satisfaction, and social connectedness, as well as a movement from single measures to indices and from activities to outputs and outcomes. And even if change is warranted, changing a well-established measure may be difficult, if not impossible.
Although the exploration of common metrics is to be encouraged, the meeting did sound a cautionary note on the prospects for useful and valid common metrics in the social sciences and the dangers of using imperfect or incomplete standardized measures to guide policy. Yet under certain situations, even an imperfect indicator can be good enough for promoting competent discussion about actions to take. However, concerns were expressed about the premature application of standards and the lack of appreciation for the role of successful science in generating standardization. Participants also noted that there is a risk that unnecessary standardization can mean that weaknesses get codified and reinforced over time and that distortions will occur from linking indicators too closely to policy decisions, particularly if indicators are meant to promote accountability. Common measures may also be lacking if there is no common understanding as to what the measures represent.
Although theory is useful in the development of metrics, some common metrics are not based on theory. An example is the unemployment rate, for which no economic theory appears to apply.
Finally, measurement breakthroughs can take a long time and require persistence, but the effort is well worth the investment. The development of standard metrics that are useful in theory and in practice is important and scientifically rewarding.
ABOUT THIS REPORT
This report is a summary of the 2 days of presentations and discussions that took place during the workshop. The workshop participants included the members of the committee that planned the workshop, along with invited speakers and a number of other participants. A complete list of participants is in Appendix A.
It is important to be specific about the nature of this report, which documents the information presented in the workshop presentations and discussions. Its purpose is to lay out the key ideas that emerged from the workshop and should be viewed as an initial step in examining the research and applying it in specific policy circumstances. The report is confined to the material presented by the workshop speakers and participants.
A separate volume is planned of the papers presented at the workshop. Readers are directed to that compilation for a more nearly complete list of references than is included in this report. The papers in the form they were submitted for the workshop are available online at http://www7.nationalacademies.org/dbasse/Workshop_on_Common_Metrics_Agenda.html. Authors may have later versions.
Neither the workshop nor this summary is intended as a comprehensive review of what is known about the topic, although it is a general reflection of the field. The presentations and discussions were limited by the time available for the workshop. A more comprehensive review and synthesis of relevant research knowledge will have to await further development.
This report was prepared by a rapporteur and does not represent findings or recommendations that can be attributed to the planning committee. Also, the workshop was not designed to generate consensus conclusions or recommendations but focused instead on the identification of ideas, themes, and considerations that contribute to understanding the topic.
Structure and Organization
The organization of the report closely follows that of the 2-day workshop. Chapter 2 begins with an overview of measurement in the social sciences, followed by presentations on the challenges involved in developing common metrics and lessons from the economic sciences and the health sciences. These presentations provided a sampling of past experience with common measurements in both the policy domains and in terms of research on some of the core concepts in a diversity of social science fields.
Chapter 3 takes up the issues involved in indicators used for policy
making and decision making, with examples drawn from the context of disability, high school completion and dropout rates, and race and ethnicity.
Chapter 4 focuses on social science constructs in the more basic social and psychological sciences. Social scientific examples of standardization range from qualitative classifications, like race/ethnicity and social class; to numerical scales describing psychological traits, social standing, or economic amounts; to normalized measures of the fit of statistical models and the effects of variables in such models. Three important aspects of standardization are identified: ontology, representation, and procedures. Examples are drawn from a number of constructs—including poverty, intergenerational mobility, and self-regulation—that highlight the obstacles to development of common metrics in the social sciences.
Chapter 5 summarizes the final discussion session of the 2-day event. The report includes two appendixes: Appendix A presents the workshop agenda and a list of participants, and Appendix B presents biographical sketches of the workshop speakers.