In the final session, Miron Straf (National Research Council) made a list of some of the themes articulated by participants during the workshop:
Good measurement begins with the end in mind. If common metrics are the goal, it is important to consider both their purpose and criteria.
One size does not fit all. In this regard, the goal may not be common metrics per se, but rather a few metrics widely used.
Common metrics require common concepts—which are facilitated by agreement on theory.
The issue may not be so much what is measured as how it is perceived and classified. Ontology is very important.
Useful standardization is balanced with construct validity.
Just as perception can trump reality, politics trumps science. And public and political demands can trump scientific review.
Some measures defy standardization—such as self-regulation and social class.
Measures will need to change over time because concepts do, and in particular what is considered important changes over time.
Raw data—whether collected, compiled, or pooled—may be grist for the measurement mill, but they do not become refined in that mill. Data in their disaggregated form are often more useful than a metric.
Meta-analysis is no substitute for primary analysis.
Useful social science needs measures that are widely accepted.
George Bohrnstedt repeated some of his challenges to the group to consider when standardization makes sense. Is there a set of criteria? When does it not make sense to standardize? What are the costs from not standardizing? Even when there is benefit to standardization, the incentives to develop common metrics may be inadequate, especially in some fields in which academic reputations are built on development of a new method, concept, or construct.
Norman Bradburn observed that the question of the importance of standardization has two parts: (1) When does it make a difference and when is it useful for science? (2) When is it useful for policy issues? On the science side, when concepts are sufficiently well defined and theory is sufficiently well formulated, then standardization is important. In terms of metric or procedures, confidence that the same construct is being measured is important for advancing theory. He further observed that the lack of overall theory about psychological processes has led to a reward structure that places a premium on inventing new measures.
On the policy side, Bradburn elaborated on the use of measures of the effectiveness of social, economic, or educational policies and the push in the last decades toward accountability. He commented that any measure (like the current poverty measure) that is insensitive to the policy lever used to change it seems to be a bad measure. It would seem that any politician should want to effectively measure improvements to demonstrate program success.
Bohrnstedt agreed that science and politics have roles to play. Politics trumps science. What can the academic community do to mobilize action? In response to this question, David Grusky commented that society must choose where it wants politics to intrude in policy decisions. There can be a cacophony of measures, and politics will intrude in deciding which measure to feature. Or alternatively, science could advocate for some official standard measure, and then politics will intrude on the selection of that measure. At least the latter is a more transparent process, which gives scientists an opportunity to provide input.
Bohrnstedt revisited the two measures of intergenerational mobility—one social, one economic—that society cares immensely about in its efforts to reduce inequality. He believes that having good measures of social mobility and economic mobility that draw on administrative records is a good idea. Education is ultimately about a way to reduce inequality and to facilitate intergenerational mobility. Grusky believes that when there is more transparency, there is more opportunity for the scientific community to weigh in at the point of adoption of some sort of official standardized measurement.
Dennis Fryback questioned what is meant by standardization, specifically in the health care context, and focused on the difference between
classical and modern test theory. The notion that standardization means adoption of the same questions is passé. It may be that the latent constructs that hearken back to theory are what need to be standardized. He is most familiar with more parochial politics about the appropriate survey questions. In contrast, the Patient-Reported Outcomes Measurement Information System lets everyone see their questions in the item bank, and it transcends the single questionnaire. He questioned whether there could be a common construct underlying different definitions of poverty if different measures of poverty could be subjected to item response theory–type analysis.
Bohrnstedt reaffirmed the idea that having common concepts does not mean that the indicators will not change over time. He cited the view, expressed by Geoff Mulgan, that indicators should change over time because they are culture- and history-bound, although the concepts should remain the same.
David Johnson suggested that researchers align themselves with policy makers and statistical agencies to develop standardized measures, accepting that a perfect measure (e.g., for poverty) is sometimes not possible. He pointed to work currently being undertaken by the Census Bureau to measure same-sex marriage (in which decisions are being made today for implementation in eight years) and vocational education. The Census Bureau has solicited advice about developing a measure that may not be perfect. As an indication of progress, he reported that there is a provision of $7.5 million in the president’s budget that directs the Census Bureau to develop a new supplemental poverty measure. Robert Pollak was not nearly so optimistic about the adoption of new poverty measures; changing the definition will change eligibility for benefits, he said. There are strong constituencies that will resist this type of change.
Bradburn mentioned three ways in which major indicators become accepted. First, he noted that the poverty measure is implemented by the Office of Management and Budget (OMB), whereas the employment rate is implemented by the Bureau of Labor Statistics.1 If responsibility for a measure is lodged in the domain of the president’s office (e.g., OMB), it is likely to be politicized. If responsibility is lodged in one of the statistical agencies, where the decision makers are generally science professionals, it will be easier to change the measure (if it is done by the government). Second, some very farsighted scientists can set about constructing a measure before it is needed—an example is the National Assessment of Educational Progress (NAEP)—and the measure can become adopted as the accepted measure before it becomes politicized. For NAEP this was largely done by
the private sector. Finally, a bipartisan public-private effort referred to as SUSA (State of the USA) publicizes on a regular basis a set of indicators across all sectors of economics and society and the environment, in an effort to inform the democratic process. So there is attention to making some indicators easily available to the broad public.
Geoff Mulgan said that it is helpful to consider the three sets of interest groups: the scientific community, the government, and the public. In his view, the scientific community has an obligation to itself, to science, and to a degree to the public but not to the state. The closer any indicator gets to being used for actual administrative decisions, as with the poverty indicator, the less appropriate it is for the scientific community to lend its legitimacy to it because of the risks of distortion. However, to treat any indicator as essentially a feedback system, there are different interests in place as to what counts as good feedback. For the scientific community, there is a lengthy time scale, cumulative knowledge, etc. For the public, one of the criteria could be whether to hold the state to account. Mulgan observed that different indicators will respond to these three interests in different ways at different times.
Arthur Kendall (U.S. General Accounting Office, retired) shared his perspective as a social psychologist and mathematical statistician. He advised that when dealing with a concept in a particular construct, it is important to look across disciplines to see what connotations and denotations the terms have in other disciplines. Ontology could be semantics. It is important to pay attention to how other people are using the concepts. He believes that an important role of the scientific community is to facilitate communications among the disciplines and between the disciplines and the policy, intelligence, government, and current administration and congressional groups. He added that if something is incomplete, that does not mean it is wrong. He also pointed to the importance of level of analysis—for example, a change in the number of children counted as proficient is not the same as a change in the number of those whose proficiency has changed.
Robert Hauser underscored the importance of persistence in getting a measure accepted. Measurement breakthroughs can take a long time. The fundamental measurement work that showed how old the universe is and that it is expanding was based on measurements that began in 1974. Another example is Measuring Poverty, the 1995 National Research Council report. It has persisted and perhaps may yet have the kind of effect that was originally intended. There was recently action in Congress to move it forward, championed by Mayor Michael Bloomberg in New York. A third example is the addition of occupational mobility questions to the Survey of Income and Program Participation (SIPP). Despite initial sentiments that there was no national interest in measuring social mobility, Hauser and his
colleagues succeeded in adding three questions to the SIPP on this topic. He believes it is time to try again to add more questions.
To move ahead, Matthew Snipp recognizes the need to decide what can be and what should be standardized. Even if standardization is not possible, harmonization might be, especially across time and space. He called for the participation of another set of actors—representatives of statistical agencies, the Association of Public Data Users, the Council of Professional Associations on Federal Statistics, among others—who have a direct interest in the production of federal statistics and are proactive in making their views known. Bohrnstedt agreed that harmonization could be possible when standardization is not. He noted, for example, that in the National Center for Education Statistics, various measures of social class or social economic status are used. He welcomed greater efforts by U.S. statistical agencies to harmonize measures across agencies at a given point in time, so that different statistical agencies, or different units in the same statistical agency, are not measuring the same construct or concept in vastly different ways.