Social Science Constructs
The second day of the workshop began with a session on the theory of measurement and the identification and integration of three important aspects of standardization: ontology, representation, and procedures. A number of social science constructs were examined to better understand when standardization of a scientific concept makes sense. The examples illustrate a number of reasons for the lack of a standard measure: paucity of scholarly interest, balkanization of fields, sparse data, and politics. Consideration was given to rethinking incentives for researchers to work collaboratively on common measures that then improve and extend discourse.
THE THEORY OF MEASUREMENT
Norman Bradburn (National Opinion Research Center, University of Chicago) began his presentation by defining measurement as the assignment of values in a systematic and grounded way for some practical purpose. Toward this end, three aspects are paramount: (1) ontology—a definition of the quantity or category that identifies its boundaries, fixing what belongs to it and what does not; (2) representation—a metrical system that appropriately represents the quantity or category; and (3) procedures—rules for applying the metrical system to produce the measurement results. All three must mesh properly to realize useful and proper measurement.
Beginning with the issue of ontology, Bradburn distinguished between two kinds of concepts. The first are the more traditional, scientific concepts that refer to specific features, such as age, minimum wage, etc. The second are “Ballungen” concepts that sort things into categories based on a loose
set of criteria in which the members of the same category do not share any specific set of features but rather have what Wittgenstein referred to as “family resemblance.” Such concepts are conglomerations with less precise boundaries, such as happiness, prestige, social exclusion, and the like.
Definitions depend on their purpose. Bradburn recalled Pollak’s mention of disability and marital status as examples of concepts that could be defined for a scientific use in order to fit into a theory or be used to make predictions, or they could be related to policy needs or social descriptive purposes. He said that concepts can be characterized by explicit definition (e.g., formulas, such as income = consumption + savings), by implicit definition (e.g., from scientific uses or attempting axiomatic definitions), or by operational definition (e.g., IQ). The usual trade-off with respect to common metrics is between the accuracy of characterization and the purpose and breadth of applicability.
Once there is a definition, the next concern is that the representation matches the concept. Thus, concepts referring to specific features like age or income to some extent can have single-value functions that measure the values of concern. However, Ballungen concepts are often measured by indicators or indices. It is often difficult to do much more than simply count up different indicators, unless some mathematical structure can be imposed on them. Measurement procedures may combine variables with different underlying relations to other concepts (e.g., happiness and satisfaction). Bradburn observed that one of the tensions in the social sciences is that the more one refines a concept and the more precise one tries to make it, the more one may lose some of the associations and original meaning, and comparability across uses may suffer. To consider large numbers of indicators over time, one ends up reducing or weighting them. Where the weights come from is of crucial importance to the validity of the measure. Bradburn saw the need to address these issues of narrowing and redefinition if a particular set of indicators are to be used for prediction or explanation.
He turned next to two aspects of procedures. One is accuracy in terms of getting the true value of what one is trying to measure, and the other is precision or getting a narrow range of estimation. In the social sciences, researchers do not do much with instrumentation. The issue he identified is whether survey questions actually measure what one thinks they are measuring. He observed that there is no gold standard for almost all measures of concepts of interest to social scientists. However, in psychology at least, this problem was addressed years ago using the multitrait, multimethod approach—that is, using different measurement modes and different aspects of the concept to measure something in different ways, which all roughly converge on the same answer. Such empirical regularities strengthen the view that the measurement is correct, particularly if it is for scientific pur-
poses, and they help to ensure that different procedures measure the same thing.
Cartwright and Bradburn (2010) proposed a number of general rules, including that the procedures need to be consistent with definitions of the concept and the particular representation of them. Empirical regularities are central to this. Cartwright added that procedures are a way of zeroing in on the concept to be defined. Most procedures are situation specific; many procedures zero in on the concept in different ways. In a new context, the linkage between concept and procedures may not hold.
One of the problems with Ballungen concepts is that the measurement procedures may violate the commonsense understanding of the concept. Bradburn considered unemployment to be a good example of this, because the way in which it is actually measured seems to violate the commonsense understanding of unemployment (in that it removes discouraged workers from the denominator). He emphasized that the subjective component can be very important. The meaning of “looking for work” is somewhat ambiguous, especially for youth. In the Current Population Survey, the report on youth behavior often comes from the parent, and the parent’s view about whether a child is looking for a job could differ from that of the child.
Another often used measurement procedure is combining different variables and questions. Bradburn cautioned that it is important to assess whether the underlying relationship of those variables to other factors is the same. As an example, he has found the concepts of happiness and satisfaction to have different relationships with age. Yet in the literature to date, happiness and satisfaction are treated as if they are the same. In fact, they are related in different ways to underlying concepts.
Bradburn continued that the concepts with different procedures can suit different purposes. Measures of quality of life, even the ones from the Patient-Reported Outcomes Measurement Information System, are different for different purposes. Particularly with respect to policy-related indicators, the explicit values become an important part of the measures. These indicators, if adopted for a considerable time, become very difficult to change, because some groups have been advantaged by one set of procedures, without necessarily having a scientific basis for the choice. Values and value implications are hard to eliminate.
The kind of distinctions made in Cartwright and Bradburn (2010) have three major implications. First, common metrics are possible and desired if the definitions, representations, and procedures are all well specified and appropriate. Second, when concepts are used for different purposes, so that the definitions, representations, or procedures are different—or all of the above—then there will be difficulty getting to common measures. Third, many policy-related social science concepts lack a firm scientific or theoretical basis for their definition, and often their definitions depend on values.
The varying purposes for which they are used make common measures very difficult, if not impossible.
MEASURING POVERTY: THE QUESTION OF STANDARDIZATION
Robert Michael (University of Chicago) discussed the measurement of poverty in terms of the advantages and disadvantages of standardization of a scientific concept. He began by reviewing the measurement of poverty—how it is done and whether there is or is not science involved. He then reflected on lessons learned from the fact that, for the past half-century, the United States has had an officially sanctioned standardized measure of this particular construct, which forms the basis of many programs. He began by tracing five steps to measuring poverty:
Choose a concept of poverty. It can be a relative or an absolute concept. Science can provide guidance about the concept, but it cannot help with issues of relative or absolute. It can explain the implications but not distinguish right from wrong.
Select a unit of observation or analysis—individual, family, or household. The individual is probably the best unit for measuring poverty, because utility and well-being are generally individualized notions. However, individualized metrics of poverty are not conventionally seen. Most use family (connected by blood or contract) or household (everybody living under one roof and pooling resources).
Determine the poverty threshold level and decide how to adjust that level across units, time, and location. Acceptable equivalents across units must be determined, and this typically is based on some kind of underlying understanding of the science involved.1 Adjustments also must be made over time. Over time, prices change, the consumption bundle underlying the notion may change, and the product and the social norms may change. Adjustments for region or location may be required if prices vary by geography.
Determine what resources to include. Theory or science may call for consumption as the appropriate concept to measure, but be-
cause there are often too many public goods for which consumption is impossible to capture, expenditures or income are often used for practical purposes.
For each unit, compare the threshold to the resources and, if the threshold is higher, that unit is “in poverty,” otherwise not.
Michael observed that science can provide much guidance on many but not all of these points, and it depends on the purpose of the measure. He identified three purposes for which a poverty measure is needed: (1) as a scientific measure of economic deprivation, (2) as a measure of social compassion, and (3) to determine eligibility for social programs. Standardization makes sense for the third purpose because of the importance placed on equitable treatment in eligibility. For the first two purposes, Michael does not believe that standardization necessarily makes sense.
In his view, politics and vested interests explain why it is so difficult to shift away from the use of a clearly imperfect poverty measure. Any time there is a scientific measure that translates into policy, politics will trump science, he said. Poverty is one of those issues that impacts the allocation of funds, so it is understandably of immense interest to politicians. He pointed as an example to a major National Research Council (NRC) effort that tried to uncouple the concept of poverty measurement from eligibility (National Research Council, 1995); the report, Measuring Poverty: A New Approach, has never gained traction, despite its being a good idea.
Michael closed by listing a number of lessons learned related to standardization:
If the science does not suggest a consensus, it cannot impose one and expect to achieve consensus. It is not worth the effort to pursue standardization if it is not needed. One risk to unnecessary standardization is that weaknesses get codified and reinforced over time.
Competition in general is good. Others will adopt what is seen as the better measure. For example, national income accounts have been adopted because they are a good idea. This is also true of the “earnings function” of Jacob Mincer. It too became the standard because it won the competition of ideas and because of its clarity and feasibility.
A community of scientists who are freely cooperating powers scientific discovery. Each person, acting on his or her own initiative, acts to further the entire group’s achievements (see Michael, 2010).
A NATIONAL PROTOCOL FOR MEASURING INTERGENERATIONAL MOBILITY?
David Grusky began his presentation by observing that there is no standardized measure for intergenerational mobility largely because of the paucity of scholarly interest in standardization, the balkanization of fields, and sparse data. Much academic research on intergenerational mobility is conducted in economics and sociology, quite independently and separately from one another. Economists are focused on economic standing and economic mobility; sociologists are focused on occupations and social mobility. This balkanization of fields may be precluding the rise of a standardized measure for intergenerational mobility. Researchers have to date been more focused on the science itself and moving the academic debate within their own disciplines. In addition, he argued, the data are not available to carry out the study of mobility in any compelling way. The paucity of data has led to a “cacophony of very clever models,” a situation that does not lend itself to the rise of a single standardized approach.
In each of the two disciplines, there is some amount of infighting, Grusky observed. In economics, the concept of economic standing is seen as important, but there is debate about how to operationalize it. In sociology, there is consensus on how to measure occupation, but there is debate about how best to understand occupational mobility and what it means about the social world.
In economics, the preferred method is calculating the intergenerational elasticity of income, but its calculation has been hampered by small sample sizes and measurement error. The consensus view is that there is insufficient sample size in the Panel Study of Income Dynamics and the National Longitudinal Surveys to reliably glean trends, and there are not enough repeated observations of income. These deficiencies have generated two cottage industries to provide tabular analyses of income mobility (based on quintiles) and wealth mobility.
In sociology, occupation is considered an omnibus extra-economic measure of social position, comparing, for example, the occupation of fathers with that of sons or daughters. Perhaps the most compelling argument on behalf of an occupational operationalization of mobility is that it embodies information about where an individual stands in the social world. It signals the skills and credentials (and hence life chances) of the individual, socioeconomic status and prestige, consumption practices and leisure activities, and the social and cultural milieu in which he or she lives.
Grusky considered it a potentially useful division of labor for economics to focus on economic mobility and for sociology to focus on social mobility. This permits examination of the extent to which the social worlds in which people find themselves are the same as those in which their parents
find themselves. Both sociology and economics are focused on the economic standing of individuals and how it is transferred from one generation to the next. However, one could take a more narrow interpretation of occupational income as a measure of permanent income, so that the annual variations in income that one observes could be seen as noise centering on the occupational mean.
Another line of debate in sociology is about how the reproduction of social standing from one generation to the next is secured. Grusky described three types of reproduction, each with its own subtradition of analysis:
Gradational form—parents pass on a hierarchical position (i.e., amount of resources) associated with a particular occupation. Children of parents with many resources (social, cultural, economic) end up in good occupations; children of parents with few resources fare less well.
Big-class form—children inherit a big class of origin (e.g., children of professionals become professionals) with associated cultures, networks, and skills. Class-specific resources are transferred from one generation to the next, which would raise the probability of class reproduction. Two big classes of the same overall desirability (e.g., proprietors, nonmanual laborers) do not convey identical mobility chances.
Micro-class form—children benefit by resources or perspectives quite specific to the detailed occupations that parents might have. For example, the attack on the World Trade Center might generate family discussion about motivation and cultural differences in a family of sociologists, but discussion about structural integrity and construction materials is more likely to occur in a family of engineers.
Putting aside narrow-gauge methodological problems for now, Grusky underscored the primary need to overcome two main structural obstacles to developing a national protocol for measuring intergenerational mobility: the balkanization of economics and sociology traditions and sparse data. He sees value in maintaining both economic and sociological approaches to studying mobility. Economic position is distinct from occupation as an omnibus measure of social position. One obviously cares about how much money people have, but one also should care deeply about the social and cultural milieu in which they live and whether or not the milieu in which they grew up is also the one in which they find themselves as adults. This question is distinct from whether the economic standing of individuals is the same from one generation to the next.
Possible solutions to the sparse data problem include better surveys,
linking surveys to administrative records, and building exclusively and directly on administrative data, such as those from the Internal Revenue Service and the Social Security Administration. Grusky argued for the latter approach, because it would generate an extremely large data set that would facilitate cross-group comparisons, permit analyses of term income histories that better approximate permanent income, make detailed occupations available and linkable to those of dependent children, and provide data on family structure and (imputed) wealth. Of course administrative records data have limitations, but Grusky believes the quality of the data would improve over the long run if monitoring efforts were dependent on them.
He then discussed the merits of having a standardized measure of intergenerational mobility. Detractors argue that it would saddle the field with a problematic standard and suppress innovation. The alternative view is that some sort of national measurement system for monitoring mobility would in fact inspire more critical research. Whether more research is beneficial depends on the opportunity cost, that is, what other research is being squeezed out that is more important to pursue.
In discussing these presentations, Christine Bachrach observed that a number of concepts (e.g., marital status, social mobility, poverty) have been characterized as Ballungen. In some cases there are concepts that truly are not precisely defined, like happiness. But some of the others seem amenable to disaggregation into very precisely defined smaller components. In the case of marital status, it appeared to her that new meanings were being tagged to a measure and a concept that is actually very precisely defined. Marriage is a legal status, precisely defined by law. She questioned whether introducing such dimensions as living arrangements, relationship stability, and relationship status into marital status might lead to the creation of a definition that is unnecessarily imprecise.
Nancy Cartwright responded that, for many concepts, it is certainly possible to provide more precise definitions, which is necessary for making scientifically defensible comparisons and tracking changes. Bradburn added that the more one defines a concept precisely for scientific purposes, the further it can depart from its originally intended meaning and the rich everyday concept that people think it means. On one hand, with respect to poverty, Cartwright suggested, it might be more helpful to simply have the array of poverty definitions available if the ordinary concept of poverty is not described properly by any single one of them. On the other hand, on specific occasions one of the definitions might be the right one to use.
David Johnson (U.S. Census Bureau) raised a question about the lack of a single accepted disability measure. He observed that there is a disability
rate to evaluate health outcomes, a disability rate to evaluate employment outcomes, and a disability rate to evaluate adherence to the Americans with Disabilities Act. Bradburn responded that there would be different measures depending on the purpose, that there may not be one perfect measure. Michael endorsed the idea of increasing transparency and clarity by posting a whole range of estimates and letting analysts pick the right one for their purposes.
MEASURING AND MODELING OF SELF-REGULATION: IS STANDARDIZATION A REASONABLE GOAL?
Compared with the concepts of poverty and intergenerational mobility, Rick Hoyle (Duke University) observed, the concept of self-regulation has no apparent consequences for politics, at least at this point in time. The implications of standardization and the adoption of a common metric would in this case have far more to do with the accumulation of evidence in the progress of science than it does for policy. As a social psychologist, Hoyle had not really considered the likely payoff or the impediments to thinking about a shared understanding even of how things might be measured. In fact, the field of social psychology is more likely to place value on originality and creativity in developing alternative ways to measure concepts. There is not even a hint of movement that he has discerned to standardize the measure of self-regulation. Instead, he approached his presentation as a thought exercise to ask whether there is value to moving toward a common understanding of the construct and how it should be measured.
Hoyle described self-regulation as a relatively new construct that has become of increasing interest from both a scientific and a lay perspective, and it will become increasingly important, for example as an education policy topic. He dated empirical research on the topic back to the late 1960s, with the first bona fide theoretical model appearing in 1972. Self-regulation is primarily a topic of study in social psychology, with applications in clinical psychology/psychiatry, education, and increasingly other areas that relate to goal-directed behavior—for example, a general theory of crime, lack of self-control, health behavior, sport, and delinquency. There has been a rapid increase in use of the construct, currently accumulating at a rate of about 120 published articles per year. As evidence has accumulated, social psychologists have begun to pull together handbooks that summarize the state of the art, with a total of 114 chapters published in the last 10 years on the topic of self-regulation.
He attributed the increased interest in part to a number of developments that exemplify lack of self-regulation: (1) the significant amount of U.S. consumers’ revolving credit debt, (2) rising obesity rates, and (3) the recent economic crisis, which is attributable in part to excessive borrowing
and lending and high-risk investments made with little or no concern for potential long-term consequences.
It is difficult for Hoyle to imagine how he might have a measure without a model. However, it is very clear to him that there is no commonly accepted model of self-regulation at this time. Although there is currently no consensus regarding even its definition, a working definition of self-regulation might be the various means by which human beings manage themselves, including the following:
Attention—the degree to which one is able to stay focused on an important task in the face of distraction;
Cognition—the degree to which one is able to produce positive thoughts or suppress negative thoughts when distressed;
Motivation—finding the will to continue in the face of challenge and stopping when continuing is unlikely to produce a desired outcome;
Emotion—seeking or prolonging pleasant emotions and resisting or quickly banishing unpleasant emotions; and
Behavior—for example, declining a second helping of food when it is offered, going to the gym when it is inconvenient or requires sacrificing preferred behavior.
In each of these systems, Hoyle noted two conceptual distinctions; first, the idea of self-stopping and self-starting and, second, the idea of deliberate versus automatic actions.
Hoyle next provided evidence of the predictive potency of self-regulation from three research studies. Building on earlier studies on children’s ability to self-regulate by delaying gratification, Walter Mischel and colleagues (1989) found that preschool delay time predicted a number of fairly consequential outcomes, including academic and social competence, coping ability, and personality characteristics in adolescence (e.g., greater attentiveness, planfulness, and reasoning ability). Caspi and Moffitt’s largescale birth cohort study revealed that children who were considered “under-controlled” at age 3 were, at age 18, high on impulsivity, danger-seeking, and various other traits that are related to poor self-control; at age 21, some 18 years after their initial assessment, they were more than twice as likely than their counterparts to engage in a variety of problem behaviors. Finally, James Heckman’s research on early deficits in self-regulation found that they translate to reduced personal, social, and economic productivity in adulthood. Heckman posits that early childhood investments that narrow the gap in noncognitive abilities can offer a ninefold return on investment, yielding a 15-17 percent increase in adult economic productivity and making a compelling case for early intervention.
In the continuum between metric diversity and common metrics, the concept of self-regulation is clearly in the direction of metric diversity. In the literature, one finds most data generated by small-scale experiments and three types of measures of self-regulation in use: rating scale measures,2 personality inventories, and measures derived from behavior.
The advantages of rating scale measures include their focus specifically on self-regulation and the frequent use of multiple subscales that allow for fine-grained assessment of the construct. Personality inventories, generally for adolescents and adults, were not originally designed to measure self-regulation, but they often include subscales addressing it (conscientiousness and constraint being two personality dimensions that are clearly relevant) that are so widely used that normative data are typically available. Apart from these normative comparisons, neither the rating scale measures nor the personality inventories have inherent meaning. Both require self-reports and are generally suitable only for adolescents and adults. Hoyle took issue with the reliance on self-reports, given the evidence that people are poor at reporting their own mental states, and the inability to track self-regulation over the life course beginning at much earlier ages.
Measures derived from behavior are typically generated in small-scale controlled experiments. Examples include duration of self-imposed delay, control of emotional expression when exposed to emotion-invoking stimuli, pain tolerance, and inhibition of interference. These measures offer a number of advantages, including their reliance on observable behavior (i.e., self-reports are not required) and the facts that situations can be devised that generate scores even for young children, and that the metrics often have inherent meaning (e.g., time, number of attempts). However, there is no generally accepted paradigm, behaviors are likely to reflect other constructs in addition to self-regulation, and there are no manipulation checks. As a result, Hoyle stated, it is difficult to know whether a finding should be attributed to self-regulation or to some other construct that one has unwittingly manipulated.
Hoyle’s review of current measurement approaches indicates that there is no existing measure that stands out as particularly promising for developing a standardized metric. Rather than “habitual measurement” and “seductions of theory,” Hoyle saw the concept of self-regulation characterized by ad hoc measures and “seductions of novelty.” Social psychologists gain notoriety when they coin a new term or develop a measure that is somehow
different from what is currently in the books. This culture works against standardization and common metrics. It also is not clear what the form of a standard measure should be (e.g., global, domain-specific), nor what quality of self-regulation matters most (e.g., capacity, style, capability). The focus has been on process rather than classification.
He summarized what he considered features of a desirable metric. It must be intuitive, that is, phrased in terms that have inherent meaning. The units should have basis in commonly accepted reality, so that change can be expressed in meaningful units. And finally, the metric must have the same meaning across the range of characteristics on which comparisons would be made (e.g., preschool to adulthood). He saw a number of advantages to standardization: (1) the results across studies and research programs could be compared, (2) empirical evidence would more readily and quickly accumulate, (3) the construct might be more likely to be assessed or discussed routinely outside the academy, thus drawing social psychologists more into discussions of social issues and into informing policy development and evaluation.
Hoyle recognized that there are many reasons why standardization may not be a good idea at a particular time. When no measure is a candidate for widespread use, the use of multiple measures can help to triangulate a construct and test the robustness of effects across operational definitions. He also appreciated the benefits of mid-range models, that is, models that spring up for different reasons and are not really trying to serve as a comprehensive explanation for self-regulation. He feared that standardization might thwart this, because it would be unlikely that a single measure would map onto and satisfy the needs of every given approach to thinking about the construct. A standardized approach might also shift examination away from process, which he thought would be a mistake at this point in the history of the construct. As evidence accumulates, models can be integrated, trimmed, and simplified.
Hoyle drew a number of lessons from his review of self-regulation measures:
Standardization does not seem necessary for a research literature to thrive or for research funding.
Without convergence on a common model or set of prominent features of the construct, there can be no convergence on a common metric.
Pressure to standardize measurement at this time would stymie research on process, continued refinement of the construct, and operational definitions.
Without standardization or a common metric, the construct rarely enters into discussions of social issues and social policy.
Although attempts at standardization would be premature, there are advantages to working toward standardization and a common metric while allowing metric diversity to continue.
Rebecca Maynard (University of Pennsylvania) said that the presentations in this session collectively have done a good job of modeling what is often desirable, and sometimes not, about common metrics in the social sciences. They also illustrated for her the limitations of moving too quickly to common metrics. She then made a number of observations.
She first observed that even when the science and technology for developing common metrics exist, there is a time and place for common metrics. Cartwright and Bradburn (2010) laid out a three-step process of defining what is to be measured, selecting the metric for measuring it, and applying the metric. These same steps are also the gatekeepers demarking readiness for common metrics. The current poverty index came about because there was a readiness—a need in the war on poverty, a ready metric, and an ability to apply that metric. There has been little progress to change this measure—despite very good work by NRC and other researchers demonstrating all the pitfalls of the current measures and other ways to measure poverty better—not only in large part because of inertia, but also because there has been no compelling reason to adopt an alternative.
One of the areas in which Maynard hopes common metrics will be developed is what she termed 21st-century skills, which are skills needed to improve the labor market readiness of those at the bottom of the skills distribution and national productivity. Such vocational skills include aspects of self-regulation (or social competence), the ability to take direction, and reading. It seemed to her that the research literature may provide a strong foundation for understanding what to measure as well as the psychometric capacity to develop such a metric. Although none of the papers explicitly cautioned against creating common measures “before their time,” she believes that the papers by Grusky and Hoyle came close.
Maynard next observed that there is a temptation to clump concepts—the things to be measured—under neat labels and to want common measures for them. In some cases, she surmised, consensus and utility might be much quicker to achieve for narrower concepts. In each of the three domains considered in this session—poverty, social mobility, and self-regulation—the concepts to be measured could well be context specific. She noted Michael’s point about why different definitions of poverty might be needed or different measures advantageous if the intent is to apply the measure cross-nationally. The concept of poverty also might differ if the focus is on children, prime age adults, or the elderly. Similarly, she noted Grusky’s
compelling examples of the theoretical and practical implications of different definitions of social mobility. For example, what to measure and the appropriate metrics would be different for understanding and comparing social status and relationships intergenerationally than if the purpose is to monitor and promote equal opportunity in education or economic welfare.
In Maynard’s view, Hoyle made a convincing case that for self-regulation there is neither a compelling need for a common metric nor is it likely that there would ever be a need for a single measure. The concept of self-regulation varies with age, with setting, and with goal. It is an umbrella concept that, for scientific, political, and practical purposes, would probably need to be greatly refined and tailored to the intended use.
For Maynard, one of the implications from this meeting is that it would be desirable to embark on a strategy of encouraging and facilitating the use of common metrics in cases in which there are well-established, meaningful metrics or when such measures could be constructed and made accessible with reasonable effort. This could take the form of doing a better job of ensuring that the good metrics are well defined, have established psychometric properties, and that the means for application of these measures is in the public domain. Royalties for the use of measures would be a deterrent to adoption, regardless of their quality.
Maynard also shared three smaller observations
The process of developing common metrics will be facilitated by encouraging the adoption of common items (anchor items) that can provide cross-walks across studies that are using different measures of similar constructs intentionally—for example, because their contexts or purposes differ or because they are still working on good measure development.
Greater use of “linking” studies could and should be encouraged when there is an interest in comparing across studies or data sets using different measures of purportedly the same construct, like poverty or social mobility.
It may be necessary to change the incentive structure for the scientific community to discourage the creation of new measures for the wrong reasons, such as to advance a professional career or for financial gain. More thought needs to be given to rewarding researchers for replicating and extending and to the relevance of the measures and the metrics.
Sheila Jasanoff (Harvard University) began the discussion by asking whether there is benefit to thinking about standardization itself as being
on some sort of conceptual sliding scale. There seems to be a gradation in the level of social articulation at which a concept, construct, or ontology develops. She questioned if different conceptual unpacking could be employed to avoid using one word across very different kinds of domains of the social sciences and their relationship to policy. One might also think of standardization as potentially a form of social production or reproduction that relates to the evolution of the construct itself.
Grusky said that it is important to know how a particular construct is being used in public discourse (e.g., social mobility) and the way in which the science itself has proceeded. For social mobility, the field has recognized that the concept is best understood in a more disaggregated form. He believes it is possible to demand precision in the scientific context by recognizing that there are quite distinct and important types of mobility, all of which should be monitored simultaneously and operationalized in a credible way and also combined into a single model in order to tease out the relationships among different types.
Robert Pollak commented on the idea of deconstructing concepts into more distinguishable pieces. For example, he found it interesting to consider two distinct concepts inherent in self-regulation—self-regulation of attention and self-regulation of behavior—that might be measured separately. He cautioned against standardization if it means imposing a unitary or dual construction from the outside in a bureaucratic way. In Grusky’s view, standardization may be seen as a kind of correct representation of the simultaneous consideration of constructs and measures that are now independent.
Turning to the notion of intergenerational mobility, Pollak observed that much of the early literature on intergenerational mobility assumed that people were raised in two-parent families, and the main focus was on transmission from fathers to sons. This formulation is no longer appropriate in the context of changing family structures, for example the growing prevalence of female-headed families, nonmarital fertility, and the effects of immigration. Grusky agreed with Pollak on the importance of factoring in mother’s income and occupation; ignoring mother’s occupation will result in profound misunderstanding about the direction of the trend in intergenerational mobility in the family.
Pollak also remarked that although there is no standardization between economics and sociology, the collection of data essentially involves choices about which questions to ask. In collecting income and occupational data, there is no requirement that the users of the data must focus on the occupations piece or the earnings piece. Agreeing on the type of data to collect could be another way of promoting common metrics.
Robert Hauser returned to the issue of self-regulation. He stated that the economists’ original notion of ability in human capital was a very global concept: whatever was left over in the psychology of individuals. The ar-
rival of easily accessible data from IQ tests created a huge market for the use of IQ as the “ability” in economic models of education and educational and economic success. This resulted in a dominant line of interest involving the consequences of cognitive ability. There is now a research program centered in Scotland looking at the correlation between IQ and mortality (which appears all over the world), but there is nothing in the literature that explains why the correlation occurs. Hauser argued that it is exceptionally important to have a few widely accepted measures of self-regulation. In the Wisconsin Longitudinal Study, Hauser has looked at the IQ-mortality relationship over a span of 52 years from ages 18 to 68 and found the expected relationship, which he attributes to a simple explanation: the effect of IQ is completely mediated by rank in high school class, which he believes is closely tied to self-regulation, conscientiousness, dependability, and other regularities in behavior. He further argued that there is a compelling public interest to get the story straight. To accomplish this, widely accepted metrics are required. He noted that this was also true years ago of social standing and occupational standing. Rather than novelty, he believes that something socially useful, which helps to nail down narrowly defined cognitive measures, will make a difference in people’s lives. Hoyle agreed with Hauser but was not clear how to move to a widely accepted measure of self-regulation.
Rick Moser (National Cancer Institute) was intrigued by the idea of creating incentives for the use of standardized measures. A psychologist by training, he understands the rewards for innovation in his field but expressed concern that psychology specifically has suffered as a result in the building of cumulative knowledge. The National Cancer Institute is creating a tool to facilitate standardization and has questioned how to create incentives for the use of standardized measures, especially in light of the competing rewards acting against this. He recognizes that some constructs and associated measures are not ready for standardization, but he questioned at what point refinement needs to stop and use begin.
Maynard sought to discover ways to encourage people to start with the best, most relevant measure, improve on it using new data, and ultimately create cross-walks between studies. She also encouraged making data sets publicly available after publication. Funding agencies can help by requiring that contractors and grantees draw on what exists or justify why they need to deviate. Widespread adoption of measures is more likely if the measures are publicly or readily available. Maynard said she is aware of a major ongoing initiative of the Department of Education for a compendium of measures; other federal agencies also support similar efforts.
George Bohrnstedt also thinks that federal agencies can be influential in pushing for cooperative agreements and use of common measures. Hoyle observed that the problem can be one of framing, not just incentives. Once
the frame shifts from the impact of “my” work to the impact of “our” work, then there must be some agreement on what it is we are doing, why we are doing it, and how we do it. Moser observed that this type of effort is challenging because it requires an altruistic stance on behalf of the field.
In thinking of the criteria for standardization on one hand, and the coherence and robustness of the metric on the other, Geoff Mulgan pointed to the need for some assessment of how the standardized metric will be used and also the cost of not having a standardized metric. He supplied three examples that follow from the comments above.
Social mobility is at the moment very politically contested, in the United Kingdom and in other countries, because of cross-national studies appearing to show deceleration or stagnation of social mobility. However, there is no agreement about the appropriate statistics and their meaning, and this is impeding basic democratic debate about what society should do about the issue. Even an imperfect indicator can be important to allow a society to have a competent discussion about proper actions to take.
There is a traditional materialist bias in all the poverty measures that no longer resonates with what poverty really means or with essentially abundant societies in which social support and psychological needs matter as much as material needs. This disjuncture makes it difficult for society to have a serious conversation about what should be done about need and undermines the legitimacy of actions that appear to follow from the measures. Again, Mulgan would rather have a good-enough set of reasonably widely agreed-on measures than perfect agreement on a measure that does not fit with the underlying public discourse on the issues.
He is involved in setting up a network of schools that emphasizes the development of social intelligence, self-regulation, and cognitive skills. The effort must demonstrate success to a very metric-focused school system. There is an urgent need for a good-enough metric, which may be one or two measures of self-regulation. The school system cannot wait 5-10 years for the perfect metric. He called for consideration of the conditions acceptable for creating measures that are imperfect but good enough.
Grusky contended that the case for standardization could be made more forcefully, particularly in social mobility. He noted that the Pew Charitable Trusts is supporting an economic mobility project and is actively publicizing the results. If it were to make its measures official, they could be better than good enough as standardized measures. Grusky believes such an effort could crystallize the best that can be found in the scientific com-
munity, and having a national mobility accounting framework would be the impetus to go beyond good enough to a gold standard.
Turning to poverty measures, Michael vehemently disagreed with the notion that there is no compelling reason to adopt a better poverty measure. He believes the standard currently in use in the United States is embarrassing and illogical, and there clearly are many intellectually superior alternatives. In his view, the obstacle is not inertia but politics. He expressed frustration that Measuring Poverty, the work of NRC from 15 years ago, has not realized much traction.
In survey data activities, Michael supported the idea of linking to administrative records, since this could reduce costs by reducing survey time and increase the size of the samples.
Revisiting the distinction between standardization and harmonization, Michael viewed standardization as top-down and harmonization as bottom-up. In his view, people will adopt measures that work well for their purposes, and he favored reliance on competition in the marketplace of ideas. He emphasized that science is all about standardizations established among scientists, not imposed on them. He therefore did not see that the benefits of imposed standardization outweigh the costs—quite the contrary.