The social sciences comprise a vast array of research methods, models, measures, concepts, and theories. This appendix provides a brief overview of five common research methods or approaches and their assets and liabilities: experiments, observational studies, evaluation, meta-analyses, and qualitative research. We close with a discussion of new sources of data. We begin with a brief comment on cause and effect.
To inform public policy, researchers often frame their studies in terms of causal conclusions and reason from an intervention to its intended outcomes. Many types of research methods are used for this purpose, as well as statistical analyses.
Research that can reach causal conclusions has to involve well-defined concepts, careful measurement, and data gathered in controlled settings. Only through the accumulation of information gathered in a systematic fashion can one hope to disentangle the aspects of cause and effect that are relevant to a policy setting. Statistical methodology alone is of limited value in the process of inferring causation.
The literature on causality spans philosophy, statistics, and social and other sciences. Our use here is consistent with the recent literature describing causality in terms of counterfactuals, interventions or manipulation, and probabilistic interpretations of causation.
In the simplest study of an intervention, one group of subjects who receive the intervention (the treatment group) is compared with another group of subjects (the control group) who do not. When the control group receives no other intervention, it serves to depict the counterfactual: what would happen in the absence of the intervention. Many studies, however, are more elaborate and may involve multiple interventions and controls.
An experiment is a study in which the investigator controls the selection of the subjects who may receive the intervention and assigns them to treatment and control groups at random. Experiments can be conducted in highly controlled settings, such as in a laboratory, or in the field, such as at a school, so as to better reflect the context in which an intervention would be implemented in practice. The former assess efficacy, or whether the intervention produces the intended effect. The latter, called randomized controlled field trials (RCFTs), assess effectiveness, or whether the intervention produces the intended effect in practice.
One important advantage of RCFTs is that secondary variables do not confound the effects of an intervention. That is, in an ideal study, an investigator wants to compare the effects of an intervention on a treatment group that is as similar as possible to the control group in all important respects except for having received the intervention. But this ideal can be affected by secondary or intervening variables—other factors by which the treatment group differs from the control group but are not of primary interest—which confound the effects of the intervention. These factors can influence the outcome of an experiment. In an RCFT, however, these secondary variables do not necessarily need to be controlled for in the design or the analysis: randomization obviates even the need to identify the secondary variables.
For many policy purposes, however, the effects of secondary variables are often critical, especially when the intervention is implemented as the result of a policy action. For this reason, the designs of RCFTs are often complex and incorporate individual differences among subjects and contextual variables so that their effects can be analyzed.
Even for the most rigorously conducted RCFTs, however, the results from one setting may not generalize to all other settings. Consequently, it may be difficult to identify “what works” in different settings from just one RCFT. Moreover, the use of RCFTs may be limited because they often require much time and expense in comparison with other approaches, or they may be precluded by ethical considerations.
Still, myriad RCFTs have been successfully conducted to inform social policy. The Digest of Social Experiments (Greenberg and Shroder, 2004) and its successor journal, Randomized Social Experiments, provide many examples.
Observational studies are nonexperimental research studies in which subjects or outcomes are observed and measured. If two groups are to be compared, the assignment of subjects among the two groups is not under the direct control of the investigator. Two types of observational studies are quasi-experiments (Campbell and Stanley, 1963) and natural experiments (see, e.g., Campbell and Ross, 1968). In the former, the investigator may manipulate the intervention; in the latter, it arises naturally. In neither type of study, however, does the investigator control which subjects receive the treatment. Observational studies can be more than passively observing data and analyzing them: for example, they may involve systematic measurement and aspects of “control,” such as manipulating the timing of an intervention to predefined although nonrandomized groups.
Because they do not involve randomization, however, observational studies may not control for the effects of secondary variables. Without experimental confirmations, the observed outcomes could be the result of any combination of a range of confounding factors. For example, subjects may be self-selected, such as students in a private school who are to be compared with students in a public school, or they may be selected by others but with different characteristics, known or unknown, that may influence the outcome of the intervention. This possible influence is called selection bias. If there is selection bias, how the intervention affects the outcome for the treatment group in comparison with the control group must be described by a model, and that model will always include some assumptions. The model may or may not help with inference for what would have happened in a randomized experiment (see National Research Council, 1998). Moreover, the assumptions underlying the model may not be widely accepted in the scientific community.
Observational studies, however, are important in revealing important associations and in guiding the formulation of theory and models. The observation of a single case can reveal unsuspected patterns and provide explanations for unmotivated forms of behavior. As put by Coburn et al. (2009, p. 1,121): “The in-depth observation made possible by the single case study
provides the opportunity to generate new hypotheses or build theory about sets of relationships that would otherwise have remained invisible.”
Observational studies also serve many other important purposes for the use of social science knowledge as evidence for public policy. The country’s wide range of longitudinal studies, for example, provides much information to guide public policy, from the extent to which people save for retirement (information provided by the Health and Retirement Study) to what different types of social welfare program benefits are actually obtained by families living in poverty (information from the Survey of Income and Program Participation). Observational studies, together with historical studies, provide the rich context in which public policy can benefit society. This use may be their most important role.
Policies are typically implemented with large and highly heterogeneous populations. Even if a policy is based on carefully designed RCFTs or other studies, implementation beyond the confines of the original study population requires careful monitoring and evaluation to make sure that the results observed in the study hold in a larger context.
A researcher must always ask if the new program is producing similar desirable outcomes in the general population as it did in the experimental setting. In the absence of a closely monitored implementation program, issues of measurement, interpretation, and purposeful or accidental deviations from a protocol inevitably creep in, with unpredictable effects on the outcome. When policies are implemented in the general population, it may be done without carefully planned designs and randomized allocation of units to treatments. Unless close monitoring of the policy occurred during implementation, it may not even be known whether the intervention as it was originally devised was what was actually implemented.
Furthermore, the ultimate goal of a policy intervention may well be something to be observed in the future, when follow-up data may be difficult to obtain. For example, although some intermediate outcomes of a program to integrate addicts into the labor force—such as the proportion of participants who are drug free and are employed after a month of treatment—can be measured more or less precisely, it is much more difficult to determine that proportion a year after treatment. Moreover, even if one is able to obtain those data, how could one determine that the results are attributable to the program and not to other factors?
Today’s trend toward accountability means that anyone proposing a new policy or intervention is also expected to prove that the intervention will “work.” Thus, thinking about credible approaches to carry out evaluation studies is almost as critical as conducting the study itself. The principles of experimental design can play an important role, even for observational evaluation.
One approach, for example, is to compare a population before and after an intervention has occurred. As long as the study includes a well-defined reference group and as long as the investigator is reasonably certain that selection bias is not important, such studies can offer some evidence of the effectiveness (or lack thereof) of an intervention. Alternatively, an evaluation study can be planned as an RCFT, in which the goal is to understand whether the original conclusions about the efficacy of the intervention hold when other factors (e.g., the target population) are not exactly the same.
Both experimental and observational studies can be used to evaluate the long-term effects of interventions. An example of such an experimental study is the work of Kellam et al. (2008) on the effect on behavioral, psychiatric, and social outcomes in young adults of a classroom behavior management program carried out when they were in first and second grades. An example of an observational study is the work of Goodman et al. (2012) on the effects of childhood physical and mental problems on adult life, based on an analysis of longitudinal data from the British National Child Development Study.
The evaluation and monitoring of an intervention as implemented is closely related to the more general concept of evolutionary learning, a process to explore how the outcome of interest responds to changes in the original intervention. Consider, for example, a new teaching method shown to be effective in a small class setting. Will it also be as effective when class sizes are large?
A critical aspect of evolutionary learning is the need to proceed in a highly controlled manner in order to understand which factor or which combination of several factors that can be varied are influencing the outcome. Alternatively, a sequence of experiments can be designed in which two or more factors are varied according to a specified plan. In the absence of carefully designed sequential learning studies, it may be difficult to untangle the effect on the outcome of each of several factors under investigation.
As in the case of evaluation and monitoring, there is a theoretical framework developed for sequential learning in studies in which the response of interest is an unknown and may be a complex function of a large
number of inputs. The approach is often known as response surface analysis: it was developed for engineering processes in the early 1950s by Box and Wilson (1951). The idea is to sequentially vary the settings of the input variables so that the response keeps improving.
Although developed for engineering processes, where it is known as evolutionary operation (Box and Draper, 1969), the approach appears to be well suited for the social sciences, in which the relationship between inputs and outputs is typically difficult to measure precisely (see the discussion in Fienberg et al., 1985). It is akin to what is referred to as a learning system that takes full advantage of each application of an intervention and extends the opportunity for discovery throughout the life-cycle of the intervention: its development, implementation, and evaluation.
Meta-analysis is an application of quantitative methods to combine the results of different studies (see Wachter and Straf, 1990). In such an analysis, a statistical analysis is typically made of a common numerical summary, such as an effect size, drawn from different studies (Hedges and Olkin, 1985). Today, there are many guides to conducting a meta-analysis: see, for example, Cooper (2010) and Cooper et al. (2009). Meta-analyses can lead to new hypotheses and theories and inform the design of an experiment or other research study to test them.
A major purpose of meta-analyses and other research syntheses is to reduce the uncertainty of cause-and-effect assessments of policy or program interventions. By statistically combining the results of multiple experiments, for example, the effect of a policy or program can be estimated more precisely than from any single study of an intervention. Moreover, comparing studies that are conducted with different participants in different settings allows for the examination of how different contexts affect the outcomes of a policy or program. However, if individual studies are flawed, then so will be a meta-analysis of them: thus, meta-analyses often specify standards of quality for the studies to be included.
The amalgamation of results from disparate studies can also be done with careful statistical modeling that is distinct from the approaches of meta-analysis. A good example of this approach is Toxicological Effects of Methylmercury (National Research Council, 2000b): its analysis is based on Bayesian methods developed by Dominici et al. (1999) to pool dose-
response information across a relatively large number of studies. Other examples are in Neuenschwander et al. (2010) and Turner et al. (2009).
Work on understanding how to evaluate effectiveness of a policy intervention from the total body of relevant research assembled from interdisciplinary studies has not been fully developed. An example of success, however, is researchers in early childhood intervention who have integrated knowledge about the developing brain, the human genome, molecular biology, and the interdependence of cognitive, social, and emotional development. These researchers have built a unified science-based framework for guiding priorities for early childhood policies around common concepts from neuroscience and developmental-behavioral research and broadly accepted empirical findings from four decades of program evaluation studies: see, for example, Center on the Developing Child at Harvard University (2007).
In addition to experimental and observational studies, qualitative research can play important roles in developing knowledge about the societal consequences of a policy. The term covers many different types of studies, including ethnographic, historical, and other case studies; focus group interviews; content analysis of documents; interpretive sociology; and comparative and cross-national studies. The research may be derived from documentary sources, field observations, interviews with individuals or groups, and discourse between participants and researchers.
Structured, focused case comparisons are an important example of qualitative research. They are particularly useful when it is difficult to carry out studies that require high levels of control (see George, 1979; George and Bennett, 2005). By compiling and comparing case studies, it is possible to refine theory and also to develop useful assessments of the effectiveness of various types of policy interventions and the conditions that favor the effectiveness of one or another policy strategy. Structured case comparison methods have been used to inform diplomacy (Stern and Druckman, 2000) and assess policy strategies for resolving international conflicts (National Research Council, 2000a), to manage environmental resources at levels from local to global (National Research Council, 2002; Ostrom, 1990), and to evaluate efforts to engage the public in environmental decisions (Beierle and Cayford, 2002; National Research Council, 2008).
Archival studies are another example of qualitative research. They in-
volve applying a model based on past evidence or decisions to a behavior or intervention for purposes of predicting future behavior (see, e.g., Institute of Medicine, 2010). Archival data may include public data sets collected by academic institutions or government agencies, such as Supreme Court records and corporate filings, or private data sets, such as medical records collected by public or private organizations.
Qualitative research allows for a rich assessment of respondents, often unattainable in other types of studies (Institute of Medicine, 2010). Like some quantitative observational studies, they can provide the rich context in which public policy can benefit society.
THE FUTURE: NEW SOURCES OF DATA
Advances in social science and in computing technology have generated a wealth and diversity of data sources. Although privacy and proprietary concerns pose ongoing challenges for the accessibility of these sources to researchers, the data represent tremendous potential and opportunities to study social phenomena in unprecedented ways.
Federal, state, and local governments collect administrative data on populations as a by-product of program responsibilities, such as the employment history data maintained by the Social Security Administration and the personal income and wealth data maintained by the Internal Revenue Service. There are health records, school records, land-use records, and much more. A growing interest in improving and using administrative records for scientific and policy purposes has generated increased attention to the issues of privacy, data sharing, data quality, and representativeness that have been central to census and survey data for decades.
The challenges and opportunities are even more pronounced with regard to digital data. With the rise and diffusion of advanced information, communication, and computing technologies, an astounding quantity of electronic data—from demographic and geographic variables to transaction records—is amassed at an exponential rate (see Prewitt, 2010). Though much of it is commercially collected and thus proprietary, the vast reservoir of digital data increasingly includes data collected by government agencies for public use. With respect to data quality, use is constrained by the relative brevity of the time series available for variables for which collection began only recently, as well as the fact that the definitions of variables are constantly changing.
The sheer quantity and diversity of digital data with the potential for
social scientific use is astounding. As examples, consider continuous-time location data from cell phones; health data from electronic medical records and monitoring devices; consumer data from credit card transactions, online product searches and purchases, and product radio-frequency identification; satellite imagery and other forms of geocoded data; and data from social networking and other forms of social media.
The increasing “democratization of data” will enable policy analysts and policy makers to obtain much information for themselves, and it will continue to open new frontiers for social scientists. Automated information extraction and text mining have the potential to extract relevant data from the unstructured text of emails, social media posts, speeches, government reports, product reviews, and other web content. Crowd sourcing can be done through extracting information from social network websites. Longitudinal data can be compiled on millions of people with information on their locations, financial transactions, and communications. The data can be analyzed with methods of the emerging field of computational social science: network analysis, geospatial analysis, complexity models, and system dynamics, agent-based, and other social simulation models. Researchers and interested policy actors have only begun to scratch the surface of the potential of new data sources to contribute to policy making (King, 2011).
Beierle, T.C., and Cayford, J. (2002). Democracy in Practice: Public Participation in Environmental Decisions. Washington, DC: Resources for the Future.
Box, G.E.P., and Draper, N.R. (1969). Evolutionary Operation: A Statistical Method for Process Improvement. New York: Wiley.
Box, G.E.P., and Wilson, K.B. (1951). On the experimental attainment of optimum conditions (with discussion). Journal of the Royal Statistical Society, Series B, 13(1), 1-45.
Campbell, D.T., and Ross, H.L. (1968). The Connecticut crackdown on speeding: Time series data in quasi-experimental analysis. Law and Society Review, 3(1), 33-54.
Campbell, D.T., and Stanley, J.C. (1963). Experimental and Quasi-Experimental Designs for Research. Boston, MA: Houghton Mifflin.
Center on the Developing Child at Harvard University. (2007). A Science-Based Framework for Early Childhood Policy: Using Evidence to Improve Outcomes in Learning, Behavior, and Health for Vulnerable Children. Available: http://developingchild.harvard.edu/fles/7612/5020/4152/Policy_Framework.pdf [August 2012].
Coburn, C.E., Toure, J., and Yamashita, M. (2009). Evidence, interpretation, and persuasion: Instructional decision making at the district central office. Teachers College Record, 111(4), 1,115-1,161.
Cooper, H.M. (2010). Research Synthesis and Meta-Analysis: A Step-by-Step Approach (fourth edition). Thousand Oaks, CA: Sage.
Cooper, H.M., Hedges, L.V., and Valentine, J.C. (Eds.). (2009). The Handbook of Research Synthesis (second edition). New York: Russell Sage Foundation.
Dominici, F., Zeger, S.L., and Samet, J.M. (1999). Combining evidence on air pollution and daily mortality from the largest 20 U.S. cities: A hierarchical modeling strategy (with discussion). Journal of the Royal Statistical Society, Series A, 163, 263-302.
Fienberg, S.E., Singer, B., and Tanur, J. (1985). Large-scale social experimentation in the United States. In A.C. Atkinson and S.E. Fienberg (Eds.), A Celebration of Statistics: The ISI Centenary Volume (pp. 287-326). New York: Springer Verlag.
George, A.L. (1979). Case studies and theory development: The method of structured, focused comparison. In P.G. Lauren (Ed.), Diplomacy: New Approaches in History, Theory, and Policy. New York: The Free Press.
George, A.L., and Bennett, A. (2005). Case Studies and Theory Development in the Social Sciences. Cambridge, MA: MIT Press.
Goodman, A., Joyce, R., and Smith, J.P. (2012). The long shadow cast by childhood physical and mental problems on adult life. Proceedings of the National Academy of Sciences, 108, 6,032-6,037.
Greenberg, D., and Shroder, M. (2004). The Digest of Social Experiments (third edition). Washington, DC: Urban Institute Press.
Hedges, L.V., and Olkin, I. (1985). Statistical Methods for Meta-Analysis. San Diego, CA: Academic Press.
Institute of Medicine. (2010). Bridging the Evidence Gap in Obesity Prevention: A Framework to Inform Decision Making. Committee on an Evidence Framework for Obesity Prevention Decision Making, S.K. Kumanyika, L. Parker, and L.J. Sim, Eds. Food and Nutrition Board. Washington, DC: The National Academies Press.
Kellam, S.G., Reid, J., and Balster, R.L. (2008). Effects of a universal classroom behavior program in first and second grades on young adult problem outcomes, Drug and Alcohol Dependence, 95, S1-S4.
King, G. (2011). Ensuring the data-rich future of the social sciences. Science, 331(6,018), 719-721.
National Research Council. (1998). Assessing Evaluation Studies: The Case of Bilingual Education Strategies. Panel to Review Evaluation Studies of Bilingual Education, M.M. Meyer and S.E. Fienberg, Eds. Committee on National Statistics, Commission on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2000a). International Conflict Resolution After the Cold War. Committee on International Conflict Resolution, P.C. Stern and D. Druckman, Eds. Washington, DC: National Academy Press.
National Research Council. (2000b). Toxicological Effects of Methylmercury. Committee on the Toxicological Effects of Methylmercury, Board on Environmental Studies and Toxicology, Commission on Life Sciences. Washington, DC: National Academy Press.
National Research Council. (2002). The Drama of the Commons. Committee on the Human Dimensions of Global Change, E. Ostrom, T. Dietz, N. Dolsak, P.C. Stern, S. Stonich, and E.U. Weber, Eds. Committee on the Human Dimensions of Global Change, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
National Research Council. (2008). Public Participation in Environmental Assessment and Decision Making. Panel on Public Participation in Environmental Assessment and Decision Making, T. Dietz and P.C. Stern, Eds. Committee on the Human Dimensions of Global Change, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
Neuenschwander, B., Capkun-Niggli, G., Branson, M., and Spiegelhalter, D.J. (2010). Summarizing historical information on controls in clinical trials. Clinical Trials, 7(1), 5-18.
Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for Collective Action. New York: Cambridge University Press.
Prewitt, K. (2010). Science starts not after measurement but with measurement. The Annals of the American Academy of Political and Social Sciences, 631(1), 7-13.
Stern, P.C., and Druckman, D. (2000). Evaluating interventions in history: The case of international conflict resolution. International Studies Review, 2(1), 33-63.
Turner, R.M., Spiegelhalter, D.J., Smith, G.C.S., and Thompson, S.G. (2009). Bias modelling in evidence synthesis. Journal of the Royal Statistical Society, Series A, 172, 23-47.
Wachter, K.W., and Straf, M.L. (1990). The Future of Meta-Analysis. New York: Russell Sage Foundation.