Evaluation is key to improving the overall quality of out-of-school STEM programs and to understanding how they contribute to the learning ecosystem.o Evaluations can inform program developers, researchers, policy makers, and the public as to what out-of-school STEM programs contribute to interest and learning. They can also provide information about the broader context of STEM learning in a community. In this chapter, we describe the complex nature of evaluating the outcomes of out-of-school programs, and what can be done to provide a clearer picture of what programs work best under what circumstances for whom, and how the programs fit into the larger STEM learning ecosystem. The chapter provides a framework to guide evaluation efforts.p
Evaluation has many purposes, including for continuous improvement, accountability, informing management, and demonstration of value. And it can take many forms, including one-time studies, ongoing cycles of data collection and reflection, and participatory evaluation. It can marshal the entire methodological toolkit available in social science and educational research, including multiple study design options and data collection methods. Evaluation efforts can include a range of study designs—with quantitative, qualitative, and mixed data collection methods—often done in collaboration with either in-house or external evaluation experts.
With all the possibilities for how evaluations can be used to document program implementation and outcomes, decisions about evaluation design and execution need to consider three elements: (1) the program’s design (how the program is supposed to work, for whom, and with what resources), (2) the larger policy environment in which it is being operated, and (3) the most current knowledge in the field of evaluation itself. For example, deciding when to use different evaluation approaches is related to the maturity and focus of the program and the goals of the evaluation. Evaluations of new initiatives or programs may best be focused on the qualities of the program’s
oBroadly speaking, evaluation of an out-of-school program is the systematic process of collecting information (data) to enhance understanding of how a program is operating and inform decisions.
pThis chapter draws heavily on the papers by Barron and by Hammer and Radoff; see Appendix B.
design and implementation with an emphasis on formative feedback from participants’ about the program content and pedagogy. Then, after a program is more stable, an evaluation could begin to focus more on whether the program is achieving its expected individual-level outcomes, which might be done with the structure of a more formal, summative evaluation. Once a program has undergone summative evaluation, it may be appropriate to conduct a comparative evaluation to understand a program’s relative strengths and weaknesses in contrast to similarly designed programs or relative to programs that serve similar participant populations.108
The current climate of evidence-based policy and decision making increasingly requires that programs demonstrate their intended outcomes. In the field of education, broadly, funders, policy makers, and the public expect to see evidence of learning. Consequently, evaluations of education programs typically focus on individual learning assessments, where learning is defined in terms of gains in specific knowledge or skills.q How these outcomes are measured depends to a considerable degree on how a program’s designers have defined learning outcomes and the factors affecting them. How do young people learn STEM? What does learning look like in action? What factors contribute to learning? The answers to such questions affect how evaluation studies define and measure learning.
qWe note a useful distinction made by researchers between assessment and evaluation. “The educational research community generally makes a distinction between assessment—the set of approaches and techniques used to determine what individuals learn from a given instructional program—and evaluation—the set of approaches and techniques used to make judgments about a given instructional program, approach, or treatment, improve its effectiveness, and inform decisions about its development. Assessment targets what learners have or have not learned, whereas evaluation targets the quality of the intervention.” (National Research Council, 2009, p. 54). Therefore, assessment of learning can be an element in the evaluation of a program, but it is not necessarily the only element that determines whether a program is productive.
Evidence regarding which out-of-school programs support STEM learning and stimulate interest in STEM, how they do so, and for whom and under what circumstances has been slow to emerge due to the complex nature of STEM learning, the wide variation in the nature of out-of-school programs, and the quality of evaluations. Evaluations of out-of-school STEM programs are challenged by a number of theoretical and practical factors. We emphasize the accumulation of experiences, change at multiple levels, the idiosyncratic nature of learning, and additional evaluation challenges because they were the focus of discussions at the National Summit on Successful Out-of-School STEM Learning, they were highlighted in the background papers commissioned for this report, and they have been cited in major reports on out-of-school STEM learning. Although these same issues also create challenges for evaluating learning in all settings, we focus on what they mean for out-of-school programs.
ACCUMULATION OF ECPERIENCES
The success of out-of-school STEM programs depends on the possibilities they create for young people to expand, deepen, and reinforce their cumulative STEM experiences. Since a wide array of activities, people, programs, material resources, and facilitators sustain engagement, the accumulation of learning opportunities usually accounts for development of expertise and interest (though, occasionally, one powerful experience is transformative).109 A single experience may not have an immediately recognizable or detectable effect on knowledge or interest, but it may have a relatively profound effect if it serves to orient, inspire, or motivate a young person to be open to new STEM learning opportunities.110 It is very difficult to know whether an after-school hike, an intriguing video, or a hands-on exhibit—or, importantly, some combination of such experiences—has a cascading effect on learning choices and motivations, especially over the span of years (or even decades). Biographical studies of scientists and everyday citizens suggest that out-of-school STEM learning experiences can play a powerful role in shaping an individual pursuit of STEM careers or hobbies.111 A broad range of evaluation approaches that capture the complexity of STEM learning and interest development across time and settings are needed in order to better understand how young people make connections across settings and experiences, and what elements of those connections contribute to the continuities that support sustained engagement and learning.
CHANGE AT MULTIPLE LEVELS
As noted throughout this report, an individual is not the only point of change or growth in a STEM learning system. Communities, organizations, programs, and small groups (peers, friends, and families) undergo changes and transformations over time, moving to new ways of thinking and doing.112 A group, program, organization, or community may change its objectives, organizational structure, resource allocation, established policies and procedures, styles of interaction, levels of
collegiality, and even membership in pursuit of greater effectiveness, efficiency, or enjoyment, among other goals. This growth can be in terms of understanding of or interest in STEM, just as with individuals, and such growth is an important object of analysis for evaluators.113
IDIOSYNCRATIC NATURE OF LEARNING
A critical issue in evaluating out-of-school STEM programs is that learning occurs in diverse and unpredictable ways. For example, ethnographic studies of children’s engagement in science outside of school,114 and retrospective studies of scientists, science teachers, and science-interested individuals show that there are multiple pathways to developing enduring interests among young people.115 Examining the scientific talk of young people makes clear that their personal feelings, intentions, purposes, and preferences shape their forms of engagement and ideas. It is also clear that talk-focused studies typically prioritize Western middle-class forms of talk as evidence of understanding.
Evaluators’ awareness of the idiosyncratic nature of learning is important for ensuring that indicators and measures are not exclusively focused on predetermined outcomes and dominant social norms. An openness to and investigation of unintended effects of a program or experience is important for ensuring that an evaluation does not prioritize easily measurable outcomes, which can contribute to narrowing the role of out-of-school STEM environments and the possibilities they offer. It is also essential that evaluators understand the cultural patterns of social discourse of participating communities so that evaluations accurately capture a program’s effects.
ADDITIONAL EVALUATION CHALLENGES
There are many additional challenges to evaluating STEM learning in out-of-school programs. Importantly, young people participate in out-of-school programs based on their interests and motivations and use program resources in different ways. Because of this, out-of-school program evaluators have little control over who participates in a program, which can make it difficult to know whether the outcomes of the evaluation could be replicated with different participants. In addition, if the differences in program experiences among the participants are not well understood, it is difficult to describe what led to any measurable outcomes. For example, young people who consistently attend an out-of-school program are more likely to reap the benefits, compared with those who attend sporadically.116
Understanding the key features of any STEM learning environment and being able to capture, categorize, and analyze participants’ diverse responses are fundamental challenges in making sense of how such environments do and do not promote growth and change.
In the social context of most out-of-school settings, individual assessments, such as tests and surveys, would typically interrupt the normal flow of activity, not be expected by the participants, and negatively change the nature of the environment. For this reason, some evaluators working in the informal STEM field have been particularly concerned with developing unobtrusive means to measure and document learning in such settings. Unobtrusive assessments would be built into the learning experience—embedded in activities such as games or challenges—or be derived observationally from the natural interactions of participants. Such “naturalistic” assessment would rely on documenting the ways in which learners seek help, share ideas, notice one another’s capabilities, build reputations, and in other ways notice and make use of resources in their environment.117
Evaluations of out-of-school programs typically document short-term outcomes. Since learning is understood to occur over time and across settings, it is important to take more comprehensive and layered approaches to evaluation by considering both short-term and long-term factors and outcomes. For example, an early evaluation might focus on short-term outcomes such as whether program goals were achieved and how the design of the program did or did not contribute to achieving those goals. The evaluation might also focus on how a given program fits within the larger learning ecosystem, documenting how it diversifies, deepens, or enhances possibilities for STEM learning in a given community. In addition, the evaluation could measure the consequences of individual differences among participants and longer-term outcomes.
From an ecosystem perspective on learning, a comprehensive out-of-school STEM program evaluation includes measurement at three interrelated levels: individual, program, and community.
At the individual level, evaluation of the quality of an out-of-school program would include measures of an individual’s intellectual development in STEM; positive STEM identity and dis-positional development; and expansion of an individual’s horizons (awareness, connections, and choices), in the context of life-long, academic, and career engagement with STEM. Measurement at the individual level, especially when conducted longitudinally, can shed light on how out-of-school programs are, individually and collectively, responsive to an individual’s learning needs, perceptions of ability, and interest in STEM.
At the program level, evaluation can document the resources and opportunities provided by the out-of-school STEM program. Evaluation at this level can suggest the ways in which program design and implementation can be augmented to better support young people’s intellectual and social and emotional engagement, and how responsive the program is to participants’ interests and experiences. Program-level evaluation can also measure how a program intentionally engages
participants with community resources and possibilities to expand horizons of project participants. Questions can be asked about the capacity of adult facilitators/educators and whether they have opportunities to enhance their skills.
Program-level evaluation that considers the dimensions of engagement in STEM, responsiveness to young people, and connectivity with community, would include descriptions of program activities, information on staff training and development, information about levels of participation, and if the participants also participate in other STEM learning experiences at school or in the community. Program-level evaluation allows staff and evaluators to connect a program’s resources and activities with individual outcomes in order to see what is working well, for whom, and to consider opportunities for change. In addition, program-level evaluation allows staff and evaluators to connect programmatic resources and activities with community-level resources and activities.
At the community level, evaluation can focus on the distribution of diverse STEM learning opportunities (across domains, practices, and levels of advancement); the ways in which a given program is synergistic with the resources within a community and across settings; and the ways in which a program affects the community by expanding learning opportunities and brokering additional engagement in STEM learning across different community settings. Community-level indicators signal the extent to which community-level resources are in place to support effective out-of-school STEM programming, to support connections among in-school and out-of-school learning, and to identify any need for action.
An evaluation at the community level can inform program design. Asset mapping and needs analysis are fundamental to the design of both individual programs and a set of opportunities across a community. They can identify areas in which needs exist in a community and allow stakeholders to understand the nature of local opportunities, what may or may not be working well, and where to best invest resources and new design and implementation efforts. Such mapping work is ongoing in
Examples of Evaluation at the Individual, Program, and Community Levels
Individual-Level Evaluation The Detroit Area Pre-College Engineering Program (DAPCEP) is a nonprofit organization that involves partnerships with universities, training programs, and K-12 school systems to connect historically underrepresented youths with high-quality STEM learning experiences. DAPCEP engages youths in out-of-school STEM learning experiences across several years, providing hands-on mathematics and engineering activities. Program activities are led by classroom teachers who seek to make explicit connections between the content of the hands-on activities and the mathematics that youths work with at school. Students also engage with local industries and professionals to see how mathematics and engineering translate to jobs and careers. Program evaluation has documented the positive effects of DAPCEP on participants, on student high school graduation rates, college enrollment, and selection of STEM-related majors.*
Program-Level Evaluation Intel the Computer Clubhouse Network (ICCN)** uses program-level evaluation to inform programmatic decisions. The ICCN has long engaged evaluators to help analyze and document ways in which its approaches are shaping and affecting the lives of participating youths. Evaluation partners have conducted interviews, surveys, observations, and reviews of staff reports to both provide feedback to the organization and support its program development.
Community-Level Evaluation To better support the development and coordination of its ecosystem of STEM learning opportunities, the Mozilla Hive NYC Learning Network works with Hive NYC partners and the Hive Research Lab to capture and share best practices and collective wisdom. For example, in 2014, Hive NYC members and stakeholders convened meetings to develop principles and guidelines for “working open”—a model for reflective, evaluative practice to support the continuous improvement of programs and outcomes at the community level. The model includes rapid prototyping,*** public storytelling to illustrate key findings, community contributions for co-development of approaches, and making the content of the network’s activities openly accessible.
*Bevan, B., Michalchik, V., Remold, J., Bhanot, R., and Shields, P. (2013). Final Report of the Learning and Youth Research and Evaluation Center. San Francisco, CA: The Exploratorium.
***Rapid prototyping is the process of quickly fabricating a scale model using three-dimensional computer-aided design data.
the 42 statewide after-school networksr that have developed online repositories of STEM out-of-school curricula and information. The networks have also created an online database that maps STEM programming and connected learning opportunities. In addition, there are a number of guides to developing asset maps, including the W.K. Kellogg Foundation, the Community Tool Box, and Community Science.s
A three-level model would include evaluation of how the outcomes for individual participants are directly influenced by the program qualities and how both are shaped and supported by the community context. Evaluations of an out-of-school STEM program would focus on these elements, characteristics, and outcomes, while at the same time identifying any shortcomings, misalignments, and unintended effects, as well as any possibilities for new directions and innovations.
Work is now under way to develop new models for evaluations that may prove to be less disruptive, less obtrusive, and more meaningful than many commonly used near-term measures of individual learning and changes in attitudes and interests, such as surveys. One suggested approach is for evaluators to develop a framework for how formative (process), summative (outcome), and comparative evaluation interact. Existing measurest and program evaluationsu are typically conducted at the individual or program level and typically focus on the short-term outcomes. Two notable exceptions are the longitudinal evaluations of 4-H Sciencev and FIRST®w (For Inspiration
rFor example, see http://www.indianaafterschool.org/state/mapping-database/ [May 2015].
sFor more information, see http://www.abcdinstitute.org/docs/kelloggabcd.pdf_[May2015], http://ctb.ku.edu/en/table-of-contents [May 2015], and http://communityscience.com/knowledge4equity/AssetMappingToolkit.pdf [May 2015].
tFor example, see the database of Assessment Tools for Informal Science (ATIS) at http://www.pearweb.org/atis/tools/browse?content=true [May 2015].
uFor example, see the evaluations of public education programs at http://informalscience.org/evaluation/browse?type=evaluations [May 2015].
vFor more information, see http://www.4-h.org/about/youth-development-research/positive-youth-development-study/ [August 2015].
and Recognition of Science and Technology).x Greater investments into developing methods for longitudinal and community-level evaluations would make it possible for more evaluations to take an ecosystem perspective.
For out-of-school programs, for example, immediate measures of individual experiences could be developed to provide formative feedback to program leaders. Such measures could include what individuals are interested in or confused about. The resulting data could be used for program design and implementation. Long-term outcome measures, such as levels of interest in STEM or documentation of course and career choices, could be used to evaluate whether a program achieved its targeted goals and outcomes, and how it did so. Similarly, program measures could be seen as formative from the community perspective by addressing such questions as: Where are investments needed? Where are opportunities for action? What community resources might strengthen a program?
The need to both consolidate and diversify evaluation methods at the individual level is an active area of research. Some researchers have pursued the development of standard metrics for measuring STEM interest and motivation across the range of out-of-school STEM environments.118 Others use qualitative means to probe and document the way that out-of-school experiences shape young people’s life trajectories, as evidenced by choices, pathways, and “ways of being”—for example, interacting with phenomena or appraising ideas, designs, and products.119
The most common approaches to research and evaluation focus on near-term measures that are easy to administer and score. Well-designed tools of this kind are an important component of an evaluation toolkit, and there are several ambitious initiatives under way to develop suites of tools that can be shared across projects:
- the Youth, Engagement, Attitudes and Knowledge (YEAK) Survey developed by 4-H;y
- the suite of tools developed by the Program in Education Afterschool and Resiliency (PEAR) at Harvard University;z
- the measures developed by the Activation Lab, a collaboration among the Learning Research and Development Center at the University of Pittsburgh, the Lawrence Hall of Science at University of California, Berkeley, and SRI International;aa and
xTwo exceptions to the general statement about existing measures and program evaluations were added to the text.
yFor more information, see http://www.4-h.org/about/youth-development-research/science-program-research/ [May 2015].
- the Developing, Validating, and Implementing Standardized Evaluation Instruments (DEVISE) Project at Cornell University.ab
Policy makers understandably want a single, low-cost, easy-to-administer tool that can provide data that allow them to measure the effects of educational investments. Creating a single metric that could be used in the diversity of out-of-school STEM programs will not be simple120 because it needs to be sensitive to differences among individuals (e.g., age, culture, level of participation) and programs (e.g., intensity and length, delivery method, goals) while not intruding on the program’s design. Yet progress has been made in developing some common standardized measures that can track the long-term trajectories of young people’s development and possibly (if linked to detailed accounting of program and community arrangements) also provide understanding across programs as to what elements of out-of-school settings and programs contribute to learning. Such measurement instruments allow for the comparison and aggregation of data across programs and settings. However, there are significant concerns about the ways that common measurements may constrict educational opportunities and approaches in schools and otherwise negatively affect learning in out-of-school settings. Clearly there are benefits and limitations of common metrics, and this is an area of work that deserves careful investment and study over time.121
Common approaches to measurement of youth outcomes are generally meant to document the contributions of out-of-school programs to STEM learning122 or to determine whether the contributions to STEM learning vary for different populations of young people. Since common metrics are used for these purposes, evaluators need to continue to gauge whether program goals are being accomplished and whether there are any unintended consequences (e.g., intruding on program designs, using any one measure as the sole metric of outcomes).
The efforts to develop common metrics of important constructs—such as learning, engagement, and identity—have generated conversations about what should “count” as outcomes of out-of-school STEM programs and for which outcomes out-of-school programs should be held accountable. Work in this area has included metrics for measuring both youth outcomesac and program quality.ad
acSee the Common Instrument at http://www.pearweb.org/tools/commoninstrument.html [May 2015] and the Youth, Engagement, Attitudes and Knowledge Survey at http://www.4-h.org/about/youth-development-research/science-program-research/ [May 2015].