Assessing Learning Outcomes
A key to designing informal experiences to support learning is to clearly articulate the goals for a particular experience. That is, what should participants learn through their experience? In this book we define science learning in terms of six strands, which encompass excitement and interest spurred by an aspect of an informal experience, understanding scientific knowledge, engaging in scientific reasoning, reflecting on the nature of science, increased comfort with the tools and practices of the scientific community, and identifying with the scientific enterprise. In any given experience, particularly those that are very brief, it may be impossible to touch on all six strands. However, the design process should include explicit decisions about which outcomes are of primary interest, and for what audience.
Although informal science settings do not use the same tools to assess learning as schools do—tests, grades, and class rankings, for example—researchers, evaluators, and practitioners are nonetheless very interested in assessing how informal experiences contribute to the development of scientific knowledge and capabilities. The nature of informal settings presents a unique set of challenges in this effort, and the field struggles with theoretical, technical, and practical aspects of measuring learning. This chapter explores some of these challenges and the ways they have been addressed.
CHALLENGES OF ASSESSING SCIENCE LEARNING IN INFORMAL SETTINGS
The characteristics of informal learning environments make it very difficult to develop practical, evidence-centered ways to assess learning outcomes. For example, during a short trip to a museum, not only is assessment logistically complex, but also the data gathered are hard to interpret. It can be difficult to separate the
effects of a single visit from other factors that could be contributing to positive learning outcomes. And arranging for tests before and after the experience or setting up other traditional measures in many museums and science centers can be disruptive, or even inappropriate for the purpose that assessment may serve (for instance, assessment that is part of exhibit or program design and improvement). Thus, it is important to consider the rationale for assessing learning in informal science learning settings.
Another feature of informal science learning environments that creates challenges for assessment is that experiences cannot fully be prescribed or predetermined. Rather, the environments are learner-centered; so much of what happens emerges during the course of activities. Because each visitor, participant, or audience member seeks out his or her own unique experience, it is extremely difficult to establish a uniform intervention or activity that succeeds in assessing the overall impact of the informal science environment. Part of the problem, too, is the importance of not interfering with the unique, free choice or self-directed experience itself, because it is often that particular characteristic that inspires learning in the first place. The challenge thus becomes how to document the learning that occurs while not sacrificing the freedom and spontaneity that is integral to the experience.
The collaborative and social aspects inherent in many informal experiences also pose a challenge for assessing learning. Participants in summer camps, science centers, family activities, hobby groups, and such are generally encouraged to take full advantage of the social resources available in the setting to achieve their learning goals. The team designing a submersible in camp or a playgroup engineering a backyard fort can be thought of as having implicit permission to draw on the skills, knowledge, and strengths of those present as well as any additional resources available to get their goals accomplished. “Doing well” in informal settings often means acting in concert with others and accomplishing results in the process. Thus, assessments that focus on an individual’s performance alone may “under-measure” learning because they fail to take into account the material and human resources in the environment, even though making use of such resources is a hallmark of competent, adaptive behavior.1 In addition, assessing whether participants working in a group have grasped the science is important, but measuring the role that collaboration and problem solving have played in learning may be equally so. Teasing out this variable from individual assessment has proven to be difficult, and some have challenged the rationale for doing so in the first place. In addition, the learning accomplishment might be integral part of the experience—the back-
yard fort is strong and complete; a new level of a game has been reached; an arc built with wood blocks stands on its own. In such cases, separate “measures” of accomplishment might be inherently rejected by participants who may not understand how their experience in the activity can be tied to a school-like assessment task.
Despite the difficulties of assessing outcomes, researchers have managed to do important and valuable work. Many of their approaches rely on qualitative interpretations of evidence, in part because researchers are still in the stages of exploring features of the informal learning process rather than quantitatively testing hypotheses.2 Yet, as a body of work, assessment of learning in informal settings draws on the full breadth of educational and social scientific methods, using questionnaires, structured and semistructured interviews, focus groups, participant observation, journaling, think-aloud techniques, visual documentation, and video and audio recordings to gather data.
DEVELOPING APPROPRIATE ASSESSMENTS
A first step in developing assessments is identifying the anticipated learning goals. In order to identify appropriate goals it is also important to determine the audience. This determination can be complex. It is not sufficient to simply target a demographic group as the audience, such as teenagers or Latinos, in part because of the broad diversity within social or demographic groups and because of the risk of stereotyping. It is equally important to understand what knowledge, skills, and beliefs the target audience brings to the learning situation. For this reason, key stakeholders in the informal learning experience, including representatives from the institution or organization involved in designing it and members of the community it is meant to serve, should be brought into the planning process. In fact, defining outcomes and target audiences for informal science learning experiences can be the most challenging tasks in the assessment process because it requires a deep understanding about purpose and the various ways in which informal experiences may be connected to past and future learning experiences.
Once goals and audience have been identified, the means of measuring these goals need to be established. The development of assessments appropriate for science learning in informal environments should be guided by three criteria. First, the assessments must address the range of capabilities that the designers have in mind, including not only cognitive outcomes, but also attitudinal, behavioral, and social outcomes. The six strands introduced in Chapter 2 can guide a discussion
of learning outcomes. Second, assessments should fit with the kind of participant experiences that make informal learning environments attractive and engaging. Any assessment activities undertaken in these settings should not undermine the very features that make for effective learning. Third, the assessments must be valid; that is, they should measure what they purport to measure (construct validity) and align with opportunities for learning that are present in the environment (often referred to as ecological validity). In short, assessment measures should capture as much of the breadth of learning that a reasonable audience could experience, should align with the nature of the learning experience, and should represent in some faithful way the learning that actually occurs. Doing so is not easy.
Keeping these three criteria in mind, let’s consider how information was collected in some of the examples included in this book. WolfQuest, the computer game discussed in Chapter 1, used online surveys to collect data about the project. The survey asked questions related to the learning goals, as the first criterion suggests. Considering this was an online experience, it is fitting that an online assessment instrument was used. Because of the nature of the activity, assessment did not interfere with the experience. Finally, the assessment was valid; questions asked on the survey align with the project’s goals.
Through the WolfQuest online survey, evaluators asked participants what they knew about wolves before playing the game and what they learned as a result of the game. This pre-post strategy is often used to document changes in learning as a result of the informal experience. However, the evaluators for WolfQuest went further. They conducted content analyses of discussions between WolfQuest players and learned a lot about the social dimension of WolfQuest and ways in which playing the game encouraged cooperative learning and even extended the experiences into other parts of a player’s life. The evaluators also analyzed other forms of “embedded” data—data that are generated through natural engagement with the game (or, by extension, an exhibit, program, interpretive walk, etc.) and can be used to infer outcomes—to examine how using knowledge about wolf behavior and ecology helped players advance in the game. Using embedded data ensures that data collection does not interfere with the experience itself, thus fulfilling two of the three above-mentioned criteria: alignment with the experience and ensuring ecological validity.
There is increasing interest among practitioners, researchers, and evaluators in documenting long-term learning from informal experiences. When evaluators are interested in finding out whether information was retained over time, they typically get in touch with participants through phone or e-mail 1 to 4 months
“There is increasing interest among practitioners, researchers, and evaluators in documenting long-term learning from informal experiences.”
after the experience to see whether the experience had a lasting impact. The evaluation conducted for the IMAX film Coral Reef Adventure (Chapter 5) used this strategy and collected compelling information about how the film led to changes in attitudes and behaviors.
Another, more complex form of data collection is taping visitors’ conversations (or, as noted above in the WolfQuest example, analyzing written comments, blog contributions, etc.). As discussed in Chapter 4, this approach is logistically difficult to execute, and interpreting the data is equally challenging. Nonetheless, the information collected can be rich and revealing, indicating what visitors are focusing on, thinking and feeling, how they come to conclude or judge or make connections, and whether the experience has evoked powerful memories. The researchers investigating conversations in the frog exhibition at the Exploratorium by “listening in” on conversations between parents and children thought long and hard about how to gather data so that it would not interfere with the learning experience. The conversations revealed the full range of learning that occurred.
Precisely because of the challenges, collecting conversations is done less frequently than more traditional measures, such as exit interviews and surveys or tracking and timing methods, which are used to measure levels of engagement with an informal science experience. Typically used in museums, behaviors measured through structured observations like tracking and timing studies include what visitors pay attention to and for how long. In Cell Lab (Chapter 3), evaluators noted that visitors spent considerably more time than usual at the wet-lab benches. This finding illustrated that when learning is more complex, more time is needed—even if fewer people can go through the exhibit in one day. The measure does not, however, distinguish whether learners needed more time because of the complexity of the experience, or deliberately spent more time because the experience is more engaging and satisfying than other, similar experiences.
The Nature of Outcomes for Informal Science Learning
Although there is a diversity of thought in the informal science learning community about what outcomes are most important and what approaches to measuring them are most appropriate, there is an emerging consensus on core assumptions regarding the nature of outcomes in informal science learning.4 This consensus aligns with the three criteria mentioned earlier.
Outcomes can be broad in nature. Currently, many types of individual outcomes are being investigated by researchers and practitioners in the field. The breadth of these outcomes is captured in the six-strands framework. Measurements of outcomes could also allow for varied personal learning trajectories and for learning that is complex and holistic, rather than narrowly defined.
Outcomes can be unanticipated. Outcomes can be based on the goals and objectives of the program, or they can be unplanned and unanticipated, based on what individual learners find to be most valuable. Researchers and practitioners typically begin with outcomes that are defined in advance but then add outcomes that emerge from learners’ experiences.
Outcomes can become evident at different points in time. While short-term outcomes have long been used to assess the impact of informal learning experiences, it is becoming increasingly evident that these experiences can have enduring, long-term impacts as well.
Outcomes can occur at different scales. To date, most outcome measures are focused on determining how the individual was influenced by the experience. But it is also useful to consider how the entire social group was influenced. For example, did group members learn about one another? Did they reinforce group identity and history? Did they develop new strategies for collaborating with each other? In addition, outcomes can be defined on a community scale, measuring how an activity, exhibition, or program affected the local community.
In practical terms, the kinds of assessments that work best in informal settings are likely to be the ones that most closely match the setting’s learning activities. Before drawing conclusions about whether a particular experience has led to a particular outcome, researchers and practitioners should ask themselves:
Are the assessment activities similar in relevant ways to the learning activities? At the Cell Lab stations, the process of doing the activity also served as an assessment of how well the participants understood the point of the experiment and how to interpret their results. Designing activities that can also serve as an assessment tool works particularly well in informal settings. This process is referred to as embedded, or authentic, assessment, and is currently seen as one of the most appropriate forms for assessing informal experiences since assessment and experience align and therefore guarantee a high degree of validity.
Are the assessments based on the same social norms as those that promote engagement in the learning activities? For example, in assessing WolfQuest, the researchers used an online forum for assessment, which matched the nature of the activity. Social norms can easily be violated when using traditional assessment systems based on school-like testing procedures and measures. Testing participants individually when the experience was meant to be shared or assessing by using skills that should not be assumed (as in verbal skills or written skills) are examples of ways that the validity of the assessment could be threatened.
Is it clear that the learners have had ample opportunity to both learn and demonstrate desired outcomes? The teens working with young children at the homeless shelter in St. Louis (Chapter 3) illustrated how, over time, they not only learned relevant science content but also could demonstrate their new learning in multiple ways. However, it would have been inappropriate to assess the teens using a typical written exam, because they did not have the opportunities to learn or to demonstrate their competence in a similar form.
The NSF Evaluation Framework for Informal Science Education
Recognizing the challenge of developing appropriate, measurable, and valid impacts for informal experiences for learning science, the National Science Foundation (NSF) developed a set of impact categories that can be used to help guide planning, assessment, and evaluation of projects.5 The impact categories are as follows with connections to the six strands identified where appropriate:
Knowledge. Similar to Strand 2 (understanding scientific content and knowledge), this impact refers to knowledge, awareness, or understanding that visi-
tors can express in words or pictures that illustrate what has been learned during, immediately after, or long after a given experience.
Engagement. Similar to Strand 1 (sparking interest and excitement), this impact focuses on learners’ engagement and interest in science, including the emotions evoked by the experience. These emotions can range from excitement and delight to negative feelings, such as anger or sadness.
Attitude. This impact refers to a change in worldview or an increase in empathy as a result of an experience in an informal setting. Changes in attitude toward science, math, engineering, or technology are connected to Strand 6.
Behavior. This impact refers to projects whose purpose is to change visitors’ behaviors over the long term. Often these changes are sought after experiencing environmental or conservation projects. This category does not have a direct equivalent within the strands.
Skills. Similar to Strand 3 (engaging in scientific reasoning) and Strand 5 (using the tools and language of science), this impact focuses on the skills of scientific inquiry, such as observing, asking questions, predicting, testing predictions through experimentation, collecting data, and interpreting them.
The framework also allows for other than these five predetermined categories, thus recognizing that informal science settings may influence visitors, audiences, or participants in many different and important ways.
“Without a common framework specifying outcomes and approaches, it is difficult to show gains in learning that occur across experiences and/or across time.”
As a final note, while it is important to document the unique and valuable contributions of informal opportunities for learning, there is a tension in the field regarding the degree to which one can or should try to standardize assessments of learning. On the one hand, the field has an overarching commitment to valuing the great diversity of ways in which informal learning experiences can positively affect participants. On the other hand, researchers and practitioners recognize the importance of building consensus in the field regarding standards for research methods and learning outcomes. Without a common framework specifying outcomes and approaches, it is difficult to show gains in learning that occur across experiences and/or across time. Success in creating more rigorous, meaningful, and equitable opportunities for science learning depends on understanding what opportunities for science learning exist across the educational landscape; what the nature of this learning is in a variety of environments; how outcomes currently complement and build on one another; and how designs, processes, and practices for supporting learning can be improved in the future.
ASSESSMENT AND EVALUATION
The educational research community generally makes a distinction between assessment and evaluation. Assessment is the set of approaches and techniques used to determine what individuals learn from an experience. Evaluation is the set of approaches and techniques used to make judgments about the effectiveness or quality of a program, approach, or treatment; improve its effectiveness; and inform decisions about its design, development, and implementation. Assessment targets what learners have or have not learned, whereas evaluation targets the quality of the experience or intervention. While assessment is often, though not always, part of an evaluation, it is important to recognize that they are separate endeavors.
The broad enterprise of evaluation includes various phases, which tend to be referred to in the field of informal science education as front-end evaluation, formative evaluation, and summative evaluation. The first stage of program development, which includes identifying appropriate goals and determining the audience, is often referred to as front-end evaluation. During the design and development phase of a project, formative evaluation is often conducted. The purpose of this step is to determine what is working—or not working—before the project is completed. In other words, front-end evaluation helps challenge assumptions and pro-
vides important information that helps conceptualize a project. Formative evaluation is part of evidence-based design processes that are open to cycles of design and testing. In The Mind exhibition developed at the Exploratorium, for example, Erik Thogersen went through a formative evaluation phase by prototyping different possibilities for the exhibition with visitors to the museum. Some ideas became part of the finished exhibition, and others were rejected. This process provides a midpoint check for developers so that they continue to question their assumptions about the project, consider whether goals and objectives are being met, and make necessary changes before the project is completed.
The final phase of the evaluation process is the summative evaluation. Conducted after the project is completed, the purpose of this evaluation is to document whether the learning goals (or any other goals for that matter) established at the beginning of the project (and likely updated over time) were met and whether there is room for improvement. During this phase, some unplanned-for learning outcomes may also be noted. Summative evaluations document program or project success and are often done as part of accountability measures, although more and more, calls for generalizable results move many summative evaluation designs into research with the potential to inform practice or contribute to the overall knowledge base.
A full discussion of the complexities of evaluation, including appropriate designs, is beyond the scope of this book. The NSF Evaluation Framework is a good starting point for a deeper exploration of issues related to evaluation and includes a list of resources in its appendixes.
This chapter discusses an approach that can be used as a guide in planning and executing assessments in informal science environments. A key element of this approach is the value of up-front planning, during which it is important to set goals for the project and get to know the audience. Knowing what the learning goals are at the beginning of the project can help contribute to its success and is essential for effective assessments.
The next phase of the assessment process is the development of assessments that appropriately measure what they purport to measure in ways that are in keeping with the nature of the learning experience. The free-choice, self-directed nature of many informal learning experiences makes the development of such assessment measures difficult. Assessment measures need to be collected in ways that do not violate the experience itself. Embedded or authentic assessments may most appropriately document science learning from informal experiences.
Although much progress has been made in understanding how to conduct assessments and interpret data gleaned from the process, more work needs to be done. By continuing to build consensus about the best way to perform assessments, it will become easier to develop effective ways to document the learning that occurs in a wide range of informal science settings.
Things to Try
To apply the ideas presented in this chapter to informal settings, consider the following:
Is there a clear link between planning and assessment? Evaluators are realizing the importance of connecting the planning process to evaluation goals. Has this idea gained traction in your setting? Do you see ways to link the two processes? Do you align the kinds of data you collect with your goals?
Consider whether you have defined appropriate goals, outcomes, and indicators that guide assessment. Are these goals appropriate for the experience? Are you aiming too high (“increase science literacy in the United States through interpretive walks in our arboretum”) or too low (“the 3-week biodiversity trip to Costa Rica’s main nature preserves will increase awareness for the need to protect local resources”)? Are goals defined in ways that capture the breadth and depth of learning outcomes for all important audiences?
Consider unintended outcomes. Assessment can focus on clearly defined outcomes, but in informal settings there are often far more outcomes than can be programmed for or can be assessed. Do you have a full understanding of the learning benefits that your audiences or participants derive? Talk to your participants or your audiences about the way they see themselves benefiting from
the experiences offered there. This can help you to identify additional unintended outcomes if brainstorming and careful analysis have not mapped the entire spectrum of possible outcomes already.
Think about how to refine assessment instruments. This chapter discussed assessment instruments and some assessment approaches and how they fit the outcomes they are designed to measure. Consider whether assessment instruments currently being used at your institution can be modified and improved based on these ideas.
Share with others. There are many ways to share your assessment experience with others. Make your insights available to the community of informal science educators and draw from other experiences as well.
Utilize external resources. A variety of online resources are now available that support assessment and evaluation, and articles and books that address these issues are of increasing quality. Websites that archive resources in informal science learning and teaching (http://www.informalscience.org), after-school program assessment (http://atis.pearweb.org/), or visitor studies (http://www.visitorstudies.org) provide gateways into the assessment and evaluation community.
Seek outside expertise. Cooperation and collaboration with academic institutions or professional evaluators will provide access to important knowledge and skills, but ensure that the professionals have the appropriate qualifications and experience to address the unique features and complexities of informal science learning.
For Further Reading
Clipman, J.M. (2005). Development of the Museum Affect Scale and Visit Inspiration Checklist. Paper presented at the annual meeting of the Visitor Studies Association, Philadelphia. Available: http://www.visitorstudiesarchives.org [accessed February 2010].
Falk, J.H., Reinhard, E.M., Vernon, C.L., Bronnenkant, K., Deans, N.L., and Heimlich, J.E. (2007). Why Zoos and Aquariums Matter: Assessing the Impact of a Visit. Silver Spring, MD: Association of Zoos and Aquariums.
Friedman, A. (Ed.). (2008). Framework for Evaluating Impacts of Informal Science Education Projects. Available: http://insci.org/resources/Eval_Framework.pdf [accessed February 2010].
Garibay, C. (2005, July). Visitor Studies and Underrepresented Audiences. Paper presented at the annual meeting of the Visitor Studies Association, Philadelphia.
National Research Council. (2009). Introduction. Chapter 3 in Committee on Learning Science in Informal Environments, Learning Science in Informal Environments: People, Places, and Pursuits. P. Bell, B. Lewenstein, A.W. Shouse, and M.A. Feder (Eds.). Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
Center for the Advancement of Informal Science Education (CAISE): http://caise.insci.org/
Informal Science: http://www.informalscience.org/
Institute for Learning Innovation: http://www.ilinet.org
Visitor Studies Association: http://www.visitorstudies.org