Part 1 of Session 5 focused on theory concerning the guidelines, and Part 2 focused on evaluation and practice.
|Moderator||Theresa Schwerin, Institute for Global Environmental Strategies|
|Speakers||Martin Orland, WestEd
Steve Schneider, WestEd
Making the Right Choices: How to Get the Most Value out of eVALUation!
Martin Orland and Steve Schneider, WestEd
The presentation prepared by Martin Orland and Steve Schneider, both of WestEd, was delivered by Orland. He began by stating that at its core, scientific inquiry is the same in all fields. It is “a continual process of rigorous reasoning, supported by a dynamic interplay among methods, theories, and findings. It builds understandings in the forms of models or theories that can be tested,” he said. Orland commented that during the workshop, participants have spoken a lot about evidence and how to encourage a passion for evidence among students. He suggested that an equal agenda and passion are needed in terms of the nature of scientific inquiry and education.
Orland explained that the Common Guidelines for Education Research and Development (generally referred to as “the Guidelines”) that were jointly developed by the National Science Foundation (NSF) and the Institute of Education Sciences of the Department of Education are an attempt to demystify scientific processes as they are applied to educational research and evaluation.1 The Guidelines are a cross-agency framework that describes broad types of research and development (R&D) and “the expected purposes, justifications, and contributions of various types of agency-supported research to knowledge generation about interventions and strategies for improving learning.”
Orland said the Guidelines are necessary because the American education system needs research to produce stronger evidence at a faster pace. Constrained federal resources demand that NSF and the Department of Educa-
1 Institute of Education Sciences, U.S. Department of Education and the National Science Foundation, “Common Guidelines for Education Research and Development,” 2013, pp. 1-53.
tion and other agencies purposefully build on each other’s research and development portfolios. The Guidelines provide a cross-agency vocabulary and set of research expectations that are critical for effective communication.
Knowledge development in education is not strictly linear, Orland said. There are three categories of educational research: core knowledge building, design and development, and studies of impact. These all overlap. This requires researchers and practitioners representing a range of disciplines and expertise. It may require more studies for basic exploration and design than for testing the effectiveness of a fully developed intervention or strategy. It also requires assessment of implementation and not just the estimation of impacts. Finally, it includes attention to learning in multiple settings, both formal and informal, Orland explained.
The Guidelines are organized according to the following:
- Purpose. How does this type of research contribute to the evidence base?
- Justification. How should policy and practical significance be demonstrated? What types of theoretical and/or empirical arguments should be made for continuing this study?
- Outcomes. Generally speaking, what types of outcomes (theory and empirical evidence) should the product produce?
- Research plan. What are the key features of a research design for this type of study?
- External feedback plan. A series of external, critical reviews of project design and activities. Review activities may entail peer review of the proposed project, external review panels or advisory boards, a third party evaluator, or peer review of publications. External review should be sufficiently independent and rigorous to influence and improve quality.
Orland said that the Guidelines will not preclude innovative projects. They are intended to help principal investigators in proposal preparation. He said the key point is to ensure that projects are explicit about their research questions, methods, and analytic approaches in their proposals. The criteria should be relevant for all types of education R&D efforts. The Guidelines can help practitioners develop a better understanding of what different types of education research should address and might be expected to produce, he said.
Orland explained that the Guidelines apply to proposals, but they foreshadow what will come from the R&D effort. Each section of the Guidelines is connected to evidence of some aspect of the proposal and the proposed work. Throughout, the Guidelines provide explicit and implicit messages about what counts as evidence and what needs to be considered.
Orland concluded by noting, “So in a sense, you are building the airplane while you are flying it. You are both doing this exploratory research, and you are learning as you are going, so that you are refining and improving.”
|Moderator||Theresa Schwerin, Institute for Global Environmental Strategies|
|Speaker||Hilarie Davis, TLC, Inc.|
|Panelists||Bonnie Eisenhamer, Space Telescope Science Institute
Jenny Gutbezahl, Brandeis University
Frances Lawrenz, University of Minnesota
Using Evaluation to Increase and Measure the Impact of Education
Hilarie Davis, TLC, Inc.
Hilarie Davis of TLC, Inc., started Part 2 of Session 5 by showing an objective from the NASA’s 2014 strategic plan:2
2 NASA, NASA Strategic Plan 2014, Washington, D.C., p. 34.
Advance the nation’s STEM [science, technology, engineering, and mathematics] education and workforce pipeline by working collaboratively with other agencies to engage students, teachers, and faculty in NASA’s missions and unique assets.
Davis discussed the barriers to evaluation identified from surveys of NASA education specialists. These include the following:
- Evaluation seems like it is being done for someone else [outside of the school system] instead of for improving the program;
- The evaluation topic is not close enough to the work being done to be meaningful to the teacher;
- The evaluation is not realistic in its scope or methods;
- Evaluations are too costly for the perceived value; and
- The evaluations feel like an audit or judgment of the people and/or program.
She said that evidence indicated that the barriers could be overcome. NASA’s Science Mission Directorate (SMD) held forums at which more than 200 people attended evaluation sessions. The attendees reported that the session had significantly affected their understanding of evaluation, their perception of its value, and their intention to use it in the future. They identified strategies for overcoming barriers. These included the following:
- Embed evaluation in the whole project cycle—provide feedback and support for this;
- Give the evaluation credibility by involving the stakeholders appropriately;
- Build the evaluation around questions that are important;
- Use reasonable, practical approaches to collect data;
- Be clear about the purpose of the evaluation; and
- Use the results of the evaluation to guide decision-making about program elements, goals, and funding.
An example was given of a valuable evaluation that was done for seventh and eighth graders using results
FIGURE 6.1 An example of data produced by NASA’s Global Precipitation Mission (GPM). GPM was used in curriculum taught to students as part of a test on how well they retained concepts over time compared to the standard curriculum. SOURCE: Courtesy of NASA’s Scientific Visualization Studio; data provided by the joint NASA/JAXA GPM mission.
from the Global Precipitation Measurement mission (GPM) (Figure 6.1). Curriculum concepts were taught to one group of students using GPM as an example. Then the students given the GPM example were compared to general curriculum students. The two groups did equally well on knowledge tests for seven lessons during the year, but the students given the GPM example did better than the general curriculum students on the retention of concepts in an end-of-the-year test. The retention of the science curriculum was enhanced by the context.
Davis mentioned the intern program at NASA Ames Research Center, where rural high school students worked with Ames astrobiologists studying extremophiles in nearby Lassen Volcanic National Park. Before they started, and at additional times during the year, the students answered core questions about astrobiology that the science team used to guide their interactions with the students. The students collected and analyzed data. The students later presented their findings to the community to demonstrate their understanding of the science.
Davis also discussed several other NASA projects, including the Magnetosphere Multiscale Mission, the Heliophysics Education Ambassadors, and the Adler IBEX After School Club.
Davis said that evaluation can be as important as the work itself, and whenever possible one should include an evaluator at the beginning stages of designing new projects. “There is no point in doing the work unless you can prove its worth,” she noted.
As to why the strategies employed to overcome barriers work, she listed several reasons:
- People like feedback—not judgment. Judgment feels punitive while feedback feels helpful.
- People want to do well—they set out to succeed, not to fail—so they appreciate a fair assessment that may help them improve.
- Evaluation throughout the project cycle improves it every step of the way, so there are a lot of chances to improve.
- People want answers to their questions, so when they help develop the questions, they care about the answers.
- People improve when they have a clear path to getting better, which is why they say the project cycle rubric helps.
- People delivering programs know where and how good data can be most effectively collected.
- Evaluators do a better job when stakeholders evaluate their evaluation plans, methods, and measures for value and validity. Stakeholders are also experts.
- Decisions based on good data about a program are honest and productive; decisions made without good evaluation data are suspect and feel arbitrary, which discourages productivity.
Davis concluded by stating that “through evaluation we are able to collect evidence and develop explanatory models of how to bring back the wonder for teachers and students to know, care about, and pursue NASA and STEM learning.”
Hilarie Davis and Theresa Schwerin joined Bonnie Eisenhamer, Jenny Gutbezahl, and Frances Lawrenz for the panel discussion. The organizing committee developed the following guiding questions to provide focus to the panel discussion:
- Why and how does NASA evaluate the programs it executes?
- What are examples of evidence that the evaluation of NASA’s programs is providing useful information to improve the programs?
- How does NASA make a difference in STEM education, and how is this known?
- What are the greatest challenges or barriers that people have encountered related to SMD education evaluation? What strategies have been used or recommended for addressing these barriers?
- How does the evaluation of NASA programs compare to the model presented for education by Orland and Schneider in Part 1 of this session?
- What is the mechanism by which the results of evaluation change NASA education programs?
Lawrenz started the session by noting that that evaluation has been underfunded. There have also been unrealistic expectations and an inability to address “real” questions as well as sampling bias. But new models are available for NASA. He suggested that “less is more” and that it is important to be selective about which battles to fight.
Eisenhamer said that it is important to “plan evaluation with your end goal in sight.” In addition, it is important that
- Evaluation questions are well matched to define purpose and strategies; and
- Evaluation questions and methods are appropriate to the stage and maturity of the program.
Eisenhamer noted that front-end planning is needed, and evaluation is planned with the end outcomes in mind. A new direction for SMD education will require the involvement of an evaluator from the beginning of the program, she said.
Gutbezahl was involved in the 1997-2000 NASA Space Science Education and Public Outreach effort and is seeing now many of the same challenges she observed more than 15 years ago. These include a culture clash between scientists and educators, a lack of coordination across the system leading to gaps and redundancies, and challenges between going for depth versus breadth. She said that she is also seeing many of the same strategies to address these problems, including creating common goals to overcome culture differences and going to the users to discover their needs.
Gutbezahl also noted that current evaluation places more emphasis on “empirical evidence,” meaning numbers “which are really not any more empirical than qualitative data,” she said. This leads to an “emphasis on breadth, because it is easier to count noses than measure true impact.”
There are often two outcomes: not enough data on what works, or great data showing that something does not work. The panelists stressed that the issues are long-term and have to be addressed as such.