Read "NASA's Elementary and Secondary Education Program: Review and Critique" at NAP.edu

« Previous: 4 Analysis of NASA's K-12 Education Portfolio

Page 90 Cite

Suggested Citation:"5 Program Evaluation." National Research Council. 2008. NASA's Elementary and Secondary Education Program: Review and Critique. Washington, DC: The National Academies Press. doi: 10.17226/12081.

Page 91 Cite

Page 92 Cite

Page 93 Cite

Page 94 Cite

Page 95 Cite

Page 96 Cite

Page 97 Cite

Page 98 Cite

Page 99 Cite

Page 100 Cite

Page 101 Cite

Page 102 Cite

Page 103 Cite

Page 104 Cite

Page 105 Cite

Page 106 Cite

Page 107 Cite

Page 108 Cite

Page 109 Cite

Page 110 Cite

Page 111 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 Program Evaluation I n this chapter we examine the Office of Educationâs approach to pro- gram and project review and evaluation. Evaluation of its K-12 educa- tion activities is the mechanism NASA can use to determine the extent to which the Elementary and Secondary Program is meeting its goals. This determination is critical not only because there is a need for accountability regarding the expenditure of government funds, but also because there is a need for ongoing program improvement. Program and project evaluation can answer questions about whether projects are advancing scientific and mathematical literacy; motivating young peopleâs interest in science, tech- nology, engineering, and mathematics (STEM) subjects; increasing studentsâ knowledge of STEM content; and encouraging young people, especially those from groups that are underrepresented in STEM fields, to become familiar with and pursue STEM careers. Evaluation of NASAâs K-12 education program and its related projects is challenging and requires significant resources and expertise in evaluation. The program goals are broad, and the projects are diverse in their scope and design. The goal of engaging students in STEM activities is particularly challenging for evaluation because âengagementâ is difficult to measure, and it requires tracking over time. In addition, NASAâs K-12 education projects, in an attempt to address local or regional issues, often vary from location to location, and evaluation design must take that variation into account. Finally, due to the wide range of experiences and activities that teachers and students bring to and participate in at school and in their everyday lives, the specific effect of NASAâs programs, particularly short- term programs, may be difficult to determine. 90

PROGRAM EVALUATION 91 This chapter is not intended to provide step-by-step guidance on how to conduct evaluations. Rather, we describe major stages in the evalua- tion process and discuss how NASA could improve its efforts related to each of those stages. Following an initial discussion of evaluation issues with some reference to NASA, the chapter is organized by the major components involved in evaluating programs, from design to evaluation of impact. The chapter draws in part on a paper the committee commis- sioned by Frances Lawrenz to review a set of ten external evaluations of NASAâs K-12 Âprojects, including the Aerospace Education Services Project (AESP), NASA Explorer Schools (NES), a module of the Digital Learning Network (DLN), and EarthKAM (Lawrenz, 2007). Lawrenz also reviewed evaluations of two programs that are outside the headquarters Office of Education: GLOBE and the Sun-Earth Day event. TableÂ 5-1 summarizes key aspects of the evaluations, including the questions and the design or methods. ISSUES IN EVALUATION The evaluation of education programs is a well-codified practice. There is a professional organization of evaluators, several related journals, and a code of ethics. There are established methods for framing evaluation questions; for hypothesizing the theories of change or of action by which a program expects to reach its goals; for developing measures of the extent to which the stages of a theory are realized; and for crafting an evaluation design, collecting data, analyzing the data, and reaching conclusions about the import of the investigation. Although there are disputes in the field about such issues as the best design to use for particular kinds of questions, the practices are widely understood and accepted. In carrying out a specific program evaluation, it is important to be clear about the intended goals and objectives of a program, as well as to distinguish the purposes of the evaluation itself, in order to frame questions appropriately and design the evaluation to address those questions. The key to an effective evaluation is a design that answers the specific questions that are relevant for decisions at a given time. Sometimes, quantitative data may be necessary; at other times rich qualitative data are more responsive to the specific questions. One way to arrive at priority questions for an evaluation is to consider the major audience for the evaluation and how the results from the evalu- ation will be used. It is important to recognize that one evaluation by itself may not be able to provide the necessary information to meet the needs of different audiences or the decision at hand. For example, program or project developers might want information on how to improve a program; congressional aides might want to know if the program improves student

TABLE 5-1â Descriptions of Reports from External Evaluations of the Core Projects 92 Program Report Title Evaluation Questions/Focus Type of Data NASA Brief 1âEvaluation Plan for a design experiment and a comparative study. No data available. Explorer Framework: Schools Evaluating the Quality and Impact of the NASA Explorer Schools Program (McGee, Hernandez, and Kirby, 2003) NASA Brief 2âA â¢ What is the profile of schools designated as NASA Explorer Schools? Surveys and review of Explorer Program in the â¢ What are the top target standards of selected schools? applications and workshop Schools Making: Evidence â¢ What are the participantsâ perspectives and beliefs about teaching, agendas. from Summer learning, and technology? 2003 Workshops â¢ Who participated in the summer 2003 workshops and what did (Hernandez et al., theyÂ do? 2004) â¢ What was the participantsâ feedback on summer workshops? NASA Brief 3âA Program â¢ What is the contextual background/conditions of participating Use of existing data, surveys Explorer in the Making: schools? of participants, focus groups Schools YearÂ 1 Annual â¢ How did the school teams organize to meet their goals? of participants and program Report (Hernandez â¢ How did school teamsâ strategic planning approaches work? personnel. et al., 2004) â¢ What is the quality of professional development supports? â¢ How did overall NES program guidelines/supports facilitate participation? â¢ What is the impact of program participation at end of yearÂ 1?

NASA Brief 4âEvidence â¢ How is the NES model being implemented? Several different data gathering Explorer That the Model is â¢ How does NES encourage more involvement with NASA program methods were used from three Schools Working (Davis, products and services? main perspectives: NASA Palak, Martin, and â¢ How does NASA involvement increase teacher professional growth? personnel, schools, and students Ruberg, 2006) â¢ What is the effect of the program on school administrators? and families. Methods included â¢ What is the effect of the program on family/caregiver involvement? surveys, content assessments, â¢ What is the effect of the program on studentsâ interest, career interviews, observations, aspirations, and knowledge of science, technology, engineering, document analyses, and mathematics, and geography? interactions. NASA Evaluation Plan â¢ Overall Question: What is the relationship of the nature and extent Data not yet available; proposed Explorer 2006-2007 (Paragon of a schoolâs involvement to their success in developing teachersâ NEEIS data and other surveys Schools TEC, 2006) competence in using NASA STEM-G resources and student interest, of teachers and students; attitude and achievement in STEM-G? student content tests for selected â¢ What is the nature of an NES schoolâs use of NASA resources? students; surveys of staff. â¢ What is the extent of an NES schoolâs use of NASA resources? â¢ In what ways and to what extent do the short-duration professional development activities associated with being a NASA Explorer School affect teachersâ confidence, competence, and use of NASA for STEMâG instruction? â¢ In what ways and to what extent do the long-duration professional development activities associated with being a NASA Explorer School affect teachersâ confidence, competence and use of NASA for STEM- G instruction? â¢ In what ways does NES involvement affect family involvement? â¢ To what extent does NES involvement affect family involvement? â¢ To what extent does NES involvement affect student interest in STEM-G topics? â¢ To what extent does NES involvement affect student attitude toward STEM-G careers? 93 continued

TABLE 5-1â Continued 94 Program Report Title Evaluation Questions/Focus Type of Data NASA NASA International To evaluate the program against the NASA educational goals and Interviews of project staff and Inter- Space Station provide strategic recommendations for future directions. participants; use of NEEIS data; national EarthKAM Program site visit to UCSD. Space Evaluation Report Station (Ba and Sosnowy, EarthKAM 2006) Program Digital Digital Learning â¢ Develop an assessment device for the reduced gravity module. Content assessment test. Learning Network Evaluation â¢ Develop a rubric for assessing the quality of DLN modules with Network Tool Development extended definitions. (DLN) Reduced Gravity Module (Davis, Davey, Manzer, and Peterson, 2006) Aerospace Evaluation of the There are 19 evaluation questions addressing the following 5 areas: Delphi survey, surveys of Education NASA Aerospace â¢ program design and management. specialists and telephone Services Education Services â¢ support of systemic improvement. interviews with center staff, Project Project (Horn and â¢ teacher preparation and enhancement programs that support AESP State Impact Survey, face- (AESP) McKinley, 2004) systemic reform. to-face interviews with AESP and â¢ student support. center staff, site visits document â¢ curriculum and dissemination. review, NEEIS data.

AESP The Final Report â¢ With whom does AESP cooperate and support for delivery of NASA Site case studies, surveys, NEEIS of a Study of programs to students, teachers, and others? data. the Aerospace â¢ What is the form and nature of this cooperation and delivery of Education Services services? Project (AESP) â¢ How effective is AESP in its provision of support services for its Role and Impact NASA and non-NASA partners? Among Selected â¢ How do these cooperative actions and provision of services to other Partners (Horn and NASA partners impact on the traditional role of AESP? McKinley, 2006) â¢ What are the elements or activities of AESP that contribute most to NASAâs major education goals? â¢ What are some exemplary cases in which AESP specialistsâ work has impact? NOTES: STEM-G = Science, technology, engineering, mathematics, and geography; NEEIS = NASA Education Evaluation Information System. 95

96 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM achievement and contributes to the national scientific effort; and high-level NASA administrators may want to know that the educational programs are consistent with the agencyâs overall goals. Evaluators need to consider which types of questions would be most relevant and produce the most use- ful outcomes by discussing the evaluation with the various audiences and establishing priorities. Resources will always be limited, and how the data are likely to be used should affect the basic questions and design. Broadly speaking, there are three sequential, overlapping stages in program evaluation: 1. evaluation for purposes of developing a program; 2. evaluation to find out how a program has been implemented in a number of settings, including adherence to the original design or effective local adaptation (formative evaluation); and 3. evaluation of the effects (impact) of the program, both short and long term (summative evaluation). As an evaluation proceeds through these stages, it generally progresses from a situation in which a close connection between the program developer or implementer and the evaluator is necessary, to one in which a distinct sepa- ration between the program evaluator and the program itself is important. In most cases, an impact evaluation should be carried out by an individual or organization external to a programâs administration. An Evaluation Plan An overall evaluation plan is needed to address how well a program as a whole is achieving its stated goals and objectives. Such a plan must be based on focused evaluation of the outcomes of individual projects. With appro- priate analysis, the individual project evaluations can show how well overall goals are being achieved. Currently, the NASA Office of Education lacks an overall evaluation plan for the K-12 education program and its projects. Given resource constraints, evaluations of individual projects can be scheduled on a cyclical basis, with high priority given to projects intended to have the greatest impact on student engagement and learning and to projects that face important questions about activities, participants, staff- ing, funding, or organization. Both formative and outcome evaluations can usually be scheduled in advance. For example, reports about program effectiveness may be scheduled on a periodic basis: staff can plan for out- come evaluations in advance over a 4â5 year period, rotating the projects in the portfolio. On occasion, questions may arise unexpectedly, and an evaluation would be useful in answering these questions. For example, during the

PROGRAM EVALUATION 97 development of a new program, early experience may suggest that the tar- get audience is not engaged. Evaluation may help to answer whether the wrong age group is being addressed, the wrong materials are being used, the nature of the pedagogy is inappropriate, or the activities are already being provided from other sources. An evaluation plan would also outline the mechanisms by which evaluation results can be communicated to decision makers and help to inform project implementation. Lawrenzâs (2007) review of existing external evaluations suggests such mechanisms are currently absent in NASA. It appears that few, if any, of NASAâs decisions about the agencyâs education programs have been based on evaluation reports. Lawrenz speculates that the evaluations may not have provided the information needed to make decisions or that the politi- cal environment may move more rapidly than the evaluation environment, and perhaps the reports were not available when decisions were made. Factors like these need to be taken into account when developing an evalu- ation plan. Currently, the overall Elementary and Secondary Program is periodi- cally reviewed, but it has not undergone a true external evaluation. More- over, the timing of external evaluations of individual projects appears to have been determined by individual project officers with little strategic coordination across the program. This situation does appear to be chang- ing. There is a plan for evaluation in the Strategic Framework for Educa- tion (National Aeronautics and Space Administration, 2006a). In testimony before the House Subcommittee on Research and Science Education of the Committee on Science and Technology on June 6, 2007, Joyce Winterton, the assistant administrator for education at NASA, acknowledged the need for program and project evaluation and outlined the steps NASA has taken to address evaluation (Winterton, 2007): The Agencyâs many Education initiatives have not been evaluated in a com- prehensive, rigorous manner to indicate how well all of our programs are performing in support of our outcome goals. We are committed, however, to enhancing and improving our evaluation procedures. The Agency has taken several major steps to improve the evaluation function by: (a) incor- porating a detailed evaluation plan into its Education Strategy Framework; (b) defining an enhanced set of outcome-based performance measures; articulating specific roles and responsibilities to ensure accountability; and, (c) allocating the resources necessary to support rigorous evaluations and the overall evaluation function. Costs Evaluation, especially evaluation of impact, can be expensive. Past headquarters Office of Education budgets for evaluation appear to be rela-

98 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM tively small, but it was difficult for the committee to obtain exact figures because evaluation costs are not listed as a separate budget category; rather, they are included in overall project costs. As a result, there is no way to account for the total amount spent on evaluation across projects. A rule of thumb for evaluating programs is that at least 5 percent of the total budget should be devoted to evaluation: reports from project Âmanagers are that this level of funding for evaluation has not been provided. Insuf- ficient funds severely limit the scope and nature of any evaluation. Given limited overall funds, it is critical that NASA develop a plan for allocating the funds that are available for evaluation. PROGRAM AND PROJECT DESIGN The evaluation process can and should begin with the initial design of a program or project. For example, once the goals of a proposed program are specified, the agency can describe the theory of action underlying the program designâhow the planned activities are expected to lead to the desired outcomesâciting the appropriate evidence that supports particular elements of the program design (Weiss, 2007). As a next step, a âdesign critiqueâ of a proposed program or project may be appropriate to help improve the design, or in some cases that step will lead to a decision to not go forward if the objectives cannot be met with the proposed design. This kind of design critique is not expensive and requires only modest amounts of time from people who understand both the system that is being t Â argeted for improvement and what has been learned in prior efforts (Weiss, 2007). It may also be appropriate at the design stage to carry out a planning evaluation in which evaluators are involved to help diagnose and define the condition that a given project is designed to address, to state clearly and precisely the goals of the project, and to review the proposed procedures for obtaining accurate information and for the soundness of the evaluation methods (Rossi and Freeman, 1993; Weiss, 1998). The result can provide a more detailed description of a project, including major goals and objectives, activities, participants, resources, timeline, and intended accomplishments. It can also help to document the state of key outcomes prior to the project in order to provide a baseline for measuring impact. NASA has begun to build a theory of action in its strategic framework with the pyramid and the push-pull model, described in Chapter 2. This framework and model, however, have very little specificity. More detail about mechanisms and expected effects based on research is needed for individual projects. The model developed as part of the NES evaluation is an example, though it is very detailed and somewhat difficult to use because of its complexity.

PROGRAM EVALUATION 99 As noted in Chapter 4, NASA could improve efforts to subject pro- gram and project designs to appropriate analysis. One approach would be to have the program or project design and theory of action and evidence presented in support of the design critiqued by a small number of external experts, perhaps by forming an advisory group, or through soliciting ad hoc Âreviewers (Weiss, 2007). Specifying and Measuring Program Outcomes An important element of project design is the specification of desired outcomes and deciding how those outcomes will be measured. NASA has taken this step at the program level by specifying a set of outputs and out- comes for the major objectives of the K-12 education program as a whole (see Table 5-2). These specifications are important for guiding both internal and external evaluations of the overall program. Although NASAâs specified outputs and outcomes developed for each program objective are appropriate, there are three areas for improvement. First, in some cases, the proposed outcome is not a good representation of the objective: that is, the outcome does not have good face validity as a measure of the objective (Kerlinger and Lee, 2000; Moiser 1947). ÂSecond, the proposed outcome is actually difficult or impossible to measure. Third, the data collected for an outcome will be difficult to interpret. The rest of this section discusses NASAâs specified objectives, outputs, and outcomes and the areas for improvement; Table 5-2 provides an overview of the objectives, outputs, and outcomes. Educator Professional DevelopmentâShort Duration The objective for short-term professional development sessions is to engage teachers. The output identified for this objective is the number of teachers participating in a session. The outcome is the number of teachers using âNASA STEM resourcesâ and rating them as effective. Given the limited goal of engagement and the short duration of the session, these mea- sures seem reasonable. However, this sort of measure of output could press NASA to simply offer sessions to more and more teachers, regardless of how effectively they might be turning their engagement into implementing any changes in their teaching. Furthermore, the count of teachers who par- ticipated in a session may be difficult to interpret and may make sense only when compared to previous yearsâ enrollment or some other such measure. Finally, given the approximately 2 million teachers in the United States, the number of teachers reached is unlikely to be significant by itself. The outcomes (use and perceived effectiveness of materials) may also present some difficulties. Use implies that a shift in the science curriculum in

TABLE 5-2â Objectives, Outputs, and Outcomes for the Elementary and Secondary Program 100 Objective Output Outcome 2.1 Educator Professional Developmentâ 2.1.1 Number of elementary and secondary 2.1.2 Percentage of elementary and secondary Short Duration educators participating in NASA-sponsored educators using NASA content-based STEM Objective: (Engage) Provide short-duration short-term professional development resources in the classroom. professional development and training opportunities. 2.1.3 Percentage of elementary and secondary opportunities to educators, equipping them educators using NASA content-based STEM with the skills and knowledge to attract and resources in the classroom who rate the retain students in STEM disciplines. resources as effective. 2.2 Educator Professional Developmentâ 2.2.1 Number of elementary and secondary 2.2.3 Number of teachers who use NASA Long Duration educators participating in NASA-sponsored content or resources as a result of another Objective: (Educate) Provide long-duration professional development opportunities. teacherâs direct involvement with a NASA and/or sustained professional development 2.2.2 Number of colleges and universities program. training opportunities to educators that training elementary and secondary educators 2.2.4 Percentage of NASA teacher program result in deeper content understanding and/or who partner with NASA in their STEM participants who become active within a competence and confidence in teaching teacher educator programs. national network to train other teachers. STEM disciplines. 2.2.5 Percentage of elementary and secondary educators who participate in NASA training programs who use NASA resources in their classroom instruction. 2.2.6 Evidence that teachers who use NASA resources perceive themselves as more effective teachers in achieving STEM results with their students. 2.2.7 Percentage of higher education partners that use NASA resources in STEM preservice education methods courses and student teaching experiences.

2.3 Curricular Support Resources 2.3.1 Quantity, type, and cost of educational 2.3.4 Customer satisfaction data regarding Objective: (Educate) Provide curricular resources being produced. relevance of NASA educational resources. support resources that use NASA themes 2.3.2 Quantity, type, and cost of educational 2.3.5 Customer satisfaction data regarding and content to (a) enhance student skills and resources approved through the NASA effectiveness of NASA educational resources. proficiency in STEM disciplines (Educate); education product review process. (b) inform students about STEM career 2.3.3 Number of approved materials that are opportunities (Engage); (c) communicate electronically accessible. information about NASAâs mission activities (Engage). 2.4 Student Involvement K-12 2.4.1 Number of elementary and secondary 2.4.5 Activities and investigations result in Objective: (Engage) Provide K-12 students student participants in NASA instructional increased student interest in STEM. with authentic first-hand opportunities to and enrichment activities. 2.4.6 Activities and investigations result in participate in NASA mission activities, thus 2.4.2 Number of elementary and secondary increased student knowledge about careers in inspiring interest in STEM disciplines and student participants in NASA-sponsored STEM. careers. extended learning opportunities. 2.4.7 Family participants will show an increased Objective: (Engage) Provide opportunities for 2.4.3 Number of opportunities for family interest in their studentâs STEM coursework. family involvement in K-12 student learning involvement. 2.4.8 Level of student learning about science in STEM areas. 2.4.4 Percentage increase in number and technology resulting from elementary and of elementary and secondary student secondary NASA education programs. participants in NASA instructional and 2.4.9 Level of student interest in science and enrichment activities. technology careers resulting from elementary and secondary NASA education programs. SOURCES: NASA Education Program-Outcomes, Objectives, & Measures for Performance Accountability Report (PAR) and Performance Measure- ment Rating Tool (PART) and personal communication, Malcom Phelps, director, Research and Evaluation, NASA Office of Education. These are draft items subject to final approval. Ratings on output 2.4.4, outcome 2.1.2, and outcome 2.2.5 are available at http://www.whitehouse.gov/omb/ expectmore/detail/10002310.2007.html [accessed November 2007]. 101

102 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM these teachersâ classrooms is expected as a result of a relatively short inter- vention. Given the research summarized in Chapter 4, such a shift is highly unlikely. Many previous evaluations in different fields show that teachers rarely change their classroom practice, especially as a result of low-intensity, outside intervention (DeSimone et al., 2002; Garet et al., 1999). Yet since many of the short-term sessions are requested by teachers or schools already familiar with NASAâs resources, it is possible that the teachers are inclined to use the resources even with only short exposure to them. Moreover, the information about use of the materials may be difficult to interpret as things can change in the time between a brief session and the use of materials in a classroom, and these changes do not necessarily reflect on the quality of the session or the quality of the materials. For example, teachers may not have time in their curriculum to introduce new materials, or they may already have similar materials from other sources. Educator Professional DevelopmentâLong Duration The objective for long-duration professional development is to educate teachersâto deepen their content knowledge and their competence in the classroom. The output measures are the number of teachers participating and the number of colleges and universities participating. The outcome measures are the number of teachers who use NASA content or resources as a result of another teacherâs direct involvement with a NASA program; the percentage of participants who become active in a national network to train other teachers; the percentage of participants who use NASA resources in their classroom instruction; and evidence that teachers who use NASA resources perceive themselves as more effective teachers of STEM subjects. For this objective, there is a mismatch between three of the outcome measures and the objective. Only one of the outcomes speaks directly to participantsâ feelings of competence in teaching STEM subjects. Three of the other outcomes deal with the percentage of teachers and colleges and universities that become active in using and further disseminating NASA materials. Moreover, the measure of teachersâ competence, although r Â elevant, is based on self-reports. There is no objective measure of Âteachersâ competence or increased knowledge, such as pre- and post-activity assess- ments of teachersâ knowledge or classroom observation of teaching prac- tices by external evaluators. Furthermore, it would be difficult and costly to collect data on âthe number of teachers who use NASA content or resources as a result of another teacherâs direct involvement with a NASA program.â Similarly, âthe percentage of NASA teacher program participants who become active within a national network to train other teachersâ is difficult to measure

PROGRAM EVALUATION 103 accurately. There is no systematic national network of this kind. Participa- tion would therefore depend largely on local factors and whether teachers have an opportunity to train or coach other teachers. Such wide variation may make it impossible to isolate the role of NASAâs intervention in foster- ing participation. Curricular Support Resources The objective for curricular support resources includes both educat- ing and engaging students. In educating students, the intent is to enhance studentsâ skills and proficiency in STEM disciplines. In engaging students, the intent is to inform students about STEM career opportunities and com- municate information about NASAâs mission activities. The output mea- sures include quantity, type, and costs of materials produced and approved through the NASA review process and the percentage of materials that are accessible electronically. The outcome measures consist of customer satisfaction with the relevance and effectiveness of the materials. Presum- ably, customers will be asked to rate relevance and effectiveness in terms of studentsâ skills and proficiency in STEM subjects, knowledge of STEM career opportunities, and knowledge of NASAâs missions. Again, there is a mismatch between the outcome measures and the objectives. Although measures of customer satisfaction are important, they are not direct measures of studentsâ interests, proficiency, or knowledge of career opportunities and NASA missions. In addition, the satisfaction measures could be supplemented with measures of how many customers access and use the materials, which is not included. Student Involvement In contrast to the curricular support objective discussed above, the stated objective for student involvement is only to engage students. Again, output measures are the number of students and families participating in NASA instructional and enrichment programs. Two outcome measures seek to document studentsâ increased interest and knowledge of STEM careers: one measures familiesâ increased interest in studentsâ STEM coursework; the other measures the level of studentsâ learning about science and techÂ nology. The second outcome measure is interesting in that the objective is not targeted at educating, but the outcome documents learning. These outcomes are sensible, but they require systematic surveys and pre- and post-activity assessments. It is not clear how such data would be collected and analyzed and over what time periods. Measuring learning is not easy. If standardized tests are used, the tests may have only a few items that are specific to the content that NASA

104 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM covered in its activity. Tests cover broad areas, and NASA curriculum materials are quite specific. Even if NASA input is able to change studentsâ performance on a few test items, a noticeable change in score is unlikely. If tests based on specific NASA curriculum materials are used, they must be developed using standard methodology for constructing tests so that their reliability and validity can be established. Specifying and Measuring Project Outcomes In addition to the program-level objectives, outputs, and outcomes, a parallel set of objectives and measures should be developed for each project. These objectives and measures can mirror those for the overall program, but they also need to take into account the specific goals, scope, and target audience of the project. If evaluators are included in the planning stages, they can offer input related to setting those objectives and identifying outputs and outcomes. Such involvement will help to facilitate long-term evaluation of a project. Lawrenzâs (2007) paper suggests that this step might be useful. It notes that the goals for most NASA projects are very broad and that it would be difficult for any project, much less one with limited funding available, to achieve these goals in any depth. It suggests that these issues might be resolved during the evaluation planning stages with careful discussions that would include development of targeted goals for projects that would be more amenable to evaluation FORMATIVE EVALUATION The purpose of formative evaluation is to provide feedback on the development of a program or project and its implementation. An over- arching formative question is âhow is the project operating?â The specific questions focus on how the project is being implemented and may include questions about specific features of a project or program, such as recruit- ment strategies, participant attributes, materials, and attendance. Whether NASA is developing a new program or revising an existing program, questions can arise about how well the program is operating in its early phases. Identifying program successes and challenges early in the process can help staff make adjustments that might improve the overall implementation or outcome of the program. Sometimes, a pilot version of a program can be run in the developmental phase, and an evaluator can assist developers as the program takes shape. Other kinds of questions may surface unexpectedly during a projectâs early implementation. For example, if a recruitment crisis occurs in Âseveral different locations, it may raise questions about teacher receptivity to

PROGRAM EVALUATION 105 certain kinds of professional development activities. In such situations, it may be helpful to have a rapid-response evaluation plan in place to study the issue. This type of evaluation will usually involve small-scale studies of limited issues. More subjective feedback from participants regarding the programs in which they take part is often useful. For example, evaluators can ask partici- pants to rate how much they like and value NASA program activities. Such information is not a real âevaluationâ of the programs or activities; rather, it is a measure of their popularity. Nonetheless, it can provide valuable feed- back. This type of information can be made part of a common information system. NASA currently gathers much of this type of information, though the information system, NEISS, is flawed (see the section in this chapter on NEISS). Lawrenzâs review of NASAâs external evaluations of projects suggests that the headquarters Office of Education is doing an adequate job of formative evaluation. All of the evaluations she reviewed addressed forma- tive questions. They all reported on how the projects were operating and how those operations fit with NASAâs larger goals. They all also provided recommendations as to how the projects might be improved or changed. Most also provided a good deal of information about how participants and administrators viewed the projects. In Lawrenzâs view, they provided interesting descriptive information about the projects from the perspective of those actually participating in them. OUTCOME OR EFFECTIVENESS EVALUATION Determining how well a program or project is achieving its goals and objectives is at the heart of any evaluation process. Data on outcomes are needed to demonstrate a programâs strengths and weaknesses, both to the public and to program and project administrators. The data from outcome evaluations are also useful for initiating program or project improvement. Evaluation of a projectâs outcomes, also called summative evaluation, can be designed to address several questions. One is to determine whether, and to what extent, a program or project results in the desired outcomes. Another is to determine whether the teacher or student outcomes are the same or different in comparison with the outcomes of other STEM educa- tion programs. For the NASA projects, a principal focus of attention in out- come evaluation is the extent to which teachers and students have achieved the attitudes and learning specified in the projectâs goals and outcomes. Outcome evaluation can be a flexible process. Evaluators need not limit themselves to just collecting data on outcomes, but can also collect data on characteristics of the program or project in different sites, characteristics of the participants and staff, materials, time, frequency, intensity of expo-

106 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM sure, and settings. Using data related to the conditions in and around the program, evaluators can analyze which conditions are associated with dif- ferent outcomes. For example, does the program have better outcomes for girls or boys? Are outcomes better when the teacher has taken a workshop in space science or technology, or when project materials are introduced in classes daily, or when the school principal supports the NASA Âintervention? Such data can indicate which features of the project are most desirable under which circumstances and thus help provide guidance for project improvement. Evaluation Designs Evaluation of outcomes calls for high standards of research design. In order to know whether the outcomes observed are the result of the interven- tion and not of other conditions to which participants have been exposed, randomized control-group or comparable comparison group design are desirable. Such designs allow the evaluator to attribute effects specifically to the intervention. Over the past decade, the demand for federal educa- ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ tion programs to demonstrate their effectiveness has grown considerably. Policy makers have raised their expectations for program evaluation and now ask for âscientifically basedâ evidence of impact. Simply documenting the numbers of participants or the geographic dispersion of project sites is not sufficient for demonstrating a programâs value (Weiss, 1998). Broadly, the demand for evidence about a programâs impact has generated a national debate about appropriate designs for evaluation, and that national debate has major implications for NASAâs approach to evaluating its education programs. Currently, some evaluators and the organizations that fund them advo- cate randomized clinical trials as the preferred evaluation design (sometimes called the âgold standardâ of evaluation). For example, the Academic Competitiveness Council (ACC) report (U.S. Department of Education, 2007a) identifies a hierarchy of designs with randomized clinical trials as the most desirable. Randomized clinical trials call for the random assignment of some people to the treatment group (people who will be exposed to the program) and some to the control group (people who will not be exposed). Random- ization enhances the chances that groups are essentially identical at the outset so that any differences between the groups at the conclusion of the trial can be attributed to the program. Although such a trial is an excellent mechanism for ruling out many rival explanations for differences between groups, it is by no means the only appropriate design for evaluation. When a key question is whether the program people who are exposed to the pro- gram attain some specified outcome, a randomized clinical trial is often the

PROGRAM EVALUATION 107 method of choice. However, under certain conditions, other methods may be more appropriate for determining impact (for discussions of designs for evaluation and research, see NRC, 2002; Rossi, Lipsey, and Freeman, 2003; Shadish, Cook, and Campbell, 2002; Weiss, 1998). There are major difficulties to conducting a randomized clinical trial in order to determine a programâs impact, especially for the types of pro- grams and projects that NASA supports ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (see Rossi et al., 2003, and Weiss, 1998, for discussions of the challenges of conducting randomized clinical trials)ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ . First, ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ it can be difficult and costly to mount a trial, especially if an intervention is provided over an extended duration or if the impact needs to be studied over a substantial period of time. It is also difficult to mount a clinical trial for projects that are intended to be tailored to local needs and may not have identical features across sites. Second, program managers may be unwilling to randomly assign units (students, teachers, classrooms, schools). Third, participants (students, teachers, or schools) may be Â unwilling to accept random assignment to a program or control group. Fourth, randomized clinical trials are not foolproof,ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ and studies can be biased even when Ârandomized (Als-Nielsen et al., 2003; House, 2006; M Â oiser, 1947; Â Torgerson and Â Roberts, 1999; Torgerson and Torgerson, 2003). These four issues must all be considered when determining the tim- ing and scope of evaluations that use a randomized clinical design. Measuringï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Inspiration and Engagement As noted in Chapter 2, NASA is particularly well positioned to build on teacher and student interest in STEM subjects. The objectives for the Elementary and Secondary Program and its constituent projects are appro- priately focused on this interest, particularly the inspiration and engagement that NASAâs programs can generate.ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Measuring inspiration and engage- ment, however, is challenging. Students can be expected to offer credible responsesâboth immediately after a project and over timeâabout how excited they have become about space science and how much they were inspired to pursue STEM subjects both in and out of school. However, widely used and validated measures of these outcomes are not available. Engagement may also be measurable in terms of course taking and l Âeisure time behavior. In fact, the metrics developed by the ACC include mea- sures such as the number of Carnegie units earned by high school students in mathematics and science and the percentage of students participating in extracurricular activities in mathematics and science (U.S. Department of Education, 2007a). When these kinds of measures are used, however, it would be valuable to use a control or comparison group of students who were not exposed to the NASA intervention to determine whether it was the NASA input that made the difference in inspiration and engagement.

108 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM Longitudinal Studies Measures of continued engagement require longitudinal studies of stu- dents who have participated in NASA programs in order to establish, for example, their enrollment in nonrequired science courses in high school, their majors in undergraduate education, if they undertake graduate study, and even their eventual careers. Even short-term outcomes, such as par- ticipation in STEM coursework or other STEM activities, require some follow-up after students have left a project. Unfortunately, studies that follow students over a period of ten years or more are difficult to carry out, are expensive, and are likely beyond the resources that NASA wants to invest in program evaluation. The challenge, therefore, is to develop meaningful measures for individual projects beyond simply counting participation, while at the same time developing a strategy for determining how well a program is achieving its goals. One possible approach is to mount a large-scale, multiyear evaluation study for the Elementary and Secondary Program as a whole, rather than attempting to do longitudinal studies for individual projects in the program. Alternatively, longitudinal studies might be carried out only for those projects in which tracking individual students is facilitated by the design of the project, such as the proposed INSPIRE project. An evaluation effort of this scale and expense is not appropriate for projects that involve short-term activities with little potential to generate long-term effects. Current Status of Evaluations Lawrenzâs (2007) review of external evaluations of projects indicates that all of the evaluations she reviewed combined formative and summa- tive elements; however, they were all much stronger on the formative side. Many of the weaknesses she identified make it difficult to draw reliable conclusions about the impact of the projects in question. For example, the evaluation designs were mostly retrospective and involved only the treatment group and self-report data. On the latter point, much prior research has shown that participants are not always reliable informants. On the former point, the lack of a comparison group makes it virtually impossible to draw any meaningful conclusions about the cause of observed outcomes. Moreover, the samples of the treatment group were often convenience samples, that is, they came from people who were easy to obtain data from. This approach often involves selecting the best cases, the ones that are easiest to locate, or the ones that are geographically close. As a result, the sample on which the conclusions were based was not representative of the project population. Response rates were often low, and there were few

PROGRAM EVALUATION 109 studies that focused on identifying the people who did not respond. Most of the instruments reviewed by Lawrenz appeared to be sound, but little information on the construction of the instruments or indications of their v Â alidity was provided. One exception was a student assessment instrument, but the analyses provided showed that it was probably not a particularly strong instrument. There were many instances of case studies and inter- views with varying amounts of detail about how they were conducted. There was almost no direct evaluator observation of programs. There were very few evaluations that actually sought to track changes, with pre- and post-program measures, related to program outcomes. There were several retrospective questions that asked participants to comment on how much they felt they had changed, and most people reported that the programs had affected them very positively. However, this kind of measure is generally unreliable. There were only a few attempts at comparative s Â tudies, and these were flawed by selection bias. In sum, past efforts to evaluate the impact of projects have been seri- ously flawed. It is difficult, if not impossible, to draw conclusions about a projectâs effectiveness based on the kinds of evaluations that have been used for most NASA activities. The agency has recognized the need for more rigorous evaluations of impact and is currently developing a plan to do this (Winterton, 2007). ACCOUNTING AND PROJECT MONITORING The new plan for the Elementary and Secondary Program Â specifies accounting and review requirements for individual projects. Project m Â anagers are responsible for ensuring continuous input to the NASA Edu- cation Evaluation and Information System (NEEIS) for capturing annual data and metrics (National Aeronautics and Space Administration, 2006c). The measures entered into NEEIS generally include counts of participants and participantsâ subjective evaluation of their experiences. It is not pos- sible in NEEIS to track individual participants over time or from project to project. The NASA Education Evaluation and Information System Reports from both outside evaluators and current and former NASA staff indicate NEEIS is cumbersome to use. There are difficulties associated with data entry, data quality, and data extraction.

110 NASAâS ELEMENTARY AND SECONDARY EDUCATION PROGRAM Data Entry NEEIS is a highly centralized system. Data entry must be done directly into the central NEEIS website on forms that are slow and cumbersome to use. During times of peak data entry, such as at the end of the fiscal year, the system tends to get overloaded, and it responds very slowly or not at all. In addition, different forms are needed for each type of data entry (e.g., institutional information, individual program managers contact informa- tion, and several other aspects of project information). Navigation between the different forms requires navigation of multiple layers of nested menus. Projects that keep their own data are generally not allowed to transfer the data directly to NEEIS. Instead, the data must be reentered. Projects that want to maintain data that are not in the standard NEEIS forms must have NEEIS staff build custom forms. This can create a bottleneck because the small central team must service the needs of all of NASAâs education projects. Data Quality There is no quality control over the data entered in NEEIS, nor is there any internal scrubbing of data. There is no attempt at standardization of data elements, such as the names of universities or project managers. If dif- ferent users enter variants of the same name, the data are treated as if each name represents a separate entity. Data Extraction Extracting data from NEEIS can be difficult. Accessing data as it is gathered in standard NEEIS forms is straightforward. However, summariz- ing data in nonstandard ways requires building a form through a compli- cated interface or having the central NEEIS staff build such a form. It is not possible to simply extract bulk data that has been entered. In fact, some external evaluators specifically mentioned difficulty with accessing and ana- lyzing data. For example, the evaluators who conducted a recent evaluation of EarthKAM cite NEEIS as a major limitation to their work: Another challenge we faced was accessing and using the NEEIS. Learning to use NEEIS was not intuitive and navigating the database was a slow and cumbersome process, which required several steps for each EarthKAM report accessed. These steps slowed the evaluation process and posed a challenge to selecting a representative sample of all the data available. Given the time it takes to access each report, the volume of reports cur- rently available, and the presence of inconsistencies within the data, such a process will undoubtedly pose problems to future efforts to evaluate any NASA program that relies on this system. (Ba and Sosnowy, 2006, p. 6)

PROGRAM EVALUATION 111 The inadequacies of the system were also pointed out in the 2001 evalu- ation of the Science, Engineering, Mathematics and Aerospace Â Academy (SEMAA). The evaluators state that, in their judgment, the data in the central database (at the time called EDCATS), offered little of value in the conduct of the SEMAA evaluation (Benson, Penick, and Associates, Inc., 2001). They recommended that the project obtain authorization for the design and utilization of the projectâs own comprehensive, universal data- base that is aligned with SEMAAâs objectives. Project Monitoring and Reporting In addition to entering data into NEEIS, projects are required to submit monthly and annual performance reports, and they are encouraged to sub- mit a weekly activity report. Projects are reviewed quarterly and annually. The annual review is based primarily on written documentation summa- rizing the goals, objectives, organization, resources, and accomplishments of each project. The results of the annual review are used to develop an improvement plan. Presumably, the data entered in NEEIS become an integral part of theÂ annual reports. However, the limitations of NEEIS seem likely to hinder the capability of projects to easily summarize data for reports and to use the data in the system to inform project implementation and improvement. One solution might be for individual projects to maintain their own databases, though there are inefficiencies in this model given that projects are required to enter data in NEEIS. Currently, individual projects appear to vary in terms of whether they maintain databases or other systematic project files outside of NEEIS. For example, NES maintains school plans and other documents in an online format outside of NEEIS. However, the 2001 evaluation of SEMAA indi- cated that that project did not maintain electronic records that would allow them to track the progress of individual students over time. In order for projects to effectively collect and learn from data, some improvement to NEEIS is essential.

Next: 6 Conclusions and Recommendations »

NASA's Elementary and Secondary Education Program: Review and Critique (2008)

Chapter: 5 Program Evaluation

Welcome to OpenBook!

Get Email Updates