During the course of the committee’s review of the existing literature, numerous opportunities for research were identified that could deepen understanding of undergraduate research experiences (UREs). This chapter identifies priorities for research and discusses multiple methodological approaches needed to answer questions about UREs, especially questions about the value-added of these experiences over programs that lack such experiences. This research is challenging due to the heterogeneity of research experiences. It will benefit from a clear conceptual framework that guides researchers to identify key questions and mechanisms for further investigation.
Conducting research can be expensive and time consuming, so it is important to consider the cost-effectiveness of the various research approaches and the relative importance of the questions so that resources can be targeted appropriately. Although all URE programs should conduct some type of evaluation to measure whether they are meeting their goals, not all UREs must or should be part of a research study. However, it is critical that some research studies are conducted to collect and analyze information that will allow the community to better define and describe UREs and their features and to clarify their mechanisms and effects. The results of the research about UREs would provide information to inform planning of future UREs.
Based on the committee’s review of dozens of empirical studies, we have found a rich descriptive foundation for testable hypotheses about the effects of UREs on student outcomes. The descriptive evidence, predominantly from self-reports, suggests that research on URE participation should
focus on its impact on disciplinary and research understanding, identity as a researcher; persistence in a science, technology, engineering, and mathematics (STEM) major; and increased enrollment in graduate programs in STEM (Blockus, 2016; Dolan, 2016; Hathaway et al., 2002; Hunter et al., 2007; Nagda et al., 1998; Sadler, 2010; Seymour et al., 2004). Since few studies employ research designs that allow for strong causal inferences about the effects on students of participating in UREs compared to programs without UREs, the next step for research on UREs is to gather this information. This chapter provides recommendations to create a firmer research base and address numerous gaps, as well as ideas for other types of research that would be beneficial to the field.
In approaching the task of creating a research agenda for strengthening UREs, this committee found it useful to build on an earlier National Research Council (2002) report, Scientific Research in Education. As discussed in Chapter 2, that report distinguished among three types of research questions in education research: descriptive, causal, and mechanistic. Research intended to answer descriptive, causal, and mechanistic questions requires a combination of theory, method, measurement, and analysis, ideally based on a shared conceptual framework. Researchers seeking to address complex questions about the underlying mechanisms and outcomes of UREs need to use the tools of the social sciences, build on prior research, and draw from existing information about learning and teaching. At the start of their projects, investigators need to identify appropriate and feasible ways to document impacts; this involves planning studies with appropriate comparison groups, creating ways to measure important elements of research and course experiences, using valid, reliable measures of the outcomes of interest, and when possible acquiring longitudinal data. As discussed later in the chapter, there can be logistical and financial challenges to some of these approaches.
All three types of research are necessary to provide the information needed to improve undergraduate training and experiences in STEM fields. The three types must proceed along parallel tracks. Given the paucity of strong causal evidence about the effects of UREs and about the mechanisms that are most effective in achieving desired outcomes, the committee urges funding agencies to provide funding for research projects intended to generate causal and mechanistic evidence. Such evidence will be useful in guiding investments. The evidence need not come from large-scale, multisite randomized controlled trials (RCTs). Small-scale experiments at individual campuses or well-designed quasi-experimental studies across courses within a college or department can provide important building blocks for the evidentiary foundation needed. If the evidence is consistent with the many descriptive studies already available and with experiences of faculty, then it can be used to advocate for greater resources for UREs that build upon
this strengthened body of research. To successfully carry out research on the individual characteristics (e.g., collaboration and reflection) and potential impacts (e.g., retention in STEM and integration into the STEM culture) of UREs, including the mechanisms by which those impacts are realized, the field needs testable hypotheses about what, why, and for whom UREs work best and about how to improve the structure and provision of UREs to reach a larger and more diverse pool of students. These ends are best accomplished through design-based and mixed methods research.
The following section introduces two key challenges to understanding the effects of UREs: nonrandom selection into UREs (as a function of student, faculty, or mentor choice) and high-quality measures of outcomes. Based on the needs of the field, we then present potential approaches that meet these challenges for research on UREs.
STEM practitioners and researchers may find that forming or joining multidisciplinary teams/partnerships with researchers who have expertise in the behavioral/social sciences, education research, and program evaluation can provide a rich opportunity for collaboration to investigate and strengthen UREs for students. For example, the multidisciplinary community of Understanding Interventions has been focused for years on creating dialogue among members of the education community participating in STEM intervention programs.
To build a stronger research literature that informs the community about the effects that UREs can have, researchers need to be aware of the advantages and limitations inherent in various research designs. In addition, researchers need to be aware of issues and challenges related to selection and measurement. This section discusses the challenges, and the next section focuses on approaches to the research about UREs.
Nonrandom Selection into UREs
Selection bias is a bias in which the characteristics of the students and faculty/mentors participating in any given URE are collected in such a way that they are not equivalent to other potentially URE-eligible participants. This makes the comparisons across UREs difficult. There are at least three common ways that selection bias can creep into the research process and affect the estimated effects of UREs: (1) student self-selection, (2) program-based selection, and (3) selective attrition (e.g., weaker students or those for whom STEM research is not a good fit may be less likely to complete URE projects, remain as STEM majors, or elect to participate in longitudinal surveys about their experiences).
First, with respect to self-selection, students who do or do not pursue opportunities for UREs likely differ from one another in important ways. Those students who seek out or take advantage of opportunities to engage in research may be better prepared academically, more motivated, or more interested in and/or more committed to STEM fields than otherwise similar students who choose not to participate in UREs.
Second, in many instances, students are not the only people involved in the URE choice process. Faculty and program staff may choose to recruit students who share similar interests and values as the faculty member or are deemed as having the greatest likelihood of success in college in general or STEM fields in particular (program-based selection). Such a process would, again, lead to a group of students participating in UREs who would be more likely to succeed (e.g., stay in a STEM major, graduate in STEM) than nonparticipants, even absent the URE participation.
Finally, attrition is another form of selection bias. Students who continue to participate in UREs and/or studies of UREs until outcomes are measured may consistently differ in outcome-relevant ways from students who withdraw or fail to respond to a survey. Students who are not satisfied with their experience in undergraduate research, who struggle academically, or who confront challenges outside of school that hinder their academic progress are more likely than other students to withdraw from courses or from the university itself. As a result, students who persist in the URE may on average be more successful by other measures as well, leading to a falsely inflated estimate of the effect of the URE on the selected outcome measures.
There are at least two ways to deal with this challenge: (1) Demonstrate the equivalence of the URE and non-URE groups (e.g., the control and experimental groups) as measured by their performance on a dependent variable (e.g., knowledge, motivation, attitudes) before and after the implementation of the URE so claims about the impact of UREs can be based on the functional equivalence between the groups. If the comparison groups are different from one another at the beginning of the study the results of the study are biased. (2) Keep track of the characteristics of the students (e.g., grade point average, previous research experience, gender) to determine the equivalence with non-URE students. This last strategy enables accumulation of knowledge about for whom UREs with certain characteristics (including the mentors’ characteristics) work; that is, which characteristics of students and mentors are associated with positive outcomes. Tracking can be more easily done within an institution but potentially it could be done across institutions.
High-Quality Measures of Outcomes
Measurement can simply reflect the process by which one observes and records the observations as part of a research study. It is important to ensure that the instruments employed to investigate the subject/object of the research are reliable and accurately capture the construct of interest. Some measures, such as graduation rates, are readily obtained and objectively defined. However, careful consideration needs to be given to selecting appropriate measures of learning gains and/or acquisition of content knowledge and skills by students who do, and those who do not, participate in UREs. The committee’s review of the literature in Chapters 2 through 6 showed that future studies need more rigorous measurement and more-valid indicators. Validation of self-reported information, for example, can be improved by cross-referencing analysis of research products, such as presentations and reports, essay examinations, or other observations of student activities.
Researchers studying URE outcomes often call for assessments that measure a student’s ability to form arguments using evidence from research in the student’s field of study, such as analyses of primary scientific literature (Dasgupta et al., 2014; Gormally et al., 2012; National Research Council, 2007). Although the use of such indicators appears to be rare in the context of UREs, the approach has proven successful in assessing learning in some courses (e.g., Brownell et al., 2014).
Many studies rely on student self-reports to measure constructs such as identity as a STEM professional, interest, and motivation to study STEM, and career plans. Even though these constructs are inherently subjective, relying on self-reports to measure them poses some challenges. One limitation of self-reports is that student’s responses may be influenced by recent events: a failed experiment, an unpleasant interaction with a collaborator, or an unexpected high grade. Self-report measures may mean different things to different students, depending on their perspectives and experiences. Students from different parts of the country or different parts of the world may not choose the same words to describe similar experiences. Students who have never met an engineer, for example, may respond differently from those whose family friends include engineers. Finally, self-report measures can be influenced by situational factors such as the expectations of the person administering the test or interview or the feeling that it is socially desirable to express interest in STEM. With these caveats noted, there are existing self-report measures that have been shown to be reliable across time, predictive of long-term persistence, and valid.
To establish the validity of new self-report measures, researchers can use multiple indicators to ensure that the intended construct is accurately measured. Promising indicators include observations of participation in experiments; logs of student activities on a project; analysis of transcripts;
analysis of journals that capture responses across weeks or months; and interviews that probe for individual characteristics such as perspectives on prior STEM activities, personal details (such as anecdotes about mentors), and confusions or conundrums about their possible futures. Additional cross-validating indicators are perceptions of peers, instructors, or advisors. Researchers can strengthen the evidence base for self-report measures by using one or more of these indicators, along with the self-report measure, to form an input construct. Moreover, by following students longitudinally, researchers can see how well their chosen indicators predict future decisions and career paths.
In order to characterize, assess, and compare student learning in different laboratory contexts (that use a wide variety of discipline-specific research questions and experimental methods), researchers need to identify appropriate measurement instruments. A recent paper (Shortlidge and Brownell, 2016) provides a table of possible assessment tools for CURE instructors; some of these tools will also be useful for running other types of URE programs. In some cases instruments will need to be generalizable across different fields and scalable for use with a large number of students. Possible areas for development of such instruments include poster presentations or similar reports, notebooks and journals, responses to a challenge requiring data analysis, and other measures of STEM-specific activities. For example, recent efforts to develop rubrics to assess undergraduate writing across courses offer promise (Timmerman et al., 2011).
To establish causal findings requires analytic strategies that can rule out alternative explanations for impacts of UREs. Causal questions related to learning outcomes could include the following: Did URE participation increase STEM literacy? Did URE participation alter the ability to navigate uncertainty or professional STEM efficacy? Causal questions about longer term career pathways might include: Did a specific URE help to sustain a student interested in STEM in the path a student was on? Did it support her and enable her to change in some way that she would not have changed, absent the experience? Did the effect vary depending on the students’ expectations or specific experiences in the laboratory?
To answer these central questions regarding the gains from URE participation in learning and persistence, studies need rigorous comparisons to alternatives and may require nuanced analysis of (multiple) outcomes. Thoughtful attention to the organization of the study before the implementation of new UREs would allow for robust conclusions to be made about UREs and how they work.
Experimental and Quasi-Experimental Designs
Whenever possible, the use of experimental designs is recommended, as these approaches may be particularly useful for those seeking to document the added value of UREs. Randomization is possible in instances of excess demand (e.g., by using lottery). Scholars at the University of Michigan successfully employed this approach to study the causal effect of Michigan’s Undergraduate Research Opportunity Program (Nagda et al., 1998). This approach requires that demand exceed supply; another study attempting to employ this design was unsuccessful because too few students signed up for the course (Brownell et al., 2012). However, more students were enrolled the next year, and a randomized assignment was possible (Brownell et al., 2013). Randomization is also possible when students accept to be randomized into experimental and control classes and/or when balancing across sections/groups is feasible (Schultz, 2004). When programs have small numbers of students, studies sometimes attempt to cluster data across sites in a consortium (Reardon, 2013).
Quasi-experimental designs provide causal evidence in the absence of RCTs. Although the RCT is a gold standard in many research fields for establishing causal evidence for efficacy of a particular intervention (e.g., pharmaceutical clinical trials), the use of RCTs in educational research is often limited by practical, political, and ethical constraints. Absent successful random assignment, researchers can pursue a number of quasi-experimental approaches to establish that subjects (students) experiencing different treatments (courses/experiences) are on average the same and that prior to treatment, nothing about either the subjects or the treatments predicted who would end up in what treatment. For example, one approach might be to match students in the treatment pool with students in the nontreatment pool on relevant variables (e.g., preparation, ethnicity). Any quasi-experimental solution to the problem of group comparability, however, requires an additional set of assumptions. For example, as an initial step, researchers might statistically adjust (or control) for students’ high school grades and SAT or ACT math scores or for student performance on a pre-test measure of achievement and assume that conditional on these pre-existing differences, students were more or less the same on average.
Both experimental and quasi-experimental designs benefit from planning for assessment of longitudinal effects. Panel attrition can undermine the validity of panel studies to the extent that those who persist in a study are different from those who drop out. Gaining consent from research subjects at the beginning of the study to link to their administrative records (e.g., grades, final major, degree attainment) is critically important. Use of such records can help to minimize the harm done by sample
attrition. In addition, there is evidence that the technique of tailored panel management can help retain panels with a higher response rate (Estrada et al., 2014).
Research experiences occur in complex contexts and often have differential impacts on students due to the students’ prior experiences and expectations. Box 7-1 provides some examples. These factors may undermine the utility of large-scale comparative studies for course developers. Research that has value for the developers of the innovations and also has potential to reveal mechanisms that might be of use to others are promising alternatives. Instructors/directors of UREs who engage in evidence-based practices in their own programs can study their programs in order to identify features and elements for improvement. Courses and programs can then be improved via iterative refinement. Research comparing successive versions of a course can shed light on the impacts of the improvements (Cobb et al., 2003).
Design-based research provides a methodology, common among researchers in learning sciences, wherein interventions are conceptualized and implemented iteratively in a natural setting to test a hypothesis (Barab and Squire, 2004). The methodology applied to education can effectively capture the effect of an innovation in a complex, local system (Johri and Olds, 2011). Design-based research may result in plausible causal accounts, assist in the identification of contextual factors and mechanisms that alter program impacts, and deepen the understanding of the nature of the intervention/feature. Iterative cycles of development, implementation, and study allow researchers to gauge how an intervention is or is not succeeding in ways that may then inform an improved approach (Barab and Squire, 2004). In all such studies, the researcher (or program director) will need to
obtain Institutional Review Board approval and the informed consent of participating students, prior to the start of the study.
Other Considerations in Research About UREs
Mixed method approaches integrate quantitative and qualitative approaches to research. For example, qualitative data can inform a RCT. A well-constructed mixed method study might include collecting quantitative measures of student learning outcomes (e.g., surveys or tests such as the Force Concept Inventory or ETS’s Major Field Test for Physics) and qualitative evidence from observations, interviews with participants, and collection of artifacts (e.g., reports, lab notebooks, presentations). The combination of these data can uncover “links between theory and empirical findings, challenge theoretical assumptions and develop new theory” (Östlund et al., 2011). Because social phenomena are very complex, mixed method designs can help to elucidate critical factors in the phenomenon of interest (Creswell et al., 2003; Greene et al., 1989). Mixed method design studies should be considered when planning studies aimed at understanding the roles and impacts that various features have on the outcomes of UREs.
Longitudinal studies provide the opportunity to track students from entrance into a URE to completion of the experience and beyond. These studies provide additional insight into the impact of UREs and may identify the impact of participation in UREs on student persistence, completion of STEM degrees, enrollment in graduate school, entrance into the STEM workforce, participation in the STEM community through publication or presentation, or other career or educational outcomes. Mixed methods experimental or quasi-experimental approaches should be used that account for the influence of students’ incoming interest, motivation, expectations, and academic background on student outcomes.
Longitudinal studies will require researchers to document the number and types of UREs that students participate in, the characteristics of those UREs, and the duration and timing of the UREs within the students’ educational experience. Longitudinal studies measuring the development of students’ knowledge and skills, such as scientific thinking or experimental design abilities, argumentation skills, STEM communication abilities, or problem-solving skills, from participation in UREs would also be valuable, after valid, reliable assessments of these outcomes are developed. These longitudinal studies are not trivial tasks but are necessary to fully understand the way UREs impact career choices.
To strengthen UREs, the committee has identified a series of high-priority study areas that merit careful consideration by URE program directors, education researchers, faculty, and funding agencies. More general recommendations about the use of UREs in undergraduate education are presented in Chapter 9, which also contains a recommendation about the importance of conducting quality research about UREs that takes a bigger-picture view and is therefore included with the general recommendations of the report and not in this chapter.
In order to meet the call for expanded research tools and active research to study the impacts of UREs, funding agencies that typically support UREs will need to examine their research portfolios and priorities, as well as funding practices (such as length of grants, which can affect the ability to carry out longitudinal studies). Well-designed summative cross-site external evaluations and studies of URE programs and their features are of potential value to the nation’s students and the national STEM education community. Optimally, the studies outlined below would be conducted by teams composed of members with strengths in the design and analysis of behavioral science and educational research, members with strengths in URE program implementation, and members who are STEM practitioners. This type of research should not be expected of every faculty member who runs a URE, but the community should work together to ensure that these questions are addressed and the results disseminated to the community in order to inform future UREs. As is always the case, studies should be designed in ways that respect the needs of students, and any necessary Institutional Review Board approval should be established before studies begin.
RESEARCH RECOMMENDATION 1 Researchers should develop and validate tools that can be readily used by people who direct undergraduate research experiences to assess student outcomes. Assessment should address both conceptual knowledge and development of skills important to STEM professionals. Some of these tools will be useful to those studying UREs in many different disciplines, whereas others will focus on concepts and content of a particular discipline.
Formative assessment by research mentors, program directors, and instructors can be used to monitor student development and achievement through a URE and to make appropriate adjustments along the way. If researchers are able to develop validated, theoretically informed tools, such tools could be used by faculty running UREs to better assess the impact of UREs on students and to identify the most influential and beneficial factors in UREs. Tools intended to assess content knowledge need to be
developed with input from subject matter experts. Potential tools would include scoring rubrics for posters, presentations, or laboratory notebooks. Instruments need to be made broadly accessible to leaders and developers of undergraduate research programs.
Tools need to be reliable and valid for various types of UREs and populations of students. For example, validated measures of student growth in knowledge and skills should work similarly for men and women, for students from historically underrepresented racial/ethnic groups, and for those who are not part of those groups. This uniformity is important for determining the broad impact of UREs across student populations. It may entail developing tools that are readily customized to the discipline, student population, or research experience goals. Tools need to be in a form readily used by program directors without social science training, and they must be relatively inexpensive to score. Research is needed to develop valid and reliable measures of important outcomes of UREs in order to allow for comparisons between UREs and other types of experiences, such as typical courses.
RESEARCH RECOMMENDATION 2 Future studies should seek to identify and measure the variables that explain why specific aspects of UREs have impact (or not) on the students participating in a URE. Researchers should consider a range of student outcomes (e.g., improved persistence, development of STEM identity, understanding of the nature of research, and development of specific skills or disciplinary knowledge). The number of UREs that a student participates in, the duration of the experience, and the timing of those experiences within the student’s undergraduate education should also be examined.
Proponents of UREs believe that they have an impact on student trajectories that is superior to that of traditional courses of instruction. While the available evidence is consistent with these beliefs, few studies have been sufficiently rigorous to offer a strong test of them. For example, does participation in a URE impact performance in future upper-division courses? Evaluation of how UREs enhance student outcomes when compared to other experiences is needed and can be informed by research on inquiry instruction and identity processes. (Further information about these approaches can be found in Furtak et al., 2012, and Nasir and Cooks, 2009.) Specific objectives of UREs may include improvements in students’ understanding of the nature of STEM, of the process of research and associated skills, or of scientific and technical communication. Other objectives may include skill development for career preparation, collaboration, and teamwork.
Researchers should characterize the type of value the URE will add for
those who participate and document the mechanisms that enable the value to be delivered. The evidence required will come from comparing UREs to other experiences and other learning approaches, including traditional courses. Research on UREs needs to take into consideration the duration of these activities, as well as their variety and goals. Many students participate in multiple UREs, so studies that compare the presence or absence of a URE in a student’s education may not adequately reflect today’s environment. Studies should attempt to identify the value added of the different types of experiences, including the importance of the scheduling/timing of the experience in the educational progression and pathway of a student, characterizing how the nature and characteristics of the URE affect the student, and the role that research experience(s) play in contributing to student outcomes.
To make conclusions about a particular outcome, multiple measures are needed. These measures may include self-report on some psychosocial measures (potentially including efficacy, identity, values, belongingness, stereotype threat, micro-aggression, and micro-affirmations), analysis of research products, and documentation of research experiences (potentially including type of URE, timing and duration of the experience, type of mentoring, opportunity for autonomous investigation and decision making, and development of research techniques). Not all of these measures would be relevant to every study, but a combination of measures would likely be required for each study.
Beyond measuring the impact of UREs on learning and student retention, studies should be undertaken that seek to answer the question of why these programs have (or have not) achieved successful outcomes. Results that explain “why” have the potential to advance theory in both educational and behavioral sciences. Further, these sorts of results inform science educators about how to refine and increase the effectiveness of their programs. For example, if UREs result in the development of a professional identity and it is found that URE students who develop a professional identity are more likely to go to graduate school in a STEM field, then educators might actively foster activities that help student’s grow their professional identity. Research that seeks to measure the “why” will benefit from large numbers of study participants, longitudinal data collection methods, and measurement (both self-report and objective measures) of URE experience, as well as measures of STEM career engagement. Because these types of studies are expensive and time consuming, there should be no expectation that all faculty who run UREs would conduct research meeting these requirements as a matter of course. Such studies should be carefully designed by teams of researchers with appropriate training in the relevant skills. A small number of well-designed and carefully executed studies will be of greater value than a large number of partial studies.
RESEARCH RECOMMENDATION 3 Future studies should systematically analyze the impact that various characteristics of UREs have on different student populations, to better identify what works for whom and under what conditions.
Descriptive research suggests that individual responses to UREs may vary depending on a student’s prior experience and academic preparation, the student’s sense of belonging to the STEM enterprise, URE goals, the timing and duration of the experience, and other factors. There is little empirical evidence showing which student characteristics moderate the effects of UREs. The sheer number of possible variables makes it impossible to investigate how all possible combinations of student cultural and experiential characteristics fare in each of the variations in UREs. Research in this area needs to be informed by prior research, theoretical frameworks, and policy priorities. For example, data on student participation could be used to analyze demographics of the participants to better understand access issues relating to barriers to participation, disciplinary differences, trends in engaging underclassmen, and information on students participating in more than one opportunity.
For this research question, it would be valuable to collect participant demographic information (race/ethnicity, age, generation, and socioeconomic status) in combination with URE characteristics (see conceptual framework) and to conduct carefully designed comparisons between specific UREs. For example, a study comparing mentoring practices could examine possible interactions of those practices with cultural or experiential background characteristics of the protégé and mentor. Such studies might identify possible mentoring mechanisms that could be recommended for broad implementation. It is possible that even with such findings, instructors will need to customize the mentoring mechanism to the characteristics of their protégés/mentees.
A major research priority is to understand the critical factors that contribute to the success of diverse groups engaged in UREs. For example, longitudinal research on the role and impact that mentors have on the persistence of diverse groups in STEM fields could help shape mentor-mentee interactions (see Research Recommendation 5). Any research design needs to pay attention to how theoretically derived factors associated with student persistence, including self-efficacy, science identity, and values, vary as a function of gender, racial/ethnic group membership, and their intersection (e.g., Byars-Winston et al., 2016).
RESEARCH RECOMMENDATION 4 Researchers should study in a systematic manner the impact of a URE’s characteristics on faculty and other
mentors to better know the diversity of benefits obtained by faculty and mentors.
While an evidentiary foundation for causal effects of UREs on students is just now beginning to be established, a foundation for causal effects on faculty and mentors is almost nonexistent. Hypotheses have been offered that UREs can increase or decrease faculty productivity depending on the circumstances at the institution, the structure of the URE, and the particular students involved. The value placed on UREs, and on teaching in general, on a particular campus may have an effect on the incentives and rewards that alter faculty decisions regarding UREs.
Although there is a long tradition of mentoring in STEM education, there is limited empirical evidence to explain specific ways that mentoring affects URE students (Pfund, 2016). More methodologically rigorous studies of mentoring are needed. The research community lacks a set of refined common variables; a first step would be for the field to define a set of common input and output variables, after which there would be a better chance of generating reproducible results when investigating mentoring.
RESEARCH RECOMMENDATION 5 Additional research should examine the specific role(s) of the mentor and the impact of the mentoring relationship on the undergraduate mentee, compared to the immersive URE itself.
Using theoretical models to understand the mechanisms contributing to persistence is one promising approach for providing insights into how and why mentoring relationships contribute to success (Byars-Winston et al., 2015; Estrada et al., 2011; Hurtado et al., 2009; Packard, 2016; Pfund, 2016). Mentoring has been proposed as a critical factor affecting the persistence of STEM students, and it offers a potential target for further investigation. Good mentoring is potentially a key way to provide an intervention that benefits students.
Research is also needed to uncover the mechanisms by which mentoring relationships foster particular outcomes and how these outcomes may differ, based on the mentoring model or student population. Potentially relevant factors include persistence, engagement in or commitment to the discipline, belonging, and educational and career decision making.
Progress on these research questions will require financial support. The results will increase knowledge of the ways that UREs affect students and provide guidance for design of future UREs that may have a more significant impact on students. Teams of researchers with strengths in the design
and analysis of experimental and quasi-experimental educational research, as well as those with strengths in URE program implementation working in concert with STEM researchers, may be needed to make progress on the research agenda identified here. Funding agencies may want to coordinate and/or pool their efforts in this regard to achieve maximum return per dollar spent.
Well-designed summative cross-site external evaluations and research on URE programs and their design features are of potential value to the nation’s students and the national STEM education community. Using rigorous research approaches for studying UREs will cost more than small outcome-centric evaluation, but it is important that some in-depth research studies be conducted.
In addition to considering research about UREs, funding agencies may want to assemble guidelines for effective assessments of funded programs that are not part of a research study. These guidelines might suggest some key elements to consider when designing and choosing assessments. Or funding agencies could focus some resources on development of an overall assessment unit that all funded projects must use. The limitation with the second approach is that funding agencies will want to allow for some flexibility so that at least part of the assessment could take into account the specifics of the URE under study, in terms of its structure, setting, organization, and population of students served. Nonetheless, a shared rubric can enable a study encompassing a larger number of students and provide greater opportunity to discern differences between implementations that matter. Many prior studies of UREs have been conducted at a single institution, and multisite studies would enhance the understanding of URE programs, their characteristics, and their outcomes in different institutional contexts and for various populations of students.
Institutions of higher education are looking for effective methods to maximize educational impact on students while minimizing cost during a time when information systems and technology are rapidly changing. Careful and well-designed research has the potential to illuminate mechanisms that could help designers make informed decisions. As discussed above, three areas of research are needed. First, research that measures outcomes and tracks types of URE engagement would be very useful. For example, research is currently needed on the components of apprentice-style UREs, how they differ from the components of CUREs and other types of UREs, and comparative outcomes. Second, research is needed to assess how the same URE affects students differently because of their prior experiences, expectations, cultural commitments, and stage in their educa-
tion. Third, there is a need to evaluate why a given URE has the outcomes it does. Researchers need to be clear about which outcomes they are studying, and they need to make sure that they use previous knowledge on the topic, as well as consider evidence that comes from discipline-based education research and from studies on topics such as retention and persistence. Multidisciplinary teams are critical to conducting this research, which bridges the expertise of education researchers, STEM educators, social scientists, natural scientists, and engineers.
Whether the goal is to evaluate an existing program or to modify a program to better achieve a particular student outcome, funders, program administrators, and faculty need to keep in mind the importance of rigorous method design and identify the specific set of questions of interest. This may include validating existing tools and/or developing better tools before questions that are more causal can be addressed. Moreover, the state of the existing evidence may suggest that additional descriptive studies are needed before a theory or model can be developed that identifies potential mechanisms for further investigation in that setting.
Barab, S., and Squire, K. (2004). Design-based research: Putting a stake in the ground. Journal of the Learning Sciences, 13(1), 1-14.
Blockus, L. (2016). Strengthening Research Experiences for Undergraduate STEM Students: The Co-Curricular Model of the Research Experience. Paper commissioned for the Committee on Strengthening Research Experiences for Undergraduate STEM Students. Board on Science Education, Division of Behavioral and Social Sciences and Education. Board on Life Sciences, Division of Earth and Life Studies. National Academies of Sciences, Engineering, and Medicine. Available: http://nas.edu/STEM_Undergraduate_Research_Apprentice.
Brownell, S.E., Kloser, M.J., Fukami, T., and Shavelson, R. (2012). Undergraduate biology lab courses: Comparing the impact of traditionally based “cookbook” and authentic research-based courses on student lab experiences. Journal of College Science Teaching, 41(4), 36-45.
Brownell, S.E., Price, J.V., and Steinman, L. (2013). A writing-intensive course improves biology undergraduates’ perception and confidence of their abilities to read scientific literature and communicate science. Advances in Physiology Education, 37(1), 70-79.
Brownell, S.E., Wenderoth, M.P., Theobald, R., Okoroafor, N., Koval, M., Freeman, S., Walcher-Chevillet, C.L., and Crowe, A.J. (2014). How students think about experimental design: Novel conceptions revealed by in-class activities. BioScience, 64(2), 125-137.
Byars-Winston, A.M., Branchaw, J., Pfund, C., Leverett, P., and Newton, J. (2015). Culturally diverse undergraduate researchers’ academic outcomes and perceptions of their research mentoring relationships. International Journal of Science Education, 37, 2533-2554.
Byars-Winston, A., Rogers, J., Branchaw, J.L., Pribbenow, C.M., Hanke, R., and Pfund, C. (2016). New measures assessing predictors of academic persistence for historically underrepresented racial/ethnic undergraduates in STEM fields. CBE–Life Sciences Education, 15(3), ar32. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008879/pdf/ar32.pdf [February 2017].
Cobb, P., Confrey, J., diSessa, A., Lehrer, R., and Schauble, L. (2003). Design experiments in education research. The Educational Researcher, 32(1), 9-13.
Creswell, J.W., Plano Clark, V.L., Gutmann, M.L., and Hanson, W.E. (2003). Advanced mixed methods research designs. Pp. 209-240 in A. Tahakkori and C. Teddlie (Eds.), Handbook of Mixed Methods in Social and Behavioral Research. Thousand Oaks, CA: Sage Publication, Inc.
Dasgupta, A.P., Anderson, T.R., and Pelaez, N. (2014). Development and validation of a rubric for diagnosing students’ experimental design knowledge and difficulties. CBE–Life Sciences Education, 13(2), 265-284.
Dolan, E. (2016). Course-Based Undergraduate Research Experiences: Current Knowledge and Future Directions. Paper commissioned for the Committee on Strengthening Research Experiences for Undergraduate STEM Students. Board on Science Education, Division of Behavioral and Social Sciences and Education. Board on Life Sciences, Division of Earth and Life Studies. National Academies of Sciences, Engineering, and Medicine. Available: http://nas.edu/STEM_Undergraduate_Research_CURE [February 2016].
Estrada, M., Woodcock, A., Hernandez, P.R., and Schultz, P. (2011). Toward a model of social influence that explains minority student integration into the scientific community. Journal of Educational Psychology, 103, 206-222.
Estrada, M., Woodcock, A., and Schultz, P. (2014). Tailored panel management: A theory-based approach to building and maintaining participant commitment to a longitudinal study. Evaluation Review, 38(1), 3-28.
Furtak, E.M., Seidel, T., Iverson, H., and Briggs, D.C. (2012). Experimental and quasi-experimental studies of inquiry-based science teaching: A meta-analysis. Review of Educational Research, 82(3), 300-329.
Gormally, C., Brickman, P., and Lutz, M. (2012). Developing a Test of Scientific Literacy Skills (TOSLS): Measuring undergraduates’ evaluation of scientific information and arguments. CBE–Life Sciences Education, 11(4), 364-377.
Greene, J.C., Caracelli, V.J., and Graham, W.F. (1989). Toward a conceptual framework for mixed-method evaluation designs. Educational Evaluation and Policy Analysis, 11(3), 255-274.
Hathaway, R., Biren. S., Nagda, A., and Gregerman, S. (2002). The relationship of undergraduate research participation to graduate and professional education pursuit: An empirical study. Journal of College Student Development, 43, 614-31. Available: http://www.eric.ed.gov/ERICWebPortal/detail?accno=EJ653327 [February 2017].
Hunter, A.B., Laursen, S.L., and Seymour, E. (2007). Becoming a scientist: The role of undergraduate research in cognitive, personal and professional development. Science Education, 91(1), 36-74.
Hurtado, S., Cabrera, N.L., Lin, M.H., Arellano, L., and Espinosa, L.L. (2009). Diversifying science: Underrepresented student experiences in structured research programs. Research in Higher Education, 50, 189-214.
Johri, A., and Olds, B.M. (2011). Situated engineering learning: Bridging engineering education research and the learning sciences. Journal of Engineering Education, 100(1), 151-185.
Nagda, B.A., Gregerman, S.R., Jonides, J., von Hippel, W., and Lerner, J.S. (1998). Undergraduate student-faculty research partnerships affect student retention. Review of Higher Education, 22, 55-72. Available: http://scholar.harvard.edu/files/jenniferlerner/files/nagda_1998_paper.pdf [February 2017].
Nasir, N.S., and Cooks, J. (2009). Becoming a hurdler: How learning settings afford identities. Anthropology and Education Quarterly, 40(1), 41-61. doi:10.1111/j.1548-1492.2009.01027.x
National Research Council. (2002). Scientific Research in Education. Committee on Scientific Principles for Education Research, Center for Education, Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.
National Research Council. (2007). Rising Above the Gathering Storm: Energizing and Employing America for a Brighter Economic Future. Committee on Prospering in the Global Economy of the 21st Centruy: An Agenda for American Science and Technology. Committee on Science and Engineering Policy. Washington, DC: The National Academies Press.
Östlund, U., Kidd, L., Wengström, Y., and Rowa-Dewar, N. (2011). Combining qualitative and quantitative research within mixed method research designs: A methodological review. International Journal of Nursing Studies, 48(3), 369-383.
Packard, B. (2016). Successful STEM Mentoring Initiatives for Underrepresented Students: A Research-Based Guide for Faculty and Administrators. Sterling, VA: Stylus.
Pfund, C. (2016). Studying the Role and Impact of Mentoring on Undergraduate Research Experiences. Paper commissioned for the Committee on Strengthening Research Experiences for Undergraduate STEM Students. Board on Science Education, Division of Behavioral and Social Sciences and Education. Board on Life Sciences, Division of Earth and Life Studies. National Academies of Sciences, Engineering, and Medicine. Available: http://nas.edu/STEM_Undergraduate_Research_Mentoring.
Reardon, S.F. (2013). The widening income achievement gap. Educational Leadership, 70(8), 10-16.
Sadler, D.R. (2010). Beyond feedback: Developing student capability in complex appraisal. Assessment and Evaluation in Higher Education, 35(5), 535-550.
Schultz, P.T. (2004). School subsidies for the poor: Evaluating the Mexican Progresa poverty program. Journal of Development Economics, 74, 199-250. Available: http://www.sciencedirect.com/science/article/pii/S0304387803001858 [February 2017].
Seymour, E., Hunter, A.B., Laursen, S.L., and DeAntoni, T. (2004). Establishing the benefits of research experiences for undergraduates in the sciences: First findings from a three-year study. Science Education, 88, 493-534.
Shortlidge, E., and Brownell, S. (2016). How to assess your CURE: A practical guide for instructors of course-based undergraduate research experiences. Journal of Microbiology and Biology Education, 17(3), 399-408.
Timmerman, B.E.C., Strickland, D.C., Johnson, R.L., and Payne, J.R. (2011). Development of a “universal” rubric for assessing undergraduates’ scientific reasoning skills using scientific writing. Assessment and Evaluation in Higher Education, 36(5), 509-547.