How can we know the impact of an educational experience on a student? To answer this question, we first must ask what kinds of student outcomes we expect from that educational experience. If the answer is “a better SAT score,” “a better graduation rate,” or “more students liked the class,” then we can provide a quantitative measure of the impact. Indeed, many national education initiatives in the past two decades have set such quantitative benchmarks as proxies by which schools can measure their effectiveness in promoting student learning.1 But if the answer is “a change in perspective” or “new insights into the human condition,” narrative and qualitative evidence would likely offer more information.
As the committee approached its task to examine “the evidence behind the assertion that educational programs that mutually integrate learning experiences in the humanities and arts with science, technology, engineering, mathematics, and medicine (STEMM) lead to improved educational and career outcomes for undergraduate and graduate students,” committee members found it necessary to first examine the nature and meaning of “evidence,” and how different kinds of evidence (e.g., qualitative, quantitative, narrative, anecdotal observation, etc.) inform decision making in real-world contexts. This examination of the meaning and nature of evidence revealed that different stakeholders have different perceptions of what forms of evidence are appropriate and informative in decision making.
1 Goals 2000: Educate America Act, Pub. L. No. 103-227 (1994). No Child Left Behind Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425 (2002). America Competes Act of 2007, Pub. L. No. 110-69 (2007).
We found that, just as there are those who believe that the only legitimate approach to evidence-based decision making is one that relies on quantitative, controlled, randomized studies, there are many at the opposite end of the spectrum who believe that quantitative approaches are insufficient for evaluating deeply human and social issues because they reduce the complex experiences and behaviors of individuals to numbers and the characteristics of a population. The committee concluded that these perspectives represent the extreme ends of a continuum and, in most cases, evaluating something as complicated as human learning will require an approach that lies somewhere between strictly qualitative or quantitative methods. The optimal approach to evaluation will vary according to the questions that researchers or course designers seek to answer and the real-world constraints that influence which methodological approaches are possible.
This chapter describes the committee’s consideration of the value and nature of different forms of evidence and the realities of evidence-based decision making in real-world contexts. We discuss the challenges of generalizing the “evidence of improved educational and career outcomes” when different stakeholders have different interpretations of positive educational outcomes, assess outcomes in different ways, and use different pedagogical structures to approach integrative teaching and learning. We conclude that it is appropriate and necessary to consider multiple forms of evidence when considering the impact of an educational experience on a student, and that approaches to evaluating the impact of courses and programs that integrate the humanities, arts, and STEMM will necessarily be diverse and should be aligned with the specific learning goals of the course or program in its own institutional context.
As the committee considered the value of multiple forms of evidence in decision making, we found it helpful to consider examples of evidence-based decision making in real-world contexts. A brief examination of real-world, evidence-based decision making demonstrated that, although we often demand longitudinal, controlled, randomized, causal studies, it is not always possible, or even necessary, for researchers to collect this form of evidence for decision making. This is true even when we consider how evidence is gathered in fields traditionally associated with rigorous quantitative methods.
Take the field of medicine, for example. If we study the process by which drugs are brought to market, from their initial discovery to their final approval by the U.S. Food and Drug Administration (FDA), we learn that not every drug is required to be tested using double-blinded, random-
ized patient populations with a placebo control.2 In fact, many of the drugs currently on the market, including morphine, penicillin, vitamins, and aspirin, have never undergone FDA testing.3 Medical professionals consider the combination of efficacy and safety of certain compounds so obvious that they have been “grandfathered” into use. Other drug trials cannot adhere to strict experimental conditions because of the nature of certain diseases and conditions. For instance, some chemotherapies cannot be blinded because severe side effects make patients receiving the treatment easily distinguishable from those who received the control.4 For diseases that are extremely rare, there may be too few patients for control groups, so cross-over studies—in which a single group of patients is alternately treated and then taken off the treatment several times—are sometimes used instead (Delaney and Suissa, 2009).
Importantly, in many instances, clinical trials are performed only after preliminary studies have indicated the probability that clinical trials would succeed. Furthermore, many clinical trials in medicine begin only after an “n of 1” observation suggests to a biomedical researcher that a unique, unexpected, and possibly useful phenomenon has revealed itself. An example is John Fewster’s observation that a farmer who had been infected with cowpox was subsequently protected against smallpox (an example of anecdotal evidence in the form of clinical observation) (Boylston, 2018). In such cases, the n of 1 observation is followed up with further observations that may confirm or disconfirm the initial hypothesis. These observations are followed by experiments to determine whether the phenomenon can be reproduced—often without controls or measurements—in an animal or human subject under laboratory conditions. If such uncontrolled experiments are successful, fully controlled experiments may be performed in animals. If these warrant further study, then a safety trial, without any measurements of efficacy, will be done on a very small cohort of healthy human subjects who cannot benefit from the treatment. If the treatment is safe, then another very small clinical trial will be done on patients with end-stage disease to determine whether the treatment may be efficacious. Only after all these stages of preparation have been completed is a “gold-standard” study (double-blind, randomized, placebo-controlled) performed.
This description of the generation and use of multiple forms of evidence in medicine makes several relevant points for evaluating the impact that integrating the arts and humanities with STEMM has on students in
2 See https://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm073137.pdf (accessed August 17, 2017).
3 See https://www.fda.gov/Food/IngredientsPackagingLabeling/GRAS/ (accessed August 17, 2017).
4 See https://www.cancer.gov/about-cancer/treatment/research/placebo-clinical-trials (accessed August 17, 2017).
higher education. The first is that evidence, regardless of type, is developed in stages. New discoveries begin with observations that then progress to more formal study. Although it is true that an anecdote offers limited predictive evidence, it is also true that an anecdote may be the first step toward a meaningful discovery. Collecting evidence on the impact of anything, be it a drug or a curricular intervention, is a process that proceeds from uncontrolled observations or interventions through to formal qualitative and/or quantitative analyses, which may or may not eventually lead to randomized, controlled trials. To accept only the end point as legitimate evidence would undermine the process by which evidence is gathered and informs decision making. The committee therefore cautions against a one-size-fits-all decision-making process or framework, and instead encourages evaluative practices that incorporate multiple forms of evidence, along with contextualized metrics of quality and value.
Not all situations lend themselves to randomized, controlled trials. Integration seeks to achieve a wide variety of outcomes—content mastery, confidence, empathy, creativity, communication skills, teamwork, critical thinking, motivation, life-long learning attitudes, among others—and not all are equally amenable to quantitative approaches. Even when a quantitative approach would be most informative, it is often the case that a controlled, longitudinal, blinded, randomized study is not possible given the circumstances. In research on education there are limits to the number of variables that we can control, and even when we have considerable control over certain variables, there are some aspects of student experience that we cannot control—some students come to class every morning having eaten breakfast, while others do not; some students have parents that encourage them to study, while others do not; some students are holding down a job and supporting a family, while others do not have these responsibilities, and so on. Further, randomly assigning students to curricular “treatment groups” in higher education research is often challenging because course taking, program participation, and major selection are all areas where students are given the agency to choose to be involved or not. Even when choosing to be involved in something, students often vary in their level or manner of involvement. Though there are some methods to deal with nonrandom assignment, accounting for all the possible variations is an extremely difficult task.
The committee expects that integrated educational experiences will have multiple impacts on students, some of which lend themselves to quantitative or qualitative approaches to data collection and analysis, whereas others defy traditional measures of impact. For example, can the impact of a work of art or a musical performance always be sufficiently described in words or numbers? One could imagine a host of qualitative and quantitative data we could collect in an effort to try to address such a question,
but essential elements of a transcendent musical experience would still be overlooked in any conventional analysis. We maintain that a combination of evidence will be necessary to demonstrate the effects of integration on students, and such a combination will be more convincing than any one type of evidence. The collection and analysis of evidence regarding integration will be ongoing, cumulative, and multifaceted.
There are unique challenges to the evaluation of integrative educational courses and programs that stem from the fact that the various disciplines generate, evaluate, and disseminate evidence in different ways. Scholarly evidence in the fine arts is different from scholarly evidence in the humanities, and both are distinct from scholarly evidence produced in STEMM fields. Indeed, differences in judgments about what counts as evidence to warrant particular sorts of claims are among the key elements that define and distinguish disciplines. Communities of scholarly practice use different epistemologies and different methods of research (Kuhn, 2012). The artifacts they produce have different purposes and audiences. They evaluate the relevance and quality of evidence differently and consider different types and standards of acceptable evidence in making judgments. This is one of the greatest strengths of interdisciplinary work to begin with: the ability and willingness to draw from these rich traditions of data, research, and analysis.
Similarly, integrative teaching and learning can also disrupt the conventions for judging quality. Within particular disciplines, scholars seek evidence of the quality of learning that is valued within that discipline. For instance, while all students might be expected to give nearly identical answers when solving a mathematical equation, an instructor might expect quite different interpretations of the same passage from Hamlet. Yet for both subjects there are better and worse, correct and incorrect answers. The desired and measurable learning outcome in the math course might be the ability to fundamentally understand and logically solve the equation, while in the English course it may be the capacity to engage in a close reading and informed interpretation of a literary text. Understanding the impact of a course that integrates mathematics and English could be confounded by the challenge of developing an assessment tool that can adequately compare two very different approaches to education or discipline-specific conceptions of desirable learning outcomes. This is often further complicated by a lack of appropriate baseline data by which to measure any change in outcome with the introduction of a new pedagogical or curricular approach. The evidence gathered may or may not appear valuable, and evidence may
be applied, understood, or interpreted in diverse ways. Thus judging the quality of scholarly productions that cross disciplines and fields almost invariably involves different conventions for making judgments. These conventions can be difficult to reconcile, making an overall judgment of quality difficult. As a result, the diversity of disciplinary approaches and conventions can be an impediment to innovation in integrative education because such standards may get in the way of developing shared definitions of evidence and quality.
Given the challenges of evaluating interdisciplinary learning, there is scant empirical literature on the career, academic, and personal and interpersonal outcomes of these programs (Borrego et al., 2009; Borrego and Newswander, 2010; Ge et al., 2015; Ghanbari et al., 2014, 2015; Grant and Patterson, 2016), and the existing published research has often used less than rigorous methodological approaches. For example, a prominent feature of the existing research literature on integration is that it often lacks the use of appropriate control groups as a design element. This is not unique to studies of integration; education scholars often struggle in designing studies that examine students in their natural, nonrandom, learning environments (see earlier discussion). Instead, scholars implement the best designs they can to address associations between experiences and outcomes. These designs often, but not always, include the following features: longitudinal approaches, comparison groups, theoretically validated and empirically derived measures of student learning, and samples adequately robust to address the questions asked. Using these design elements as evidence of empirical trustworthiness is a strategy consistent with that offered in the largest and most cited literature synthesis in higher education: How College Affects Students, Volume 3: 21st Century Evidence That Higher Education Works (Mayhew et al., 2016). When possible, the committee considered studies that used many or all of these design elements to understand the relationship between integrated experiences and student learning outcomes. However, the generalizability of the student learning outcomes from integrative courses is limited by the fact that no two integrative courses or programs are exactly the same. Each course or program features a different syllabus or curriculum, instructor(s), pedagogical approach(es), class of students, and institutional infrastructure.5 Additionally, the same course with the same syllabus taught at multiple institutions will have varied results and effects on student learning. There is tremendous variation in how disciplines are integrated, as well as course-to-course variation in teaching and institution-to-institution variation in practice, to name just a few factors. If we use a medical analogy, and consider integration akin to a drug treatment, we see that, at a minimum, the kinds of integration being
5 This is often true of disciplinary curricula as well.
studied by the committee amount to thousands of different drug treatments. Even defining one of those treatments can be difficult. For example, what is the treatment for a course that mixes the restoration of a motorcycle with literary analysis of Zen and the Art of Motorcycle Maintenance? Is the entire course the treatment, is it the mixing of the two fields, is it altered or new teaching practices, or something else? Another challenge is presented by the fact that students may manifest the outcomes of an integrative education experience on a timescale far beyond a semester or other curricular milestone.
Despite the many challenges to evaluation and assessment, it is possible for faculty and researchers to chart a path toward a meaningful evaluation of an integrative learning experience. The challenge for faculty and scholars who strive to evaluate integrative learning experiences will be to develop frameworks that permit them to evaluate the student learning outcomes they value and hope to provide, rather than those that are easy to measure. This process begins with the faculty member or evaluator asking a series of questions: What are the expected student learning outcomes from this integrative course or program? What is the particular added value to the student experience from this integration? What kinds of methodological approaches will be informative given the learning goals of this integrative education experience? Which methodological approaches will be possible in light of the real-world constraints of evaluating student learning outcomes at my institution? The answers to these questions will necessarily vary according to the instructor, student population served, and specific institutional constraints, even if the same program is implemented across multiple schools in multiple contexts. But the information gained from this introspection can help guide the course or program evaluation appropriately, while also taking into account how access to resources and real-world constraints may shape which research designs are possible and practical. Given the great diversity of integrative approaches in higher education, it is not surprising that a great many diverse learning goals are associated with different courses and programs. Evaluation of these diverse courses and programs will depend on first articulating the expected learning outcomes of the particular course, program, or approach. The practice of “backward design” (described in Box 4-1) could be applied to develop meaningful evaluations of integrative courses and programs. Ultimately, the evidence for integrative approaches in higher education will need to be integrative itself. Causal mechanisms revealed through the use of carefully designed evaluations that feature control groups and longitudinal study designs may be the most informative approach for understanding the
impact of one integrative educational experience (and more convincing to one group of stakeholders) (Choy, 2002), whereas the impact of another integrative educational experience might be best understood (and more convincing to another group of stakeholders) through personal narratives of individual success or demonstrations of students’ work, as made possible by e-Portfolios (Gulbahar and Tinmaz, 2006). But to move forward with a research agenda will require proponents of integration rooted in the arts, humanities, and STEMM disciplines to come together with each other and with scholars of higher education research to agree on anticipated learning outcomes of integrative courses and programs and to develop appropriate approaches to assessment. The committee anticipates that the publication and dissemination of this report will stimulate this collaborative process, but we also expect that the sustained engagement of these diverse stakeholders over the long term will be necessary.
Multiple goals and student outcomes will be associated with integration, and these goals and outcomes should influence the approach to evaluating integrative programs. To support those efforts and bolster evidence-based evaluative practices, the committee suggests a research program to generate and collect robust forms of the following types of evidence in evaluating the impact of integrative educational experiences:
- Qualitative, longitudinal testimony from students, teachers, and administrators on the impact of integrative programs and courses
- Quantitative, controlled, longitudinal data on the impact of integration on students’ grades, attitudes, and competencies, retention and graduation rates, college major, employment status, salary, civic engagement, and satisfaction in life and career
- Narrative case studies that offer an in-depth description and analysis of the nature of integrated programs and courses, the learning goals of the students taking the course, and the goals of the professor teaching the course or program
- Detailed descriptions of the curricula and pedagogies of integrative courses and programs
- Portfolios of student work, performances, or exhibitions
Currently, only limited evidence from each of these categories is available. While this is unfortunate, it is not surprising. Decisions about curricular offerings in higher education are rarely made in response to research studies on the impact of specific educational approaches on students. Rather, decisions about curricular offerings are often driven by the interests and limitations of faculty (e.g., lack of time, greater emphasis on research productivity for professional advancement, etc.) and the cultural
and administrative structure of the department or university.6 Although we urge faculty and administrators to consider research on student learning and development when they make choices about curricular offerings, we kept the reality of curricular decision making in higher education in mind as we crafted the recommendations we put forth in this report. Though we conclude that the evidence on the impact of integrative approaches in higher education is limited, we do not think it is practical for institutions with an interest in adopting integrative approaches to wait for additional research before they begin supporting, implementing, and evaluating integrative approaches, especially if current discipline-based curricular approaches are not serving their students’ learning goals.
Given that the limited evidence available is promising and suggests positive learning and career outcomes for students, the committee is urging a new nationwide effort to develop and fund the research needed to establish appropriate protocols for the collection of the kinds of robust and multifaceted evidence that the broader educational community can accept and embrace. Institutions of higher education that have implemented integrative models, or that plan to do so, might consider working with faculty and higher education researchers to evaluate courses and programs in ways that could address a number of outstanding research questions. Among the questions researchers should consider addressing are the following:
- What are the shared or different hypothetical learning outcomes across similar and different integrative courses and programs?
- How might different levels of integration (e.g., multidisciplinary, interdisciplinary, transdisciplinary) impact students differently?
- How might in-course, within curriculum, and co-curricular integrative experiences impact students differently?
- How does integration between closely related disciplines (e.g., the integration of chemistry and biology) impact students compared with integration across more distantly associated disciplines (e.g., engineering and history)?
- What specific mechanisms contribute to the positive outcomes associated with integration?
- What role does pedagogical approach play relative to subject matter?
- How do students make sense of integrative experiences?
- What are the longitudinal and long-term influences of integration?
- Does the integration of the humanities, arts, and STEMM lead to
6 Lattuca, L. R., and Stark, J. S. (2011). Shaping the college curriculum: Academic plans in context. John Wiley & Sons.
integrative learning? If so, how is integrative learning revealed in student work products?
- Can a successful integrative program or course at one college or university be implemented at another such that it results in the same kinds of learning outcomes? To what extent can the positive impacts of integrative efforts be achieved independently of the idiosyncratic professors and administrators who implement such courses and programs?
- To what extent are students choosing or required to take integrative courses? Courses in their major? Courses outside their major? In other words, what does course taking tell us about the extent to which higher education is siloed?
This page intentionally left blank.