Productivity measurement involves a conceptually simple framework. However, for the case of higher education, complexities are created by a number of factors, the following among them:
- Institutions of higher education are multi-product firms (that is, they produce multiple kinds of services);
- Inputs and outputs of the productive process are heterogeneous, involve nonmarket variables, and are subject to quality variation and temporal change; and
- Measurement is impeded by gaps in needed data.
None of these complexities is completely unique to higher education, but their severity and number may be.1 In this chapter, we examine each of these complexities because it is essential to be aware of their existence, even while recognizing that practical first steps toward measurement of productivity cannot fully account for them.
1A wise tempering of this assertion is offered in the Carnegie Foundation report (Cooke, 1910:5):
It is usual in the industrial world to find manufacturers and business men who look upon their own undertakings as being essentially different from every other seemingly like undertaking. This could not be otherwise, because every one knows the difficulties of his own work better than those of his neighbor. So I was not surprised to learn that every college feels that it has problems unlike, and of greater difficulty of solution than, those to be encountered at other colleges. As a matter of fact, from the standpoint of organization, uniformity in collegiate management is a much easier problem than it is in most industries, because in any industry which I know about, the individual plants vary considerably more than do the colleges.
The greatest barriers to estimating the output of higher education derive from the fact that most institutions are multi-product firms.2 Large research universities produce undergraduate, professional and graduate degrees, research (including patents and pharmaceutical development), medical care, public service activities (especially at land grant universities), entertainment (such as cultural and athletic events), and other goods and services from a vector of capital, labor, and other inputs. Community colleges produce remedial education, degree, and certificate programs designed for graduates entering directly into careers, academic degree programs that create opportunities for transfer to four-year institutions, and programs designed to meet the needs of the local labor market and specific employers. It is admittedly extremely difficult to develop accounting structures that capture the full value of these outputs which accrue to both private and public entities.3
Firms and sectors in other areas of the economy produce multiple goods and services as well. An automobile manufacturer, for example, may produce cars, trucks, and airplane parts; a bank may offer loans as well as an automatic teller machine, checking accounts, and a range of other services. While it can be difficult to specify a functional form that represents the technological input-output relationships that exists for multi-product producers, it has been done (Christensen, Jorgensen, and Lau, 1973; Diewert, 1971). The range and nature of outputs produced by higher education, however, makes such estimation much more complex than for most other industries.
Though the panel’s recommendations in Chapters 5 and 6 focus on improving measurement of instructional inputs and outputs, research, and other scholarly and creative activities should be acknowledged in a comprehensive accounting because they are part of the joint product generated by universities. Among the difficult analytical problems created by joint production are how to separate research and development (R&D) production costs from degree production costs; how to compare the relative value of research and degree output; and how to assign faculty and staff time inputs into each (which raises the problem of separating different kinds of research, whether done at a faculty member’s initiative or with outside sponsorship). Judgments must be made in the process of separat-
2Triplett (2009:9) writes:
Measuring medical care output is difficult. Measuring the output of education is really hard.… The fundamental difficulty in education has little to do with test scores, class sizes and similar attributes that have figured so intensively in the discussion so far, though those measurement problems deserve the attention they are getting. More crucially, the output of educational establishments is difficult to measure because they are multi-product firms. They do not produce only education, they produce other things as well.
3McPherson and Shulenburger (2010) provide an excellent description of the multi-product nature of higher education institutions, plus a sensible first attempt to separate these into educational and other components. On the regional impact of universities, see Lester (2005).
ing the instructional and noninstructional components of the higher education production function.
Additionally, the linkage of research and research training coupled with responsibility for baccalaureate and professional education is relevant to the instructional component of output, and a defining and internationally distinctive characteristic of the U.S. system of higher education.4 Statistics on degrees and research activity document the central place of research universities in the generation of advanced degrees in scientific and engineering fields. Research universities—defined here, using the Carnegie Classification of Academic Institutions, as doctorate-granting institutions—are few in number (approximately 283) relative to the total number of U.S. colleges and universities (estimated at 4,200). Nonetheless, they awarded 70 percent of doctorates, 40 percent of master’s degrees, and 36 percent of bachelor’s degrees in science and engineering in 2007.5 The connection between research and graduate instruction in America’s universities is well understood and indeed is a core rationale for their substantial role in the national R&D system.6
While fully appreciating the value and variety of higher education outputs, the panel decided to focus on instruction. This decision entails analytical consequences. Specifically, a productivity measure of instruction can provide only a partial assessment of the sector’s aggregate contributions to national and regional objectives. In particular, the omission of some kinds of research creates a truncated view not only of what colleges and universities do but also of their critical role in national research innovation and postbaccalaureate educational systems. And, just as there should be measures of performance and progress for the instructional capabilities of educational institutions, measures should also be developed for assessing the value of and returns to the nation’s investments in research (especially the publicly funded portion). As outlined in the next chapter, we believe it is useful to assess and track changes in instructional productivity as a separate output.
4In his comparative study of national higher education systems, Burton Clark describes U.S graduate education as a “tower of strength,” adding: “This advanced tier has made American higher education the world’s leading magnet system, drawing advanced students from around the world who seek high-quality training and attracting faculty who want to work at the forefront of their fields” (Clark, 1995:116). Jonathan Cole (2009) cites a host of inventions that have fundamentally altered the way Americans live, contributed to U.S. economic competitiveness and raised the U.S. standard of living. He describes the United States as being “blessed with an abundance of first-rate research universities, institutions that are envied around the round,” further calling them “national treasures, the jewels in our nation’s crown, and worthy of our continued and expanded support” (Cole, 2009:x-xi).
5Estimates are from National Science Board (2010:2ff-7ff).
6Less well understood is how the coupling of instruction and research serves to attract high-performing researchers to faculty positions and to then provide incentives to undertake high-risk, frontier research.
The inputs and outputs of higher education display widely varying characteristics. The talents of students and teachers vary, as do their levels of preparedness and effectiveness in teaching and learning. At community colleges, for example, the student mix and to some extent instructor qualifications are typically quite unlike those for four-year research universities. In the composition of a student body, the following characteristics are widely acknowledged to affect educational outcomes and thus, the relationship between inputs and outputs:
- Economic inequality and mix of low-income and minority students.7
- Student preparedness. College preparedness affects the efficiency with which graduates can be produced. The link between academic preparation and performance in college is extremely strong (Astin, 1993; Horn and Kojaku, 2001; Martinez and Klopott, 2003).8 Remedial courses (those not part of the required total for graduation) also add to the cost of degree completion.
- Student engagement. Education is a service where the recipient must be an active partner in the process of creating value (“coproduction”).9 Variation in student motivation as well as ability strongly affects the learning process and, therefore, productivity.
- Peer effects. Student interaction affects both higher education outputs and inputs, and is difficult to measure. If the performance of a less prepared student is raised by being surrounded by better prepared students, this enhances learning and is part of the value of the higher education experience.10
The composition of an institution’s student body will influence how that institution will score in a performance metric. If the measure of interest is graduation rates, lower levels of student preparation will likely translate into lower productivity. If the metric is value added or marginal benefit, lower levels of student
7Two perennial policy goals are the promotion of productivity and equity, which, in different situations, can be complementary or conflicting. See Immerwahr et al. (2008) to get a sense of the views of college presidents regarding costs and equity.
8Adelman (1999) found completing high-level mathematics classes such as algebra II, trigonometry, and calculus in high school to be the best single predictor of academic success in college.
9Coproduction, introduced in Chapter 2, is recognized as a defining feature of service operations including education. See, for example, Sampson (2010:112). The complexity introduced by coproduction should be taken into account when developing productivity models. Notice, however, that issues of coproduction arise in the handling of input heterogeneity, and that there is no suggestion that student time should be priced into the productivity formula.
10Zimmerman (2003) shows students’ grades being modestly but significantly affected by living with high, medium, or low SAT score roommates.
preparation may lead to higher measured gains because the learning gap that can be closed is larger.11
In the same vein, faculty characteristics, skills, and sets of responsibilities will impact the quality of outputs produced by an institution. At the simplest level, college faculty can be categorized into two groups: tenure-track faculty and adjunct faculty. Tenure-track faculty are involved in teaching, research, and public service, with time allocation to each dependent on the type of institution they are associated with. At research universities, some time is obviously directed toward research, while at community colleges efforts are concentrated almost exclusively on teaching courses. Adjunct (nontenure track) faculty at all types of institutions are assigned to teach specific courses and may not have a long-term affiliation with an institution. In the current economic downturn, with universities facing budget cuts, the utilization of adjunct faculty has become increasingly prominent.12 This situation raises the need for analyses of the quality of instruction adjunct faculty provide. In the section on inputs, below, and again in Chapter 5, we return to the topic of variable student and instructor quality, and its implications for productivity measurement.
On the output side, the mix of degrees by level and subject varies across institutions. These differences affect both the production process and the labor market value of graduates. Institutions serve diverse student communities and pursue very different missions. While all aim to produce better educated and credentialed citizens, some institutions produce two-year degrees, certificates, and students equipped to transfer to four-year schools, while others produce bachelor’s and graduate degrees in a wide range of disciplines. Some of these outputs are inherently more expensive to produce than others. This heterogeneity means that production functions for institutions with different output mixes will display different characteristics.
Adjusting for the distribution of degrees requires data on the course-taking patterns of majors in different fields. The cost of a degree in chemistry, for example, depends on the number of math classes, laboratory classes, and general studies classes that such majors must take and the average cost of each type of class. Regression analyses using data at the state level have been used to produce estimates of the cost of degrees in different majors. In their models, Blose, Porter, and Kokkelenberg (2006) found that carefully adjusting per-student expenditures to account for the distribution of majors and the average costs to produce each
11See Carey (2011) on the relationship between student quality and the cost of obtaining educational outcomes.
12Even in the period before the recession, the trend was well established: According to data from the American Federation of Teachers, in the mid-1970s, adjuncts—both part-timers and full-timers not on a tenure track—represented just over 40 percent of professors; 30 years later, they accounted for nearly 70 percent of professors at colleges and universities, both public and private (see http://www.nytimes.com/2007/11/20/education/20adjunct.html?pagewanted=all [June 2012]).
major improved estimates of the impact of measured instructional expenditures on graduation and persistence rates.
Another approach to determining the cost of degrees in different fields involves working back from the level and field coefficients in the funding models used by various jurisdictions to adjust an institution’s total instructional cost on the basis of its data in the Integrated Postsecondary Education Data System (IPEDS). Data on degrees may not be robust enough for regression analysis, but they ought to be sufficient for this approach based on an assumed set of coefficients. In our data recommendations in Chapter 6, we advise that credit-hour data for productivity analyses be collected in a way that follows students in order to better control for differences in degree level and field. This manner of collecting data, discussed in Chapter 4, will be a big step forward in productivity and cost analysis.
The problem of heterogeneity can be at least partially addressed by cross-classifying institutions that enroll different kinds of students and offer various degree levels and subjects. One cell in the classification might be chemistry Ph.D. programs in research universities, for example, while another might be undergraduate business majors in comprehensive or two-year institutions. While measuring the relation between inputs and outputs for each cell separately would significantly limit variations in the educational production function, and also would help control for differences due to the joint production of research and public service, it would not eliminate the problem.13 Variation in student inputs will still be present to some extent since no institution caters specifically to only one kind of student, and students do not always enroll in school with a definite idea of what their major will be. Students also frequently change majors.
Such a multi-fold classification is most likely impractical for any kind of nationally based productivity measure. Two strategies exist for overcoming this problem. First, certain cells could be combined in the cross-classification by aggregating to the campus level and then creating categories for the standard institutional type classifications used elsewhere in higher education (e.g., research, master’s, bachelor’s, and two-year institutions). In addition to reducing the number of cells, aggregation to the campus level subsumes course-level issues that occur, for example, when engineering majors enroll in English courses. While compiling data at the campus level introduces a significant degree of approximation, this is no worse than would likely occur in many if not most industries elsewhere in the economy. Individual institutions can and should analyze productivity at the level of degree and subject, just as manufacturers should analyze productivity at the level of individual production processes. The techniques required to do so are beyond the panel’s purview.
An alternative is to control for key variations within the productivity model
13Of course, too much disaggregation could also be harmful, by reducing sample size too much to be useful for some analyses, for example.
itself. This might entail tracking the number of degrees and credits separately by level and subject for each institutional category. Cross-classification is carried as far as practical and then formulas are constructed to control for the variation in key remaining variables. The two approaches are analogous to poverty measures that combine cross classification and formulaic adjustment to allow for differences in wealth, in-kind benefits, or cost of living differences. The baseline model described in Chapter 4 employs both of these strategies.
Further complicating the accounting of inputs and outputs of higher education is that some, such as student time and nonpecuniary benefits of schooling, are nonmarket in nature—these factors are not bought and sold, do not have prices, and are not easily monetized. Additionally, not all of the benefits of an educated citizenry accrue to those paying for education. Such characteristics make these factors difficult to measure and, as a result, they are often ignored in productivity analyses. In this sense, higher education (and education in general) is analogous to activities in other areas, such as health care, home production, and volunteerism.14
Policy makers concerned with, say, a state’s returns on its investment in education should be interested in the full private and social benefits generated by their institutions. A truly comprehensive accounting would include the sector’s impact on outcomes related to social capital, crime, population health, and other correlates of education that society values. Much of this social value is intangible and highly variable; for example, social capital creation attributable to higher education may be greater at residential colleges and universities than at commuter colleges due to peer effects. These kinds of nonmarket quality dimensions are no doubt important parts of the production function, although they cannot yet be measured well. The policy implication is that the fullest possible accounting of higher education should be pursued if it is to be used for prioritizing public spending.15
That positive externalities are created by higher education is implicitly acknowledged as college tuition (public and private) is deliberately set below the
14See National Research Council (2005) for difficulties and approaches to measuring nonmarket inputs and outputs in an accounting framework.
15According to Brady et al. (2005), Texas generates $4.00 of economic output per each dollar put into higher education; California generates $3.85 for each dollar invested. The issue of states’ returns on education is complex. Bound et al. (2004) have shown that the amount that states spend on their public education systems is only very weakly related to the share of workers in the state with college degrees. Intuitively, this is because educated labor is mobile and can move to where the jobs are. Because of mobility, some social benefits of higher education accrue to the nation as a whole, not just to individual states. This may create an incentive for states to underinvest in their public higher education systems.
equilibrium market-clearing price, and institutions engage in various forms of rationing and subsidies to manage demand. Financial aid and other forms of cross-subsidization provide mechanisms to increase enrollment. Thus, because the resulting marginal cost does not align with price, total revenues do not equate with the value of an institution’s output; the distribution of revenues across different activities also cannot be used to construct a combined index of output that reflects relative consumer value.16
At the most aggregate level, Jorgenson and Fraumeni (1992) circumvented the problem of defining a measure of output by assuming that the value of education can be equated with the discounted value of students’ future incomes. They assess the contribution of education to human capital based on lifetime earning streams of graduates. Without additional assumptions, however, such a measure cannot be related back to a specific educational episode. The focus on outcomes may also subsume the role of education as a sorting mechanism to distinguish individuals of differing abilities, potentially overstating the contribution of education alone.17
A fully specified measure of higher education productivity would account for quality changes over time and quality variation across inputs and outputs by individual and institution. However, the kind of sophisticated data and analysis that would be required for accurate and sensitive quality measurement is very much in the long-term research phase. Nonetheless, it is important to conceptualize what is needed in order to make progress in the future.
Many sectors of the economy are characterized by wide variety of quality in outputs. Computers vary in processing speed, reliability and data storage capacities; luxury cars are built to higher standards than economy models; and the local hardware store may have superior customer service relative to superstores. Quality also changes over time—computers become faster, cars become safer, and power tools more powerful. What is unique about higher education is the lack of generally accepted measures of quality change or variation. And indeed, consumers may not be aware of the measures that do exist. This reinforces the conclusion of the previous section: variations in the demand for higher education cannot be taken as reflecting quality.
Many aspects of measuring quality change have been explored for other difficult-to-measure service sectors and progress has been made. In its price
16Within the framework of the national accounts, nonmarket activities such as education have been valued on the basis of the cost of their inputs. That approach rules out the possibility of productivity change.
17Spence (1973) developed models to identify these kinds of sorting effects, such as those whereby employers use credentials to identify workers with desirable, but not directly observable, traits.
measurement program, the Bureau of Labor Statistics (BLS) employs a number of methods for separating pure price and quality effects as it monitors products in its market basket over time. Methods for addressing some of the more generic issues (e.g., defining output in service sectors, adjusting for changing product characteristics) may translate to the education case.18 As described in Box 3.1, lessons from work on productivity and accounting in the medical care sector may exhibit the closest parallels to the education sector.19
Quality variations exist for nearly the full range of higher education inputs: students, faculty, staff, library, and physical facilities. Some dimensions of student quality can potentially be adjusted for using standardized test scores, high school grade point averages (GPAs), parents’ education, socioeconomic status, or other metrics. For comparing institutions, additional adjustments may be made to reflect variation in the student population characteristics such as full-time or part-time status, type of degrees pursued, and preparation levels, as well as the differing missions of institutions. Institutions with a high percentage of remedial or disadvantaged students need longer time-horizons to bring students to a given level of competency. They are often further burdened by smaller endowments, lower subsidies, and fewer support resources, all of which can lengthen time to degree for their students. Students select into institutions with different missions, according to their objectives. Institutional mission and character of student body should be considered when interpreting graduation rates, cost statistics, or productivity measures as part of a policy analysis.
Measures of student engagement generated from major student surveys such as the National Survey of Student Engagement (NSSE), the Community College Survey of Student Engagement (CCSSE), the Student Experience in the Research University (SERU), and the Cooperative Institutional Research Program (CIRP) can provide additional insight about the experiences of students enrolled in a given institution. This is important because the extent to which students devote effort to educationally purposeful activities is a critical element in the learning process. However, engagement statistics require careful interpretation because—beyond student attributes—they may also reflect actions by an institution and its faculty. For example, effective educational approaches or inspiring teachers can sometimes induce less well-prepared or -motivated students to achieve at higher levels. Thus, measures of student engagement can be instructive in understanding an institution’s capacity to enhance learning. Limitations of student surveys, such
18See National Research Council (2002) for a full description of the statistical techniques developed by BLS, the Bureau of Economic Analysis, and others for adjusting price indexes to reflect quality change.
19For more information, see National Research Council (2005, 2010a).
BOX 3.1 Higher Education and Medical Care
An analogy exists between higher education—specifically the production of individuals with degrees—and health care—specifically the production of completed medical treatments. Lessons for measuring the former can possibly be gleaned from the latter. Nearly all the complications making productivity measurement difficult can be found in both sectors:
- In both cases, additional outputs beyond degrees and treatments are produced.
- Product categories exhibit a wide variety—different kinds of degrees and different kinds of treatments are produced. Some of the products have more value than others, depending on how value is calculated. For example, an engineering degree may generate more income than a philosophy degree, and cardiovascular surgery produces greater health benefits (in terms of quality-adjusted life years, for instance) than does cosmetic surgery.
- Outcomes vary substantially such that some students or patients enjoy more successful outcomes than others. Some students get a great education and find worthwhile employment while others do not; some patients recover fully, while others die.
- Inputs are heterogeneous. Some students are better prepared and therefore enter college with a higher chance of graduation. Some patients are more fit than others and therefore have a greater probability of successful outcomes from medical treatment.
- Institutional missions also vary. Institutions of higher education range from small locally oriented colleges to large universities with national and international influence. Similarly, medical care treatments are administered in a variety of institutions with different missions, ranging from doctors’ offices to small local hospitals to large regional hospitals (which also jointly produce medical students).
as those noted above, are debated in Arum and Roksa (2010) who point out that, while data focused on “social engagement” are important for questions related to student retention and satisfaction outcomes, data on academic engagement are also needed if the goal is improved information about learning and academic performance. While NSSE does include a few questions related to social engagement (e.g., nonacademic interactions outside the classroom with peers), many more questions address areas of academic engagement such as writing, discussing ideas or doing research with faculty, integrative learning activities, and so forth.
In looking at any measure of student characteristics, it must be remembered that between-institution variance is almost always smaller than within-institution
- Varying production technologies are possible in both sectors. Educational institutions vary in their student/faculty ratios, their reliance on graduate student instructors and adjunct faculty, and their use of technology. Similarly, hospitals vary in doctors, nurse, and staff-to-patient ratios, in their reliance on interns and residents, and in their use of technology.
- Pricing and payment schemes also vary in the education and health sectors. Students can pay very different prices for apparently similar English degrees, just as patients can pay very different prices for apparently equivalent medical treatments. Also, in both sectors, payments are made not only by the primary purchaser but by a variety of third-party payers, such as the government or insurance companies. This complicates price estimation and costing exercises.
National Research Council (2010a) provides guidance on how to deal with the complexities associated with estimating inputs, outputs, and prices for medical care. Essentially, outputs are defined so as to reflect completed treatments for which outcomes can be quantified and thus quality of the output adjusted.
For performance assessment purposes, hospital mortality rates have been adjusted to reflect the complexity of the case mix that each deals with. For example, a tertiary care hospital with relatively high numbers of deaths may receive “credit” for the fact that its patients are sicker and hence at a greater likelihood of death. An analogous risk adjustment approach exists for higher education wherein schools that enroll less well-prepared students would be assigned additional points for producing graduates because the job is more difficult. One effect of such an adjustment is that the highly selective schools would be adjusted downward because more of their students are expected to graduate. Regression models have been used to attempt to make these adjustments using institutional resources and student characteristics to estimate relative performance. For example, the ranking system of the U.S. News & World Report takes into account characteristics of both the institution (wealth) and of the students (both entry test scores and socioeconomic background). In this system, SAT results essentially predict the rank order of the top 50 schools.
variance on proxy measures of quality (Kuh, 2003). This is because individual student performance typically varies much more within institutions than average performance does between institutions. This appears to be true at every level of education. Results from the NSSE reveal that for all but 1 of the 14 NSSE scales for both first-year and senior students, less than 10 percent of the total variance in student engagement is between institutions. The remaining variance—in several instances more than 95 percent—exists at the student level within a college or university. Thus, using only an institutional level measure of quality when estimating productivity can be misleading.
Beyond students—the “raw material” for learning20—many of the variations in inputs are no different in principle from those encountered in other industries. However, teacher quality bears special mention. The central role of teachers has long been reflected by the separation of faculty from other kinds of staff in the human resource policies of educational institutions and in government data collection. The distinction between tenure-track and nontenure-track faculty is also common. A recent trend has been to use inexpensive adjunct teachers who receive no benefits and lack job stability. Adjunct teachers may be of the same quality as tenure-track faculty member in terms of their ability to teach the material for a given course, but they tend to be less well integrated into the institution’s departmental structure. This, plus the fact that adjuncts do less (or no) research and do not participate comparably in departmental administration means that productivity improvements arising from shifts toward greater use of them may be more apparent than real.
Whether adjuncts are better or worse in motivation and ability than tenure-track faculty is an empirical question; indeed, it is these kinds of questions that highlight the need for better productivity measurement. And the answer is likely to differ depending on circumstances: in some settings, research and administrative responsibilities may improve teaching quality; in others, publication pressures may be such that tenure-track faculty are not selected for teaching quality and have strong incentives to neglect their teaching. These possibly significant factors are one reason why the model presented in Chapter 4 includes adjunct usage as an explicit factor.
A survey of the chief financial officers (CFOs) of 500 colleges by The Chronicle of Higher Education revealed their view that the most effective cost-cutting or revenue-raising strategies are to raise teaching loads and increase tuition.21 Another favored strategy is to reallocate tenure and adjunct faculty positions and, as a result, universities and colleges are increasingly scrutinizing faculty productivity. An example of this is the recent initiatives by the University of Texas and Texas A&M University. These initiatives resulted in the release of performance data identifying faculty teaching loads versus the cost to keep faculty members employed. The basic idea was to examine the number of students taught by an individual faculty member relative to the cost borne by the university in faculty salaries, benefits, and overhead. In the current atmosphere of accountability for public funds, this kind of measure of faculty performance will be used synonymously with faculty productivity, even though William Powers, president of the
20See Rothschild and White (1995) for a discussion of higher education and other services in which customers are inputs.
21See “Economic Conditions in Higher Education: A Survey of College CFOs” at http://chronicle.com/article/Economic-Conditions-in-Higher/128131/ [June 2012].
University of Texas at Austin, said that “there is no attempt to measure the quality, and therefore the true productivity, of the learning experience.”22
Faculty quality is often measured by grades and student evaluations. However, these outcomes can be heavily influenced by external factors which make it difficult for institutions to ascertain the contribution of faculty quality toward student success. In a controlled study by Carrell and West (2010), U.S. Air Force Academy students were randomly assigned to a permanent or a part-time instructor. Student grades over a course sequence were analyzed to evaluate teacher quality. The study found that the students taught by part-time instructors (more specifically, less experienced instructors who did not possess terminal degrees) received better grades in the lower level course taught (Calculus I). However, “the pattern reversed” for higher division courses (e.g., Calculus II), where the same students performed worse relative to those taught by the experienced professors in the introductory course. The study concluded that part-time instructors were more likely to teach the introductory course to improve students’ test performance for that course, while the permanent instructors were more likely to teach to improve students’ knowledge of the subject.23 Even though the study provides a useful direction for measuring faculty quality, it is not possible for all universities and colleges to conduct such controlled studies—though more could do so than actually do—and therefore such rich data may not typically be available for analysis.
The OECD and Institutional Management in Higher Education (IMHE) conducted a joint study on quality teaching practices in institutions of higher education around the world.24 The study pointed out that to understand fully the causal link between teaching and quality of learning, pioneering and in-depth evaluation methods and instruments are necessary. The National Research Council (2010b) report on measuring quality of research doctoral programs also outlined assessment methods for the quality of faculty involved in Ph.D. programs. The assessment utilized a broad data-based methodology that included more specific items than were included in previous years’ assessments, such as number of publications, citations, receipt of extramural grants for research, involvement in interdisciplinary work, demographic information, and number of awards and honors. Even though the NRC report addressed a complicated issue, it emphasized measuring faculty quality as it pertains to research-doctoral programs in four-year research universities. The absence of guidelines on measuring quality of instructional faculty in four-year universities and community colleges was attributed to the trend of relying on wages earned as a proxy of faculty quality. The
22Powers (2011), see http://www.statesman.com/opinion/powers-how-to-measure-the-learningexperience-at-1525080.html [June 2012].
23The Carrell and West finding is consistent with research by Bettinger and Long (2006), who found that the use of adjunct professors has a positive effect on subsequent course interest, and Ehrenberg and Zhang (2005), who found a negative effect on student graduation.
series of models presented in the next chapter uses faculty salaries to distinguish across labor categories.
We have made the point that higher education produces multiple outputs. Even for those concerned primarily with the instructional component, looking narrowly at the production of four-year degrees may be inadequate because degrees are far from homogeneous.25 Ideally, for valuing outputs, it would be possible to identify quality dimensions and make adjustments integrating relevant indicators of learning, preparation for subsequent course work, job readiness, and income effects. Even with full information, weights that would be applied to these characteristics would still entail subjective assessments. We emphasize, explicitly and unapologetically, that adjusting degrees or otherwise defined units of higher education output by a quantitative quality index is not feasible at the present time. However, it is possible to begin dealing with the problem through classification of institutions by type and mission and then, as described in Chapter 4, by seeking to assure that quality within each segment is being regularly assessed and at least roughly maintained.
When considering a productivity metric focusing on instruction, objectively measurable outputs such as credit hours earned and number of degrees granted represent the logical starting point; however, the quality problem arises almost immediately since these can be expected to differ across courses, programs, and institutions. While universities often use credit hours as a measure of the importance and difficulty of each course, these quality adjustments are incomplete because they do not reflect the full course value to students (and to society).
25Ehrenberg (2012) finds the presence of differential tuition by major or year in a program to be quite widespread in American public higher education, reflecting differences in the cost of providing education in different fields (or levels) or the expected private return to education in the field or year in the program. For example, among four-year public institutions offering primarily bachelor’s degrees, 23 percent have differential tuition by college or major. The University of Toronto has a differential tuition price policy that makes the expected relative public to private benefit of the degree one of the criteria for determining the level of public subsidy vs. private tuition (see http://www.governingcouncil.utoronto.ca/policies/tuitfee.htm [June 2012]). From an economic perspective, it makes sense to base tuition on the expected value of the major and the costs. One practical problem is that lower income students may be excluded from majors with high costs but high return. In addition, the needs of states may call for training students in areas that are high cost but provide limited economic return to students. Another problem is that the policy could lead to the production of the wrong kind of degrees over different time frames (for example, swings in demand for nurses). It is understandable why states may do this, but the policy may emphasize costs of degrees over students’ interests. On the other hand, if infinite cross-subsidization is not endorsed or feasible, and if cross subsidies have gone beyond what is seen as a reasonable level, then students may be required to bear a larger portion of costs.
Market-oriented assessments of educational output, with attention to how salary effects vary by area of study and by institutional quality, have been explored in the economics literature.26 Some studies attempt to assess evidence on the relationship between college cost and quality in terms of student and institutional performance. Beyond wage outcomes, indicators have reflected number of graduates who find a job within a given period after graduation; surveys of alumni satisfaction with their education; surveys of local business communities’ satisfaction with the university’s role in providing skilled workers; percentage of students taking classes that require advanced work; and number of graduates going on to receive advanced degrees.
Despite work in this area, many tough issues remain even if the goal is to estimate only the economic returns to education. How can wage data best be used for such calculations, and what is most important: first job, salary five years out, or discounted lifetime earnings?27 Furthermore, intergenerational and business cycle effects and changing labor market conditions cause relative wages to be in constant flux. Perhaps most importantly, student characteristics, demographic heterogeneity, accessibility and opportunities, and other factors affecting earnings must be controlled for in these kinds of economic studies. The reason full quality adjustment of the output measure is still a futuristic idea is that much research is still needed to make headway on these issues. The literature certainly offers evidence of the effects of these variables, but using precise coefficients in a productivity measure requires a higher level of confidence than can be gleaned from this research.
Beyond measures of credits and degrees produced, and their associated wage effects, is the goal of measuring the value added of student learning.28 The motivation to measure learning is that the number of degrees or credit hours completed is not, by itself, a complete indicator of what higher education produces. That is, earning a baccalaureate degree without acquiring the knowledge, skills, and competencies required to function effectively in the labor market and in society is a hollow accomplishment. Indicators are thus needed of the quality of the degree represented by, for example, the amount of learning that has taken
26See, for example, Pascarella and Terenzini (2005), Shavelson (2010), and Zhang (2005).
27Estimating lifetime earnings would introduce long lags in the assessments as some evidence suggests that the most quantitatively significant wage effects do not take effect until 8 to 10 years after undergraduate degree.
28Not only is learning difficult to measure in its own right, it would be challenging to avoid double counting with wage effects (assuming those who learn most do best in the job market).
place and of its post-college value (represented by income, occupational status, or other measure) beyond that attributable to the certificate or degree itself.
Ignoring measures of learning outcomes or student engagement (while, perhaps, emphasizing graduation rates) may result in misleading conclusions about institutional performance and ill-informed policy prescriptions. Is it acceptable for a school to have a high graduation rate but low engagement and outcomes scores? Or are individual and public interests both better served by institutions where students are academically challenged and demonstrate skills and competencies at a high level, even if fewer graduate? Strong performance in the areas of engagement, achievement, and graduation are certainly not mutually exclusive, but each says something different about institutional performance and student development. One conclusion from Pascarella and Terenzini’s (1991, 2005) syntheses is that the impact of college is largely determined by individual student effort and involvement in the academic, interpersonal, and extracurricular offerings on a campus. That is, students bear a major responsibility for any gains derived from their postsecondary experience. Motivation is also a nontrivial factor in accounting for post-college differences in income once institutional variables such as selectivity are controlled (Pascarella and Terenzini, 2005).
A number of value-added tests have been developed over the years: the Measure of Academic Proficiency and Progress (MAPP) produced by the Educational Testing Service, Collegiate Assessment of Academic Proficiency (CAAP) produced by the ACT Corporation, and the Collegiate Learning Assessment (CLA) designed by RAND and the Council for Aid to Education. The CLA is most specifically designed to measure valued added at the institutional level between the freshman and senior years.29 This kind of quality adjustment is desirable at the level of the institution or campus for purposes of course and program improvement, but is unlikely to be practical anytime soon for the national measurement of productivity in higher education. It is beyond the scope of this panel’s charge to resolve various longstanding controversies, such as using degrees and grades as proxies for student learning versus direct measures of learning as represented by MAPP, CAPP, and CLA. Nonetheless, it is important to work through the logic of which kinds of measures are relevant to which kinds of questions.30
The above kinds of assessments show that even identical degrees may represent different quantities of education produced if, for example, one engineering graduate started having already completed Advanced Placement calculus and physics while another entered with a remedial math placement. Modeling approaches have been developed to estimate time to degree and other potentially
29The Voluntary System of Accountability (VSA), which has been put forth by a group of public universities, is a complementary program aimed at supplying a range of comparable information about university performance, but it is less explicitly linked to a notion of value added by the institution. Useful discussions of the merits of assessment tests are provided in Carpenter and Bach (2010) and Ewell (2009a).
30See Feller (2009) and Gates et al. (2002).
relevant indicators of value-added learning outcomes and student engagement (Kuh et al., 2008). These take into account entering student ability as represented by pre-college achievement scores (ACT, SAT) and prior academic performance, other student characteristics such as enrollment status (full- or part-time), transfer status, and financial need (Wellman, 2010).
Popular proxies for institutional quality such as rankings are flawed for the purpose of estimating educational productivity. The major limitation of most rankings and especially that of U.S. News & World Report is they say almost nothing about what students do during college or what happens to them as a result of their attendance. As an illustration of the limitations of most ranking systems, only one number is needed to accurately predict where an institution ranks in U.S. News & World Report: the average SAT/ACT score of its enrolled students (Webster, 2001). The correlation between U.S. News & World Report’s rankings (1 = highest and 50 = lowest) and institutional average SAT/ACT score of the top 50 ranked national universities was –0.89 (Kuh and Pascarella, 2004). After taking into account the average SAT/ACT score, the other indices included in its algorithm have little meaningful influence on where an institution appears on the list.
This is not to say that selectivity is unrelated to college quality. Peers substantially influence students’ attitudes, values, and other dimensions of personal and social development. Being in the company of highly able people has salutary direct effects on how students spend their time and what they talk about. Hoxby (1997, 2000, 2009) has quantified the returns to education and shown that the setting of highly selective schools contributes to the undergraduate education of at least some subsets of students. More recently, Bowen, Chingos, and McPherson (2009) present evidence that institutional selectivity is strongly correlated with completion rates, controlling for differences in the quality and demographics of enrolled students as well as factors such as per student educational expenditures. The authors argue that students do best, in terms of completion rates, when they attend the most selective schools that will accept them, due in part to peer effects. A related point, also documented in Bowen, Chingos, and McPherson (2009:198ff), is that productivity is harmed greatly by “undermatching”—the frequent failure of well-prepared students, especially those from poor families, to go to institutions that will challenge them properly. Hoxby (1997, 2009) also shows that improved communications and other factors creating national markets for undergraduate education have improved the “matching” of students to institutions and thereby improved outcomes.31
At the same time, research shows that other factors are important to desired outcomes of college. These include working collaboratively with peers
31López Turley, Santos, and Ceja (2007) have also studied “neighborhood effects,” such as the impact on education outcomes of low-income Hispanics locked into local areas due to family or work concerns.
to solve problems, study abroad opportunities, service learning, doing research with a faculty member, and participating in learning communities (Pascarella and Terenzini, 2005). Longitudinal data from the National Study of Student Learning and cross-sectional results from the NSSE show that institutional selectivity is a weak indicator of student exposure to good practices in undergraduate education—practices such as whether faculty members clearly articulate course objectives, use relevant examples, identify key points, and provide class outlines (Kuh and Pascarella, 2004). These kinds of practices and experiences are arguably much more important to college quality than enrolled student ability alone.
In other words, selectivity and effective educational practices are largely independent, given that between 80 to 100 percent of the institution-level variance and 95 to 100 percent of the student-level variance in engagement in the effective educational practices measured by NSSE and other tools cannot be explained by an institution’s selectivity. This is consistent with the substantial body of evidence showing that the selectivity of the institution contributes minimally to learning and cognitive growth during college (Pascarella and Terenzini, 2005). As Pascarella (2001:21) concluded,
Since their measures of what constitutes “the best” in undergraduate education are based primarily on resources and reputation, and not on the within-college experiences that we know really make a difference, a more accurate, if less marketable, title for [the national magazine rankings] enterprise might be “America’s Most Advantaged Colleges.”
Other measures of educational quality are worth considering, given the increasing diversity of college students and their multiple, winding pathways to a baccalaureate degree. These could include goal attainment, course retention, transfer rates and success, success in subsequent course work, year-to-year persistence, degree or certificate completion, student and alumni satisfaction with the college experience, student personal and professional development, student involvement and citizenship, and postcollegiate outcomes, such as graduate school participation, employment, and a capacity for lifelong learning. Measures of success in subsequent coursework are especially important for students who have been historically underrepresented in specific majors and for institutions that provide remedial education. Participation in high-impact activities—such as first-year seminars, learning communities, writing-intensive courses, common intellectual experiences, service learning, diversity experiences, student-faculty research, study abroad, internships and other field placements, and senior capstone experiences—might also be useful indicators of quality, as they tend to be associated with high levels of student effort and deep learning (Kuh, 2008; Swaner and Brownell, 2009).
The two most relevant points for thinking about how to introduce explicit quality adjustment into higher education output measures may be summarized as follows:
- Research on student engagement and learning outcomes is promising. This area of research has established a number of high-impact educational practices and experiences. Even where direct measures of student learning are not available, the existence of these practices could be used as proxies in evaluating the quality of educational experience reflected in a given set of degrees or credit hours. This kind of evidence, even if it cannot currently be directly included in a productivity measure (such as that developed in Chapter 4) due to data or conceptual limitations, can be considered in a comprehensive performance evaluation of an institution, department, or system.
- Even the best statistical models that show institutional differences in the quality and quantity of education produced rarely allow for meaningful discrimination between one institution and another. Studies by Astin (1993), Kuh and Pascarella (2004), and Pascarella and Terenzini (2005) show that institutions do matter, but individual student differences matter more. Once student characteristics are taken into account, significant effects for institutions still exist, though the difference between any two given institutions, except for those at the extreme ends of the distribution, will often be small.
Adding to the complexity of productivity measurement is the fact that various policy and administrative actions require information aggregated at a number of different levels. Institution and state level measures are frequently needed for policy and are relevant to the development of administrative strategies. A major motivation for analyzing performance at these levels is that policy makers and the public want to know which institutions and which systems are performing better and how their processes can be replicated. Prospective students (and their parents) also want to know which institutions are good values. As we have repeatedly pointed out, for many purposes, it is best to compare institutions of the same type.
A course can be envisioned as the atomistic element of learning production, and the basic building block of productivity measurement at the micro level. For example, this may be expressed as the number of semester credits produced from a given number of faculty hours teaching. However, increasingly, courses themselves can be broken down further to examine quantitative and qualitative aspects within the course or classroom unit (Twigg, 2005). Classroom technology is changing rapidly. The introduction of scalable technologies is important, as are the effects of class size and technology. The technology of how education is delivered across and within categories (disciplines, institutions, etc.) varies widely.
Flagship state universities often have big classes while private colleges often have smaller ones. The latter is almost certainly more expensive on a per unit of output basis; less is known about the quality of the outcome. Those students that can make college choices based on tradeoffs in price and perceived quality offered by the range of options. Adding to the complexity is the faculty mix, including the use of graduate student instructors or adjunct faculty. This may also affect cost and quality of delivering credit hours.
A growing body of research and an increasing number of programs assess efficiencies at the course level seeking cost, quality tradeoffs that can be exploited. For example, the National Center for Academic Transformation (NCAT) develops programs for institutions to improve efficiency in production of higher education through course redesign.32 In the NCAT model, the redesign addresses whole courses (rather than individual classes or sections) to achieve better learning outcomes at a lower cost by taking advantage of information technologies. Course redesign is not just about putting courses online, but rather rethinking the way instruction is delivered in light of the possibilities that technology offers. NCAT reports that, on average, costs were reduced by 37 percent in redesigned courses with a range of 9 to 77 percent. Meanwhile, learning outcomes improved in 72 percent of the redesigned courses, with the remaining 28 percent producing learning equivalent to traditional formats. Appendix B to this volume provides a description of how NCAT measures comparative quality and cost of competing course design models.
For some purposes, an academic department or program is a more appropriate unit of analysis.33 This is because input costs as well as output valuations that markets, societies, and individuals place on various degrees vary by majors or academic field.34 Collecting physical input and output data that can be associated with specific departments or fields of study within an institution provides maximum flexibility as to how the production function will actually be organized, and also provides the data needed for productivity measurement.
Despite these advantages, department-based analysis is inappropriate for determining sector-based productivity statistics. One difficulty is that it is not easy to compare institutions based on their departmental structures. What counts
32NCAT is an independent, not-for-profit organization dedicated to the effective use of information technology to improve student learning outcomes and reduce costs in higher education. Since 1999, NCAT has conducted four national programs and five state-based course redesign programs, producing about 120 large-scale redesigns. In each program, colleges and universities redesigned large-enrollment courses using technology to achieve quality enhancements as well as cost savings. Participating institutions include research universities, comprehensive universities, private colleges, and community colleges in all regions of the United States.
33Massy (2010) presents one effort to systematize the course substructure (using physical rather than simply financial quantities) for purposes of aggregation.
34See DeGroot et al. (1991) and Hare and Wyatt (1988) for estimates of cost/production functions for university research-graduate education.
as a department in one institution may be two departments in another or simply a program in a third. This is one reason why IPEDS does not require institutions to specify faculty inputs and other expenditures by department.
A framework that tracks students through fields of study has an analytical advantage over a framework that uses department as the unit of analysis when the concern is interactive effects. For example, the quality (and possibly quantity) of output associated with a labor economics class (in which students must write empirical research papers) clearly depends upon what they learn in their introductory statistics classes. Thus, one department’s productivity is inherently linked to another’s. While institutions allocate resources at the department level, productivity analysis can be enhanced by creating a representative student in various majors that captures all coursework in all departments. As discussed in Chapter 6, such an approach would require an extension of data collection capabilities, which possibly could be integrated with the IPEDS data system.35
A major source of demand for performance measures is to inform rankings and provide accountability, generally at the campus level. This aggregation level is analogous to productive units, such as automobile plants or hospitals, frequently monitored in other sectors. It is a logical starting place for many key applications of productivity measurement as it is easier to think of the practical value (ideas for improving efficiency) at this level than at the state or higher levels of aggregation—at least in terms of production processes. Certainly there is value for a university to track its productivity over time.
Of course, campus level productivity measurement invites inter-institution comparisons as well. We discussed earlier how heterogeneity of inputs and outputs requires segmentation by institutional type. It is not obvious exactly how many categories are needed to make groups of institutions sufficiently homogeneous so that productivity calculations are meaningful. As a starting point for defining and classifying institutional types, we can use basic categories consistent with the Carnegie Classification of Academic Institutions:
- credit hours not resulting in a degree (continuing education);
- community colleges providing associate’s degrees certificates and the possibility of transferring to a four-year college;
- colleges granting bachelor’s degrees;
- colleges and universities granting master’s degrees; and
- universities granting doctorates.
35The UK’s Higher Education Funding Council for England collects cost and output data by field of study.
Within an institutional category, it makes more sense to compare costs and outcomes across campuses. Measurement problems associated with heterogeneity of inputs and outputs are dampened when factors such as percentages of students in particular programs remain constant over time. However, even within categories of higher education institutions, characteristics vary and multiple functions are performed. For example, a university system (such as Florida’s) may enjoy high four-year graduation rates in part due to strict requirement that less well-prepared students attend two-year institutions. Even so, segmenting the analysis by institutional type seems to be a prerequisite to accurate interpretation of various performance and cost metrics.
It is also worth noting that data collection at the campus level is simpler than it is at the course or department level. Aggregation at the campus level consolidates the effects of out-of-major courses and does not require allocating central services and overheads among departments. Reporting data at the campus level that is potentially useful for productivity measurement does not require weighting departmental inputs and outputs. Estimating total labor hours for the campus as a whole is equivalent to summing the hours for the individual departments, but the data collection process is much simpler. Summing student credit hours and awards also is straightforward although, as discussed in Chapter 4, a complication arises when linking enrollments to degrees by field.
For some purposes, it is useful to have productivity statistics at state, multi-campus system, or even national levels (see Box 3.2). For example, there have been efforts to develop state-by state “report cards” for tracking higher education outcomes, such as those reflected in student learning or skills assessments (Ewell, 2009). Additionally, as we discuss in the recommendations chapters, sometimes it makes sense to follow students at the state level so that events such as inter-institution transfers and measures such as system wide completion rates can be tracked.
One approach for generating state-level data is to aggregate the campus-level productivity measures described earlier. For example, if a system has a research-university campus, several baccalaureate campuses, and a two-year campus, productivity statistics could be calculated for each campus and compared with the averages for the segment into which the campus falls. An overall figure for the system or state could be obtained by aggregating the campus statistics. Thought will need to be given to the weights used in the aggregation but, in principle, the problem does not appear to be unsolvable.
Macro or Sector Level Accounting
The U.S. statistical agencies do not currently produce a measure of education sector productivity, although some components of such a measure are available. The Bureau of Economic Analysis (BEA) produces several nominal and real higher education consumption measures. The National and Income and Product Accounts (from which the nation’s gross domestic product statistics are estimated) include real and nominal measures for education personal consumption expenditures (PCE) and for education consumption expenditures aross all government levels. The PCE tables include expenditures for books, higher education school lunches, and two other expenditure categories: (1) nonprofit private higher education services to households and (2) proprietary and public education. The nominal value of these two components is deflated by the BLS CPI-U college tuition and fees price index to produce an inflation-adjusted measure. The nominal value for gross output of nonprofit private higher education services to households is deflated by an input cost-based measure, which is a fixed weight index. This input cost-based deflator is constructed from BLS Quarterly Census of Employment and Wages, PPI, and CPI data. Although BEA measures the nominal value of education national income components such as wages and salaries and gross operating surplus (profits, rents, net interest, etc.), it does not produce real measures of these education input income components. Accordingly, BEA data would have to be supplemented with other data to create a measure of education productivity.
Beyond the United States, a mandate from Eurostat motivated European Union members and others to undertake research on how to measure education output and inputs. In the United States, most of this kind of research has focused on elementary and secondary education. In the United Kingdom, debate about how to measure government output, including education, resulted in the formation of the Atkinson Commission. However, though there have been calls to do so, no consensus has been reached about how to measure the real output of education independently from inputs.
A different approach to understanding higher education productivity would be to look at more indirect measures. One possibility is to use information such as that slated to be released in 2013 by the Programme for the International Assessment of Adult Competencies (PIAAC) of the OECD. PIAAC will assess adults’ literacy and numeracy skills and their ability to solve problems in technology-rich environments. It will also collect a broad range of information from the adults taking the survey, including how their skills are used at work and in other contexts such as in the home and the community. Ideally, in addition to educational attainment, information on college major, previous work experience, and the dates and types of higher education institutions attended is desired to estimate higher education productivity based on PIAAC-collected data. Accordingly, PIAAC and other skill-based surveys might be a better indicator of human capital, rather than higher education output or productivity.
In this chapter, we have described how measuring productivity in higher education is especially challenging relative to the simple textbook model. Joint production of multiple outputs, heterogeneous inputs and outputs, quality change over time, and quality variation across institutions and systems all conspire to add complexity to the task. In order to advance productivity measurement beyond its current nascent state, it is necessary to recognize that not all of the complexities we have catalogued can be adequately accounted for at least at the present time. The panel recognizes the difficulties of moving from the conceptual level of analysis (Chapters 1-3), which is surely the place to start, to empirical measurement recommendations. Like other economic measures in their incipient stages—such as GDP estimates and the national economic accounts on which they rest (particularly early on in their development)—new measures of higher education productivity will be flawed.
Because the performance of the sector cannot be fully organized and summarized in a single measure, it becomes all the more important to bear the complexities in mind and to monitor supporting information, especially regarding the quality of output (e.g., student outcomes). Without this awareness, measures will surely be misused and improper incentives established. For example, the danger of incentivizing a “diploma mill,” pointed out earlier, is real. Measuring performance is a precursor to developing reward structures that, in turn, incentivize particular behavior.
Here, we can only reiterate that the productivity measure proposed in Chapter 4—or any single performance metric for that matter—if used in isolation, will be insufficient for most purposes, particularly those linked to accountability demands. For the most part, a productivity measure will not be of great use for improving performance at the institutional level. What is relevant is the question of whether being able to measure higher education productivity in the aggregate will produce a better policy environment, which may in turn lead to indirect productivity improvements over time.