Click for next page ( 38


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 37
3 Why Measurement of Higher Education Productivity Is Difficult Productivity measurement involves a conceptually simple framework. How- ever, for the case of higher education, complexities are created by a number of factors, the following among them: Institutions of higher education are multi-product firms (that is, they produce multiple kinds of services); Inputs and outputs of the productive process are heterogeneous, involve nonmarket variables, and are subject to quality variation and temporal change; and Measurement is impeded by gaps in needed data. None of these complexities is completely unique to higher education, but their severity and number may be.1 In this chapter, we examine each of these complexi- ties because it is essential to be aware of their existence, even while recognizing that practical first steps toward measurement of productivity cannot fully account for them. 1A wise tempering of this assertion is offered in the Carnegie Foundation report (Cooke, 1910:5): It is usual in the industrial world to find manufacturers and business men who look upon their own undertak- ings as being essentially different from every other seemingly like undertaking. This could not be otherwise, because every one knows the difficulties of his own work better than those of his neighbor. So I was not surprised to learn that every college feels that it has problems unlike, and of greater difficulty of solution than, those to be encountered at other colleges. As a matter of fact, from the standpoint of organization, uniformity in collegiate management is a much easier problem than it is in most industries, because in any industry which I know about, the individual plants vary considerably more than do the colleges. 37

OCR for page 37
38 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION 3.1. BEYOND THE DEGREE FACTORY--MULTIPLE OUTPUTS AND JOINT PRODUCTION The greatest barriers to estimating the output of higher education derive from the fact that most institutions are multi-product firms.2 Large research universities produce undergraduate, professional and graduate degrees, research (including patents and pharmaceutical development), medical care, public service activi- ties (especially at land grant universities), entertainment (such as cultural and athletic events), and other goods and services from a vector of capital, labor, and other inputs. Community colleges produce remedial education, degree, and cer- tificate programs designed for graduates entering directly into careers, academic degree programs that create opportunities for transfer to four-year institutions, and programs designed to meet the needs of the local labor market and specific employers. It is admittedly extremely difficult to develop accounting structures that capture the full value of these outputs which accrue to both private and public entities.3 Firms and sectors in other areas of the economy produce multiple goods and services as well. An automobile manufacturer, for example, may produce cars, trucks, and airplane parts; a bank may offer loans as well as an automatic teller machine, checking accounts, and a range of other services. While it can be difficult to specify a functional form that represents the technological input- output relationships that exists for multi-product producers, it has been done (Christensen, Jorgensen, and Lau, 1973; Diewert, 1971). The range and nature of outputs produced by higher education, however, makes such estimation much more complex than for most other industries. Though the panel's recommendations in Chapters 5 and 6 focus on improv- ing measurement of instructional inputs and outputs, research, and other scholarly and creative activities should be acknowledged in a comprehensive accounting because they are part of the joint product generated by universities. Among the difficult analytical problems created by joint production are how to separate re- search and development (R&D) production costs from degree production costs; how to compare the relative value of research and degree output; and how to assign faculty and staff time inputs into each (which raises the problem of sepa- rating different kinds of research, whether done at a faculty member's initiative or with outside sponsorship). Judgments must be made in the process of separat- 2Triplett (2009:9) writes: Measuring medical care output is difficult. Measuring the output of education is really hard.... The fun- damental difficulty in education has little to do with test scores, class sizes and similar attributes that have figured so intensively in the discussion so far, though those measurement problems deserve the attention they are getting. More crucially, the output of educational establishments is difficult to measure because they are multi-product firms. They do not produce only education, they produce other things as well. 3McPherson and Shulenburger (2010) provide an excellent description of the multi-product nature of higher education institutions, plus a sensible first attempt to separate these into educational and other components. On the regional impact of universities, see Lester (2005).

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 39 ing the instructional and noninstructional components of the higher education production function. Additionally, the linkage of research and research training coupled with responsibility for baccalaureate and professional education is relevant to the instructional component of output, and a defining and internationally distinctive characteristic of the U.S. system of higher education.4 Statistics on degrees and research activity document the central place of research universities in the genera- tion of advanced degrees in scientific and engineering fields. Research universi- ties--defined here, using the Carnegie Classification of Academic Institutions, as doctorate-granting institutions--are few in number (approximately 283) relative to the total number of U.S. colleges and universities (estimated at 4,200). None- theless, they awarded 70 percent of doctorates, 40 percent of master's degrees, and 36 percent of bachelor's degrees in science and engineering in 2007. 5 The connection between research and graduate instruction in America's universities is well understood and indeed is a core rationale for their substantial role in the national R&D system.6 While fully appreciating the value and variety of higher education outputs, the panel decided to focus on instruction. This decision entails analytical con- sequences. Specifically, a productivity measure of instruction can provide only a partial assessment of the sector's aggregate contributions to national and re- gional objectives. In particular, the omission of some kinds of research creates a truncated view not only of what colleges and universities do but also of their critical role in national research innovation and postbaccalaureate educational systems. And, just as there should be measures of performance and progress for the instructional capabilities of educational institutions, measures should also be developed for assessing the value of and returns to the nation's investments in research (especially the publicly funded portion). As outlined in the next chapter, we believe it is useful to assess and track changes in instructional productivity as a separate output. 4In his comparative study of national higher education systems, Burton Clark describes U.S gradu- ate education as a "tower of strength," adding: "This advanced tier has made American higher educa- tion the world's leading magnet system, drawing advanced students from around the world who seek high-quality training and attracting faculty who want to work at the forefront of their fields" (Clark, 1995:116). Jonathan Cole (2009) cites a host of inventions that have fundamentally altered the way Americans live, contributed to U.S. economic competitiveness and raised the U.S. standard of living. He describes the United States as being "blessed with an abundance of first-rate research universities, institutions that are envied around the round," further calling them "national treasures, the jewels in our nation's crown, and worthy of our continued and expanded support" (Cole, 2009:x-xi). 5Estimates are from National Science Board (2010:2ff-7ff). 6Less well understood is how the coupling of instruction and research serves to attract high- performing researchers to faculty positions and to then provide incentives to undertake high-risk, frontier research.

OCR for page 37
40 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION 3.2. HETEROGENEITY OF INPUTS AND OUTPUTS The inputs and outputs of higher education display widely varying character- istics. The talents of students and teachers vary, as do their levels of preparedness and effectiveness in teaching and learning. At community colleges, for example, the student mix and to some extent instructor qualifications are typically quite unlike those for four-year research universities. In the composition of a student body, the following characteristics are widely acknowledged to affect educational outcomes and thus, the relationship between inputs and outputs: Economic inequality and mix of low-income and minority students. 7 Student preparedness. College preparedness affects the efficiency with which graduates can be produced. The link between academic prepara- tion and performance in college is extremely strong (Astin, 1993; Horn and Kojaku, 2001; Martinez and Klopott, 2003).8 Remedial courses (those not part of the required total for graduation) also add to the cost of degree completion. Student engagement. Education is a service where the recipient must be an active partner in the process of creating value ("coproduction"). 9 Variation in student motivation as well as ability strongly affects the learning process and, therefore, productivity. Peer effects. Student interaction affects both higher education outputs and inputs, and is difficult to measure. If the performance of a less pre- pared student is raised by being surrounded by better prepared students, this enhances learning and is part of the value of the higher education experience.10 The composition of an institution's student body will influence how that institu- tion will score in a performance metric. If the measure of interest is graduation rates, lower levels of student preparation will likely translate into lower produc- tivity. If the metric is value added or marginal benefit, lower levels of student 7Two perennial policy goals are the promotion of productivity and equity, which, in different situ- ations, can be complementary or conflicting. See Immerwahr et al. (2008) to get a sense of the views of college presidents regarding costs and equity. 8Adelman (1999) found completing high-level mathematics classes such as algebra II, trigonometry, and calculus in high school to be the best single predictor of academic success in college. 9Coproduction, introduced in Chapter 2, is recognized as a defining feature of service operations including education. See, for example, Sampson (2010:112). The complexity introduced by copro- duction should be taken into account when developing productivity models. Notice, however, that issues of coproduction arise in the handling of input heterogeneity, and that there is no suggestion that student time should be priced into the productivity formula. 10Zimmerman (2003) shows students' grades being modestly but significantly affected by living with high, medium, or low SAT score roommates.

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 41 preparation may lead to higher measured gains because the learning gap that can be closed is larger.11 In the same vein, faculty characteristics, skills, and sets of responsibilities will impact the quality of outputs produced by an institution. At the simplest level, college faculty can be categorized into two groups: tenure-track faculty and adjunct faculty. Tenure-track faculty are involved in teaching, research, and public service, with time allocation to each dependent on the type of institu- tion they are associated with. At research universities, some time is obviously directed toward research, while at community colleges efforts are concentrated almost exclusively on teaching courses. Adjunct (nontenure track) faculty at all types of institutions are assigned to teach specific courses and may not have a long-term affiliation with an institution. In the current economic downturn, with universities facing budget cuts, the utilization of adjunct faculty has become in- creasingly prominent.12 This situation raises the need for analyses of the quality of instruction adjunct faculty provide. In the section on inputs, below, and again in Chapter 5, we return to the topic of variable student and instructor quality, and its implications for productivity measurement. On the output side, the mix of degrees by level and subject varies across institutions. These differences affect both the production process and the labor market value of graduates. Institutions serve diverse student communities and pursue very different missions. While all aim to produce better educated and credentialed citizens, some institutions produce two-year degrees, certificates, and students equipped to transfer to four-year schools, while others produce bachelor's and graduate degrees in a wide range of disciplines. Some of these outputs are inherently more expensive to produce than others. This heterogeneity means that production functions for institutions with different output mixes will display different characteristics. Adjusting for the distribution of degrees requires data on the course-taking patterns of majors in different fields. The cost of a degree in chemistry, for ex- ample, depends on the number of math classes, laboratory classes, and general studies classes that such majors must take and the average cost of each type of class. Regression analyses using data at the state level have been used to produce estimates of the cost of degrees in different majors. In their models, Blose, Porter, and Kokkelenberg (2006) found that carefully adjusting per-student expenditures to account for the distribution of majors and the average costs to produce each 11See Carey (2011) on the relationship between student quality and the cost of obtaining educational outcomes. 12Even in the period before the recession, the trend was well established: According to data from the American Federation of Teachers, in the mid-1970s, adjuncts--both part-timers and full-timers not on a tenure track--represented just over 40 percent of professors; 30 years later, they accounted for nearly 70 percent of professors at colleges and universities, both public and private (see http://www. nytimes.com/2007/11/20/education/20adjunct.html?pagewanted=all [June 2012]).

OCR for page 37
42 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION major improved estimates of the impact of measured instructional expenditures on graduation and persistence rates. Another approach to determining the cost of degrees in different fields in- volves working back from the level and field coefficients in the funding models used by various jurisdictions to adjust an institution's total instructional cost on the basis of its data in the Integrated Postsecondary Education Data System (IPEDS). Data on degrees may not be robust enough for regression analysis, but they ought to be sufficient for this approach based on an assumed set of coeffi- cients. In our data recommendations in Chapter 6, we advise that credit-hour data for productivity analyses be collected in a way that follows students in order to better control for differences in degree level and field. This manner of collecting data, discussed in Chapter 4, will be a big step forward in productivity and cost analysis. The problem of heterogeneity can be at least partially addressed by cross- classifying institutions that enroll different kinds of students and offer various degree levels and subjects. One cell in the classification might be chemistry Ph.D. programs in research universities, for example, while another might be undergraduate business majors in comprehensive or two-year institutions. While measuring the relation between inputs and outputs for each cell separately would significantly limit variations in the educational production function, and also would help control for differences due to the joint production of research and public service, it would not eliminate the problem.13 Variation in student inputs will still be present to some extent since no institution caters specifically to only one kind of student, and students do not always enroll in school with a definite idea of what their major will be. Students also frequently change majors. Such a multi-fold classification is most likely impractical for any kind of nationally based productivity measure. Two strategies exist for overcoming this problem. First, certain cells could be combined in the cross-classification by aggregating to the campus level and then creating categories for the standard in- stitutional type classifications used elsewhere in higher education (e.g., research, master's, bachelor's, and two-year institutions). In addition to reducing the num- ber of cells, aggregation to the campus level subsumes course-level issues that occur, for example, when engineering majors enroll in English courses. While compiling data at the campus level introduces a significant degree of approxi- mation, this is no worse than would likely occur in many if not most industries elsewhere in the economy. Individual institutions can and should analyze pro- ductivity at the level of degree and subject, just as manufacturers should analyze productivity at the level of individual production processes. The techniques required to do so are beyond the panel's purview. An alternative is to control for key variations within the productivity model 13Of course, too much disaggregation could also be harmful, by reducing sample size too much to be useful for some analyses, for example.

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 43 itself. This might entail tracking the number of degrees and credits separately by level and subject for each institutional category. Cross-classification is carried as far as practical and then formulas are constructed to control for the variation in key remaining variables. The two approaches are analogous to poverty measures that combine cross classification and formulaic adjustment to allow for differ- ences in wealth, in-kind benefits, or cost of living differences. The baseline model described in Chapter 4 employs both of these strategies. 3.3. NONMARKET VARIABLES AND EXTERNALITIES Further complicating the accounting of inputs and outputs of higher educa- tion is that some, such as student time and nonpecuniary benefits of schooling, are nonmarket in nature--these factors are not bought and sold, do not have prices, and are not easily monetized. Additionally, not all of the benefits of an educated citizenry accrue to those paying for education. Such characteristics make these factors difficult to measure and, as a result, they are often ignored in productivity analyses. In this sense, higher education (and education in general) is analogous to activities in other areas, such as health care, home production, and volunteerism.14 Policy makers concerned with, say, a state's returns on its investment in education should be interested in the full private and social benefits generated by their institutions. A truly comprehensive accounting would include the sector's impact on outcomes related to social capital, crime, population health, and other correlates of education that society values. Much of this social value is intangible and highly variable; for example, social capital creation attributable to higher education may be greater at residential colleges and universities than at commuter colleges due to peer effects. These kinds of nonmarket quality dimensions are no doubt important parts of the production function, although they cannot yet be measured well. The policy implication is that the fullest possible accounting of higher education should be pursued if it is to be used for prioritizing public spending.15 That positive externalities are created by higher education is implicitly ac- knowledged as college tuition (public and private) is deliberately set below the 14See National Research Council (2005) for difficulties and approaches to measuring nonmarket inputs and outputs in an accounting framework. 15 According to Brady et al. (2005), Texas generates $4.00 of economic output per each dollar put into higher education; California generates $3.85 for each dollar invested. The issue of states' returns on education is complex. Bound et al. (2004) have shown that the amount that states spend on their public education systems is only very weakly related to the share of workers in the state with college degrees. Intuitively, this is because educated labor is mobile and can move to where the jobs are. Because of mobility, some social benefits of higher education accrue to the nation as a whole, not just to individual states. This may create an incentive for states to underinvest in their public higher education systems.

OCR for page 37
44 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION equilibrium market-clearing price, and institutions engage in various forms of rationing and subsidies to manage demand. Financial aid and other forms of cross-subsidization provide mechanisms to increase enrollment. Thus, because the resulting marginal cost does not align with price, total revenues do not equate with the value of an institution's output; the distribution of revenues across dif- ferent activities also cannot be used to construct a combined index of output that reflects relative consumer value.16 At the most aggregate level, Jorgenson and Fraumeni (1992) circumvented the problem of defining a measure of output by assuming that the value of educa- tion can be equated with the discounted value of students' future incomes. They assess the contribution of education to human capital based on lifetime earning streams of graduates. Without additional assumptions, however, such a measure cannot be related back to a specific educational episode. The focus on outcomes may also subsume the role of education as a sorting mechanism to distinguish individuals of differing abilities, potentially overstating the contribution of edu- cation alone.17 3.4. QUALITY CHANGE AND VARIATION A fully specified measure of higher education productivity would account for quality changes over time and quality variation across inputs and outputs by individual and institution. However, the kind of sophisticated data and analysis that would be required for accurate and sensitive quality measurement is very much in the long-term research phase. Nonetheless, it is important to conceptual- ize what is needed in order to make progress in the future. Many sectors of the economy are characterized by wide variety of quality in outputs. Computers vary in processing speed, reliability and data storage ca- pacities; luxury cars are built to higher standards than economy models; and the local hardware store may have superior customer service relative to superstores. Quality also changes over time--computers become faster, cars become safer, and power tools more powerful. What is unique about higher education is the lack of generally accepted measures of quality change or variation. And indeed, consumers may not be aware of the measures that do exist. This reinforces the conclusion of the previous section: variations in the demand for higher education cannot be taken as reflecting quality. Many aspects of measuring quality change have been explored for other difficult-to-measure service sectors and progress has been made. In its price 16Within the framework of the national accounts, nonmarket activities such as education have been valued on the basis of the cost of their inputs. That approach rules out the possibility of productivity change. 17Spence (1973) developed models to identify these kinds of sorting effects, such as those whereby employers use credentials to identify workers with desirable, but not directly observable, traits.

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 45 measurement program, the Bureau of Labor Statistics (BLS) employs a number of methods for separating pure price and quality effects as it monitors products in its market basket over time. Methods for addressing some of the more generic issues (e.g., defining output in service sectors, adjusting for changing product characteristics) may translate to the education case.18 As described in Box 3.1, lessons from work on productivity and accounting in the medical care sector may exhibit the closest parallels to the education sector.19 3.4.1. Inputs Quality variations exist for nearly the full range of higher education inputs: students, faculty, staff, library, and physical facilities. Some dimensions of stu- dent quality can potentially be adjusted for using standardized test scores, high school grade point averages (GPAs), parents' education, socioeconomic status, or other metrics. For comparing institutions, additional adjustments may be made to reflect variation in the student population characteristics such as full-time or part-time status, type of degrees pursued, and preparation levels, as well as the differing missions of institutions. Institutions with a high percentage of remedial or disadvantaged students need longer time-horizons to bring students to a given level of competency. They are often further burdened by smaller endowments, lower subsidies, and fewer support resources, all of which can lengthen time to degree for their students. Students select into institutions with different missions, according to their objectives. Institutional mission and character of student body should be considered when interpreting graduation rates, cost statistics, or pro- ductivity measures as part of a policy analysis. Measures of student engagement generated from major student surveys such as the National Survey of Student Engagement (NSSE), the Community College Survey of Student Engagement (CCSSE), the Student Experience in the Research University (SERU), and the Cooperative Institutional Research Program (CIRP) can provide additional insight about the experiences of students enrolled in a given institution. This is important because the extent to which students devote effort to educationally purposeful activities is a critical element in the learning process. However, engagement statistics require careful interpretation because-- beyond student attributes--they may also reflect actions by an institution and its faculty. For example, effective educational approaches or inspiring teachers can sometimes induce less well-prepared or -motivated students to achieve at higher levels. Thus, measures of student engagement can be instructive in understanding an institution's capacity to enhance learning. Limitations of student surveys, such 18See National Research Council (2002) for a full description of the statistical techniques developed by BLS, the Bureau of Economic Analysis, and others for adjusting price indexes to reflect quality change. 19For more information, see National Research Council (2005, 2010a).

OCR for page 37
46 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION BOX 3.1 Higher Education and Medical Care An analogy exists between higher education--specifically the production of individuals with degrees--and health care--specifically the production of com- pleted medical treatments. Lessons for measuring the former can possibly be gleaned from the latter. Nearly all the complications making productivity measure- ment difficult can be found in both sectors: In both cases, additional outputs beyond degrees and treatments are produced. Product categories exhibit a wide variety--different kinds of degrees and different kinds of treatments are produced. Some of the products have more value than others, depending on how value is calculated. For example, an engineering degree may generate more income than a philosophy degree, and cardiovascular surgery produces greater health benefits (in terms of quality-adjusted life years, for instance) than does cosmetic surgery. Outcomes vary substantially such that some students or patients enjoy more successful outcomes than others. Some students get a great educa- tion and find worthwhile employment while others do not; some patients recover fully, while others die. Inputs are heterogeneous. Some students are better prepared and there- fore enter college with a higher chance of graduation. Some patients are more fit than others and therefore have a greater probability of successful outcomes from medical treatment. Institutional missions also vary. Institutions of higher education range from small locally oriented colleges to large universities with national and inter national influence. Similarly, medical care treatments are administered in a variety of institutions with different missions, ranging from doctors' offices to small local hospitals to large regional hospitals (which also jointly produce medical students). as those noted above, are debated in Arum and Roksa (2010) who point out that, while data focused on "social engagement" are important for questions related to student retention and satisfaction outcomes, data on academic engagement are also needed if the goal is improved information about learning and academic performance. While NSSE does include a few questions related to social engage- ment (e.g., nonacademic interactions outside the classroom with peers), many more questions address areas of academic engagement such as writing, discussing ideas or doing research with faculty, integrative learning activities, and so forth. In looking at any measure of student characteristics, it must be remembered that between-institution variance is almost always smaller than within-institution

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 47 Varying production technologies are possible in both sectors. Educational institutions vary in their student/faculty ratios, their reliance on graduate student instructors and adjunct faculty, and their use of technology. Simi- larly, hospitals vary in doctors, nurse, and staff-to-patient ratios, in their reliance on interns and residents, and in their use of technology. Pricing and payment schemes also vary in the education and health sec- tors. Students can pay very different prices for apparently similar English degrees, just as patients can pay very different prices for apparently equivalent medical treatments. Also, in both sectors, payments are made not only by the primary purchaser but by a variety of third-party payers, such as the government or insurance companies. This complicates price estimation and costing exercises. National Research Council (2010a) provides guidance on how to deal with the complexities associated with estimating inputs, outputs, and prices for medi- cal care. Essentially, outputs are defined so as to reflect completed treatments for which outcomes can be quantified and thus quality of the output adjusted. For performance assessment purposes, hospital mortality rates have been adjusted to reflect the complexity of the case mix that each deals with. For ex- ample, a tertiary care hospital with relatively high numbers of deaths may receive "credit" for the fact that its patients are sicker and hence at a greater likelihood of death. An analogous risk adjustment approach exists for higher education wherein schools that enroll less well-prepared students would be assigned ad- ditional points for producing graduates because the job is more difficult. One ef- fect of such an adjustment is that the highly selective schools would be adjusted downward because more of their students are expected to graduate. Regression models have been used to attempt to make these adjustments using institutional resources and student characteristics to estimate relative performance. For ex- ample, the ranking system of the U.S. News & World Report takes into account characteristics of both the institution (wealth) and of the students (both entry test scores and socioeconomic background). In this system, SAT results essentially predict the rank order of the top 50 schools. variance on proxy measures of quality (Kuh, 2003). This is because individual student performance typically varies much more within institutions than average performance does between institutions. This appears to be true at every level of education. Results from the NSSE reveal that for all but 1 of the 14 NSSE scales for both first-year and senior students, less than 10 percent of the total variance in student engagement is between institutions. The remaining variance--in several instances more than 95 percent--exists at the student level within a college or university. Thus, using only an institutional level measure of quality when esti- mating productivity can be misleading.

OCR for page 37
50 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION series of models presented in the next chapter uses faculty salaries to distinguish across labor categories. 3.4.2. Outputs (and Outcomes) We have made the point that higher education produces multiple outputs. Even for those concerned primarily with the instructional component, looking narrowly at the production of four-year degrees may be inadequate because degrees are far from homogeneous.25 Ideally, for valuing outputs, it would be possible to identify quality dimensions and make adjustments integrating relevant indicators of learning, preparation for subsequent course work, job readiness, and income effects. Even with full information, weights that would be applied to these characteristics would still entail subjective assessments. We emphasize, explicitly and unapologetically, that adjusting degrees or otherwise defined units of higher education output by a quantitative quality index is not feasible at the present time. However, it is possible to begin dealing with the problem through classification of institutions by type and mission and then, as described in Chapter 4, by seek- ing to assure that quality within each segment is being regularly assessed and at least roughly maintained. When considering a productivity metric focusing on instruction, objectively measurable outputs such as credit hours earned and number of degrees granted represent the logical starting point; however, the quality problem arises almost immediately since these can be expected to differ across courses, programs, and institutions. While universities often use credit hours as a measure of the im- portance and difficulty of each course, these quality adjustments are incomplete because they do not reflect the full course value to students (and to society). 25Ehrenberg (2012) finds the presence of differential tuition by major or year in a program to be quite widespread in American public higher education, reflecting differences in the cost of provid- ing education in different fields (or levels) or the expected private return to education in the field or year in the program. For example, among four-year public institutions offering primarily bachelor's degrees, 23 percent have differential tuition by college or major. The University of Toronto has a dif- ferential tuition price policy that makes the expected relative public to private benefit of the degree one of the criteria for determining the level of public subsidy vs. private tuition (see http://www. governingcouncil.utoronto.ca/policies/tuitfee.htm [June 2012]). From an economic perspective, it makes sense to base tuition on the expected value of the major and the costs. One practical problem is that lower income students may be excluded from majors with high costs but high return. In addi- tion, the needs of states may call for training students in areas that are high cost but provide limited economic return to students. Another problem is that the policy could lead to the production of the wrong kind of degrees over different time frames (for example, swings in demand for nurses). It is understandable why states may do this, but the policy may emphasize costs of degrees over students' interests. On the other hand, if infinite cross-subsidization is not endorsed or feasible, and if cross subsidies have gone beyond what is seen as a reasonable level, then students may be required to bear a larger portion of costs.

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 51 Economic Returns Market-oriented assessments of educational output, with attention to how salary effects vary by area of study and by institutional quality, have been ex- plored in the economics literature.26 Some studies attempt to assess evidence on the relationship between college cost and quality in terms of student and insti- tutional performance. Beyond wage outcomes, indicators have reflected number of graduates who find a job within a given period after graduation; surveys of alumni satisfaction with their education; surveys of local business communities' satisfaction with the university's role in providing skilled workers; percentage of students taking classes that require advanced work; and number of graduates going on to receive advanced degrees. Despite work in this area, many tough issues remain even if the goal is to estimate only the economic returns to education. How can wage data best be used for such calculations, and what is most important: first job, salary five years out, or discounted lifetime earnings?27 Furthermore, intergenerational and business cycle effects and changing labor market conditions cause relative wages to be in constant flux. Perhaps most importantly, student characteristics, demographic heterogeneity, accessibility and opportunities, and other factors affecting earn- ings must be controlled for in these kinds of economic studies. The reason full quality adjustment of the output measure is still a futuristic idea is that much research is still needed to make headway on these issues. The literature certainly offers evidence of the effects of these variables, but using precise coefficients in a productivity measure requires a higher level of confidence than can be gleaned from this research. Student Learning Beyond measures of credits and degrees produced, and their associated wage effects, is the goal of measuring the value added of student learning.28 The motivation to measure learning is that the number of degrees or credit hours completed is not, by itself, a complete indicator of what higher education pro- duces. That is, earning a baccalaureate degree without acquiring the knowledge, skills, and competencies required to function effectively in the labor market and in society is a hollow accomplishment. Indicators are thus needed of the quality of the degree represented by, for example, the amount of learning that has taken 26See, for example, Pascarella and Terenzini (2005), Shavelson (2010), and Zhang (2005). 27Estimating lifetime earnings would introduce long lags in the assessments as some evidence sug- gests that the most quantitatively significant wage effects do not take effect until 8 to 10 years after undergraduate degree. 28Not only is learning difficult to measure in its own right, it would be challenging to avoid double counting with wage effects (assuming those who learn most do best in the job market).

OCR for page 37
52 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION place and of its post-college value (represented by income, occupational status, or other measure) beyond that attributable to the certificate or degree itself. Ignoring measures of learning outcomes or student engagement (while, per- haps, emphasizing graduation rates) may result in misleading conclusions about institutional performance and ill-informed policy prescriptions. Is it acceptable for a school to have a high graduation rate but low engagement and outcomes scores? Or are individual and public interests both better served by institutions where students are academically challenged and demonstrate skills and compe- tencies at a high level, even if fewer graduate? Strong performance in the areas of engagement, achievement, and graduation are certainly not mutually exclusive, but each says something different about institutional performance and student development. One conclusion from Pascarella and Terenzini's (1991, 2005) syn- theses is that the impact of college is largely determined by individual student ef- fort and involvement in the academic, interpersonal, and extracurricular offerings on a campus. That is, students bear a major responsibility for any gains derived from their postsecondary experience. Motivation is also a nontrivial factor in ac- counting for post-college differences in income once institutional variables such as selectivity are controlled (Pascarella and Terenzini, 2005). A number of value-added tests have been developed over the years: the Mea- sure of Academic Proficiency and Progress (MAPP) produced by the Educational Testing Service, Collegiate Assessment of Academic Proficiency (CAAP) pro- duced by the ACT Corporation, and the Collegiate Learning Assessment (CLA) designed by RAND and the Council for Aid to Education. The CLA is most specifically designed to measure valued added at the institutional level between the freshman and senior years.29 This kind of quality adjustment is desirable at the level of the institution or campus for purposes of course and program improve- ment, but is unlikely to be practical anytime soon for the national measurement of productivity in higher education. It is beyond the scope of this panel's charge to resolve various longstanding controversies, such as using degrees and grades as proxies for student learning versus direct measures of learning as represented by MAPP, CAPP, and CLA. Nonetheless, it is important to work through the logic of which kinds of measures are relevant to which kinds of questions. 30 The above kinds of assessments show that even identical degrees may repre- sent different quantities of education produced if, for example, one engineering graduate started having already completed Advanced Placement calculus and physics while another entered with a remedial math placement. Modeling ap- proaches have been developed to estimate time to degree and other potentially 29The Voluntary System of Accountability (VSA), which has been put forth by a group of public universities, is a complementary program aimed at supplying a range of comparable information about university performance, but it is less explicitly linked to a notion of value added by the institution. Useful discussions of the merits of assessment tests are provided in Carpenter and Bach (2010) and Ewell (2009a). 30See Feller (2009) and Gates et al. (2002).

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 53 relevant indicators of value-added learning outcomes and student engagement (Kuh et al., 2008). These take into account entering student ability as represented by pre-college achievement scores (ACT, SAT) and prior academic performance, other student characteristics such as enrollment status (full- or part-time), transfer status, and financial need (Wellman, 2010). Popular proxies for institutional quality such as rankings are flawed for the purpose of estimating educational productivity. The major limitation of most rankings and especially that of U.S. News & World Report is they say almost nothing about what students do during college or what happens to them as a result of their attendance. As an illustration of the limitations of most ranking systems, only one number is needed to accurately predict where an institution ranks in U.S. News & World Report: the average SAT/ACT score of its enrolled students (Webster, 2001). The correlation between U.S. News & World Report's rankings (1 = highest and 50 = lowest) and institutional average SAT/ACT score of the top 50 ranked national universities was 0.89 (Kuh and Pascarella, 2004). After taking into account the average SAT/ACT score, the other indices included in its algorithm have little meaningful influence on where an institution appears on the list. This is not to say that selectivity is unrelated to college quality. Peers sub- stantially influence students' attitudes, values, and other dimensions of personal and social development. Being in the company of highly able people has salutary direct effects on how students spend their time and what they talk about. Hoxby (1997, 2000, 2009) has quantified the returns to education and shown that the setting of highly selective schools contributes to the undergraduate education of at least some subsets of students. More recently, Bowen, Chingos, and McPherson (2009) present evidence that institutional selectivity is strongly correlated with completion rates, controlling for differences in the quality and demographics of enrolled students as well as factors such as per student educational expenditures. The authors argue that students do best, in terms of completion rates, when they attend the most selective schools that will accept them, due in part to peer effects. A related point, also documented in Bowen, Chingos, and McPherson (2009:198ff), is that productivity is harmed greatly by "undermatching"--the frequent failure of well-prepared students, especially those from poor families, to go to institutions that will challenge them properly. Hoxby (1997, 2009) also shows that improved communications and other factors creating national markets for undergraduate education have improved the "matching" of students to institu- tions and thereby improved outcomes.31 At the same time, research shows that other factors are important to de- sired outcomes of college. These include working collaboratively with peers 31Lpez Turley, Santos, and Ceja (2007) have also studied "neighborhood effects," such as the impact on education outcomes of low-income Hispanics locked into local areas due to family or work concerns.

OCR for page 37
54 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION to solve problems, study abroad opportunities, service learning, doing research with a faculty member, and participating in learning communities (Pascarella and Terenzini, 2005). Longitudinal data from the National Study of Student Learn- ing and cross-sectional results from the NSSE show that institutional selectiv- ity is a weak indicator of student exposure to good practices in undergraduate education--practices such as whether faculty members clearly articulate course objectives, use relevant examples, identify key points, and provide class outlines (Kuh and Pascarella, 2004). These kinds of practices and experiences are argu- ably much more important to college quality than enrolled student ability alone. In other words, selectivity and effective educational practices are largely in- dependent, given that between 80 to 100 percent of the institution-level variance and 95 to 100 percent of the student-level variance in engagement in the effective educational practices measured by NSSE and other tools cannot be explained by an institution's selectivity. This is consistent with the substantial body of evidence showing that the selectivity of the institution contributes minimally to learning and cognitive growth during college (Pascarella and Terenzini, 2005). As Pascarella (2001:21) concluded, Since their measures of what constitutes "the best" in undergraduate education are based primarily on resources and reputation, and not on the within-college experiences that we know really make a difference, a more accurate, if less mar- ketable, title for [the national magazine rankings] enterprise might be "America's Most Advantaged Colleges." Other measures of educational quality are worth considering, given the increasing diversity of college students and their multiple, winding pathways to a baccalaureate degree. These could include goal attainment, course reten- tion, transfer rates and success, success in subsequent course work, year-to-year persistence, degree or certificate completion, student and alumni satisfaction with the college experience, student personal and professional development, stu- dent involvement and citizenship, and postcollegiate outcomes, such as graduate school participation, employment, and a capacity for lifelong learning. Measures of success in subsequent coursework are especially important for students who have been historically underrepresented in specific majors and for institutions that provide remedial education. Participation in high-impact activities--such as first-year seminars, learning communities, writing-intensive courses, common intellectual experiences, service learning, diversity experiences, student-faculty research, study abroad, internships and other field placements, and senior cap- stone experiences--might also be useful indicators of quality, as they tend to be associated with high levels of student effort and deep learning (Kuh, 2008; Swaner and Brownell, 2009). The two most relevant points for thinking about how to introduce explicit quality adjustment into higher education output measures may be summarized as follows:

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 55 1. Research on student engagement and learning outcomes is promising. This area of research has established a number of high-impact educa- tional practices and experiences. Even where direct measures of student learning are not available, the existence of these practices could be used as proxies in evaluating the quality of educational experience reflected in a given set of degrees or credit hours. This kind of evidence, even if it cannot currently be directly included in a productivity measure (such as that developed in Chapter 4) due to data or conceptual limitations, can be considered in a comprehensive performance evaluation of an institution, department, or system. 2. Even the best statistical models that show institutional differences in the quality and quantity of education produced rarely allow for meaningful discrimination between one institution and another. Studies by Astin (1993), Kuh and Pascarella (2004), and Pascarella and Terenzini (2005) show that institutions do matter, but individual student differences mat- ter more. Once student characteristics are taken into account, significant effects for institutions still exist, though the difference between any two given institutions, except for those at the extreme ends of the distribu- tion, will often be small. 3.5. MEASUREMENT AT DIFFERENT LEVELS OF AGGREGATION Adding to the complexity of productivity measurement is the fact that vari- ous policy and administrative actions require information aggregated at a number of different levels. Institution and state level measures are frequently needed for policy and are relevant to the development of administrative strategies. A major motivation for analyzing performance at these levels is that policy makers and the public want to know which institutions and which systems are performing better and how their processes can be replicated. Prospective students (and their parents) also want to know which institutions are good values. As we have repeatedly pointed out, for many purposes, it is best to compare institutions of the same type. 3.5.1. Course and Department Level A course can be envisioned as the atomistic element of learning production, and the basic building block of productivity measurement at the micro level. For example, this may be expressed as the number of semester credits produced from a given number of faculty hours teaching. However, increasingly, courses themselves can be broken down further to examine quantitative and qualitative aspects within the course or classroom unit (Twigg, 2005). Classroom technology is changing rapidly. The introduction of scalable technologies is important, as are the effects of class size and technology. The technology of how education is de- livered across and within categories (disciplines, institutions, etc.) varies widely.

OCR for page 37
56 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION Flagship state universities often have big classes while private colleges often have smaller ones. The latter is almost certainly more expensive on a per unit of output basis; less is known about the quality of the outcome. Those students that can make college choices based on tradeoffs in price and perceived quality offered by the range of options. Adding to the complexity is the faculty mix, including the use of graduate student instructors or adjunct faculty. This may also affect cost and quality of delivering credit hours. A growing body of research and an increasing number of programs assess efficiencies at the course level seeking cost, quality tradeoffs that can be ex- ploited. For example, the National Center for Academic Transformation (NCAT) develops programs for institutions to improve efficiency in production of higher education through course redesign.32 In the NCAT model, the redesign addresses whole courses (rather than individual classes or sections) to achieve better learn- ing outcomes at a lower cost by taking advantage of information technologies. Course redesign is not just about putting courses online, but rather rethinking the way instruction is delivered in light of the possibilities that technology offers. NCAT reports that, on average, costs were reduced by 37 percent in redesigned courses with a range of 9 to 77 percent. Meanwhile, learning outcomes improved in 72 percent of the redesigned courses, with the remaining 28 percent producing learning equivalent to traditional formats. Appendix B to this volume provides a description of how NCAT measures comparative quality and cost of competing course design models. For some purposes, an academic department or program is a more appropri- ate unit of analysis.33 This is because input costs as well as output valuations that markets, societies, and individuals place on various degrees vary by majors or academic field.34 Collecting physical input and output data that can be associ- ated with specific departments or fields of study within an institution provides maximum flexibility as to how the production function will actually be organized, and also provides the data needed for productivity measurement. Despite these advantages, department-based analysis is inappropriate for determining sector-based productivity statistics. One difficulty is that it is not easy to compare institutions based on their departmental structures. What counts 32NCAT is an independent, not-for-profit organization dedicated to the effective use of informa- tion technology to improve student learning outcomes and reduce costs in higher education. Since 1999, NCAT has conducted four national programs and five state-based course redesign programs, producing about 120 large-scale redesigns. In each program, colleges and universities redesigned large-enrollment courses using technology to achieve quality enhancements as well as cost savings. Participating institutions include research universities, comprehensive universities, private colleges, and community colleges in all regions of the United States. 33Massy (2010) presents one effort to systematize the course substructure (using physical rather than simply financial quantities) for purposes of aggregation. 34See DeGroot et al. (1991) and Hare and Wyatt (1988) for estimates of cost/production functions for university research-graduate education.

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 57 as a department in one institution may be two departments in another or simply a program in a third. This is one reason why IPEDS does not require institutions to specify faculty inputs and other expenditures by department. A framework that tracks students through fields of study has an analytical advantage over a framework that uses department as the unit of analysis when the concern is interactive effects. For example, the quality (and possibly quantity) of output associated with a labor economics class (in which students must write em- pirical research papers) clearly depends upon what they learn in their introductory statistics classes. Thus, one department's productivity is inherently linked to an- other's. While institutions allocate resources at the department level, productivity analysis can be enhanced by creating a representative student in various majors that captures all coursework in all departments. As discussed in Chapter 6, such an approach would require an extension of data collection capabilities, which possibly could be integrated with the IPEDS data system.35 3.5.2. Campus Level A major source of demand for performance measures is to inform rank- ings and provide accountability, generally at the campus level. This aggregation level is analogous to productive units, such as automobile plants or hospitals, frequently monitored in other sectors. It is a logical starting place for many key applications of productivity measurement as it is easier to think of the practical value (ideas for improving efficiency) at this level than at the state or higher levels of aggregation--at least in terms of production processes. Certainly there is value for a university to track its productivity over time. Of course, campus level productivity measurement invites inter-institution comparisons as well. We discussed earlier how heterogeneity of inputs and out- puts requires segmentation by institutional type. It is not obvious exactly how many categories are needed to make groups of institutions sufficiently homo- geneous so that productivity calculations are meaningful. As a starting point for defining and classifying institutional types, we can use basic categories consistent with the Carnegie Classification of Academic Institutions: credit hours not resulting in a degree (continuing education); community colleges providing associate's degrees certificates and the possibility of transferring to a four-year college; colleges granting bachelor's degrees; colleges and universities granting master's degrees; and universities granting doctorates. 35The UK's Higher Education Funding Council for England collects cost and output data by field of study.

OCR for page 37
58 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION Within an institutional category, it makes more sense to compare costs and out- comes across campuses. Measurement problems associated with heterogeneity of inputs and outputs are dampened when factors such as percentages of students in particular programs remain constant over time. However, even within categories of higher education institutions, characteristics vary and multiple functions are performed. For example, a university system (such as Florida's) may enjoy high four-year graduation rates in part due to strict requirement that less well-prepared students attend two-year institutions. Even so, segmenting the analysis by in- stitutional type seems to be a prerequisite to accurate interpretation of various performance and cost metrics. It is also worth noting that data collection at the campus level is simpler than it is at the course or department level. Aggregation at the campus level consoli- dates the effects of out-of-major courses and does not require allocating central services and overheads among departments. Reporting data at the campus level that is potentially useful for productivity measurement does not require weighting departmental inputs and outputs. Estimating total labor hours for the campus as a whole is equivalent to summing the hours for the individual departments, but the data collection process is much simpler. Summing student credit hours and awards also is straightforward although, as discussed in Chapter 4, a complication arises when linking enrollments to degrees by field. 3.5.3. State or System Level For some purposes, it is useful to have productivity statistics at state, multi- campus system, or even national levels (see Box 3.2). For example, there have been efforts to develop state-by state "report cards" for tracking higher education outcomes, such as those reflected in student learning or skills assessments (Ewell, 2009). Additionally, as we discuss in the recommendations chapters, sometimes it makes sense to follow students at the state level so that events such as inter- institution transfers and measures such as system wide completion rates can be tracked. One approach for generating state-level data is to aggregate the campus-level productivity measures described earlier. For example, if a system has a research- university campus, several baccalaureate campuses, and a two-year campus, productivity statistics could be calculated for each campus and compared with the averages for the segment into which the campus falls. An overall figure for the system or state could be obtained by aggregating the campus statistics. Thought will need to be given to the weights used in the aggregation but, in principle, the problem does not appear to be unsolvable.

OCR for page 37
WHY MEASUREMENT OF HIGHER EDUCATION PRODUCTIVITY IS DIFFICULT 59 BOX 3.2 Macro or Sector Level Accounting The U.S. statistical agencies do not currently produce a measure of educa- tion sector productivity, although some components of such a measure are avail- able. The Bureau of Economic Analysis (BEA) produces several nominal and real higher education consumption measures. The National and Income and Product Accounts (from which the nation's gross domestic product statistics are estimated) include real and nominal measures for education personal consumption expen- ditures (PCE) and for education consumption expenditures aross all government levels. The PCE tables include expenditures for books, higher education school lunches, and two other expenditure categories: (1) nonprofit private higher educa- tion services to households and (2) proprietary and public education. The nominal value of these two components is deflated by the BLS CPI-U college tuition and fees price index to produce an inflation-adjusted measure. The nominal value for gross output of nonprofit private higher education services to households is deflated by an input cost-based measure, which is a fixed weight index. This input cost-based deflator is constructed from BLS Quarterly Census of Employment and Wages, PPI, and CPI data. Although BEA measures the nominal value of educa- tion national income components such as wages and salaries and gross operating surplus (profits, rents, net interest, etc.), it does not produce real measures of these education input income components. Accordingly, BEA data would have to be supplemented with other data to create a measure of education productivity. Beyond the United States, a mandate from Eurostat motivated European Union members and others to undertake research on how to measure education output and inputs. In the United States, most of this kind of research has focused on elementary and secondary education. In the United Kingdom, debate about how to measure government output, including education, resulted in the formation of the Atkinson Commission. However, though there have been calls to do so, no consensus has been reached about how to measure the real output of education independently from inputs. A different approach to understanding higher education productivity would be to look at more indirect measures. One possibility is to use information such as that slated to be released in 2013 by the Programme for the International As- sessment of Adult Competencies (PIAAC) of the OECD. PIAAC will assess adults' literacy and numeracy skills and their ability to solve problems in technology-rich environments. It will also collect a broad range of information from the adults tak- ing the survey, including how their skills are used at work and in other contexts such as in the home and the community. Ideally, in addition to educational at- tainment, information on college major, previous work experience, and the dates and types of higher education institutions attended is desired to estimate higher education productivity based on PIAAC-collected data. Accordingly, PIAAC and other skill-based surveys might be a better indicator of human capital, rather than higher education output or productivity.

OCR for page 37
60 IMPROVING MEASUREMENT OF PRODUCTIVITY IN HIGHER EDUCATION 3.6. CONCLUSION In this chapter, we have described how measuring productivity in higher education is especially challenging relative to the simple textbook model. Joint production of multiple outputs, heterogeneous inputs and outputs, quality change over time, and quality variation across institutions and systems all conspire to add complexity to the task. In order to advance productivity measurement beyond its current nascent state, it is necessary to recognize that not all of the complexities we have catalogued can be adequately accounted for at least at the present time. The panel recognizes the difficulties of moving from the conceptual level of analysis (Chapters 1-3), which is surely the place to start, to empirical measurement recommendations. Like other economic measures in their incipient stages--such as GDP estimates and the national economic accounts on which they rest (particularly early on in their development)--new measures of higher education productivity will be flawed. Because the performance of the sector cannot be fully organized and sum- marized in a single measure, it becomes all the more important to bear the com- plexities in mind and to monitor supporting information, especially regarding the quality of output (e.g., student outcomes). Without this awareness, measures will surely be misused and improper incentives established. For example, the danger of incentivizing a "diploma mill," pointed out earlier, is real. Measuring performance is a precursor to developing reward structures that, in turn, incentiv- ize particular behavior. Here, we can only reiterate that the productivity measure proposed in Chapter 4--or any single performance metric for that matter--if used in isola- tion, will be insufficient for most purposes, particularly those linked to account- ability demands. For the most part, a productivity measure will not be of great use for improving performance at the institutional level. What is relevant is the question of whether being able to measure higher education productivity in the aggregate will produce a better policy environment, which may in turn lead to indirect productivity improvements over time.