The importance of productivity growth to an economy is widely recognized because the extent to which living standards can be improved over time depends almost entirely on the ability to raise the output of its workers.1 From the perspectives of individual industries and enterprises, gains in productivity are a primary means of offsetting increases in the costs of inputs, such as hourly wages or raw materials. Likewise, in higher education, productivity improvement is seen as the most promising strategy for containing costs in the continuing effort to keep college education as affordable as possible. Without technology-driven and other production process improvements in the delivery of service, either the price of a college degree will be beyond the reach of a growing proportion of potential students or the quality of education will erode under pressures to reduce costs.
In this environment, such concepts as productivity, efficiency, and accountability are central to discussions of the sustainability, costs, and quality of higher education. The discussion should begin with a clear understanding of productivity measures and their appropriate application, while recognizing that other related concepts (such as unit cost, efficiency measures, and the like) are also important and inform key policy questions.
At the most basic level, productivity is defined as the quantity of outputs delivered per unit of input utilized (labor, capital services, and purchased inputs).
1Average annual GDP growth for the United States, 1995-2005, was 3.3 percent. Estimates of the contributions of the various components of this growth (Jorgenson and Vu, 2009) are as follows: Labor quantity, 0.63; labor quality, 0.19; noninformation and communications (ICT) capital, 1.37; ICT capital, 0.48; total factor productivity (TFP) growth, 0.63. These figures indicate the importance of input quality and technology in per capita productivity gains.
The number of pints of blueberries picked or boxes assembled using an hour of labor are simple examples. Productivity, used as a physical concept, inherently adjusts for differences in prices of inputs and outputs across space and over time. While productivity measures are cast in terms of physical units that vary over time and across situations, efficiency connotes maximizing outputs for a given set of fixed resources.2 Maximizing efficiency should be the same as maximizing productivity if prices are set by the market (which is not the case for all aspects of higher education). Accountability is a managerial or political term addressing the need for responsibility and transparency to stakeholders, constituents, or to the public generally.
Application of a productivity metric to a specific industry or enterprise can be complex, particularly for education and certain other service sectors of the economy. Applied to higher education, a productivity metric might track the various kinds of worker-hours that go into producing a student credit hour or degree. The limitation of this approach is that, because higher education uses a wide variety of operational approaches, which in turn depend on an even wider variety of inputs (many of them not routinely measured), it may not be practical to build a model based explicitly and exclusively on physical quantities. Of even greater significance is the fact that the quality of inputs (students, teachers, facilities) and outputs (degrees) varies greatly across contexts.
A primary objective of industries, enterprises, or institutions is to optimize the efficiency of production processes: that is, to maximize the amount of output that is physically achievable with a fixed amount of inputs. Productivity improvements are frequently identified with technological change, but may also be associated with a movement toward best practice or the elimination of inefficiencies. The measurement of productivity presumes an ability to construct reliable and valid measures of the volume of an industry’s (or firm’s) output and the different inputs. Though productivity improvements have a close affinity to cost savings, the concepts are not the same. Cost savings can occur as a result of reduction in input prices, so that the same physical quantity of inputs can be purchased at a lower total cost; they are also attainable by reducing the quantity or quality of output produced. But, by focusing on output and input volumes alone, it becomes difficult to distinguish efficiency gains from quality changes. To illustrate, consider homework and studying. Babcock and Marks (2011) report that college students currently study less than previously. Assuming studying is an input to learning, does this mean that students have become more productive or now
2Kokkelenberg et al. (2008:2) write that: “Economists describe efficiency to have three aspects; allocative efficiency which means the use of inputs in the correct proportions reflecting their marginal costs; scale efficiency which considers the optimal size of the establishment to minimize long-run costs; and technical efficiency which means that given the establishment size and the proper mix of inputs, the maximal output for given inputs under the current technology is achieved.” It should be noted that that the productivity index approach, on its own, is unlikely to say much about optimal size and scale efficiency.
shirk more? Arum and Roksa (2010) argue that college students are learning less, implying the latter. But, without robust time series data on test results to verify student learning, the question remains unanswered.
Several different productivity measures are used to evaluate the performance or efficiency of an industry, firm, or institution. These can be classified as single-factor productivity measures, such as labor productivity (the ratio of output per labor-hour), or multi-factor productivity, which relates output to a bundle of inputs (e.g., labor, capital, and purchased materials). In addition, productivity can be evaluated on the basis of either gross output or value added. Gross output is closest to the concept of total revenue and is the simplest to calculate because it does not attempt to adjust for purchased inputs. Value added subtracts the purchased inputs to focus on the roles of labor, capital, and technology within the entity itself.
For most goods, labor is the single largest factor of production as measured by relative expenditures. Labor productivity is thus a commonly used measure. Labor productivity, however, is a partial productivity measure that does not distinguish between improvements in technology and the contributions of other productive factors. Thus, a measure of labor productivity based on gross output might rise due to the outsourcing of some activities or the improvement of capital used in the production process. In this instance, labor productivity would rise at the expense of additional purchased services or other inputs.
Conceptually, a multi-factor productivity measure based on gross output and changes in the volumes of all individual inputs provides greater insight into the drivers of output growth. It shows how much of an industry’s or firm’s output growth can be explained by the combined changes in all its inputs. Relative to labor productivity, construction of a multi-factor productivity measure imposes substantially greater requirements on data and estimation methods.
The construction of productivity measures requires quantitative estimates of the volume of outputs and inputs, excluding the effects of pure price changes while capturing improvements in quality. As a simple illustration, total revenues provide a measure of the value to consumers of an industry’s production, and the revenues of the individual types of good or services produced by the industry are deflated by price indexes and weighted together by their shares in total revenues to construct an index of output volume. Similar indexes are constructed for the volumes of the inputs. The volume indexes of the inputs are combined using as weights their shares in total income or costs. Alternatively, when feasible, quantities of outputs and inputs may be estimated without deflating expenditure totals when the physical units can be counted directly. The productivity measure is then obtained by dividing the index of output by the composite index of inputs.
After decades of discussion, research and debate, the concepts and methods
used to compute productivity within the market-based private economy have achieved widespread agreement and acceptance among economists, policy analysts, and industry specialists. However, comparable progress has not been made with respect to the measurement of productivity in education, and higher education in particular. Progress also has been slow—although perhaps not quite as slow—in a few other service sector industries, such as finance and health care, where outputs are also difficult to define and measure (Triplett and Bosworth, 2002). It is possible to count and assign value to goods such as cars and carrots because they are tangible and sold in markets; it is harder to tabulate abstractions like knowledge and health because they are neither tangible nor sold in markets.
Standard methods for measuring productivity were developed for profit- or shareholder value-maximizing firms engaged in the production of tangible goods. These methods may not be applicable, valid, or accurate for higher education, which is a very different enterprise. Traditional private and public colleges and universities are not motivated by or rewarded by a profit margin. Neither their output nor their prices are determined within a fully competitive market, and thus their revenues or prices (essentially tuition) are not indicative of the value of the industry’s output to society.3 The inputs to education are substantially similar to those of other productive sectors: labor, capital, and purchased inputs. Higher education is distinct, however, in the nature of its outputs and their prices. The student arrives at a university with some knowledge and capacities that are enhanced on the way to graduation. In this instance, the consumer collaborates in producing the product.
Second, institutions of higher education are typically multi-product firms, producing a mixture of instructional programs and research as well as entertainment, medical care, community services, and so on. For market-based enterprises, the production of multiple products raises manageable estimation problems. Outputs are combined on the basis of their relative contributions to revenue shares. This is a common feature of productivity analysis.
However, because research and classroom instruction are both nonmarket activities, there is no equivalent concept of revenue shares to combine the two functions. This greatly complicates the analysis and the possibility of deriving an overall valuation of an institution’s output. We have chosen to separate instruction from research (and other outputs), acknowledging the practical reality that an institution’s allocation of resources among its multiple functions is in part the result of forces and influences that are quite complex. For example, the value of research universities cannot be fully measured by their instructional contribution alone. Important interactions exist, both positive and negative, between research activities and the productivity of undergraduate instruction. On the positive side,
3Discounts for financial aid also complicate an institution’s value function. The interaction between institutional and consumer value upsets the association of price with value to consumers. Other price distortions are discussed throughout the report.
there is the opportunity for promising undergraduates to work alongside experienced faculty. On the negative side, there is the possibility that the growth of graduate programs detracts from commitments to undergraduate education. While these difficulties in allocating some inputs and outputs by function are very real, and clearly warrant investigation, a separate analyses of instruction and research seems most practical for now and is the approach pursued in Chapter 4.
In Chapter 3, we explore in more detail this and other complexities of measuring productivity in higher education.
Estimating productivity presumes an ability to define and measure an industry’s (or institution’s, or nation’s) output. Most systems of output measurement are estimated by deflating the revenues of individual product categories by indexes of price change. As noted above, if the products are sold in open competitive markets, producers will expand output to the point where the marginal revenues of individual products are roughly equal to their marginal costs. Thus, their revenue shares can be used as measures of relative consumer value or weights to combine the various product categories to yield an index of overall output. By focusing on the price trends for identical models, or by making adjustments to account for changing characteristics of products, price indexes can differentiate the price and quality change components of observed changes in overall prices.4
In some cases, the output of an industry might be based on physical or volume indicators such as ton miles moved by trucks or, in the case of education, the number of students in a course. Physical measures of output are difficult to aggregate if they differ in their basic measurement units, but methods have been devised to mitigate this problem.5 A greater challenge is that physical indicators generally miss quality change. While explicit quality adjustments can be included in the construction of a physical output index, it is difficult to know the weight to place on changes in quantities versus changes in quality. The role of quality change and other complications in defining and measuring output of higher education are addressed in detail in Chapters 3 and 4.
Higher education qualifies graduates for jobs or additional training as well as increasing their knowledge and analytic capacities. These benefits of undergraduate, graduate and professional education manifest as direct income effects, increased social mobility, and health and other indirect effects. Measures have been created to monitor changes in these outputs, narrowly defined: numbers
4The complete separation of price and quality change continues to be a major challenge for the creation of price and output indexes. It is difficult to incorporate new products into the price indexes in a timely fashion, and price and quality changes are often intermingled in the introduction of new models. See National Research Council (2002).
5The Törnqvist index used in Chapter 4 uses percentage changes, which takes care of the dimensionality problem.
of degrees, time to degree, degree mix, and the like. Attempts have also been made to estimate the benefits of education using broader concepts such as the accumulation of human capital. For estimating the economic returns to education, a starting point is to examine income differentials across educational attainment categories and institution types, attempting to correct for other student characteristics. Researchers since at least Griliches (1977), Griliches and Mason (1972), and Weisbrod and Karpoff (1968) have estimated the returns to education, controlling for students’ cognitive ability by including test score variables in their wage regressions.
Researchers have also examined the impact of socioeconomic status (SES) variables on the returns to education, but the results are somewhat ambiguous. Carneiro, Heckman, and Vytlacil (2010) show that marginal returns to more college degrees are lower than average returns due to selection bias. That is, returns are higher for individuals with characteristics making them more likely to attend college than for those for whom the decision is less clear or predictable. This carries obvious implications for policies designed to increase the number of college degrees produced by a system, region, or nation. On the other hand, Brand and Xie (2010) predict that the marginal earnings increase attributable to holding a degree is actually higher for those born into low socioeconomic status (relative to the higher SES group more likely to select into college) because of their lower initial earnings potential with or without a degree. Dale and Krueger (2002:1491) also found that the “payoff to attending an elite college appears to be greater for students from more disadvantaged family backgrounds.” Davies and Guppy (1997) found that socioeconomic factors do not affect chances of entry into lucrative fields net of other background factors, but SES predicts entry into selective colleges and lucrative fields within selective colleges. Establishing the values of degrees generally or of degrees in specific fields—as done by Carnevale, Smith, and Strohl (2010) and Trent and Medsker (1968)—involves estimating the discounted career cost (controlling for selection effects) of not attending college at all. To some extent this line of research has been stunted by the characteristics of available data; many cohort studies have been flawed in not properly including aging effects, not asking about attainment, or not extending for a long enough time period. Such features are important for estimating returns.6 As a result, the evidence for evaluating the magnitude of differences in outcomes of those who attain higher education and those who do not is surprisingly mixed.
One limitation of the above-described approaches is that the rate of return on various degrees, and college in general, varies over time with labor market
6For example, the National Center for Education Statistics (NCES) “High School and Beyond” study looked at a large number of students (more than 30,000 sophomores and 28,000 seniors enrolled in 1,015 public and private high schools across the country participated in the base year survey), but did not follow individuals long enough. The NCES National Education Longitudinal Study (NELS) repeated the error by not asking about final degree attainment; the Education Longitudinal Study (ELS) is still following cohorts and may offer a very useful data source.
conditions, independent of the quality of the degree or credits earned. Adding to the supply of graduates tends to lead to a reduction in the wage gap between more and less educated workers, an effect that may be strengthened if the expansion causes educational quality to fall. Another modeling consideration is that, due to market pressures, students may enroll in majors with high projected returns. An increased supply of graduates in those fields should lead to downward pressure on wages. The ebb and flow in the demand for and starting wages paid to nurses is a good example.7 Nonetheless, wages do provide at least one quantifiable measure, but one that needs regular updating.
Even when research on wages relative to educational attainment is conducted properly, it cannot tell the whole story. The overall returns to instruction (learning) and production of degrees are broader than just the pecuniary benefits that accrue to the degreed individuals. It is a mistake to view the purpose of higher education as solely to increase gross domestic product (GDP) and individual incomes.8 Some of the nation’s most important social needs (e.g., teaching, nursing) are in fields that are relatively low paying. When the focus is on incomes after graduation, a system or institution that produces more credentialed individuals in these socially important but low-paying fields will appear less productive than an institution that produces many highly paid business majors. This would be a false conclusion. Moreover, using lifetime earnings as a measure of productivity and then tying public support for institutions to this measure in effect restricts the educational and career choices of individuals who, capable of entering either, knowingly choose lower paying over higher paying occupations. In Chapter 3, we examine the implications of looking more broadly at the benefits—private and public, market and nonmarket, productive and consumption—produced by higher education.
Having established that productivity relates the quantity of output to the inputs required to produce it, it is evident that correct measurement requires identifying all inputs and outputs in the production process. Economists frequently categorize inputs into the factors of production:
- Labor (e.g., professors, administrators)
- Physical and financial capital (e.g., university buildings, endowments)
- Energy (utilities)
- Materials (e.g., paper, pens, computers if not capitalized)
7Data on graduates’ wages do allow students to make informed decisions, so would be useful to students as a resource, and to administrators for resource allocation.
8That said, one of the most carefully studied “externalities” of higher education is its role in economic growth—see Card (1999) and Hanushek and Kimko (2000).
- Service inputs (e.g., use of outside payroll, accounting, or information technology [IT] firms)
Here, we review the role of each of these inputs.
In most simple measures of labor productivity, the quantity of labor is defined by the number of hours or full-time equivalent workers. Left at this, the measure suffers from the assumption that all workers have the same skills and are paid equivalent wages. This is clearly not true, and can only be maintained in situations where changes and variation in the skill level of the workforce are known to be small.
One means of adjusting for quality is to disaggregate the workforce by various characteristics, such as age or experience, education, occupation, and gender. In competitive labor markets, it is assumed that workers of each skill characteristic will be hired up to the point where their wage equals their contribution to marginal revenue. The price of labor is measured by compensation per hour; hence, labor inputs of different quality are aggregated using as weights their relative wage rates or, alternatively, using the share of each type of labor in total labor compensation. In this respect, the aggregation of the labor input is comparable to the aggregation of individual product lines to arrive at an estimate of total output.
Relative to other sectors, the problem of measuring labor inputs differs only marginally for higher education since, even if higher education is largely a nonmarket activity, its workforce must be drawn from a competitive market in which faculty and other employees have a range of alternatives. Some faculty members are protected by tenure; however, similar issues of seniority and job protection arise in other industries and the differences are generally ones of degree.9 Despite these similarities, however, it may be desirable to differentiate among the labor categories of teachers as discussed in Chapters 3 and 4.
Another complication arises at research-based institutions. For these institutions, the time and cost of faculty and administrative personnel must be divided between research and instruction.10 One approach might rely on time-use studies to develop general guidelines on the number of instructional hours that accompany an hour of classroom time, although time required for student consultations and grading may vary with class size. Furthermore, there are so many different kinds of classes and teaching methods that it is not practical to associate hours for
9The role of tenure in education is often comparable to various union pressures for seniority and other forms of job protection.
10Some problems of using wage rates to adjust for the quality of faculty teaching may arise in research-based institutions where the primary criteria for promotion and tenure reflect research rather than teaching skills.
A Note on Student Time Inputs
A fully specified production function for higher education might include student time as an input. Given the panel’s charge, this area of measurement is not a high priority; however, the student time input, if defined as the number of hours spent in school-related activities multiplied by an opportunity cost wage rate, would be substantial (see National Research Council, 2005, for a discussion of how to deal with nonmarket time valuations within an economic accounting framework). It can be difficult to establish opportunity cost wages when students are subsidized. For example, during periods or in places characterized by high unemployment, a federal Pell grant is a good substitute for a job.
For our purposes, we acknowledge that unpaid student time is a relevant input to the production function (though Babcock and Marks, 2011, find students are studying less). Nonetheless, little would be gained for policy purposes by including it in productivity measures. For applications where this kind of information is important, researchers can turn to the Bureau of Labor Statistics’ American Time Use Survey, which includes data on study time by students.
specific labor categories with credit hours, even if variation in class size could be handled. As discussed below, we therefore believe the best approach is to allocate inputs among output categories using a more aggregate approach.
Finally, the student’s own time and effort is a significant input to the educational process (see Box 2.1). While there has been debate about whether student effort should be treated as an input or an output, the emergent field of service science moots the question by recognizing that the process of consuming any service (including education) requires the recipient to interact with the provider during the production process and not only after the process has been completed as in the production of goods.11 This phenomenon is called coproduction. As applied to higher education, it means student effort is both an input and an output. This is consistent with the view that a primary objective of a university is to encourage strong engagement of students in their own education. Equally fundamental, institutions of higher education service a highly diverse student population, and many institutions and programs within those institutions have devoted great effort to sorting students by ability. In the absence of information about the aptitude levels of incoming students, comparing outcomes across institutions and programs may not provide a useful indication of performance.
11See, for example, Sampson in Maglio, Kieliszewski, and Spohrer (2010:112).
The major feature of capital is that it is durable and generates a stream or flow of services over an extended period. Thus, the contribution of capital to production is best measured as a service or rental flow (the cost of using it for one period) and not by its purchase price. Because many forms of capital cannot be rented for a single production period, the rental or service price must be imputed. This is done by assuming that a unit of capital must earn enough to cover its depreciation and a real rate of return comparable to similar investments. The depreciation rate is inversely proportionate to the asset’s expected useful life, and the rate of return is normally constant across different types of capital.12 Short-lived capital assets can be expected to have a higher rental or service price because their cost must be recovered in a shorter period. These rental rates are comparable to a wage rate and can be used in the same way to aggregate across different types of capital services and as a measure of capital income in aggregating the various inputs to production.
The role of capital in the measurement of productivity in higher education is virtually identical to that for a profit-making enterprise. Assets are either purchased in markets or valued in a fashion similar to that in the for-profit sector. Thus, the standard measurement of capital services should be appropriate for higher education. The education sector may exhibit a particular emphasis on information and communications capital because of the potential to use such tools to redesign the education process and by doing so to achieve significant productivity gains. The more significant problem at the industry level is that there is very little information on the purchases and use of capital in higher education. The sector is exempt from the economic census of the U.S. Census Bureau, which is the primary source of information for other industries. However, the Internal Revenue Service Form 990 returns filed by nonprofit organizations do contain substantial financial information for these organizations, including data on capital expenditures and depreciation.
Energy, Materials, and Other Purchased Inputs
Productivity measures require information on intermediate inputs either as one of the inputs to the calculation of multi-factor productivity or as a building block in the measurement of value added. In some measures, energy, materials, and services are identified separately. Such a disaggregation is particularly useful in the calculation of meaningful price indexes for purchased materials. In the past, the lack of significant information on the composition of intermediate inputs was a significant barrier to the calculation of productivity measures for many
12The rental rate is measured as a proportion of the replacement cost of a unit of capital or Pk (r + d), where Pk is the replacement cost, r is the real rate of return, and d is the depreciation rate.
service industries. Lack of relevant information on purchased inputs continues to be a major shortfall for estimating productivity in higher education. This kind of data is particularly important for analyses attempting to control for the effects of the outsourcing of some service activities. As with capital, the primary problem in measuring the role of purchased inputs in higher education is the lack of a consistent reporting system. The information is known at the level of individual institutions, but there is no system for collecting and aggregating the data at the national level for the purpose of establishing performance norms.
For the purposes of this report, it is essential to distinguish inputs and outputs along functional lines. In particular, an effort should be made to identify the inputs that go into each of the multiple outputs produced by the sector. These inputs can be designated:
- Instructional, including regular faculty, adjunct faculty, and graduate student instructors.
- Noninstructional and indirect costs including, for example, administration, athletics, entertainment, student amenities, services, hospital operation, research and development, student housing, transportation, etc.13 Some of these are budgeted separately.
- Mixed, including other capital such as instructional facilities, laboratory space and equipment, and IT. The best way to distribute the cost of such inputs across instructional, administrative, and research categories is not often clear.
In the model presented in Chapter 4, we attempt to identify all the inputs associated with the instruction function, while recognizing the difficulty of separating instructional and noninstructional costs or inputs. The main concern is to distinguish inputs associated with instruction from those designated for research. As faculty are involved in a range of activities, it is difficult to assign their wages to one category or another.
Instructional costs can also vary greatly. On the faculty side, per unit (e.g., course taught) instructional costs vary by field, institution, and type of instructor. On the student side, per-unit instructional costs vary by student level—undergraduate, taught postgraduate, and research students; mode of attendance—full- versus part-time students (the cost of student services varies by
13See Webber and Ehrenberg (2010).
mode of attendance even if the teaching cost per credit hour does not14); and field of study, with business and the humanities costing less than science and engineering, which in turn cost less than medicine. At the institutional level, costs can be subject to large-scale activity-based costing studies. Costs can also be disaggregated to the department level. Because the panel’s interests and charge focus primarily on groups of institutions of different types and within different states, our recommendations do not emphasize detailed breakdowns of costs at the student level. Nevertheless, some way of controlling for these variations will be essential to ameliorate significant distortions and criticisms.
For administrative and other purposes, universities typically track inputs along other dimensions, such as by revenue source. For our purposes, the only reason for classifying inputs according to revenue source is to separate the inputs associated with organized research and public service as described in Chapter 4. University accounting systems assign costs to funds. This practice tends to differentiate among payers, but obfuscates productivity unless specific outputs also are assigned to the fund. Differentiating inputs among payers departs from the idea of productivity as an engineering concept relating physical inputs and outputs. Further, not all revenues are fungible; they cannot all be used to increase production of undergraduate degrees (Nerlove, 1972).
Higher education costs may also be identified and categorized according to their source:
- institutional funds such as gifts and endowments;
- public-sector appropriations, including state, local, and federal government subsidies and financial aid;
- tuition and fees from students and their families (and note that some factors affect costs to specific payers but not overall cost; cost to university may also differ from total cost); and
- sponsored research.
For some policy purposes it is important to distinguish between trends in tuition and trends in cost per full-time equivalent (FTE) student. Some analyses dispute the common notion that the cost of higher education is rising faster than consumer prices broadly; rather, the composition of who pays is changing. Even when the total cost of a college education is relatively stable, shifts occur in the proportions paid by different players and what activities the revenues support.
McPherson and Shulenburger (2010) highlight the important difference between cost and price. In simple economics terms, the cost, or supply schedule, is based on an underlying production function. Productivity improvements shift the
14Mode of attendance may affect the relationship between transcript and catalog cost measures. For example, part-time students may take more courses or repeat courses because of scheduling problems or less efficient sequencing and thus learning (Nerlove, 1972).
cost schedule downward, with (other things being equal) attendant reductions in price and increases in quantity demanded for a given demand schedule. The full price of undergraduate education (determined by both the demand and supply functions) is the sum of tuition charges, campus subsidies, and state subsidies. Affordability and access thus depend on state appropriations as much as they depend on changes in productivity. For example, if an increase in productivity occurs simultaneously with a reduction in state appropriations, price to student (tuition) may not fall; it may even rise depending on relative magnitudes.
In the same vein, it is important to highlight differences between public and private higher education, as has been done by the Delta Cost Project (2009). Tuition increases in private higher education invariably are associated with increased expenditures per student.15 In marked contrast, tuition increases in public higher education often are associated with decreases in expenditures per student as the tuition increases often only partially offset cutbacks in state support.
Dozens of metrics have been created to serve as proxies for productivity or as indicators to inform accountability programs and to track costs and outcomes.16 Beyond productivity as defined above, measures of efficiency and cost are other performance metrics with policy value. While there are certainly appropriate uses for a variety of measures, there are also dangers of misuse, such as the creation of perverse incentives. For example, if degrees granted per freshman enrolled was used to track performance, then institutions could enroll large numbers of transfer students to improve their standing. Our review of various measures below informs our recommendations for developing new measures and for modifying existing ones. New, improved, and properly applied performance measures will begin filling information gaps and allow multiple stakeholders to better understand performance trends in higher education.
An alternative approach to measuring productivity—one typically used in cost studies—is to estimate the expenditures incurred for instructional activity
15Pell grants, state need-based scholarships, and other sources of student aid can, in principle, offset tuition hikes.
16See Measuring Quality in Higher Education (http://applications.airweb.org/surveys/Organization.aspx [February 2012]), a database developed by the National Institute for Learning Outcomes Assessment, which describes four categories: assessment instruments; software tools and platforms; benchmarking systems and other extant data resources; and assessment initiatives, collaborations, and custom services. The database can be searched by unit of analysis and aggregation level. These are categorized not too differently from our matrix (i.e., student, course, institution, and state or system).
(including allocated overheads), then divide by a volume measure of output to produce a ratio such as cost per degree. Under tightly specified conditions, this would produce the same result as a productivity measure. These conditions, however, are rarely if ever realized. The problem is that simple ratios like cost per student or degree does not take into consideration quality and the multiple outputs produced by higher education institutions. Hence, this approach conveys too little information to be able to attribute productivity differences to differences (over time or between institutions) in price and quality.
Efficiency is improved when cheaper inputs are substituted for more expensive ones without damaging quality proportionately. For example, it has become a common trend for institutions to substitute adjunct instructors for tenure-track faculty. Whether this move toward lower-priced inputs has a proportionately negative impact on output quantity and quality (e.g., numbers of degrees and amount learned) is not yet fully known, and surely varies from situation to situation (e.g., introduction and survey classes versus advanced seminars). In reviewing evidence from the emerging literature, Ehrenberg (2012:200-201) concludes that, in a wide variety of circumstances, the substitution of adjuncts and full-time nontenure-track faculty for tenure-track faculty has resulted in a decline in persistence and graduation rates.
Without data tying changes in faculty composition to student outcomes, efforts to implement accountability systems will be made with only partial information and will lead to problematic policy conclusions. For example, in 2010 the office of the chancellor of Texas A&M University published what amounted to a “a profit-and-loss statement for each faculty member, weighing annual salary against students taught, tuition generated, and research grants obtained … the number of classes that they teach, the tuition that they bring in and research grants that they generate” (Wall Street Journal, October 22, 2010). When a metric as simple as faculty salary divided by the number of students taught is used, many relevant factors are omitted. An instructor teaching large survey courses will always come out ahead of instructors who must teach small upper-level courses or who are using a year to establish a laboratory and apply for grants, as is the case in many scientific disciplines.17 These metrics do not account for systematic and sometimes necessary variations in the way courses at different levels and in different disciplines are taught; and they certainly do not account for differences in the educational experience across faculty members and across different course designs.
The value of productivity and efficiency analysis for planning purposes is that it keeps a focus on both the input and output sides of the process in a way
17In recognition of these limitations, administrators did pull the report from a public website to review the data and the university president promised faculty that the data would not be used to “assess the overall productivity” of individual faculty members (see http://online.wsj.com/article/SB10001424052748703735804575536322093520994.html [June 2012]).
that potentially creates a more thorough and balanced accounting framework. If costs were the only concern, the obvious solution would be to substitute cheap teachers for expensive ones, to increase class sizes, and to eliminate departments that serve small numbers of students unless they offset their boutique major with a substantial grant-generating enterprise.18 Valid productivity and efficiency measures needed for accountability require integration of additional information—for example, the extent to which use of nontenure track faculty affects learning, pass rates, and preparation for later courses relative to the use of expensive tenured professors. The implication is that analysts should be concerned about quality when analyzing statistics that purport to measure productivity and efficiency. Different input-output ratios and unit costs at differing quality levels simply are not comparable.
Finally, it is important to remember that even valid measures of cost and productivity are designed to answer different questions. A productivity metric, for example, is needed to assess whether changes in production methods are enabling more quality-adjusted output to be generated per quality-adjusted unit of input. That this is an important question can be seen by asking whether higher education is indeed subject to Baumol’s cost disease (see Chapter 1)—the question of whether, in the long run, it is a “stagnant industry” where new technologies cannot be substituted for increasingly expensive labor inputs to gain efficiencies. Unit cost data cannot answer this question directly, but they are needed for other purposes, such as when legislatures attempt to invest incremental resources in different types of institutions to get the most return in terms of numbers of degrees or graduation rates. This kind of resource-based short-run decision making responds to funding issues and institutional accountability, but addresses productivity only indirectly and inadequately.
A critical asymmetry also exists in the way productivity and cost-based measures are constructed. Current period price data can be combined with the physical (quantity) data to calculate unit costs, but it is impossible to unpack the unit cost data to obtain productivity measures. The fact that most measurement effort in higher education is aimed at the generation of unit cost data has inhibited the sector’s ability to assess and improve its productivity.
Many other performance measures have been proposed for higher education. The most prominent of these are graduation rates, completion and enrollment ratios, time to degree, costs per credit or degree, and student-faculty ratios. These kinds of metrics are undeniably useful for certain purposes and if applied correctly. For example, Turner (2004) uses time-to-degree data to demonstrate the
18To the credit of Texas A&M University, it did not respond to the findings of its faculty assessment in any of the above-mentioned ways.
relative impact on student outcomes of changing incoming student credentials versus effectiveness in the allocation of resources within public higher education. She finds that the former has a smaller impact than the latter. Similarly, studies have usefully shown how tuition and aid policies affect student performance as measured partially by these statistics. The range of performance metrics, including a discussion of the meaning of graduation rates as calculated by the Integrated Postsecondary Education Data System (IPEDS), is described in detail in Appendix A.
While their role is accepted, the measures identified above should not be confused with productivity as defined in this report. Used as accountability tools, one-dimensional measures such as graduation rates and time-to-degree statistics can be abused to support misleading conclusions (e.g., in making comparisons between institutions with very different missions). Also, because graduation rates are strongly affected by incoming student ability, using them in a high-stakes context may induce institutions to abandon an assigned and appropriate mission of broad access. Use of these kinds of ratio measures may similarly induce institutions to enroll large numbers of transfer students who are much closer to earning a degree than are students entering college for the first time, whether that is the supposed mission or not.
To illustrate the ambiguity created by various metrics, student-faculty ratio levels can be linked to any combination of the following outcomes:
Low Student-Faculty Ratio
|High Student-Faculty Ratio|
The ability to distinguish among these outcomes is crucial both for interpreting student-faculty ratios and for policy making (both inside and outside an institution).
Time to degree, graduation rate, and similar statistics can be improved and their misuse reduced when institutional heterogeneity—the mix of full- and part-time students, the numbers of students who enter at times other than the fall semester, and the proportion of transfer students—is taken into account. Additional refinements involve things like adjusting for systemic time-frame differences among classes of institutions or students. A ratio measure such as a graduation rates that is lagged (allowing for longer time periods to completion) is an example. To avoid the kinds of overly simple comparisons that lead to misguided conclusions—or, worse, actions—responsible use of performance metrics (including productivity if it is used for such purposes) should at the very least be used only to compare outcomes among like types of institutions or a given institution’s actual performance with expected levels. The institutional segmenta-
tion approach has been used by College Results Online,19 a Web site that allows users to view graduation rates for peer institutions with similar characteristics and student profiles. The second method is exemplified by Oklahoma’s “Brain Gain”20 performance funding approach that rewards institutions for exceeding expected graduation rates. These existing measures and programs with good track records could serve as models or pilots for other institutions, systems, or states.
While the panel is in no way attempting to design an accountability system, it is still important to think about incentives that measures create. Because institutional behavior is dynamic and directly related to the incentives embedded within the measurement system, it is important to (1) ensure that the incentives in the measurement system genuinely support the behaviors that society wants from higher education institutions, and (2) attempt to maximize the likelihood that measured performance is the result of authentic success rather than manipulative behaviors.
The evidence of distortionary and productive roles of school accountability is fairly extensive in K-12 education research, and there may be parallel lessons for higher education.21 Numerous studies have found that the incentives introduced by the No Child Left Behind Act of 2001 (P.L. 107-110) lead to substantial gains in at least some subjects (Ballou and Springer, 2008; Ladd and Lauen, 2010; Reback, Rockoff, and Schwartz, 2011; Wong, Cook, and Steiner, 2010), and others have found that accountability systems implemented by states and localities also improve average student test performance (Chakrabarti, 2007; Chiang, 2009; Figlio and Rouse, 2006; Hanushek and Raymond, 2004; Neal and Schanzenbach, 2010; Rockoff and Turner, 2010; Rouse et al., 2007). However, these findings have been treated with some skepticism because, while Rouse and colleagues (2007) show that schools respond to accountability pressures in productive ways, there is also evidence that schools respond in ways that do not lead to generalized improvements. For example, many quantitative and qualitative studies indicate that schools respond to accountability systems by differentially allocating resources to the subjects and students most central to their accountability ratings. These authors (e.g., Booher-Jennings, 2005; Hamilton et al., 2007; Haney, 2000; Krieg, 2008; Neal and Schanzenbach, 2010; Ozek, 2010; Reback, Rockoff, and Schwartz, 2011; White and Rosenbaum, 2008) indicate that schools under accountability pressure focus their attention more on high-stakes subjects, teach skills that are valuable for the high-stakes test but less so for other assessments, and concentrate their attention on students most likely to help them satisfy the accountability requirements.
Schools may attempt to artificially boost standardized test scores (Figlio and Winicki, 2005) or even manipulate test scores through outright cheating (Jacob
20See http://www.okhighered.org/studies-reports/brain-gain/ [June 2012].
21The panel thanks an anonymous reviewer for the following discussion of incentive effects associated with accountability initiatives in the K-12 context.
and Levitt, 2003). These types of behaviors may be the reason that the recent National Research Council (2011) panel on school accountability expressed a skeptical view about accountability while recognizing the positive gains associated with these policies.
One potential solution emerging from the K-12 literature is that “value added” measures of outcomes tend to be less manipulable than are measures based on average levels of performance or proficiency counts. The rationale is that when schools are evaluated based on their gains from year to year, any behaviors generating artificial improvements would need to be accelerated in order for the school to continue to show gains the next year. In higher education, however, this year’s post-test is not next year’s pre-test, so there remains the very real possibility that institutions could manipulate their outcomes (or their inputs) in order to look better according to the accountability system; and while value added measures might allow for more apples-to-apples comparisons among institutions, they will not reduce the strategic behavior problem by as much as they might in K-12 education.
One example of how higher education institutions respond strategically to the incentives embedded within an evaluation system is observable in relation to the U.S. News & World Report rankings. Grewal, Dearden, and Lilien (2008) document ways in which universities strategically deploy resources in an attempt to maximize their rankings. Avery, Fairbanks, and Zeckhauser (2003) and Ehrenberg and Monks (1999) find that the ranking system distorts university admissions and financial aid decisions.
If institutions make failure more difficult by implementing systems of support to help struggling students improve, this is a desired outcome of the accountability system. If instead they act in ways that dilute a curriculum, or select students who are likely to help improve the institution’s ranking, this could be a counterproductive consequence of the system. The more background characteristics are used to predict graduation rates, the harder this manipulation would become, but on the other hand, only a small number of background factors are currently available on a large scale.
To sum up, many proxy measures of productivity have been constructed over the years. They have some utility in comparing institutions and programs, if used cautiously and with knowledge of their drawbacks. But experience has shown that they can result in major misunderstandings and the creation of perverse incentives if applied indiscriminately. As with productivity measurement itself, these proxies are significantly affected by context. Among the most important contextual variables that must be controlled for are institutional selectivity, program mix, size, and student demographics. The model outlined in Chapter 4 suggests approaches for dealing with some of the shortcomings of traditionally used performance measures. Part-time students are treated as partial FTEs; semester of entry does not create distortions; and successful transfers are accounted for through awarding bonus points analogous to the sheepskin effect for bachelor’s degrees.