No method of measuring a societal phenomenon satisfying certain minimal conditions exists that can’t be second-guessed, deconstructed, cheated, rejected or replaced. This doesn’t mean we shouldn’t be counting—but it does mean we should so do with as much care and wisdom as we can muster.
—John Paulos, New York Times Sunday Magazine, May 16, 2010
Educational leaders and critics around the country are increasingly calling for greater accountability in higher education. This call creates new demands for reliable performance metrics and data infrastructure to support them. This report attempts to advance the conversation about how best to proceed in response to these demands by identifying promising approaches to productivity measurement that would supplement the statistical information needed by policy makers and administrators to guide resource allocation decisions and assess the value of higher education against other compelling demands on scarce resources; in the process, insights may also be generated that, at least indirectly, lead to improved performance of higher education over the long run.
In sorting through the wide variety of potential applications and contexts—differentiated by characteristics of student populations, institution types and missions, and relevant level of aggregation—it is immediately clear that no single metric can suit all purposes. An appropriately constructed productivity measure for open-admission locally-based colleges will differ from that for major research universities that compete for students, faculty, funds, and recognition on an international scale. Regarding mission, operations such as university hospitals, museums, or athletic programs—for which costs can often be separated out—are typically less central to the pressing policy and public information needs with
which this report is concerned.1 Regarding aggregation, productivity can be measured at increments from the micro level (for example, course and departmental units of analysis, which are important for assessing institutional efficiency) to the macro level (relevant in itself for policy, and also because national accounting offers principles for assembling data used in productivity analysis).
Even as we attempt to advance the discussion of measurement in higher education, we recognize that methodologies and data collection will continue to develop. The measure proposed in Chapter 4, while more promising than simpler metrics, still represents only a partial accounting of the output of the higher education sector. As a first approximation for estimating the value of degrees in various fields, researchers have looked at the associated earnings profiles of those who have earned them. The members of this panel have a range of views on the validity and usefulness of this approach. Some argue that there are legitimate reasons for policy makers to look at market wages, such as when assessing the contribution of college graduates or institutions to economic growth in their regions. Others on the panel argue that such analyses unjustifiably devalue liberal arts education (and fields such as education and social work), and may be particularly harmful when undertaken under pressures created by tightening state budgets.
The panel does agree that policy makers should be concerned with social value, not just market value generated by higher education, and that, for many purposes, emphasis on the latter is a mistake. Earlier chapters include discussion of why current salary differentials (by degree and field) are not necessarily good predictors of future differentials and, thus, why valuing degrees by salaries that graduates earn may be misleading. Some socially important fields pay relatively low salaries which do not reflect their full social value. Furthermore, investment in citizens’ careers is not the only objective, from a societal perspective, of supporting and participating in higher education. The nonpecuniary components of higher education output, such as research and other public goods, are also important; even the consumption component of college, including student enjoyment of the experience, is quite clearly significant.
While acknowledging inherent limitations in our ability to comprehensively and coherently account for such a multi-dimensional and diverse productive activity, our view is that simplified, practical approaches that can be implemented by administrators and policy makers can add analytic value. Our recommendations provide guidance for establishing and building on the model of productivity measurement presented in Chapter 4, predicated on the notion that acknowledging the quantitative aspects of productivity is reasonable as a first step.
1Of course, just because a cost can be separated does not mean that it always should be. A museum, for example, may provide classes and a cultural center. A portion of the museum’s cost would then be attributed to the educational mission and a portion to student services.
Measuring productivity for higher education begins with the collection of data for quantifying the relationships between units of inputs and outputs, expressed either as volume measures or inflation-adjusted monetary measures. This is the same approach used in productivity measurement for other categories of services and goods produced in the economy. As with higher education, other kinds of firms and sectors produce multiple products and services at quality levels that are difficult or impossible to measure objectively. What is different about higher education is the scale and interconnectedness of these complexities, as examined in Chapter 3. Nevertheless, the basic principles of productivity measurement, appropriately adjusted (or qualified to acknowledge lack of adjustment) for the sector’s heterogeneities, can be applied in higher education.
Applying these basic principles leads to Recommendation (1), below. We emphasize that this is only a starting point—that additional research, such as that on the measurement of both the quantity and quality of inputs and outputs, is urgently needed.2 Nonetheless, it is not premature to propose a basic form as a starting point for productivity measurement. Indeed, having such a form in view will help set the research agenda.
Recommendation (1): The baseline productivity measure for the instructional component of higher education—baseline because it does not capture important quality dimensions of all inputs and outputs—should be estimated as the ratio of (a) the quantity of output, expressed to capture both degrees or completions and passed credit hours, to (b) the quantity of inputs, expressed to capture both labor and nonlabor factors of production.
The model is experimental and should not be adopted without scrutiny, modifications, and a trial period. As we describe in Chapter 6, the model presented will also help motivate and guide data collection and survey design by institutions and by the federal statistical agencies such as the National Center for Education Statistics (NCES), which produces the Integrated Postsecondary Education Data System (IPEDS). Data collection should also be organized so that measures can be aggregated for different purposes. Ideally, estimation of the productivity measure would proceed by first compiling institution-level data for creation of measures for institutional segments, which can then be further aggregated to state and national levels.
2Dealing with the output of two-year institutions—specifically the production of transfers and nonfour-year degrees—is another complication, which we discuss later in the chapter.
To accurately and meaningfully assess the higher education sector’s contributions to society and individuals, graduation rates and other uni-dimensional statistics alone are insufficient, and should be avoided. Even a well specified measure, while a considerable step in the right direction, does not offer a full portrait of all that is important, even if a practical way to adjust outputs and inputs for quality were at hand.3 With this caveat, the panel recommends—for constructing the baseline measurement of instructional output—a formula that sums student credit hours and a multiplier that captures the added benefit of achieving academic credentials (degrees or certificates).
Recommendation (2): The metric used for instructional output (numerator of the productivity ratio) should be a weighted mix of total credits plus additional points for graduation (degree or equivalent) such that:
Adjusted credit hours = Credit hours + Sheepskin effect × Completions.
The underlying theme here, consistent with the economics literature of human capital accumulation, is that even education that does not result in a degree adds to a student’s knowledge and skill base, and thus has value. Bailey, Kienzl, and Marcott (2004), Barro and Lee (2010a), and others have concluded that the appropriate unit of measurement for estimating costs and benefits is neither per credit hour nor cost per completion exclusively—and that a hybrid scheme is required. Appropriate data on enrollments and completions are obtainable from IPEDS. The two positive outputs can be combined into a single quantity by weighting the student credit hours with the added value of the degree or certificate over and above the equivalent years of schooling (the “sheepskin effect”).
Labor market studies indicate that, due to a combination of credentialing and other effects, starting salaries are consistently higher for those with a bachelor’s degree relative to those with similar characteristics and an equivalent number of credit hours but lacking a degree. Estimates of the size of the sheepskin effect to be applied in the model must be empirically based, as in the work of Card (1999) and Jaeger and Page (1996). At this point, a single national weighting scheme for the value of credits versus completions could be applied for all institutions; in Chapter 4, we suggest a degree bonus (for four-year institutions) equivalent to an additional year of credits, pending further empirical investigation. In time,
3The criticism that a measure is not useful when incomplete is analogous to the criticism of GDP as a measure of societal welfare or progress. GDP is important, but statistics on poverty, income/wealth distribution, mental and physical health, well-being, etc. are also needed to form a full picture of whether or not social and economic conditions are improving. Likewise, a productivity measure is one element of a complete picture of the sector’s performance.
with sufficient information, a more granular adjustment factor could be devised that allows for variation by academic field, institution, and possibly region.4 Estimates of the market value of degrees should be adjusted as warranted by ongoing research, and regularly updated as the values change with labor market trends. Longitudinal surveys (e.g., the Wisconsin Longitudinal Study) designed to determine the average salary for degree and nondegree earners at various points after graduation (starting, after five years, etc.) are essential for this kind of research and may serve as a model.5
Adjusting Output Measures to Reflect Institution Type and Mission
Appropriate specification of a productivity measure varies by context and by intended use. For many purposes, it is misleading to compare performance measures across institutional types unless adjustments are made.6 Community colleges, which are likely to continue to contribute to any increases in the nation’s production of postsecondary education, provide a clear example of the need for flexibility in the measurement concept. In this case, the unit of output should be defined to reflect a student body mix dominated by individuals pursuing (at least as a first step) an objective other than a four-year degree. The method of establishing point totals in the numerator of the productivity measure for community colleges can use the baseline framework presented in Chapter 4; however, it will have to be modified to reflect different mission objectives. Specifically, certifi-
4At this point, it may be asking too much to provide the differentials by field, but it warrants further research along the lines of Arum and Roksa (2010). Credit hours for students who have not declared a major would be prorated over the declared majors for that degree level. There may be credit hours that cannot be assigned to specific majors. If a residual category of “nonmatriculated students” is needed, a weighting procedure could be devised. Also, in some fields, such as nursing, the credentialing effect is strong and credits themselves less so; in other cases, such as the student with two years in liberal arts at a prestigious institution, credits themselves may be quite valuable. One possible benefit of applying a single degree bonus to all degrees is that it may avoid unhelpful attacks on academic areas (such as the humanities) for which social benefits are less fully captured by current salary data.
5In addition to the longitudinal data, modeling techniques must control for student characteristics that may affect both the probability of graduating and subsequent earnings levels. Thus far, the literature has mainly addressed earnings differentials between those who attend or do not attend college (see Dale and Krueger, 2002), but the methods would be similar. Techniques should also ensure that the marginal earnings effect of an advanced degree is not attributed to the undergraduate degree. Eliminating those with advanced degrees from the study may introduce a selection bias in either direction. That is, those with the highest earning potential may have a tendency either to enter the labor market directly with a bachelor’s degree or to pursue a graduate degree.
6A good example is the “Brain Gain” initiative of the Oklahoma Board of Regents, which employs a statistical methodology that estimates the amount an institution deviates from model-predicted graduation rates that takes into account such variables as average admissions test scores, gender, race, and enrollment factors such as full- versus part-time status.
cates, associate’s degrees, or successful transfers will have to enter the equation. This will create new data needs (see next chapter).7
Recommendation (3): Definitions should be established for outcomes at institutions other than traditional four-year colleges and universities with low transfer-out rates, and appropriate bonus figures estimated and assigned to those outcomes. This is especially important for community colleges where, in contrast to B.A. and B.S. degrees, outcomes might be successful transfers to four-year colleges, completion of certificates, or acquisition of specific skills by students with no intention of pursuing a degree.
If supported by empirical research on the added value of these milestones, the same methodology that is used for the four-year institutions (analogous to the sheepskin effect) could be applied.8 In the short run, a proxy could be developed using data from the Beginning Postsecondary Study (BPS, which includes information on transfers) to determine profiles of community college students whereby an outcome measure indicating a given number of credit hours earned counts as a success (see Bailey and Kienzl, 1999).
Additional empirical work will be needed since the salary premium—that is, the salary bump for someone with two years of postsecondary schooling and an associate’s degree relative to someone with similar schooling but no degree—is likely to be quite different from the four-year counterpart. The bonus will also vary by degree type. An associate of arts degree may lead to only a small sheepskin effect; a technical associate’s degree, on the other hand, may be quite valuable in the labor market.9 These are empirical questions.
To study salary effects for transfers, these students must be distinguished from nontransfers (transfers typically can only be identified by the receiving institutions). For transfers, in cases where the degree is ultimately earned, the degree bonus could be allocated by percentage of time spent—or, probably better, credits earned—at each institution. For institutions at which a significant proportion of
7This more flexible kind of accounting creates virtuous incentives such as encouraging community colleges to produce student transfers to four-year institutions.
8It is noteworthy that the Gates Foundation has stated as a program objective establishing these kinds of metrics for grantees. Lots of data are being generated through these projects but the definitions and methods are still in flux.
9Additionally, state systems have different policies that affect the need for an associate’s degree. In some, a student may automatically matriculate to a four-year institution with full credits; a degree is not needed to transfer. It is also unclear whether there is any significant sheepskin effect associated with a two-year degree for individuals who go on to get a four-year degree.
new students enter as transfers, new additional measures of incoming class level may be needed.10
The multiplicity of institutions again points to the need to develop data tools designed to follow students rather than institutions so as to accurately assign output quantities in the accounting system. Longitudinal data also would add considerable analytic capacity for estimating education outcomes of transfer students. Without this kind of data, it will be impossible to differentiate between transfer students who ultimately receive degrees and students who begin at four-year institutions and receive the same degree. Additionally, much of the discussion about tracking student outcomes along with student characteristics suggests the need for student unit record systems that are different from cohort graduation rate record datasets. We discuss these data implications in more detail in Chapter 6.
Ideally, for many monitoring and assessment purposes, separate productivity measures should be developed for direct instructional activities and for the various noninstructional activities that take place at higher education institutions. As with outputs, expenditures on inputs attributable to some kinds of research, public service, hospitals and the like should be excluded from the instructional production function. Proportionate shares of various overhead functions such as academic support, institutional support, operations and maintenance, and other core expenses should be included.
As detailed in Chapter 4, the most quantitatively significant input to higher education instruction is labor, represented as units of FTE employees. Some portion of nonlabor inputs—specifically, the cost of materials and other inputs acquired through purchase or outsourcing and the rental value of physical capital—must also be accounted for. The Bureau of Labor Statistics (BLS) recommends a methodology based on percentage changes for the various categories of physical inputs, with weightings based on total category expenditures, for combining the various inputs.11 Following this approach:
Recommendation (4): The metric used for instructional input (denominator of the productivity ratio) should be a composite index based on the instructional components of (a) full-time equivalent labor hours,
10Another question is how to weight the outputs of institutions at which students begin accumulating credit for transfer (e.g., a community college that produces a significant percentage of students who eventually attend a four-year college) against those with a terminal degree focus, such as community colleges that specialize in technical training and that produce a small percentage of transfers to four-year schools.
11The BLS method assumes that the entities for which productivity is being measured are profitmaximizers, an assumption that does not strictly apply to traditional colleges and universities. Massy (2012) extends the BLS methodology to the nonprofit case.
perhaps broken down by labor category, (b) intermediate inputs, and (c) the rental value of capital.
The first challenge here is to correctly allocate the quantities of various inputs to the different categories of outputs produced. Accounting systems vary in their classification and treatment of costs (which is more likely to affect disaggregated than aggregated data). Any proposed approach should assess the available options in terms of their capability to capture the desired information.12
A second challenge involves differentiating among the various categories of labor inputs. Chapter 4 suggests grouping labor categories with similar marginal productivities and which operate in similar markets. The starting point is to allocate labor FTEs into categories differentiating tenure-track (mainly but not exclusively full-time) and adjunct (mostly part-time), faculty. Ultimately, if empirical evidence justifies treating them as fundamentally different types of labor, it may be desirable to differentiate full-time faculty by seniority, permanent/temporary status, tenure status, or teaching load assignment. This basic approach can be modified as necessary pending more thorough study:
Recommendation (5): The National Center for Education Statistics (NCES) or a designee should examine the feasibility of (a) modifying university accounting systems and IPEDS submissions to identify FTEs by labor category, as ultimately specified for the model, according to the function to which they are charged; and (b) calculating total compensation for each category and function.
As discussed in the next chapter, the eventual scheme may involve modest changes to IPEDS to fully account for the small number of tenure-track faculty members with part-time status. The change to IPEDS in regard to part-time faculty suggested in Chapter 4 is one such modest change. If additional elements are added to IPEDS, NCES should consider ways to limit the total reporting burden, such as eliminating little-used or inconsistently reported disaggregations of personnel data.
To complete the baseline model, it is necessary to account for nonlabor expenditures such as the cost of materials and other inputs acquired through purchasing and outsourcing. This can be done by summing expenditures on “operations & maintenance” and “all other” categories (most of these data can be obtained in IPEDS). Deflated (real) values are used to represent the physical quantities (N) and nominal values as the weights (NE). Additionally, a portion of the rental value of capital attributable to instruction—estimated as the opportunity cost for the use of physical capital—must be included in the aggregation of inputs. The rental value of capital is estimated as the book value of capital stock
12The cost allocation algorithm developed for the Delta Cost Project is an example of a logical and well-considered basis for allocating costs.
multiplied by an estimated national rate of return on assets, where capital stock equals the sum of land, buildings, and equipment. In the Base Model (Table 4.2) developed in Chapter 4, rental value of capital is calculated as the instructional capital stock times the rate of return.
The most methodologically complicated “mixed” input, at least for one large subset of institutions, is faculty time spent in research and service, including public service. In Chapter 2, we describe the complications associated with distinguishing faculty labor associated with research and service from that for other functions. Accounting for institutions’ intermediate inputs and capital usage is more straightforward, and standard allocation techniques from cost accounting can be used to good advantage. Even here, however, there are differences in methodology that complicate our ability to achieve comparable productivity measures. These methodological assumptions need to be reviewed and perhaps revised.
Recommendation (6): The National Center for Education Statistics or a designee should develop an algorithm for adjusting labor and other inputs to account for joint production of research and service. Faculty labor hours associated with instruction should exclude hours spent on sponsored research and public service, for example, and the algorithm should provide an operational basis for adjusting other inputs on the basis of expenditures.
Commentators such as Jonathan Cole have argued that research capacity is in fact the primary factor distinguishing U.S. universities from those in the rest of the world (Cole, 2010). Cole argues convincingly that America’s future depends strongly on the continued nurturing of its research-intensive universities. Indeed, that is why the federal government and state governments have invested and continue to invest billions of dollars in university-based research. Thus, in not fully accounting for research activities, the baseline model omits a central mission of research oriented universities.
The decision to limit the report’s focus to measurement of undergraduate instruction was made for pragmatic reasons and is not intended as a comment on the relative importance of teaching, research, and public service for institutions with multiple missions. Even if the sole objective was in measuring instruction, understanding of faculty time spent in research should be improved because it is substantial, not fully separable, and may affect the quality of teaching. For example, Webber and Ehrenberg (2010) show that increases in sponsored research expenditures per student were associated with lower graduation rates after holding instructional expenditures per student constant—perhaps because regular faculty spend less time on optional tasks and rely more on adjuncts. The authors hypothesized that institutions with high levels of sponsored research probably
The Case Study Afforded by For-Profit Universities
Though separating the cost of departmental research and instructional costs is clearly important, it involves a fundamental issue that relates to the growing for-profit sector. The for-profit sector is exclusively focused on providing instruction, and the curriculum is based on the knowledge that is produced in the rest of the higher education sector. This means that for-profit higher education, which will not have departmental research, operates on a lower cost schedule than the rest of higher education. If state policy makers reduce funding of departmental research, the growth of the knowledge base will slow and thus, in the long run, gains from attending college will diminish. Trying to minimize the current costs of producing degrees for all types of institutions may not be in the best interest of either research or teaching efficiency and quality. Conceptually, there is a great distance between this perspective and that underlying state legislators’ preoccupation with performance indicators for undergraduates. It is the distance between the largely autonomous operating environment of an internationally prominent private university and the resource-dependent environments of the larger number of public institutions.
also had higher levels of departmental research (discussed below), but this could not be demonstrated. On the positive side, what a professor does in the classroom may depend in part on whether he or she stays current with skills and the issues in the field.13 Without an active research program, professors risk falling behind the state of knowledge in their fields and thus being unable to teach material at the frontier. Additionally, active researchers may be versed in new knowledge that will not reach journals for a year or two and textbooks for much longer.
In earlier chapters, it was argued that instructional program data should exclude sponsored research and organized public service activities because these outputs are distinct from the undergraduate educational mission. When and where these activities are separately budgeted and involve little joint production (i.e., are far removed from the instructional mission), they are easy to parse. For sponsored research, the practice of being released from teaching duties by grants puts a natural price on the time spent in the activity. Other kinds of research, such as unsponsored faculty research, are more difficult to account for. The question of how to account for research is even relevant to the for-profit university and community college cases—where the mission is almost exclusively instruction—because, in the long run, these institutions ultimately rely on the knowledge base developed by the research (Box 5.1).
13A related issue is the need to deal with undergraduate and other taught programs when graduate-research programs play a role as inputs (i.e., supplying teachers).
The recommendations in this report about the appropriate treatment of departmental research differ in important respects from those contained in the costing study conducted in 2002 by the National Association of College and University Business Officers (NACUBO). The NACUBO costing study asserts that all departmental research should be combined with instructional costs because, “The integration of research and education is a major strength of the nation’s colleges and universities and directly benefits undergraduates” (National Association of College and University Business Officers, 2002:28). However, the tendency to report cost study results in terms of the difference between unit costs and the tuition and fees paid by students has led some commentators to argue that at least some of these costs should be separated (assuming that a means of separation can be found). Massy (2010) asks why the cost of faculty research—pursued for its own sake and often with only tenuous links to the educational mission—should be described as a subsidy to students. NACUBO’s approach carries the perverse implication that the lower a department’s apparent productivity (for example, because it pursues objectives not desired by students or stakeholders and/or fails to manage teaching loads effectively), the greater is the apparent subsidy to students—an incentive system that fails to drive improvement and undermines the credibility of higher education.
Departmental research (DR) falls into two categories: project-driven and discretionary. Each has different implications for the measurement of instructional productivity. Project-driven departmental research involves work required to fulfill the terms of a sponsored research project, and may involve some cost sharing. For example, principal investigators on a grant from the National Science Foundation (NSF) may not receive academic-year salary offsets, yet the considerable amounts of time they spend on their projects often necessitates reduced teaching loads (reductions that count as departmental research). Arrangements made between colleges or departments and faculty members to seed proposal writing or provide bridge support between sponsored projects also fall comfortably under project-driven DR even if they are not separately budgeted.
The direct link between project-driven DR and sponsored research provides a strong argument for excluding the former from instructional costs. Only the idiosyncrasies of university accounting and the market power of sponsoring agencies enable those agencies to enforce cost-sharing on academic-year effort in order to spread their funds further. Arguing on principle for inclusion of research costs in instructional cost is tantamount to arguing that the sponsored research itself be included—which, in addition to being intrinsically illogical, would hugely distort the productivity measures.14
14A study by Webber and Ehrenberg (2010) supports the idea that project-driven DR may lower graduation rates and lengthen time to degree (presumably because of its demands on faculty effort).
Fortunately, the association of project-driven DR with sponsored research opens the way to a statistical adjustment for instructional cost based on the amount of sponsored research extant in the department. Objection to basing the adjustment on sponsored research because it is unequally distributed across fields will be mooted by the partition of DR into project-driven and discretionary components.
Recommendation (7): An estimate of the departmental research (DR) directly associated with sponsored research projects (project-driven DR, now typically considered part of the instructional cost in universities’ accounts) should be excluded from faculty instructional labor. The algorithm for doing this will have to be developed through a special study, since it appears impractical to capture the data directly in university accounting systems.
Some ideas about how the special study might be conducted are presented in Appendix D.
Once an empirically based aggregate estimate (the default for each institutional type) is established, institutions might be allowed to report their own percentage of DR, either higher or lower than the default, based on local data (the same percentage would have to be reported to all audiences). This would have a balanced effect on incentives: selecting a higher percentage would boost the university’s reported productivity, but it also would raise research expectations and expose the institution to criticism from stakeholders. Regardless, the percentage figure (and the evidence put forward to justify the difference from the default) will stimulate a needed dialog which will improve the understanding of departmental research as well as the calculation itself.
Discretionary departmental research refers to work initiated by a faculty member without outside support. Like all departmental research, discretionary research and scholarship is done on the departmental budget—that is, without being separately accounted for. Separately budgeted research may be sponsored or unsponsored, but departmental research is paid for by the university. The largest element of departmental research expense arises from reduced teaching loads for faculty. Teaching load reductions may be defended on the grounds that they enable: (a) research and scholarship for fields where sponsored research is not typically available—for example, the humanities; (b) research and scholarship for faculty who are less successful in obtaining sponsored projects than their peers or who are at departments or institutions that typically do not attract sponsored research funds but who, for various reasons, are deserving of dedicated research time; (c) time to be directed toward seed efforts to reorient research agendas or
to lay the foundation for large, complex proposals; or (d) “education research” to spur teaching improvement and course and curriculum development.
Good arguments exist for including at least a part of discretionary departmental research in the cost base for instruction, especially if one can escape the subsidy notion discussed in the previous section. For one thing, it is difficult or impossible to separate the effects of educational research and development (R&D) from the other motivators of low teaching loads (other than those associated with sponsored research projects), and there is no doubt that educational R&D should be included in the instructional cost base. Meaningful educational R&D expenses and work that sustains the life of disciplines (and that are not sponsored research) should be defendable to stakeholders. Additionally, some allocation of faculty time entails service that is required to keep institutions running. Other time commitments, such as those to public service and administrative work related to sponsored research, do not contribute directly to instruction.
Recommendation (8): Discretionary departmental research and service should be included in the instructional cost base for purposes of measuring productivity.
Because of the variation in intensity of departmental research across institutions, productivity comparisons across categories should be avoided. For example, departmental research that enhances the knowledge base in disciplines provides much of the intellectual content for classes taught by both four-year research and for-profit institutions. However, since instructors at for-profit institutions spend little time doing this research, they will appear more efficient (because of the time faculty at other institutions spend producing new knowledge, which eventually is used by the for-profits).
Ideally, input and output quantities would be adjusted for quality, but this is difficult; developing data and methods for doing so is a very long-term project, especially if the goal is systematic quality adjustment such as that routinely performed by BLS for product categories like computers and automobiles. Incomplete accounting of quality variation or change in a given measure makes it all the more essential to monitor when apparent increases in measurable output arise as a result of quality reduction. This may have to be done through parallel tracking of other kinds of information generated independently from the productivity metric.15 In general, until adjustments can be made to productivity metrics to ac-
15In practice, we see quality measures deployed to monitor whether learning quality changes with changes in production methodology, as, for example, in course redesign projects such as those conducted by the National Center for Academic Transformation. This requires that institutions take steps to ensure that departments maintain a robust internal quality control system.
count for quality differences, it will be inappropriate to rely exclusively on them when making funding and resource reallocation decisions. A productivity metric is only one piece of information to weigh in most decision-making environments. Uncritical use could result in creating the wrong incentives, such as penalizing institutions that accept economically disadvantaged (less prepared) students. This is also why we emphasize the need to segment and array institutions for purposes of comparison. At the national level, quality adjustment is a concern only if there are changes over time in the preparedness of students and the value of education, or in the percentages attending various kinds of institutions—for example, a big shift into community colleges. The problem of quality is more acute when attempting to compare institutions, departments, and systems.
As emphasized throughout this report, full measurement of inputs and outputs is more complicated than simply counting hours, dollars, or degrees. The hours for a given task are not fixed, and students, instructors, institutions, and degrees are not homogenous. Among the host of factors that change over time or vary across institutions and departments—and which in turn affect measured productivity—is the mix of students. Students may vary along dimensions such as preparedness and socioeconomic background. Prior research shows that standard outcome measures (graduation and persistence rates, for example) are related to variables such as entry test scores and family income (Webber and Ehrenberg, 2010). Heterogeneity of student characteristics surely must be taken into consideration when interpreting measures of institutional productivity.
The heterogeneity issue regularly arises when interpreting performance implications of simple metrics such as graduation rates. For purposes of making comparisons across institutions (or states, or nations), it is essential to take into account incoming student ability and preparation. Highly selective institutions typically have higher completion rates than open-access institutions. This may reflect more on the prior learning, preparation, and motivation of the entrants than on the productivity of the institution they enter. Therefore, in the context of resource allocation, performance assessment, or other high stakes decisions, the marginal success effect attributable to this input quality effect should be taken into consideration. A conventional approach, widely applied in the empirical literature, is to use SAT scores and other indicators of student quality (e.g., high school rank, ethnicity, socioeconomic variables such as educational status and income of parents) as explanatory variables in regression analysis to make the needed adjustments.
Legal constraints may prevent reporting of data on the individual student level, but much could still be learned from a more complete school-wide census of incoming freshmen on demographic and preparedness measures. Beyond the data offered by IPEDS on SAT/ACT scores and enrollment by gender and
race for first-time freshmen, more empirical traction could be gained by making available the following data at the school level: distributions of family income by entering freshman cohort (some of which could be gleaned from federal student loan applications); distributions of Advanced Placement test taking and scores by cohort; distribution of parents’ education by cohort; distribution of SAT/ACT scores by gender, race, and cohort; distributions of need-based aid (both in-house and government) by income categories and cohort; and other relevant information that schools commonly collect but do not report. It may be possible for schools to report a census of applicants or matriculants that would include variables such as acceptance rates by race, gender, and household income.
In the spirit of monitoring quality (in this case, of the student input) in parallel with the proposed productivity statistic, student distributions could be reported at the quartile or quintile level so as not to make reporting excessively costly, rather than simple averages. Dispersion is likely to be an informative metric in quality-adjusted productivity.16 Schools commonly track at least some of these data, but this advance would require a mandate by IPEDS to elicit widespread cooperation. This kind of partial quality adjustment is better than nothing and contributes to researchers, practitioners, and policy makers having a more precise and informative picture to learn from.
Segmenting institutions is the alternative (or in some case complementary) approach to controlling for variation in student characteristics and other kinds of educational inputs and outputs. As outlined in Section 4.3, institutions are assigned to different groups and productivity estimates are made for each group. Representatives of the segments themselves would have to make the case that any increments to resource utilization per unit of output, as compared to other segments, are justified in terms of quality, research emphasis, or other factors.
Even within an institutional segment, it is still desirable to account for differences in the mix of degree levels and majors. Institution-level cost data indicate that the resources required to produce an undergraduate degree vary, sometimes significantly, by major. Variation in degree cost is linked to, among other things, systematic differences in the amount of time and technology (e.g., engineering) needed to complete a degree. Some majors take an average of four years while others may take four-and-a-half or five years (engineering tracks, for example, typically entail eight or more additional credits due to course requirements and
16For a simple example, suppose two schools have an entering class with the same average SAT score, but the first school has a higher variance. If students below a certain preparation threshold are more likely to drop out, then, all else equal, the school with the fattest lower tail would have a larger attrition rate and hence, a lower measured productivity if means alone were considered. Even though their student inputs are comparable as measured by the first moment of the distribution, the second moment (and others) may still contain meaningful information on quality differentials.
accreditation rules).17 Similarly, there is variation in associate degree program lengths. A nursing A.S., for example, is often more than 60 credit hours; 72 hours is typical. Also, some majors are associated with larger class sizes and others with smaller ones; and some majors are associated with lower graduation rates.18 If this variation were not taken into account—that is, if only degrees and credits were counted—schools with an expensive field mix would fare poorly on a cost-per-degree metric; and, if such a performance measure were tied to high stakes decisions, institutional policy makers might have an incentive to gravitate toward less costly majors. If an institution’s students were increasingly enrolled in high-cost majors, expenditures per student must rise or, if they remain constant, one would expect a decline in graduation and persistence rates or in quality.
Faculty salaries also vary widely across fields of study (Turner, 2001). Institutions that educate a greater share of their students and employ a greater share of their faculty in fields associated with rising relative salaries will see greater cost pressures on their faculty resources, other factors held constant. Institutions may respond by reducing faculty size as salary costs rise or shifting to lower-paid nontenure-track teachers which, as noted earlier in the chapter, may potentially affect student learning and graduation outcomes. This suggests that productivity measures must control for changes in the distribution of faculty type and salaries over time.19 For these reasons, changes in the mix of majors must be accounted for when estimating the denominator of the productivity measure.
Recommendation (9): The productivity model should include an adjustment for field of study. This adjustment should reflect different course requirements, pass rates, and labor input costs associated with various fields.
Even if the initial baseline model does not adjust for field of study, research to create future versions that do so should be encouraged. In our data recommenda
17A gradual increase in credit-hour requirements has occurred since the 1970s, possibly in response to credit-hour funding incentives. Before that, almost all bachelor’s programs were 120 credit hours. They are now gradually decreasing again under pressure from states. Productivity metrics will help avoid incentives to inflate requirements and extend students’ time.
18Blose, Porter, and Kokkelenberg (2006) look at the distribution of majors and the cost of producing a degree for each field of the State University of New York system.
19At the national level, using a national average of faculty salaries by field is appropriate. When thinking about changes in productivity at individual institutions, it becomes a more complicated problem. Not only do faculty salaries vary across fields, but the ratio of faculty salaries in different fields (e.g., average salary in economics to average salary in English) also varies widely across institutions. Ehrenberg et al. (2006) suggest that if, say, the economics department at a university gets higher salaries relative to the English department, the relative salary advantage of economists will increase, leading average faculty salaries at the institution to increase. In measuring changes in productivity at that institution, it is unclear whether this institution-specific change in relative faculty salaries should be factored in.
tions (next chapter), we advise that credit-hour data for productivity analyses be collected in a way that follows students in order to estimate the role of various departments associated with a given type of degree.
The major types of information needed to ideally adjust for in an output measure relate to learning outcomes and degree and credit value. If salaries of graduates are used as a quality metric of outputs, which we do not blindly advise, controlling for field in the model is essential. Even with field of study level granularity, acting on the recommendations in this report will produce nothing like a complete quality-adjusted productivity measure. The goal here is to initiate a methodology that provides a real and necessary step toward better measures that can be further developed as data and research advance.
Because of its complexity, the prospect of quality adjustment calculations seems at first blush daunting. However, some information is already available. For example, in thinking about outputs (educated and credentialed graduates), a wide range of degrees already include some kind of external certification that involve quality assessment, at least in terms of student competency outcomes.20 These include engineering, accounting, nursing, and a range of technical fields.
Recommendation (10): Where they already exist, externally validated assessment tools offer one basis for assessing student learning outcomes. For fields where external professional exams are taken, data should be systematically collected at the department level. This could subsequently become a part of IPEDS reporting requirements.
While this kind of information cannot be included in the productivity metric as specified above, it can be evaluated alongside it to help determine if instructional quality is holding constant as input and output variables change.21 Similarly, empirical power could be created if something like the College and Beyond (C&B) survey conducted by Bowen and Bok (1998) were done so on a nationally representative sample of schools. A limiting aspect of the C&B survey is that it concentrates only on highly selective schools that are not reflective of the overall college market.
Student learning is inherently complex and therefore difficult to measure, particularly when it comes to the higher-order thought process presumably
20On the other hand, many of the fields taught within colleges of liberal arts do not have learning assessments based on field competency tests.
21Additional information, such as students’ college entrance test scores, would make the professional exam data even more useful. On their own, the latter data do not indicate the value added by a school, but rather students’ absolute level of achievement relative to a threshold of passing.
needed to pursue a college education. But steps have been taken, and a number of states produce data on learning results. All University of Texas system schools use the Collegiate Learning Assessment (CLA) test as a means of assessing value-added of each general education curriculum. South Dakota uses the Collegiate Assessment of Academic Proficiency (CAAP) to, among other things, “compute student gains between taking the ACT as high school students and the CAAP as rising juniors. Gains are computed by comparing student scores on the two exams and then categorizing students as making lower than expected progress, expected progress, or higher than expected progress.”22 If learning assessments such as these are to be included as an adjustment factor in a productivity measure, the test selected needs to be one that has national norms. Accreditation is moving in this direction. The Accreditation Board for Engineering and Technology accredits over 3,100 programs at more than 600 colleges and universities worldwide.
A complete monitoring of the value added to students by degrees earned would also require an initial baseline to capture an incoming student’s level of preparation. One approach to assessing student learning would be to encourage institutions to implement data systems for cataloguing tests and test results of their graduates. To some extent, these test-based assessments presume that development of cognitive skills is the goal. However, returns to noncognitive skills—for example, improved networking, social skills, citizen awareness of civic responsibility—may also be a goal but do not appear in many efforts to assess learning. Much more thought will have to be given to appropriate characteristics of tests and how they should be applied and interpreted. Existing high-stakes exams should obviously be utilized to the extent possible. Pilot tests could be developed to analyze validated credit hours at meaningful levels; some departments have information that would be useful in learning assessments. To accurately reflect learning patterns, these tests must carry high enough stakes to motivate students to take them seriously; such a testing system would be difficult to create.
Given that quality assessment in higher education remains very much a work in progress, the panel is not prepared to recommend a single measure of quality for use in adjusting the productivity model’s output metric. Likewise, we are not in a position to recommend specific methods for adjusting the quality of student and teacher inputs to higher education. As discussed in Chapter 4, there are promising approaches to measuring student readiness (e.g., testing) and faculty effectiveness (e.g., course sequence grade analysis). Results from this research could provide context for interpreting more narrowly defined productivity metrics and other performance measures, particularly when policy choices are being weighed.
As put forward in Chapter 4, however, we do propose that effective quality assurance systems be maintained to ensure that output quality does not decline
22For details, go to http://www.educationsector.org/sites/default/files/publications/Subcategories_49.pdf, page 1 [June 2012].
as a result of quantitative productivity measurements. These could be based on extant accreditation systems, on the methods of academic audit being used effectively in Tennessee and certain overseas venues, and on the other quality-reviewing initiatives now being conducted at the state level (see Massy, Graham, and Short, 2007). All such methods can and should make use of the growing body of quality measures that are developing. Whatever the approach, however, its usefulness for achieving the goals of this report will depend upon full transparency, which is not always maintained in the existing systems.
We believe the groundwork exists for implementing effective quality monitoring, and that a committee (that, beyond accreditors, might include representation from governors and state legislators, Congress, state governing boards, consumer advocacy groups, and college guidebook publishers) could usefully review external quality assessment, with specific reference to the contents of this report, and make recommendations about the way forward.
Recommendation (11): A neutral entity, with representation from but not dominated or controlled by the country’s existing higher education quality assurance bodies, should be charged with reviewing the state of education quality assessment in the United States and recommending an approach to assure that quantitative productivity measurement does not result in quality erosion.
This is an important recommendation: the time is right for an overarching impartial review. Failing such a review, the development of badly needed improvement in the quantitative dimension of productivity could lead to unintended negative outcomes.