This report reviews the research on performance appraisal and on its use in linking pay to performance. It was written to assist federal policy makers as they undertake a revision of the federal government's system of performance appraisal and merit pay for mid-level managers, called the Performance Management and Recognition System. Specifically, the Committee on Performance Appraisal for Merit Pay was asked by the Office of Personnel Management to review current research on performance appraisal and merit pay and to supplement the research findings with an examination of the practices of private-sector employers. Our investigation expanded beyond a restricted examination of merit pay plans to include pay for performance plans more generally, as well as the organizational and institutional conditions under which such plans are believed to operate best.
THE NATURE OF THE EVIDENCE
It is important to note that this study draws on diverse bodies of evidence and information, from research as well as private-sector practice. Because the issues of interest intersect different theories, disciplines, and levels of analysis, it was necessary for us to compare, contrast, and synthesize very different kinds of evidence, which do not address precisely the same issues or even apply the same standards of proof. For that reason, not all the evidence meets the same rigorous standards of scientific proof; we have been careful to identify the type of evidence and the level of confidence we feel it merits.
On balance, we believe that a careful piecing together of the many fragmentary kinds of evidence and experiential data gives federal policy makers the best available scientific understanding of performance appraisal as a basis for making personnel decisions and of the effectiveness of using pay to improve performance.
Performance appraisal has two ostensible goals: to create a measure that accurately assesses the level of a person's performance in a job, and to create an evaluation system that will advance one or more operational functions in an organization. These two goals are represented in the literature by two distinct, yet overlapping, approaches to theory and research. The measurement tradition emphasizes standardization, objective measurement, and psychometric properties. The applied tradition emphasizes the organizational context and the usefulness of performance appraisal for promoting communication, clarifying organizational goals, informing pay-based decisions, and motivating employees.
The Measurement Tradition: Findings
Prior to 1980, most research on performance appraisal was generated from the field of psychometrics. Performance appraisals were viewed in much the same way as tests: they were evaluated against criteria for validity and reliability and freedom from bias; a primary goal of the research was to reduce rating errors. On the basis of evidence in the measurement tradition, the committee presents five major findings:
Organizations cannot use job analysis and the specification of performance standards to replace managerial judgment; at best such procedures can inform managers and help focus the appraisal process.
The evidence supports the premise that supervisors are capable of forming reasonably reliable estimates of their employees' overall performance levels. Consistency among raters, however, is not proof of the accuracy of performance appraisal procedures; it can cloak systematic error or systematic bias in valuing performance.
The accretion of evidence from many types of studies suggests that supervisors, when using appraisal instruments based on well-chosen and clearly defined performance dimensions, can make modestly valid evaluations of employee performance within the terms of psychometric analysis.
A wide variety of rating scale types (traits, behaviors) and formats (behaviorally anchored, graphic), with varying levels of specificity, exist. Recent reviews of the relevant research suggest that scale types and formats have relatively little impact on psychometric quality, as long as the dimensions to be rated are well chosen and the scale anchors are clearly defined.
The weight of evidence suggests that the reliability of ratings drops if there are fewer than 3, or more than 9, rating categories. Recent work indicates that there is little to be gained from having more than 5 response categories.
The Applied Tradition: Findings
Researchers in the applied tradition concentrate on the appraisal system and how it functions to serve organizational ends. On the basis of the evidence in the applied tradition, the committee presents two major findings:
There is some evidence that performance appraisals can motivate employees when the supervisor is trusted and perceived as knowledgeable by the employee.
There is evidence from both laboratory and field studies to support the assumption that the intended use of performance ratings influences results. The most consistent finding is that ratings used to make decisions on pay and promotion are more lenient than ratings used for research purposes or for feedback.
The search for a high degree of precision in measurement does not appear to be economically viable in most applied settings; many believe that there is little to be gained from such a level of precision.
The committee concludes that federal policy makers would not be well served by a commitment of vast human and financial resources to job analyses and the development of performance appraisal instruments and systems that can meet the strictest challenges of measurement science.
The committee further concludes that, for most personnel management decisions, including annual pay decisions, the goal of a performance appraisal system should be to support and encourage informed managerial judgment, and not to aspire to the degree of standardization, precision, and empirical support that would be required of, for example, selection tests.
PERFORMANCE-BASED PAY SYSTEMS
The label pay for performance covers a broad spectrum of compensation systems that can be clustered under two general categories: merit pay plans and variable pay plans, which include both individual and group incentive plans. Although we set out to examine merit pay plans, we found virtually no research on the effects of merit pay systems, and so extended the scope of our review to include pay for performance and compensation research generally. We also realized that the effects of performance-based pay plans on individual and organizational performance cannot be easily disentangled from the broader
context of an organization's structures, management strategies, and personnel systems. We present below our major finding and conclusion:
The evidence on the effects of pay for performance, pieced together from research, theory, clinical studies, and surveys of practice, suggests that, in certain circumstances, variable pay plans produce positive effects on individual job performance. The evidence is insufficient, however, to determine conclusively whether merit pay can enhance individual performance or to allow us to make comparative statements about merit and variable pay plans.
On the basis of analogy from the research and theory on variable pay plans, the committee concludes that merit pay can have positive effects on individual job performance. These effects may be attenuated by the facts that, in many merit plans, increases are not always clearly linked to employee performance, agreement on the evaluation of performance does not always exist, and increases are not always viewed as meaningful. However, we believe the direction of effects is nonetheless toward enhanced performance.
THE IMPORTANCE OF ORGANIZATIONAL CONTEXT
Our reviews of performance appraisal and merit pay research and practice indicate that their success or failure is substantially influenced by the organizational context in which they are embedded. Research on performance appraisal now encompasses a broader set of organizational factors; research on pay now stresses the importance of the firm's personnel system, its structure and managerial styles, and its strategic goals. Both researchers and managers acknowledge the influence of environmental conditions on organizational decisions about adopting and implementing performance appraisal, merit pay, and variable pay plans.
Three kinds of contextual factors are important. First, the strongest evidence on context has to do with the fit between a firm's appraisal and pay systems and the nature of the work it does. A firm's technologies and their pace of change influence the way the firm defines its jobs and people's performance in them. Second, there is a growing body of case studies that suggest the need for congruence between an organization's structure and culture and its appraisal and pay policies. Third, factors external to the firm, such as the economic climate, the presence of unions, and legal or political forces exerted by external constituencies can affect the success of its evaluation and pay systems.
IMPLICATIONS FOR FEDERAL POLICY
What the committee has learned in its wide-ranging study of performance appraisal and pay for performance in the private sector does not provide a blueprint for linking pay to performance in the federal sector, or even any specific remedy for what ails the federal system. The study does, however, offer some key considerations for the director of the Office of Personnel Management and other federal policy makers as they rethink the Personnel Management and Recognition System.
Although performance appraisal ratings can influence many personnel decisions, and thus care in developing and using performance appraisal systems is warranted, there is no obvious technical solution to the performance management problems facing the federal government. The pursuit of further psychometric sophistication in the federal performance appraisal system is unlikely to contribute to enhanced individual or organizational performance.
Where performance appraisal is viewed as most successful in the private sector, it is firmly embedded in a context that provides incentives to managers to use the ratings as the organization intends. These incentives include managerial flexibility or discretion in rewarding top performers and in dismissing those who continually perform below standards. When performance ratings are used to distribute pay, as in a merit plan, the size of the merit pay offered allows managers to differentiate outstanding performers from good and poor performers—providing them with incentives to differentiate.
In order to motivate employees and provide incentives for them to perform, a merit plan (or any pay for performance plan) must communicate performance goals that employees understand and consider ''doable," link pay and performance consistently, and provide payouts that employees see as meaningful. Although we cannot generalize about which pay for performance plans work best—especially for the federal government, with its considerable organizational and work force diversity—we do suggest that federal policy makers consider decentralizing the design and implementation of many personnel programs, including appraisal and merit pay programs, and supporting careful, controlled pilot studies of a variety of pay for performance systems in a variety of agencies.
Although ensuring fair and equitable treatment for all employees is an important objective in any personnel system, the heavily legalistic environment surrounding the federal civil service has led to dependence on formal procedures that ultimately provide powerful disincentives for managers to use the system as the organization intends. Such safeguards are meant to ensure equity, but it is not clear that their proliferation provides federal employees with a greater sense of equity than their private-sector counterparts. Effective reform may well need to be part of a more fundamental rethinking of past notions of political neutrality, merit, and their protection in the civil service.
Our entire review has stressed the importance of viewing performance appraisal and merit pay as embedded within broader pay, personnel, management, and organizational contexts. The larger changes suggested by an analysis of context can be costly, but we suggest that making programmatic changes to the Performance Management and Recognition System in isolation is unlikely to enhance employee acceptance of the system or improve individual and organizational effectiveness and, in the long run, may prove no less costly.
The final chapter of the report summarizes in greater detail the committee's findings and conclusions.