8
Findings and Conclusions

The Office of Personnel Management (OPM) requested this study in preparation for reauthorization hearings, scheduled for 1991, on the troubled Performance Management and Recognition System (PMRS). Our charge was to review the research on performance appraisal and on its use in linking compensation to performance. To supplement the research findings, we were asked to look at private-sector practice as well, to see if there are successful compensation systems based on performance appraisal that might provide guidance for policy makers in reforming PMRS. We construed this charge as requiring an investigation of whether and under what conditions performance appraisal in the context of merit pay systems could assist the federal government in managing performance, fostering employee equity, improving individual and organizational effectiveness, providing consistent and predictable personnel costs, and—not least—enhancing the legitimacy of public service.

The Civil Service Reform Act (CSRA) of 1978 provides the backdrop for this study. That act required the development of job-related and objective performance appraisal systems, the results of which were to be used as a basis for training, promotion, reduction in grade, removal, and other personnel decisions. The act also created performance-based compensation systems for middle and senior managers. Designed to revitalize the civil service, in part by bringing private-sector management strategies to the federal bureaucracy, the reforms have by most measures fallen short of expectations, despite fairly substantial midcourse corrections. Yet the belief in merit principles remains strong, as does the expectation that performance appraisal and linking compensation to performance can provide incentives for excellence.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay 8 Findings and Conclusions The Office of Personnel Management (OPM) requested this study in preparation for reauthorization hearings, scheduled for 1991, on the troubled Performance Management and Recognition System (PMRS). Our charge was to review the research on performance appraisal and on its use in linking compensation to performance. To supplement the research findings, we were asked to look at private-sector practice as well, to see if there are successful compensation systems based on performance appraisal that might provide guidance for policy makers in reforming PMRS. We construed this charge as requiring an investigation of whether and under what conditions performance appraisal in the context of merit pay systems could assist the federal government in managing performance, fostering employee equity, improving individual and organizational effectiveness, providing consistent and predictable personnel costs, and—not least—enhancing the legitimacy of public service. The Civil Service Reform Act (CSRA) of 1978 provides the backdrop for this study. That act required the development of job-related and objective performance appraisal systems, the results of which were to be used as a basis for training, promotion, reduction in grade, removal, and other personnel decisions. The act also created performance-based compensation systems for middle and senior managers. Designed to revitalize the civil service, in part by bringing private-sector management strategies to the federal bureaucracy, the reforms have by most measures fallen short of expectations, despite fairly substantial midcourse corrections. Yet the belief in merit principles remains strong, as does the expectation that performance appraisal and linking compensation to performance can provide incentives for excellence.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay Policy makers already have extensive documentation of the problems and employee dissatisfactions with the Merit Pay System (MPS) and the successor PMRS: consistent underfunding of the merit pool, the lag of merit salaries behind the salaries of employees still under the General Schedule, the widely held and annually reinforced belief that federal salaries have fallen far behind their private-sector equivalents, and the perceived politicization of the civil service and the merit pay system that seemed to be an outgrowth of the Civil Service Reform Act. This study is intended to supplement that knowledge and experience with information drawn from the private sector, beginning with a systematic investigation of the research on performance appraisal and pay for performance systems and including an assessment of private-sector practices in the years since the passage of the Civil Service Reform Act. We began the report with a cautionary note about the difficulties inherent in trying to measure social phenomena in general, and about the particular evidentiary obstacles presented by the subject at hand (Chapter 3). Our research has taken us into the literature of a variety of disciplines as we tried to piece together from fragmentary evidence the best possible scientific understanding of the adequacy of performance appraisal as a basis for making personnel decisions and of the effectiveness of using pay to improve performance. Investigation of the effects of linking compensation to performance led us from the question of individual effectiveness to organizational effectiveness and required an examination of both merit and variable pay plans. Recent research trends also broadened the scope of the study beyond measurement instruments and appraisal processes to an examination of context and the attempt to identify conditions under which performance appraisal and merit plans operate best. In the course of our investigations it became clear that the theoretical and empirical literatures have posited at least four different types of benefits in discussing performance-based pay systems: (1) positive effects on the work behaviors of individual employees (including decisions to join an organization, attend, perform, and remain); (2) increased organization-level effectiveness; (3) facilitating socialization and communication; and (4) enhancing the perceived legitimacy of an organization to important internal and external constituencies. We have been ecumenical in pulling together evidence and information that speak to these criteria for gauging the effectiveness of an organization's performance appraisal and pay systems. The preceding pages have taken account of theory, empirical research, and clinical studies not only from many disciplines, but also from any research topics that seemed relevant. The formal evidence has been supplemented with information about current practices in private-sector firms. The study's findings and conclusions are presented in this chapter as follows. The first section deals with the science and practice of performance appraisal, focusing first on measurement research, then on applied research, and ending with overall findings and conclusions. The second section covers

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay performance-based pay systems, focusing first on evidence from research, then on findings from practice, and again ending with overall findings and conclusions. The third section deals with the influence of context on performance appraisal and merit pay systems. The fourth section deals with the implications of the study's findings and conclusions for federal policy making. I. THE SCIENCE AND PRACTICE OF PERFORMANCE APPRAISAL The evaluation of workers' performance is directed toward two fundamental goals. The first of these is to create a measure that accurately assesses the level of an individual's performance on something called the job. The second is to create a performance measurement system that will advance one or more operational functions in an organization: personnel decisions, compensation policy, communication of organizational objectives, and facilitation of employee performance. Although all performance appraisal systems encompass both goals, the two are represented in the literature by two distinct, albeit overlapping, lines of development in theory and research. In part the difference in approach to performance appraisal reflects disciplinary orientation, in part historical development. One approach grows out of psychometrics and the measurement tradition, with its emphasis on standardization, objective measurement, psychometric properties (validity, reliability, bias, etc.). The other comes from the more applied fields—human resource management, industrial and organizational psychology, organization science, sociology—and focuses on the organizational context and the usefulness of performance appraisal for such things as promoting communication between managers and employees; clarifying organizational goals and performance expectations; providing information for managers to guide retention, dismissal, and promotion decisions; informing performance-based pay decisions; and motivating employees. Both research fields are interested in the use of rating scales to evaluate job performance, although they have tended to focus on different questions and have different expectations of performance appraisal. At the risk of overemphasizing the distinctions, we have presented our discussion in this report in two parts, one focused on the measurement research, the second on the applied research. It is, however, a matter of general orientation, not unrelated polarities. Of the two goals, accuracy and organizational utility, most of the research in the measurement tradition has concentrated on aspects of accuracy, the implicit assumption being that if the measures are accurate, the functional goals will be met. Research in the more applied fields tends to focus not on the measurement instrument and the accuracy of inferences drawn from the measurement, but on the whole operational system of which it is a part. The applied or management perspective tends to evaluate the performance measurement component by how well the whole operates, e.g., whether the system distributes pay as it was

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay designed to, whether the system is accepted by all players. Accuracy of performance measurement tends to be ignored, not because it is considered unimportant, but because it is assumed, at least implicitly, that if the system-level criteria are met, then the measurement component must be sufficiently accurate. Apart from our own convenience in presenting findings from the measurement and applied traditions separately, it is important that federal policy makers, managers' groups, and employees understand these differences and tailor their language and expectations appropriately. Current federal policy is couched in the language of the measurement tradition. In the manner of the 1978 Uniform Guidelines on Employee Selection Procedures, which elaborates the requirements of Title VII of the Civil Rights Act of 1964, Office of Personnel Management regulations implementing the Civil Service Reform Act of 1978 called on federal agencies to develop job-related and objective performance appraisal systems. The regulations required that performance standards and critical job elements be specified consistent with the duties and responsibilities outlined in an employee's position description. OPM suggested that performance standards be based on a job analysis to identify the critical elements of a job, and that each agency develop a method for evaluating its system to ensure its validity. Although courts have not demanded of performance appraisal systems the degree of rigor required of tests and other selection instruments, the terms validity, objectivity, and job-relatedness are all drawn from the context of psychological testing and performance measurement. The Measurement Tradition Psychometrics grows out of the theory of individual differences, namely, that humans possess characteristics and traits (e.g., height, verbal ability, upperbody strength); that each possesses these characteristics in some amount; and that the amounts can be measured. Drawing on findings in the biological sciences about the distribution of characteristics in a given plant or animal population, the founders of psychological measurement developed statistical techniques for expressing human mental characteristics and for relating the standing of one individual to that of a population of individuals. From the beginning, these theories and measurement techniques were thought to hold great promise for matching people to jobs and for measuring job performance. They were also particularly compatible with the concept of meritocracy and the particularly American idea that jobs ought to be allocated on the basis of talent or ability and not as a function of family connection, social class, religious persuasion, or other criteria that are irrelevant to job performance. In the realm of psychometrics, the scientific imperative is accuracy of measurement. Standardized multiple-choice tests, the most familiar type of instrument in this mode, are a product of that drive for precise measurement.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay Just as test administration can be controlled to provide a high degree of consistency and uniformity in the conditions of testing, so does the format of the tests constrain response possibilities to allow direct comparison of the performance of all test takers. Over the years a variety of sophisticated statistical analytics have been developed to evaluate the consistency of measurement (reliability analyses) and the accuracy and relevance of inferences drawn from the measurement results (validity analyses). Prior to 1980, most research on performance appraisal was generated from within the psychometric tradition. Performance appraisals were viewed in much the same way as tests: they were evaluated against criteria for validity and reliability and freedom from bias, and a primary goal of the research was to reduce rating errors. Our findings on how closely performance appraisal has been found to conform to these aspirations of measurement science follow. Research on Job Analysis Findings: Job Analysis Applied psychologists have used job analysis as a primary means for understanding and describing job performance. There have been a number of approaches to job analysis over the years, including the job element method, the critical incident method, the Air Force task inventory approach, and methods that rely on structured questionnaires to describe managerial-level jobs in large organizations. All of these methods share certain assumptions about good job analysis practices, and all are based on a variety of empirical sources of information. There is an enormous body of job analysis research, the preponderance of which has been conducted for relatively simple, concrete jobs—military enlisted jobs, auto mechanics, sales, and other jobs characterized by observable behaviors or tangible products. The literature on complex, interactive, cognitively loaded jobs, and specifically on managerial jobs, is comparatively sparse and less conclusive. With few exceptions, the analysis of managerial performance is cast at a high level of abstraction; far less attention has been given to the sort of detailed, task-centered definition typical of simpler, more concrete jobs. This global focus is reflected in managerial appraisal instruments, which typically present very broad performance dimensions for evaluation. A job may be more or less routinized, structured, and constrained by the requirements of machinery or defined by training, but the evaluation of job performance will always depend in the final analysis on external judgments about what is most important (number of units produced or quality of the

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay units produced; everyday performance or response to the infrequent emergency; single-minded pursuit of profits or avoidance of environmental damage). As a consequence, describing job performance is not a straightforward or obvious process. Even for simple jobs, it involves judgment and inference combined with careful study of the job by such means as interviews, observation, and collection of data on tasks performed and skills required. For managerial jobs, the task of adequate description becomes even more difficult, because much of what a manager does is fragmented, amorphous, and involves unobservable cognitive activities. Job descriptions and the appraisal systems based on them reflect organizational values and judgments as well as some independent constellation of job tasks and performance requirements. To speak of objectivity with regard to job analysis and performance appraisal does not imply the absence of human judgment, but rather the absence of irrelevant or inappropriate judgments. Conclusions: Job Analysis The commonly made dichotomy between objective and subjective measurement is more misleading than useful in the field of performance appraisal. Organizations cannot use job analyses or other methods of specifying critical elements and performance standards as replacements for managerial judgment; at best such procedures can inform the manager and help focus the appraisal process. The abstract character of the behaviors (e.g., leadership, oral communications, overall performance) that typifies much of the research on managerial job performance conveys a message from the research community about the nature of managerial performance and about the infeasibility of capturing its essence through lists of tasks, duties, and standards that can be objectively counted or quantified. Reliance on global measures guarantees that evaluation of a manager's performance is of necessity based on a substantial degree of judgment. An overly literal interpretation of the requirements of the Civil Service Reform Act—taking job-related to mean job-specific, or treating objective as the opposite of judgment, would be particularly destructive for managerial appraisal. Research on Psychometric Properties Reliability Reliability analysis provides an index of the consistency of measurement, from occasion to occasion, from form to form (if there are several versions of a test or measure that are all intended to measure the same thing), or from rater to rater. The first- and last-mentioned types of reliability analysis are particularly pertinent to performance appraisal. If the measurements are to

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay have any meaning, one would expect the rater to reach the same judgment from one week to the next (assuming the employee's performance did not change significantly), just as one would hope that several raters would reach substantially the same decision about a single individual's performance. Data on reliability derive in part from operational settings and in part from laboratory experiments or from research projects undertaken in field settings, using special rating instruments developed for the purpose and administered with the proviso that no operational decisions will be based on the results. Findings: Reliability There is substantial evidence in the research literature to support the premise that supervisors are capable of forming reasonably reliable estimates of their employees' overall performance levels. For the mostly nonmanagerial jobs studied over the years, raters show substantial agreement in rating workers' performance. There is also some data showing interrater agreement on managerial performance. It is important to remember, however, that consistency among raters cannot be taken simply at face value as proof of the accuracy of performance appraisal procedures; it can also cloak systematic bias or systematic error in valuing performance. Systematic bias is difficult to detect, the more so if it is the product of unexamined views and conventional assumptions. There is evidence of such bias, fragmentary but suggestive, in a small number of studies showing that white supervisors tend to rate white employees as a group somewhat higher than black employees and, conversely, that black supervisors rate black employees higher on average. The studies have not been able to distinguish between real performance differences and rater bias but suggest the presence of both, although the variance accounted for by bias appears to be quite small. Validity From the psychometric perspective, the central question posed by any measurement system is whether it produces an accurate assessment of relevant performance. Validity is the technical term used to refer to the degree of accuracy and relevance that characterizes a measurement procedure. It is not meant to imply a static characteristic of a test or rating scale; rather, the term has to do with the structure of meaning that can be built up to support the assessment results. Validity, therefore, is an accretion of evidence from many sources; it describes a research process that gradually lends confidence to the interpretations or judgments made on the basis of the measure. In the realm of job performance, validation begins in an important sense with an analysis of the job or category of jobs for which performance measures are to be developed. If an employment test or appraisal system can be linked to important aspects of the job—say typing accuracy and speed or a sonar

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay technician's skill at recognizing patterns—then one building block is in place. The evidence of interrater reliabilities described above can provide another sort of clue to the accuracy of measurement systems like performance ratings, hands-on job sample tests, and other procedures that depend on an observer to judge the performance. Statisticians and psychometricians have developed an array of sophisticated statistical methods to explore the relationships between the test or measure under study and other relevant variables (correlational and regression analysis, multivariate analysis and ANOVA techniques). Findings: Validity Performance appraisal does not lend itself to the full complement of validation strategies that have been found useful for standardized tests. Criterion-related validity, for example, is rarely as useful for evaluating performance appraisals as it is with selection tests. The strength of the approach lies in showing that a healthy relationship exists between, say, test results and some independent, operational performance measure (e.g., college admissions test and grade-point average). When the measure being validated is itself a behavioral measure, it is difficult to find relevant operational measures for comparison that have the essential independence. As a consequence, what is frequently considered a compelling type of evidence in validation research is usually not possible for performance appraisals. Furthermore, in those limited conditions in which independent criteria do exist, the jobs themselves tend to be much more simple and straightforward than those for which appraisals are typically used. It is, however, possible to compare performance appraisals to other measures of job performance using the conventional statistical methods of psychometric analysis. Recent military job performance measurement research, for example, demonstrated moderate correlations between supervisor ratings and each of the other types of criterion measures developed (hands-on test scores, training grades, written job knowledge tests), which lends credibility to the claim that carefully developed performance appraisals can bear a meaningful degree of relationship to actual job performance. Supervisor ratings have been used in thousands of studies designed to examine the power of cognitive and other ability tests to predict job performance—in other words, they have been used to validate employment tests. These studies consistently show a low to moderate observed correlation between employment tests and supervisor ratings; job incumbents who score well on the test tend also to receive good ratings and those with low test scores tend to be rated as mediocre performers. While admittedly circular, this relationship provides further indirect evidence that supervisors can rate their employees with some degree of (but by no means perfect) accuracy; whether they will do so in an operational setting is another matter.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay Scale Characteristics A wide variety of rating scale formats, defining performance dimensions at varying levels of specificity, exist. Commonly used rating dimensions include personal traits (e.g., initiative, leadership, perseverance), job behaviors (e.g., follows safety procedures in engine room, financial management, interpersonal relations), and performance results (e.g., quality of work, quantity of work). The number of scale points has ranged as high as 11, but most appraisal scales have between 3 and 5. In terms of scale format, a general distinction can be made between scales that include specific behavioral examples of good, average, and inadequate performance and those that do not. The latter, called graphic scales, simply list the dimension of interest and present a number of scale points along a continuum. The scale points, or anchors, can be numerical or adjectival (e.g., consistently superior, average, consistently unsatisfactory). Behaviorally anchored rating scales (BARS) were developed to reduce some of the rating error typical of graphic scales. Proponents thought that BARS would help to clarify the meaning of the performance dimensions used and would help calibrate various raters' definition of what constitutes superior, average, and unsatisfactory performance on the dimension. It was also felt that the behavioral descriptions would discourage the tendency to rate on broad, general traits by focusing attention on specific work behaviors. Mixed standard scales, also behaviorally based, went one step further in trying to control rater error, particularly bias and leniency. These scales present the behavioral descriptions in random order and not in conjunction with a particular performance dimension. The rater's responses are computed by someone else into a performance score for each dimension measured. Findings: Rating Format Reviews of the relevant research suggest that behaviorally based scales have not met early expectations. Although the research findings are not entirely consistent, the consensus seems to be that scale formats have relatively little impact on psychometric quality, when impact is indexed by interrater agreement, rater errors, and convergent and discriminant validity of ratings. In other words, the use of behavioral versus nonbehavioral language and the physical arrangement of the scale do not appear to be critical in terms of the validity of the overall judgments about performance.1 1   A weakness in the comparative research on rating approaches and formats, however, was noted by Landy and Farr (1983). It is, namely, that in many studies the scales compared were actually developed in the same way. The performance dimensions and behavioral examples were developed according to BARS methodology. This means that only the presentation modes were actually compared. Many authors have also pointed to the lack of rigor in the selection and scaling of anchors, which suggests that the final word has not been spoken on the merits of behavioral approaches to rating scales. It is also the case that the choice of approach (traits or behaviors) and format (BARS or graphic format) may make a difference in the usefulness, if not the accuracy, of the ratings. Scales containing specific behavioral examples may be more useful for providing feedback to employees; trait scales may be more useful for ranking those rated.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay This proposition is given support by the research on the cognitive processes involved in performance appraisal done in the 1980s. This body of research suggests that the distinction between behaviors and traits is not as salient as once thought. Raters appear to rely less on specific behaviors than on their general evaluation of each employee when they make ratings, regardless of the focus of the rating scale. These general evaluations substantially affect raters' memory for and evaluation of actual work behaviors. Finding: Job-Specific Versus Global Ratings In litigation dealing with performance appraisal, the courts have shown a clear preference for job-specific dimensions. There is little research that directly addresses the validity of ratings obtained on job-specific, general, or global dimensions. Indirect evidence suggests that raters may work at the global level in any case. First, there is the evidence from the research on cognitive processes mentioned in finding number 2 above. In addition, there is a substantial body of research on halo error in ratings that shows that raters do not, for the most part, distinguish between conceptually distinct aspects of performance in rating their workers. This suggests that similar outcomes can be expected from rating scales that use global or job-specific performance dimensions. Finding: Number of Scale Points or Anchors The weight of the evidence suggests that the reliability of ratings drops if there are fewer than 3 or more than 9 rating categories. Recent work indicates that there is little to be gained from having more than 5 response categories. Within that range (3 to 5), there is no evidence that there is one best number of scale points in terms of scale quality. Conclusion: Psychometric Properties The combination of research on job analysis, research on the reliability of appraisal results, and the direct and indirect evidence of a modest relationship between performance ratings and other sorts of measures (employment tests, other measures of job performance) leads us to conclude that the performance appraisal process, while by no means high-precision measurement, can achieve moderate levels of accuracy within the assumptions of the measurement tradition.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay The Applied Tradition The focus of psychometric theory and research tends to be on the rating instrument, its measurement properties, and standardization of raters to reduce error. Researchers in the organizational sciences and human resource management tradition, which is more attuned to applied settings and operational systems, concentrate more on the appraisal system and how it functions to serve organizational ends. From this point of view, performance ratings are not the equivalent of testing technology, and the concentration of research energies on questions of job analysis, scale development, scale format, and measurement precision is misguided. There are others closer to the measurement tradition who also have begun to feel that the psychometric lines of inquiry have become arid and are unlikely to bring about large additional improvements in the way performance appraisals are used in organizations (Banks and Murphy, 1985; Ilgen et al., 1989). A number of industrial psychologists in the last decade have begun to move away from the traditional view of performance appraisal as a measurement problem; rather than treating it as a measurement tool, they have begun to look on performance appraisal as a social and communication process (Murphy and Cleveland, 1991). Although such scholars do not reject the idea of accuracy, they tend to take a more commonsense approach, talking of the ''relevance" of the appraisal to job performance, and to concentrate much more on the contextual factors that support or distort appraisal systems. From this perspective, the interesting research questions about performance appraisal systems are whether they enrich managerial judgment and improve employee understanding of organizational goals and standards of performance; encourage more communication between managers and employees; communicate a sense of equity and fair play in the distribution of rewards and penalties by making visible the grounds of these decisions; and enhance employee trust and acceptance. While none of these questions can be divorced from the accuracy-validity issues, the answers tend to be sought in evidence of system-level outcomes. Research on the effectiveness of performance appraisal looks at such questions as employee attitudes toward the system, the degree to which it serves individual needs (feedback, employee development) or organizational needs (communication of mission, meritocratic principles), and the degree to which it enhances (or destroys) cohesion in the work unit or organization. And, as many of these points of emphasis indicate, there is a great deal of emergent interest in the organizational context in which appraisals occur. Although this reorientation is quite recent among applied psychologists, our review of the literature included several bodies of research in organizational psychology and management science that contribute to an understanding of how appraisal systems function as part of an organization's performance management system. These include: (1) performance appraisal and motivation, (2)

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay Findings: Regulating Labor Costs Although economic models provide a conceptual basis for understanding the potential trade-offs between cost and performance and some of the contextual factors that might be presumed to favor one pay policy over another, the research on cost regulation and the cost-benefit trade-offs associated with pay for performance plans is sparse and limited to production jobs and manufacturing settings. We have no evidence that any particular pay for performance plan is superior to another in regulating labor costs. FINDINGS FROM PRACTICE Our review of private sector practices revealed that pay for performance is an important part of compensation philosophy and the overwhelming choice of U.S. private-sector firms. Merit plans are almost universally used for managerial and professional employees (95 percent); variable pay plans are much less frequently used (between 16 and 40 percent, depending on the type of plan), but increased competition worldwide appears to be kindling interest in them. Our interviews with personnel managers of five Fortune 100 companies indicated that merit plans are viewed primarily as a means of guiding managers' decisions about pay increases in a way that is consistent with a meritocratic personnel philosophy—that is, it ensures that pay increases are, at least in part, tied to individual contributions, and that the increases are consistently distributed to employees in a way that is fair and predictable. This strong attachment to a meritocratic ethos explains the predominance of merit pay plans in the private sector. Merit plans are the only pay for performance plans currently used that base pay increase decisions on the combination of individual contributions (skills, experience, and performance) that are the foundation of a meritocratic philosophy. The personnel managers interviewed noted that a major benefit of performance appraisal and merit pay was the identification of top and bottom performers. They emphasized the flexibility of private-sector managers to bring top performers into a job at any position in the pay range, and the comparative ease of dismissing those who cannot meet company performance standards. Surveys indicate that organizations do not evaluate the effect of merit plans on performance, but rather focus on employee perceptions of plan fairness and workability and of the link between pay and performance. The personnel managers interviewed also emphasized the importance of communicating merit pay increases as part of an overall pay system and a meritocratic personnel philosophy. For example, most of these managers emphasized the competitiveness of base pay and benefits and the general excellence

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay of the company and work force in their pay communications to employees. Notable, also, is that most of these managers said that their organizations did not share specific pay information—such as average annual increase percentages, market competitors and wage survey methods, the organization spectrum of pay ranges—with employees. This is in contrast to the federal meritocracy in which employees appear to have information about their pay from many different (and conflicting) sources. In contrast to the nearly universal presence of merit pay plans, our survey reviews revealed that less than 40 percent of private-sector firms have bonus plans for middle managers; less than 20 percent have gainsharing or profit-sharing plans in place. Baseline data for the frequency and distribution of specific plans is difficult to obtain, but there appears to be some increase in interest in these plans and in their application to groups of employees not traditionally covered. There are a limited number of surveys on the use of group incentive plans. They report that most organizations adopt these plans to improve productivity and financial outcomes and, more generally, to ''revitalize the organization consistent with business strategy." These same surveys report that organizations that have adopted these plans believe that they have achieved the desired effects, but also acknowledge the importance of contextual factors such as employee involvement, information sharing, and ongoing marketing and communication to the employees covered. One survey acknowledged that design and implementation costs were high. None of these surveys reported employee perceptions about the equity or efficacy of variable pay plans. PERFORMANCE-BASED PAY SYSTEMS: OVERALL FINDINGS AND CONCLUSIONS Taken together, the evidence from research and practice suggests the following findings and conclusions about the effects on individual and organizational performance of pay for performance plans. Findings: Individual Performance The evidence on the effects of pay for performance, pieced together from research, theory, clinical studies, and surveys of practice, suggests that, in certain circumstances, variable pay plans produce positive effects on individual job performance. There is insufficient research to determine conclusively whether merit pay can enhance individual performance or to allow us to make comparative statements about merit and variable pay plans.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay Conclusion: Individual Performance We nevertheless infer that merit pay can have positive effects on individual job performance, on the basis of analogy from the research and theory on variable pay plans. These effects might be attenuated by the facts that, in many merit plans, increases are not always clearly linked to employee performance, agreement on the evaluation of performance does not always exist, and increases are not always viewed as meaningful. However, we believe the direction of effects is nonetheless toward enhanced performance. Finding: Organizational Performance There is some evidence from the private sector suggesting that gainsharing plans are associated with improved organizational performance. However, it is not possible from existing research to conclude that these plans cause performance changes, to specify how they do so, or to understand how the behavior of individuals under these plans aggregates to the organization level. III: THE IMPORTANCE OF CONTEXT Our reviews of performance appraisal and merit pay research and practice indicate that their success or failure will be substantially influenced by the broader features of the context in which they are embedded. Research on performance appraisal has recently turned to organizational factors that might support or hinder the appraisal system from functioning as intended. Research on pay plans stresses the context of the organization's personnel system, technological systems, and strategic goals. Overall Findings There is a broad consensus among practitioners—as well as some research evidence—that personnel systems in general and performance appraisal and pay systems in particular must exhibit "fit" or congruence to be effective. Three categories of contextual factors of particular relevance to performance appraisal and pay for performance emerged from our reviews of research and practice: (a) the nature of the organization's work, or what might be called technological fit; (b) the broad features of the organization's structure and culture; and (c) external factors such as economic climate, the presence of unions, and legal or political forces exerted by external constituents. Technological Fit The strongest evidence on congruence has to do with the fit between appraisal and pay systems and the nature of work. The literature on the

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay links between pay and individual motivation, for example, demonstrates the importance of job independence, concrete and easily measured products, and production standards that are perceived as fair (doable) to effective individual incentive pay plans. Only a limited number of jobs, mainly in some executive, sales, and manufacturing work, have proved to be amenable to this sort of performance measurement and incentive pay. Conversely, it has been shown that using highly specific individual performance appraisals and incentives with jobs that are complex, interdependent, and have multiple and amorphous goals can result in employees' ignoring important aspects of their jobs or distorting performance in order to meet the appraisal goals. This sort of gaming is a particular danger with objectives-based appraisal systems. Group incentives avoid some of the problem. They recognize the interdependent nature of work and focus on organization-level performance. However, they suffer from unclear links between individual actions and organization-level results. Organizational Structure and Culture Although there is little systematic evidence to suggest precisely what the congruence of pay system and organizational culture looks like, there is a growing body of case studies that look at organizational structure and culture, particularly studies of high-commitment organizations and of organizational innovation. The business policy literature, for example, describes two archetypal strategic postures—the dynamic firm and the steady-state firm—and the performance appraisal and pay systems that appear to go along with each. Firms pursuing innovation and growth tend to offer their employees a higher proportion of their pay in the form of incentives than do firms in steady state. The more entrepreneurial firms tend to evaluate their managers and professionals on quantitative, organization-level performance goals and to offer high payouts if strategic goals are met. Studies of organizational structure confirm this pattern. They describe the entrepreneurial firm as emphasizing general skill, higher investment in recruiting than training, and performance measures tied to market outcomes. Retention is not a primary management goal. Firms pursuing a maintenance strategy tend to evaluate managers on more qualitative, individual behaviors. Their personnel practices emphasize internal skill development, the importance of work force norms, and the employee's long-term contribution. Such firms would seem to be well served by traditional performance appraisal and merit pay plans. There are also theoretical literatures that suggest that organizations in highly institutionalized sectors or that rely greatly on public trust may be more likely to adopt very formal, precise performance appraisal systems. In such organizations, personnel and pay systems can have an important legitimizing function. There is a considerable literature that supports these general patterns of

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay association between performance appraisal and pay systems on the one hand and organizational strategy and structure on the other. However, all of this work is theoretical or descriptive and should be viewed as suggestive, but not necessarily generalizable. External Forces The final dimension of congruence has to do with external factors that constrain an organization's choice of evaluation and pay systems. One of the most relevant to federal policy makers is the widespread resistance of unions in the private sector to performance appraisal and pay for performance systems. Most surveys show that unionized employees are far less likely than nonunionized employees to be covered by incentive systems (including merit plans). To the extent that this changed in the 1980s, the incentive pay arrangements accepted by unions (e.g., profit-sharing) were not ones that differentiate among individual employees. Also of particular salience to the issue of pay for performance is the role of external laws and regulations. Fair labor standards, occupational health and safety, and equal employment opportunity are a few of the areas of law that prescribe internal structures, policies, and procedures that may be more or less compatible with an organization's chosen evaluation and pay systems. Federal equal employment opportunity policy has had an enormous impact on personnel management in every organization of any size in the nation. In addition to these requirements, the federal government as an employer faces a set of constraints imposed by the laws and regulations surrounding its merit system. The desire to shield civil servants from the exigencies of politics has placed serious constraints on the managerial flexibility needed to make pay for performance work. IV. IMPLICATIONS FOR FEDERAL POLICY Since its formal adoption by the federal government, performance appraisal for merit pay has been a matter of continuing controversy and periodic amendment. One view of this experience is an explicit criticism of the federal government and its inability to "get right" what is now widely used in the private sector with (at least) less criticism. While there are many features of the merit pay system that could be improved, we do not attribute these failings to mismanagement or stupidity in implementation. Instead, we would emphasize the constraints, many of which derive from features unique to the federal sector. The federal government faces special, if not entirely intractable, problems that work against any easy transferability of private-sector experience. The very term merit pay carries far more meaning in the context of a public civil service than in the private sector—above all, the absence of partisan political considerations in the determination of pay levels of career employees. Where

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay private-sector practice relatively easily accepts manager-employee exchanges about performance objectives, both individual and organizational, such a practice in the public sector could be perceived as opening the civil service to partisan manipulation. Hence, one of the most difficult questions facing federal policy makers is whether and how the experience of private-sector organizations with performance appraisal and pay for performance plans is applicable to civil service organizations. The portrait of high-commitment organizations that emerges from case studies highlights some fundamental differences between private firms in which performance-based pay seems to work well and the typical government agency. In high-commitment organizations, the following conditions appear to obtain: Pay for performance would be one part of a total management system, which provides full financial and organizational support for effective administration of the plan; The organization would be characterized by an emphasis on managerial discretion and flexibility and by the recognition that individual managerial authority is critical to effective performance appraisal; The climate would be characterized by shared values and high levels of trust throughout the organization; On the basis of those values, the ability to link individual performance and activities to organizational goals and objectives would be strong; There would be widespread agreement about individual and organizational standards of success; and There would be low turnover at the managerial levels. Most of these conditions pose a problem for public-sector organizations because of the division of leadership between the political and career employees; the lack of managerial control over personnel and resource systems; the ambiguity of goals and performance criteria; and multiple authority centers for employee accountability. The very publicness of government creates organizations that are at once more open to external influences and less able to respond to them. These conditions have led to a working environment in which managers are frustrated in their ability to make personnel decisions and employees are distrustful of the performance appraisal and pay allocation systems—most do not see a link between their performance and their pay. The issue of divided leadership provides a particularly salient example of the inherent difficulties of creating a successful merit pay system in the federal context. A continuing theme in modern government has been the need to make the bureaucracy more responsive to the chief executive. One tool available to presidents is appointing employees to positions outside the career civil service. But if the presence of political executives in leadership positions in federal agencies institutionalizes the continuing mandate for change, the authority and

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay communication structures within those agencies often create obstacles to change (Ingraham, 1987). For example, the "dual executive" characteristic of many public agencies tends to create a system in which decisions are made according to short-term policy goals at the upper levels of the organization and according to longer-term program goals elsewhere. In many ways federal agencies function as two loosely coupled organizations with authority, control, and communication between them much more tenuous than prescribed by the classic paradigm. Even if the policy goals were not so often diffuse, unclear, and contradictory (Heclo, 1978; Ingraham, 1987), the ability to communicate them to the career bureaucracy is attenuated by the lack of experience and short tenure of many political executives (Heclo, 1978). All too often, in the judgment of experts in federal management, organization-wide goals are either not articulated or are not communicated down through the organization to the career employees responsible for their implementation. Functioning with two sets of managers makes congruence and coherence hard to achieve. In most models of organizational fit, there is a single leadership that creates a coherent culture and shared values that are necessary conditions to enable a successful performance appraisal system. The issue of organizational boundary (at which the controlling influences shift from internal to external actors), particularly as it relates to the ability to control or direct organizational resources, is also a central concern. Many have observed that public organizations are notable for the porosity of their boundary (Waldo, 1971; Kaufman, 1978; Gawthrop, 1984). The federal government has been structured deliberately to disburse authority among competing institutions (Allison, 1983); members of Congress, administration officials, interest groups, concerned citizens, and others can, and do, influence bureaucratic actors. This further obfuscates goals and objectives within the organization. Of equal significance is the fact that many of these external influences, but most notably the Congress, have a controlling influence on the resources available to the organization, thus further complicating the authority issue. Other institutional influences that profoundly shape federal agencies and their activities include civil service laws and regulations that impose great complexity and rigidity on the system. Recruiting, testing, hiring, firing and rewarding are all constrained in the federal government (National Academy of Public Administration, 1983). As a result of these externally imposed constraints, managerial discretion has traditionally been limited and has, in fact, been discouraged by the provisions of the merit system (Ingraham and Rosen-bloom 1990). Although there is emerging evidence that some federal managers do use whatever flexibilities that are available, including those provided by existing performance appraisal systems, there is also strong evidence that procedural constraints deter all but the strongest of heart (unpublished document, U.S. General Accounting Office, 1990). A frequently cited example of the boundary problem is demonstrated by

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay the fact that Congress retained statutory control over development of the federal government's performance appraisal system, rather than delegating both the development and implementation components to the Office of Personnel Management. The rationale was to balance managerial discretion with employee rights in the context of a system that made it easier for agencies to fire incompetent employees; the result was to hobble the decision making of managers. On one hand, Civil Service Reform Act legislation provided the requirement for detailed performance appraisal standards that could be used by managers as proof of unsatisfactory performance. On the other hand, the managers' ability to act regarding unsatisfactory performance was limited in the statute by providing employees with strong substantive rights, such as the opportunity to improve before an unacceptable performance action can be taken and the ability to appeal performance appraisal ratings both within the agency and externally to the Merit Systems Protection Board. This has led to situations in which, at best, a number of years are required to release an inadequate employee, and the costs borne by managers serve as a strong disincentive against appraising mediocre performance accurately. Another feature of the federal context that warrants consideration is whether the dominant motivations among employees are comparable to those of private-sector workers who work where pay for performance has been implemented. Although there has been a long tradition of simply applying private-sector motivation theory and techniques to the public sector, some recent studies are finding different sources for motivation and different motivational patterns among public employees. Perry and Wise (1990) explore the role of public service as a motivator; Rainey (1990) documents a fairly consistent pattern of differences in public and private managers in relation to money, job satisfaction and security, and organizational commitment. In a 1982 review article, Perry and Porter noted that public-sector employees had higher achievement needs and tend to value economic wealth less than do entrants into the private sector. Furthermore, there is some evidence that public managers, particularly those at the highest levels of the organization, are keenly attuned to public perceptions of their effectiveness and the overall usefulness of the policies and programs they administer (Ingraham and Barrilleaux, 1983). Federal Employee Attitude Surveys in 1979 and 1980 demonstrated that upper-level managers perceived generalized "bureaucrat bashing" as a personalized attack. More recent studies by the Merit Systems Protection Board (1989) and the U.S. General Accounting Office (1987) indicate that managers continue to tie their overall job satisfaction to their perceptions of "appreciation" by the public. These findings suggest that policy makers would do well to give their attention to nonmonetary motivators in concert with their plan to strengthen the ties of pay to performance. Finally, one of the most important contextual factors that governs how any new performance appraisal or pay for performance system is likely to function

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay is the less than satisfactory experience of federal employees with the merit pay systems implemented during the last 12 years. CONCLUSIONS We have conducted a wide-ranging study of performance appraisal and pay for performance in the private sector to help the director of the Office of Personnel Management and other federal policy makers as they rethink the Personnel Management and Recognition System. What we have learned does not provide a blueprint for linking pay to performance in the federal sector or even any specific remedy for what ails PMRS. Instead, we conclude with some general suggestions about priorities. Performance appraisal ratings can influence many personnel decisions, and thus care in the development and use of performance appraisal systems is warranted. There is, however, no obvious technical (psychometric) solution to the performance management issues facing the federal government. Further refinements in the technology of performance appraisal (e.g., extensive new job analysis, modifications of existing rating scales or rater training programs) are unlikely to provide substantially more valid and accurate appraisals than those currently in force, particularly for managerial and professional jobs. There is also no evidence that one particular appraisal format is clearly superior to all others. For example, we do not know that the objective-based format for managerial appraisal, so popular in the private sector, yields more (or less) valid appraisals than the supervisory ratings used in the government. There appears to be at least as much effort expended on performance appraisal in the federal government as elsewhere. More generally, the pursuit of further psychometric sophistication in the performance appraisal system used in the federal government is unlikely to contribute to enhanced individual or organizational performance. Where performance appraisal is viewed as most successful in the private sector, it is firmly embedded in the context of management and personnel systems that provide incentives for managers to use performance appraisal ratings as the organization intends. These incentives include managerial flexibility or discretion in rewarding top performers and in dismissing those who continually perform below standards. When performance appraisal ratings are used to distribute pay (as in a merit plan) the size of the merit pay offered allows managers to differentiate outstanding performers from good and poor performers, and thus provides them with incentives to differentiate. For example, top performers may receive 10 percent of their base salary in merit pay, good performers, 5 percent, and poor performers, no merit increase. Finally, managers are themselves assessed on the results of their performance appraisal activities.

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay We have been struck by the apparent contrast between incentives for private and federal managers to use performance appraisal and merit plans effectively. Whatever incentives there are for federal managers seem currently dwarfed by the disincentives. In order to motivate employees and provide them incentives to perform, a merit plan or any pay for performance plan must theoretically (a) define and communicate performance goals that employees understand and view as doable; (b) consistently link pay and performance; and (c) provide payouts that employees see as meaningful. These conditions seem straightforward, and the notion of pay for performance thus becomes deceptively simple. Our reviews of research and practice indicate, however, that selecting the best pay for performance plan and implementing it in an organizational context so that these conditions are met is currently as much an art as a science. We cannot generalize about which pay for performance plans work best—especially for the federal government, with its considerable organizational and work force diversity. We can suggest that, given this diversity and the importance of matching pay for performance plans to organization context, federal policy makers consider: Decentralizing the design and implementation of many personnel programs, including appraisal and merit pay programs, within the framework of central policy guidelines and to the extent possible given the government's legitimate concerns about facilitating interagency mobility, standardization and comparability, and equity. Supporting careful, controlled pilot studies of a variety of pay for performance systems in a variety of agencies. These studies would serve to identify important design, implementation, and evaluation issues for users, policy makers, and the research community, along with incentives to investigate these issues. They could take a variety of forms, but to be useful must provide careful measures of preand postintervention conditions. Ensuring fair and equitable treatment for all employees is an important objective of any personnel system. Yet the heavily legalistic environment surrounding the federal civil service has led to dependence on formal procedures and an elaboration of protections, requirements, and procedures that ultimately provide powerful disincentives for managers to use personnel systems as the organization intends. Although these protections are meant to ensure employee equity, it is not clear that their proliferation provides federal employees with a greater sense of equity than seen in many private-sector organizations. Effective reform of personnel management and pay systems in the federal government may well need to be part of a more fundamental rethinking of past notions of political neutrality, merit, and their protection in the civil service. Our entire review has stressed the importance of viewing performance appraisal and merit pay as embedded in broader pay, personnel, management, and organizational contexts. For example, while by no means the only relevant

OCR for page 135
Pay for Performance: Evaluating Performance Appraisal and Merit Pay contextual factor, the issue of comparability of federal base salaries with pay for equivalent private-sector jobs may pose severe problems for the acceptance of merit pay or any other pay for performance system if the promise of recently enacted legislation proves illusory. We realize that the broader changes suggested by an analysis of context can be costly, but we suggest that making programmatic changes to the Performance Management and Recognition System in isolation is unlikely to enhance employee acceptance of the system or improve individual and organizational effectiveness significantly and, in the long run, may prove no less costly.