Findings and Conclusions
The Office of Personnel Management (OPM) requested this study in preparation for reauthorization hearings, scheduled for 1991, on the troubled Performance Management and Recognition System (PMRS). Our charge was to review the research on performance appraisal and on its use in linking compensation to performance. To supplement the research findings, we were asked to look at private-sector practice as well, to see if there are successful compensation systems based on performance appraisal that might provide guidance for policy makers in reforming PMRS. We construed this charge as requiring an investigation of whether and under what conditions performance appraisal in the context of merit pay systems could assist the federal government in managing performance, fostering employee equity, improving individual and organizational effectiveness, providing consistent and predictable personnel costs, and—not least—enhancing the legitimacy of public service.
The Civil Service Reform Act (CSRA) of 1978 provides the backdrop for this study. That act required the development of job-related and objective performance appraisal systems, the results of which were to be used as a basis for training, promotion, reduction in grade, removal, and other personnel decisions. The act also created performance-based compensation systems for middle and senior managers. Designed to revitalize the civil service, in part by bringing private-sector management strategies to the federal bureaucracy, the reforms have by most measures fallen short of expectations, despite fairly substantial midcourse corrections. Yet the belief in merit principles remains strong, as does the expectation that performance appraisal and linking compensation to performance can provide incentives for excellence.
Policy makers already have extensive documentation of the problems and employee dissatisfactions with the Merit Pay System (MPS) and the successor PMRS: consistent underfunding of the merit pool, the lag of merit salaries behind the salaries of employees still under the General Schedule, the widely held and annually reinforced belief that federal salaries have fallen far behind their private-sector equivalents, and the perceived politicization of the civil service and the merit pay system that seemed to be an outgrowth of the Civil Service Reform Act. This study is intended to supplement that knowledge and experience with information drawn from the private sector, beginning with a systematic investigation of the research on performance appraisal and pay for performance systems and including an assessment of private-sector practices in the years since the passage of the Civil Service Reform Act.
We began the report with a cautionary note about the difficulties inherent in trying to measure social phenomena in general, and about the particular evidentiary obstacles presented by the subject at hand (Chapter 3). Our research has taken us into the literature of a variety of disciplines as we tried to piece together from fragmentary evidence the best possible scientific understanding of the adequacy of performance appraisal as a basis for making personnel decisions and of the effectiveness of using pay to improve performance. Investigation of the effects of linking compensation to performance led us from the question of individual effectiveness to organizational effectiveness and required an examination of both merit and variable pay plans. Recent research trends also broadened the scope of the study beyond measurement instruments and appraisal processes to an examination of context and the attempt to identify conditions under which performance appraisal and merit plans operate best.
In the course of our investigations it became clear that the theoretical and empirical literatures have posited at least four different types of benefits in discussing performance-based pay systems: (1) positive effects on the work behaviors of individual employees (including decisions to join an organization, attend, perform, and remain); (2) increased organization-level effectiveness; (3) facilitating socialization and communication; and (4) enhancing the perceived legitimacy of an organization to important internal and external constituencies.
We have been ecumenical in pulling together evidence and information that speak to these criteria for gauging the effectiveness of an organization's performance appraisal and pay systems. The preceding pages have taken account of theory, empirical research, and clinical studies not only from many disciplines, but also from any research topics that seemed relevant. The formal evidence has been supplemented with information about current practices in private-sector firms.
The study's findings and conclusions are presented in this chapter as follows. The first section deals with the science and practice of performance appraisal, focusing first on measurement research, then on applied research, and ending with overall findings and conclusions. The second section covers
performance-based pay systems, focusing first on evidence from research, then on findings from practice, and again ending with overall findings and conclusions. The third section deals with the influence of context on performance appraisal and merit pay systems. The fourth section deals with the implications of the study's findings and conclusions for federal policy making.
I. THE SCIENCE AND PRACTICE OF PERFORMANCE APPRAISAL
The evaluation of workers' performance is directed toward two fundamental goals. The first of these is to create a measure that accurately assesses the level of an individual's performance on something called the job. The second is to create a performance measurement system that will advance one or more operational functions in an organization: personnel decisions, compensation policy, communication of organizational objectives, and facilitation of employee performance.
Although all performance appraisal systems encompass both goals, the two are represented in the literature by two distinct, albeit overlapping, lines of development in theory and research. In part the difference in approach to performance appraisal reflects disciplinary orientation, in part historical development. One approach grows out of psychometrics and the measurement tradition, with its emphasis on standardization, objective measurement, psychometric properties (validity, reliability, bias, etc.). The other comes from the more applied fields—human resource management, industrial and organizational psychology, organization science, sociology—and focuses on the organizational context and the usefulness of performance appraisal for such things as promoting communication between managers and employees; clarifying organizational goals and performance expectations; providing information for managers to guide retention, dismissal, and promotion decisions; informing performance-based pay decisions; and motivating employees.
Both research fields are interested in the use of rating scales to evaluate job performance, although they have tended to focus on different questions and have different expectations of performance appraisal. At the risk of overemphasizing the distinctions, we have presented our discussion in this report in two parts, one focused on the measurement research, the second on the applied research. It is, however, a matter of general orientation, not unrelated polarities.
Of the two goals, accuracy and organizational utility, most of the research in the measurement tradition has concentrated on aspects of accuracy, the implicit assumption being that if the measures are accurate, the functional goals will be met. Research in the more applied fields tends to focus not on the measurement instrument and the accuracy of inferences drawn from the measurement, but on the whole operational system of which it is a part. The applied or management perspective tends to evaluate the performance measurement component by how well the whole operates, e.g., whether the system distributes pay as it was
designed to, whether the system is accepted by all players. Accuracy of performance measurement tends to be ignored, not because it is considered unimportant, but because it is assumed, at least implicitly, that if the system-level criteria are met, then the measurement component must be sufficiently accurate.
Apart from our own convenience in presenting findings from the measurement and applied traditions separately, it is important that federal policy makers, managers' groups, and employees understand these differences and tailor their language and expectations appropriately. Current federal policy is couched in the language of the measurement tradition. In the manner of the 1978 Uniform Guidelines on Employee Selection Procedures, which elaborates the requirements of Title VII of the Civil Rights Act of 1964, Office of Personnel Management regulations implementing the Civil Service Reform Act of 1978 called on federal agencies to develop job-related and objective performance appraisal systems. The regulations required that performance standards and critical job elements be specified consistent with the duties and responsibilities outlined in an employee's position description. OPM suggested that performance standards be based on a job analysis to identify the critical elements of a job, and that each agency develop a method for evaluating its system to ensure its validity. Although courts have not demanded of performance appraisal systems the degree of rigor required of tests and other selection instruments, the terms validity, objectivity, and job-relatedness are all drawn from the context of psychological testing and performance measurement.
The Measurement Tradition
Psychometrics grows out of the theory of individual differences, namely, that humans possess characteristics and traits (e.g., height, verbal ability, upperbody strength); that each possesses these characteristics in some amount; and that the amounts can be measured. Drawing on findings in the biological sciences about the distribution of characteristics in a given plant or animal population, the founders of psychological measurement developed statistical techniques for expressing human mental characteristics and for relating the standing of one individual to that of a population of individuals. From the beginning, these theories and measurement techniques were thought to hold great promise for matching people to jobs and for measuring job performance. They were also particularly compatible with the concept of meritocracy and the particularly American idea that jobs ought to be allocated on the basis of talent or ability and not as a function of family connection, social class, religious persuasion, or other criteria that are irrelevant to job performance.
In the realm of psychometrics, the scientific imperative is accuracy of measurement. Standardized multiple-choice tests, the most familiar type of instrument in this mode, are a product of that drive for precise measurement.
Just as test administration can be controlled to provide a high degree of consistency and uniformity in the conditions of testing, so does the format of the tests constrain response possibilities to allow direct comparison of the performance of all test takers. Over the years a variety of sophisticated statistical analytics have been developed to evaluate the consistency of measurement (reliability analyses) and the accuracy and relevance of inferences drawn from the measurement results (validity analyses).
Prior to 1980, most research on performance appraisal was generated from within the psychometric tradition. Performance appraisals were viewed in much the same way as tests: they were evaluated against criteria for validity and reliability and freedom from bias, and a primary goal of the research was to reduce rating errors.
Our findings on how closely performance appraisal has been found to conform to these aspirations of measurement science follow.
Research on Job Analysis
Findings: Job Analysis
Applied psychologists have used job analysis as a primary means for understanding and describing job performance. There have been a number of approaches to job analysis over the years, including the job element method, the critical incident method, the Air Force task inventory approach, and methods that rely on structured questionnaires to describe managerial-level jobs in large organizations. All of these methods share certain assumptions about good job analysis practices, and all are based on a variety of empirical sources of information.
There is an enormous body of job analysis research, the preponderance of which has been conducted for relatively simple, concrete jobs—military enlisted jobs, auto mechanics, sales, and other jobs characterized by observable behaviors or tangible products. The literature on complex, interactive, cognitively loaded jobs, and specifically on managerial jobs, is comparatively sparse and less conclusive.
With few exceptions, the analysis of managerial performance is cast at a high level of abstraction; far less attention has been given to the sort of detailed, task-centered definition typical of simpler, more concrete jobs. This global focus is reflected in managerial appraisal instruments, which typically present very broad performance dimensions for evaluation.
A job may be more or less routinized, structured, and constrained by the requirements of machinery or defined by training, but the evaluation of job performance will always depend in the final analysis on external judgments about what is most important (number of units produced or quality of the
units produced; everyday performance or response to the infrequent emergency; single-minded pursuit of profits or avoidance of environmental damage).
As a consequence, describing job performance is not a straightforward or obvious process. Even for simple jobs, it involves judgment and inference combined with careful study of the job by such means as interviews, observation, and collection of data on tasks performed and skills required. For managerial jobs, the task of adequate description becomes even more difficult, because much of what a manager does is fragmented, amorphous, and involves unobservable cognitive activities.
Job descriptions and the appraisal systems based on them reflect organizational values and judgments as well as some independent constellation of job tasks and performance requirements. To speak of objectivity with regard to job analysis and performance appraisal does not imply the absence of human judgment, but rather the absence of irrelevant or inappropriate judgments.
Conclusions: Job Analysis
The commonly made dichotomy between objective and subjective measurement is more misleading than useful in the field of performance appraisal.
Organizations cannot use job analyses or other methods of specifying critical elements and performance standards as replacements for managerial judgment; at best such procedures can inform the manager and help focus the appraisal process.
The abstract character of the behaviors (e.g., leadership, oral communications, overall performance) that typifies much of the research on managerial job performance conveys a message from the research community about the nature of managerial performance and about the infeasibility of capturing its essence through lists of tasks, duties, and standards that can be objectively counted or quantified. Reliance on global measures guarantees that evaluation of a manager's performance is of necessity based on a substantial degree of judgment. An overly literal interpretation of the requirements of the Civil Service Reform Act—taking job-related to mean job-specific, or treating objective as the opposite of judgment, would be particularly destructive for managerial appraisal.
Research on Psychometric Properties
Reliability analysis provides an index of the consistency of measurement, from occasion to occasion, from form to form (if there are several versions of a test or measure that are all intended to measure the same thing), or from rater to rater. The first- and last-mentioned types of reliability analysis are particularly pertinent to performance appraisal. If the measurements are to
have any meaning, one would expect the rater to reach the same judgment from one week to the next (assuming the employee's performance did not change significantly), just as one would hope that several raters would reach substantially the same decision about a single individual's performance. Data on reliability derive in part from operational settings and in part from laboratory experiments or from research projects undertaken in field settings, using special rating instruments developed for the purpose and administered with the proviso that no operational decisions will be based on the results.
There is substantial evidence in the research literature to support the premise that supervisors are capable of forming reasonably reliable estimates of their employees' overall performance levels. For the mostly nonmanagerial jobs studied over the years, raters show substantial agreement in rating workers' performance. There is also some data showing interrater agreement on managerial performance.
It is important to remember, however, that consistency among raters cannot be taken simply at face value as proof of the accuracy of performance appraisal procedures; it can also cloak systematic bias or systematic error in valuing performance. Systematic bias is difficult to detect, the more so if it is the product of unexamined views and conventional assumptions. There is evidence of such bias, fragmentary but suggestive, in a small number of studies showing that white supervisors tend to rate white employees as a group somewhat higher than black employees and, conversely, that black supervisors rate black employees higher on average. The studies have not been able to distinguish between real performance differences and rater bias but suggest the presence of both, although the variance accounted for by bias appears to be quite small.
From the psychometric perspective, the central question posed by any measurement system is whether it produces an accurate assessment of relevant performance. Validity is the technical term used to refer to the degree of accuracy and relevance that characterizes a measurement procedure. It is not meant to imply a static characteristic of a test or rating scale; rather, the term has to do with the structure of meaning that can be built up to support the assessment results. Validity, therefore, is an accretion of evidence from many sources; it describes a research process that gradually lends confidence to the interpretations or judgments made on the basis of the measure.
In the realm of job performance, validation begins in an important sense with an analysis of the job or category of jobs for which performance measures are to be developed. If an employment test or appraisal system can be linked to important aspects of the job—say typing accuracy and speed or a sonar
technician's skill at recognizing patterns—then one building block is in place. The evidence of interrater reliabilities described above can provide another sort of clue to the accuracy of measurement systems like performance ratings, hands-on job sample tests, and other procedures that depend on an observer to judge the performance. Statisticians and psychometricians have developed an array of sophisticated statistical methods to explore the relationships between the test or measure under study and other relevant variables (correlational and regression analysis, multivariate analysis and ANOVA techniques).
Performance appraisal does not lend itself to the full complement of validation strategies that have been found useful for standardized tests. Criterion-related validity, for example, is rarely as useful for evaluating performance appraisals as it is with selection tests. The strength of the approach lies in showing that a healthy relationship exists between, say, test results and some independent, operational performance measure (e.g., college admissions test and grade-point average). When the measure being validated is itself a behavioral measure, it is difficult to find relevant operational measures for comparison that have the essential independence. As a consequence, what is frequently considered a compelling type of evidence in validation research is usually not possible for performance appraisals. Furthermore, in those limited conditions in which independent criteria do exist, the jobs themselves tend to be much more simple and straightforward than those for which appraisals are typically used.
It is, however, possible to compare performance appraisals to other measures of job performance using the conventional statistical methods of psychometric analysis. Recent military job performance measurement research, for example, demonstrated moderate correlations between supervisor ratings and each of the other types of criterion measures developed (hands-on test scores, training grades, written job knowledge tests), which lends credibility to the claim that carefully developed performance appraisals can bear a meaningful degree of relationship to actual job performance.
Supervisor ratings have been used in thousands of studies designed to examine the power of cognitive and other ability tests to predict job performance—in other words, they have been used to validate employment tests. These studies consistently show a low to moderate observed correlation between employment tests and supervisor ratings; job incumbents who score well on the test tend also to receive good ratings and those with low test scores tend to be rated as mediocre performers. While admittedly circular, this relationship provides further indirect evidence that supervisors can rate their employees with some degree of (but by no means perfect) accuracy; whether they will do so in an operational setting is another matter.
A wide variety of rating scale formats, defining performance dimensions at varying levels of specificity, exist. Commonly used rating dimensions include personal traits (e.g., initiative, leadership, perseverance), job behaviors (e.g., follows safety procedures in engine room, financial management, interpersonal relations), and performance results (e.g., quality of work, quantity of work). The number of scale points has ranged as high as 11, but most appraisal scales have between 3 and 5.
In terms of scale format, a general distinction can be made between scales that include specific behavioral examples of good, average, and inadequate performance and those that do not. The latter, called graphic scales, simply list the dimension of interest and present a number of scale points along a continuum. The scale points, or anchors, can be numerical or adjectival (e.g., consistently superior, average, consistently unsatisfactory).
Behaviorally anchored rating scales (BARS) were developed to reduce some of the rating error typical of graphic scales. Proponents thought that BARS would help to clarify the meaning of the performance dimensions used and would help calibrate various raters' definition of what constitutes superior, average, and unsatisfactory performance on the dimension. It was also felt that the behavioral descriptions would discourage the tendency to rate on broad, general traits by focusing attention on specific work behaviors. Mixed standard scales, also behaviorally based, went one step further in trying to control rater error, particularly bias and leniency. These scales present the behavioral descriptions in random order and not in conjunction with a particular performance dimension. The rater's responses are computed by someone else into a performance score for each dimension measured.
Findings: Rating Format
Reviews of the relevant research suggest that behaviorally based scales have not met early expectations. Although the research findings are not entirely consistent, the consensus seems to be that scale formats have relatively little impact on psychometric quality, when impact is indexed by interrater agreement, rater errors, and convergent and discriminant validity of ratings. In other words, the use of behavioral versus nonbehavioral language and the physical arrangement of the scale do not appear to be critical in terms of the validity of the overall judgments about performance.1
This proposition is given support by the research on the cognitive processes involved in performance appraisal done in the 1980s. This body of research suggests that the distinction between behaviors and traits is not as salient as once thought. Raters appear to rely less on specific behaviors than on their general evaluation of each employee when they make ratings, regardless of the focus of the rating scale. These general evaluations substantially affect raters' memory for and evaluation of actual work behaviors.
Finding: Job-Specific Versus Global Ratings
In litigation dealing with performance appraisal, the courts have shown a clear preference for job-specific dimensions. There is little research that directly addresses the validity of ratings obtained on job-specific, general, or global dimensions. Indirect evidence suggests that raters may work at the global level in any case. First, there is the evidence from the research on cognitive processes mentioned in finding number 2 above. In addition, there is a substantial body of research on halo error in ratings that shows that raters do not, for the most part, distinguish between conceptually distinct aspects of performance in rating their workers. This suggests that similar outcomes can be expected from rating scales that use global or job-specific performance dimensions.
Finding: Number of Scale Points or Anchors
The weight of the evidence suggests that the reliability of ratings drops if there are fewer than 3 or more than 9 rating categories. Recent work indicates that there is little to be gained from having more than 5 response categories. Within that range (3 to 5), there is no evidence that there is one best number of scale points in terms of scale quality.
Conclusion: Psychometric Properties
The combination of research on job analysis, research on the reliability of appraisal results, and the direct and indirect evidence of a modest relationship between performance ratings and other sorts of measures (employment tests, other measures of job performance) leads us to conclude that the performance appraisal process, while by no means high-precision measurement, can achieve moderate levels of accuracy within the assumptions of the measurement tradition.
The Applied Tradition
The focus of psychometric theory and research tends to be on the rating instrument, its measurement properties, and standardization of raters to reduce error. Researchers in the organizational sciences and human resource management tradition, which is more attuned to applied settings and operational systems, concentrate more on the appraisal system and how it functions to serve organizational ends. From this point of view, performance ratings are not the equivalent of testing technology, and the concentration of research energies on questions of job analysis, scale development, scale format, and measurement precision is misguided.
There are others closer to the measurement tradition who also have begun to feel that the psychometric lines of inquiry have become arid and are unlikely to bring about large additional improvements in the way performance appraisals are used in organizations (Banks and Murphy, 1985; Ilgen et al., 1989). A number of industrial psychologists in the last decade have begun to move away from the traditional view of performance appraisal as a measurement problem; rather than treating it as a measurement tool, they have begun to look on performance appraisal as a social and communication process (Murphy and Cleveland, 1991). Although such scholars do not reject the idea of accuracy, they tend to take a more commonsense approach, talking of the ''relevance" of the appraisal to job performance, and to concentrate much more on the contextual factors that support or distort appraisal systems.
From this perspective, the interesting research questions about performance appraisal systems are whether they enrich managerial judgment and improve employee understanding of organizational goals and standards of performance; encourage more communication between managers and employees; communicate a sense of equity and fair play in the distribution of rewards and penalties by making visible the grounds of these decisions; and enhance employee trust and acceptance. While none of these questions can be divorced from the accuracy-validity issues, the answers tend to be sought in evidence of system-level outcomes. Research on the effectiveness of performance appraisal looks at such questions as employee attitudes toward the system, the degree to which it serves individual needs (feedback, employee development) or organizational needs (communication of mission, meritocratic principles), and the degree to which it enhances (or destroys) cohesion in the work unit or organization. And, as many of these points of emphasis indicate, there is a great deal of emergent interest in the organizational context in which appraisals occur.
Although this reorientation is quite recent among applied psychologists, our review of the literature included several bodies of research in organizational psychology and management science that contribute to an understanding of how appraisal systems function as part of an organization's performance management system. These include: (1) performance appraisal and motivation, (2)
approaches to assisting supervisors in making high-quality ratings, and (3) the types and sources of rating distortion that can be anticipated in an organizational context, particularly when the results of the performance appraisal are linked to decisions about employees' pay increases.
Performance Appraisal and Motivation
Information about performance is believed to influence work motivation in three ways. First, in expectancy theory, performance information is thought to provide the basis for the employee to form beliefs about the causal connection between performance and pay. Second, performance information is believed to affect motivation by creating a sense of accomplishment; this sense of accomplishment provides an incentive to maintain high performance. Third, it is proposed that performance information provides cues to the employee about which behaviors should be continued and which should be dropped or modified.
Findings: Performance Appraisal and Motivation
The empirical research needed to support these motivational models is ambiguous as well as spotty. There is some survey data, including data on the federal Performance Management and Recognition System, that indicates that the feedback from performance appraisal helps some employees understand the job and performance expectations better. Whether that translates into better performance is unclear. At the same time, there is survey evidence indicating that appraisal information is less likely to be an accurate source of information than informal interactions with the supervisor, talking with coworkers, specific indicators provided by the job itself, and personal feelings.
The performance feedback literature, which also draws heavily on survey data, indicates that the credibility of the supervisor is crucial to acceptance of appraisal information. That credibility appears to depend heavily on the supervisor's perceived degree of knowledge about the employee's job and degree of interest in the employee's welfare.
A frequent research finding is that employees rate their own performance higher than do their supervisors. This is supported by evidence that people are likely to accept positive information about themselves and to reject negative information. Both of these inclinations would tend to dilute the motivational influence of any critical performance appraisals.
Approaches to Increasing Rating Quality
Several approaches have been used to increase the quality of performance ratings. These have included developing training programs for supervisors responsible for providing performance appraisals and developing appraisal scales
that explicitly guide the rater through both performance observation and performance assessment.
Finding: Increasing Rating Quality
The research results on rater training are mixed. A number of recent research reviews have concluded that rater training has not been highly effective in increasing the accuracy of ratings. However, there is some contrary evidence suggesting that training can lead to more accurate ratings—particularly training that focuses on the rating process and on the use of specific rating tools. Thus training seems indicated if the performance appraisal system involves scales that require complicated procedures or calculations.
Sources of Rating Distortion
Performance ratings are subject to distortion from many quarters, no matter how carefully designed the appraisal instrument. The measurement research has concentrated on statistical analysis to detect rater bias and rater errors such as halo and leniency. The organizational context adds greatly to our understanding of likely sources of distortion. It is widely assumed, for example, that the uses of the rating data in an organization will influence the appraisal process and outcomes. There are also strains in the motivational literature suggesting that supervisors distort ratings, among other reasons, to achieve outcomes they value, to bolster feelings of fairness in the work group, or to avoid demotivating employees with brutal ratings.
Findings: Sources of Rating Distortion
There is evidence from both laboratory and field studies to support the assumption that the intended use of performance ratings influences results. The most consistent finding is that ratings used to make operational decisions (e.g., pay, promotion) are more lenient than ratings used for research purposes or for feedback.
While the predictions from the motivational literature seem reasonable, empirical research on motivational factors in rating distortion is understandably rare. Little is known about the factors actually considered by raters when they decide how to fill out their rating forms. There is some revealing clinical evidence, however. A number of researchers have reported, based on interview data, that supervisors consciously manipulate appraisals to achieve desired outcomes, such as maximizing the chances that deserving employees get promoted.
Whatever the exact nature of the environmental sources of rating distortion, organizations have adopted a number of devices to deal with it. Some
private-sector firms deal with rating inflation by requiring a forced distribution in which the majority of ratings are allocated to the middle two or three categories—this provides for only a few outstanding ratings and encourages a few less-than-satisfactory ratings. Some companies decouple the performance rating from pay decisions by interposing a negotiation among relevant supervisors to rank all employees with similar jobs, thereby hoping to combat inflation and lessen the negative consequences of disappointing pay outcomes on the relationship of supervisor and employee.
Our review of performance appraisal practices in the private sector suggests that most organizations focus on the process, rather than the design aspects, of performance appraisal. For example, few organizations conduct regular updates to job analyses and job descriptions or fund validation studies. Indeed validity and reliability do not seem to enter the vocabulary of private-sector human resource managers as a rule, a finding of no great surprise since only a few of the larger companies (Sears, AT&T) have an in-house personnel testing and measurement research capability. In contrast, there is nearly universal use of objective-based formats for managers and professionals; this format allows for joint manager-employee participation in defining performance objectives and, in some organizations, interim changes to objectives according to organization or individual needs.
In addition, some organizations use joint management meetings for ranking employees after initial performance ratings are completed; these meetings provide a forum for negotiating the basic norms of "acceptable" individual performance for similar jobs or job areas. Such meetings recognize the process aspects of performance appraisal—that norms change, that raters change, that context is important, that individual judgments need to be calibrated against group norms. Our interviews with personnel managers suggested that their process emphasis also includes communications to managers and other employees about the role of performance appraisal in the context of the organization's other meritocratic practices and culture, and the insistence that performance appraisal is an important, ongoing part of a manager's job. These companies tend to assess the effectiveness of performance appraisal via its influence on employee perceptions of equity and job satisfaction, rather than with measures of performance improvements or cost reductions.
All of this emphasis on process and the use of performance appraisal systems to reinforce the idea of a meritocratic personnel context is consistent with the current research interest in performance appraisal as a social and communication process rather than a measurement tool. However, it does not address the question of the accuracy of the rating decisions or the effects of using an appraisal system on individual or corporate performance.
PERFORMANCE APPRAISAL: OVERALL FINDINGS
We have to some extent caricatured two different approaches to performance appraisal—the one preoccupied with psychometrics and precision measurement, the other focused on the utility and acceptance of performance appraisal. Clearly, both sets of considerations are important. The appropriate balance in devoting resources to measurement issues versus process issues will obviously depend on the specifics of the situation.
However, we wish to call attention to two sets of findings that suggest that there may be diminishing returns to focusing on the measurement properties of appraisal scales in the federal context.
Findings: Quality of the Instrument
There is no compelling evidence that one appraisal format is significantly better than another. The improvements in accuracy and precision that were at one time anticipated from the use of behaviorally anchored rating scales have not been convincingly demonstrated as yet—not in a way that would justify the very expensive and labor-intensive development of such scales for federal jobs generally. Although there is far less evidence on the subject, global ratings do not appear to produce very different results from job-specific ratings.
Assuming that reasonable care has been taken in the development of scales and the training of raters, the reliability and validity of performance appraisal systems does not appear to be improved by fine-tuning the format of the appraisal instrument or the number of rating anchors used.
The reliability and validity of performance appraisal systems established in the context of research or laboratory settings cannot necessarily be expected to translate directly into operational settings. We know, for example, that when performance ratings are used in the context of merit pay allocations, managers tend to inflate ratings. We know too that specifying behaviors of interest in the appraisal format (e.g., BARS or management-by-objective systems) can lead managers to ignore other aspects of job performance, particularly those that are difficult to reduce to concrete terms, that may be equally important to successful performance.
There is virtually no research establishing the predictive validity of performance appraisal measures, tools, and approaches for measures of organizational effectiveness aggregated to the level of the office, division, or firm. (This statement says more about the state of the analytical tools available to social scientists than perhaps about performance appraisal.)
Findings: Costs of Psychometric Sophistication
Psychometrically sound performance measures based on job analysis and supported by a substantial empirical research base are both difficult and costly to generate and to maintain.
One could infer from current practice that the payoffs of trying to maximize and demonstrate the scientific validity of measures of job performance are not perceived to justify the costs—or that there is simply little felt need to do so. Few organizations attempt to establish the scientific validity of performance appraisal using typical psychometric procedures. The focus in applied settings appears to be on performance appraisal as a means of supporting an ethos of meritocratic personnel decisions, and on the development and administration of performance appraisal in ways that foster employee perceptions of equity and fairness—using goal setting formats, using joint management negotiations to define job performance norms, and measuring employee perceptions of performance appraisal fairness. There is virtually no measurement of the effects of performance appraisal on ongoing organization-level performance or cost reduction measures.
PERFORMANCE APPRAISAL: OVERALL CONCLUSION
Given the expense and difficulty of developing appraisal systems that conform to the exacting requirements of the measurement tradition; given the very modest returns to that investment that have been documented empirically; given the widespread lack of concern with this level of precision among firms using performance appraisal; given the absence of convincing evidence linking performance appraisal to organization-level outcomes—we find it impossible to conclude that federal policy makers should commit vast new human and financial resources to job analyses and the development of performance appraisal instruments and systems that can meet the strict constructionist challenge of measurement science.
Many applied psychologists and management experts feel that the search for such a high degree of precision in measurement is not economically viable in most applied settings—some believe that there is little to be gained from this level of precision over currently accepted sound practices.
Policy makers need to consider carefully where on the spectrum, between psychometric measurement and impressionistic measurement, performance appraisal for the civil service should be aimed. The purposes of the appraisal system should enter into the decision. There seems little doubt that for purposes of communication and feedback, the demands for scientific precision will not overwhelm cost considerations. For controversial decisions such as dismissal or pay, the question becomes more difficult.
However, it is important to remember that line supervisors are usually
in a position to know their employees well and to have far more information available to them than the consumers of standardized test results—say, a college admissions committee.
These considerations lead us to conclude that for most personnel management decisions, including annual pay decisions, the goal of a performance appraisal system should be to support and encourage informed managerial judgment and not to aspire to a degree of standardization, precision, and empirical support that would be required of, for example, selection tests.
In this context, informed judgment means that there are demonstrable and credible links between the performance of the individuals being rated and the supervisor's evaluation of that performance.
II. Performance-Based Pay Systems
The label pay for performance covers a broad spectrum of compensation systems that can be clustered under two general categories: merit pay plans and variable pay plans. The latter category can be further divided in two, namely, individual incentive plans and the currently popular group incentive plans. Although the charge to the committee was couched in terms of merit pay plans, we extended the scope of our review to include pay for performance and compensation research more generally. This was in part for the sake of experience—we found virtually no research on the effects of merit pay systems on the performance of individuals or organizations, and so were forced to turn elsewhere to explore the question. But we also rapidly realized that the effects of performance-based pay plans on individual and organizational performance cannot be easily disentangled from the broader context of an organization's structures, management strategies, and personnel systems.
We have distinguished performance-based pay plans along two dimensions. The first represents design variation in the level of performance measurement—individual or group—to which payouts are tied. The second represents design variation in the plan's contribution to base pay—some are added into base pay, some are not.
In merit pay plans, the locus of attention is individual performance. As an important element in a meritocratic personnel system, merit pay plans link annual pay increases, at least in part, to how well the incumbent has performed on the job. As a consequence, performance appraisal is at the heart of most merit plans. Payouts allocated under merit plans are commonly added into the individual's base salary. The payouts are typically not large (on average 5 percent, with a range of 2 to 12 percent), but their addition to base pay offers the potential for significant long-term salary growth.
In the most common individual incentive plans—piece rate plans and sales
on commission—payouts are not added to base salary. Although the payouts can be large, they also carry the risk to the individual of no payout if performance thresholds are not met.
Group incentive plans differ from the two preceding types in basing compensation decisions on unit or system performance rather than individual performance. Thus profit-sharing plans or equity plans link employees' payouts to the overall fortunes of the firm as measured by some indicator of its financial health. Although payouts can be large in good times, they are not usually added to base pay—hence the designation variable pay plan.
All pay for performance plans are designed to deliver pay increases to employees based, at least in part, on some measure of performance. In theory, such plans offer several potential benefits:
They can support the organization's personnel philosophy by helping to communicate the organization's goals to its employees. For example, if financial goals are paramount, then a pay for performance plan tied to the achievement of financial goals (e.g., a profit-sharing plan) helps reinforce their importance for employees.
Goal theory also suggests that performance-based pay plans can support a certain level of performance that is consistent with the organization's mission. For example, a plan that pays out when financial goals are almost met (80 percent) sends a different message to employees than one that pays out only when goals are completely met (100 percent). Likewise, if employees receive no pay increase when their performance appraisal is below some work force norm, then they are more likely to attend to that norm.
They can help ensure consistency in the distribution of pay increases. For example, under a plan that ties pay increases to a specific financial goal, payouts are distributed only when that goal is met. Under a merit plan, pay increases are distributed consistently to employees who are in the same pay grade, who are in the same position in grade, and who have the same performance appraisal ratings. This helps the organization predict and regulate the price tag for merit increases.
Motivation theory suggests that pay for performance can positively influence individuals to achieve goals that are rewarded. To the extent that these goals contribute to organizational effectiveness, we can infer that pay for performance can influence individual and organizational effectiveness.
Before turning to the research findings, it is important to note that performance-based pay is only one dimension of employee compensation; other dimensions include competitiveness of salaries with the marketplace, benefits packages, cost-of-living considerations, and others. The effects of merit or variable pay plans will depend in good measure on this larger compensation context.
EVIDENCE FROM RESEARCH
Organizations design pay systems to accomplish three objectives: attracting, retaining, and motivating employees to perform; advancing the fair and equitable treatment of employees; and regulating labor costs. We have reviewed the research literature to see how pay for performance plans, and particularly merit pay plans, influence an organization's ability to meet these objectives.
The research most directly related to questions about the impact of performance-based pay plans on individual and organizational performance comes from theory and empirical study of work motivation. Motivation theories that have been well tested empirically predict that employee motivation is enhanced, and the likelihood of desired performance increased, under pay for performance plans when: (1) employees understand performance goals and view them as "doable" given their own abilities and skills and the restrictions posed by organization context; (2) there is a clear link between performance and pay increases, consistently communicated and followed; and (3) the pay increase is viewed as meaningful.
Findings: Employee Motivation
Most of the research examining the relationship between pay for performance plans and performance is focused on individual incentive plans such as piece rates. By design, these plans most closely approximate the ideal motivational conditions prescribed by expectancy and goal-setting theory.
Empirical research indicates that individual incentive plans can motivate employees and improve individual performance.
Individual incentive plans are most likely to improve performance in (a) simple, structured jobs in which employees are relatively autonomous; (b) work settings in which employees trust management to set fair performance goals; and (c) a stable economic environment.
Merit pay plans do not conform as closely as individual incentive plans to the theoretical conditions thought to be conducive to improved performance. Although merit plans also focus on individual performance, the link between performance and pay increase is less concrete; pay increase guidelines typically consider position and time in grade as well as performance rating; and pay increases tend to be small and therefore do not clearly differentiate outstanding from average or even poor performance. These characteristics may dilute their potential to motivate employees.
There is very little empirical research on merit pay plans. What exists is mixed and defies firm conclusions about the relationship between such plans and either individual or group performance. There are a number of field
studies suggesting that managers and professionals under a merit pay system (as opposed to a straight seniority system or no formal system) express more job satisfaction and perceive a stronger tie between pay and performance. Other studies suggest that these effects may be tenuous.
Some group incentive plans retain many of the motivational features of individual incentive plans (quantitative performance goals, relatively large and frequent payouts), but it is not easy for individuals to see how their performance contributes to group- or organizational-level measures, so the motivational link is weakened. More to the point, payouts may occur only in good times and are dependent on larger environmental and economic forces beyond the control of the individual employee.
There is a modest body of research evidence drawn from private-sector experience that suggests that gainsharing and profit-sharing plans are associated with improved group- or organizational-level productivity and financial performance. This research does not, however, allow us to disentangle the effects of the pay plans on performance from many other contextual conditions. We cannot say that group plans cause performance changes or specify how they do.
Finding: Attraction and Retention
The empirical research examining the relationship of pay to an employer's ability to attract and retain high-performing employees is limited, and there is almost no research on the impact of pay for performance plans on these objectives. We have found but one experimental study (involving white-collar workers in Navy labs) that relates retention to the adoption of a merit pay system. The study reported considerable reduction in turnover among superior performers. One study, however, is not sufficient to support a general finding.
Fairness and Equity
Organizations want their pay systems to be viewed as fair by multiple stakeholders: employees, managers, owners, and top managers; those at one remove, such as unions, associations, and regulatory agencies; and the public. Theories of organizational justice distinguish between distributive and procedural justice. The former predicts that the employee judges the fairness of pay level or pay raises in comparison with other people or groups considered similar in terms of contribution. Theories of procedural justice link employees' job satisfaction to their perceptions about the fairness of procedures used to design or administer pay, for example, the fairness of performance appraisals or the availability of mechanisms for appealing pay decisions.
Findings: Fairness and Equity
Research examining distributive and procedural fairness theories in real-world pay contexts is scarce; there are no studies that can directly answer questions about the perceived fairness of different types of pay for performance plans.
The existing research does suggest that employee perceptions of fairness with regard to pay distributions and the design and administration of pay systems does affect their job satisfaction, their trust of management, and their commitment to the organization. The research suggests at least three groups against which employees may assess the fairness of their pay: people in a similar job outside the organization; people in similar jobs inside the organization; and others in the same job or work group.
The research shows that there are different beliefs about how pay increases should be allocated (performance, seniority, equal percentage of base, etc.). Several studies suggest that private-sector managers believe that pay increases should be tied to performance. Surveys of federal managers have shown support of the concept of performance-based pay increases in principle, but there is also a tradition, stemming from the concern to protect the bureaucracy from political manipulation, that equates equity with equal pay for all people in the same grade and step.
Regulating Labor Costs
All organizations have to regulate labor costs. An organization's choice of pay system by definition involves trade-offs among performance, equity, and costs. The various performance-based pay systems studied in this report approach these trade-offs differently. The design of merit pay plans appears to emphasize predictability and stability over time. Pay increases are administered via a merit grid that uses performance rating and position in the pay grade to determine a prespecified percentage increase. The increases are typically modest, but since they are added to base pay, the gradual accumulation over years becomes significant.
Variable pay plans are intended to be more immediately market sensitive. Many of the group incentive plans, for example, are tied to clearly defined measures of organizational productivity or financial performance. Generally, improvements in these performance measures generate the bulk of the pay increase pool. Since the increases are not added to base pay, employee pay is tied closely to the fortunes of the firm. In good times, the payouts are relatively large; in bad times, the employee has more at risk than under a merit system.
Findings: Regulating Labor Costs
Although economic models provide a conceptual basis for understanding the potential trade-offs between cost and performance and some of the contextual factors that might be presumed to favor one pay policy over another, the research on cost regulation and the cost-benefit trade-offs associated with pay for performance plans is sparse and limited to production jobs and manufacturing settings.
We have no evidence that any particular pay for performance plan is superior to another in regulating labor costs.
FINDINGS FROM PRACTICE
Our review of private sector practices revealed that pay for performance is an important part of compensation philosophy and the overwhelming choice of U.S. private-sector firms. Merit plans are almost universally used for managerial and professional employees (95 percent); variable pay plans are much less frequently used (between 16 and 40 percent, depending on the type of plan), but increased competition worldwide appears to be kindling interest in them.
Our interviews with personnel managers of five Fortune 100 companies indicated that merit plans are viewed primarily as a means of guiding managers' decisions about pay increases in a way that is consistent with a meritocratic personnel philosophy—that is, it ensures that pay increases are, at least in part, tied to individual contributions, and that the increases are consistently distributed to employees in a way that is fair and predictable.
This strong attachment to a meritocratic ethos explains the predominance of merit pay plans in the private sector. Merit plans are the only pay for performance plans currently used that base pay increase decisions on the combination of individual contributions (skills, experience, and performance) that are the foundation of a meritocratic philosophy.
The personnel managers interviewed noted that a major benefit of performance appraisal and merit pay was the identification of top and bottom performers. They emphasized the flexibility of private-sector managers to bring top performers into a job at any position in the pay range, and the comparative ease of dismissing those who cannot meet company performance standards.
Surveys indicate that organizations do not evaluate the effect of merit plans on performance, but rather focus on employee perceptions of plan fairness and workability and of the link between pay and performance.
The personnel managers interviewed also emphasized the importance of communicating merit pay increases as part of an overall pay system and a meritocratic personnel philosophy. For example, most of these managers emphasized the competitiveness of base pay and benefits and the general excellence
of the company and work force in their pay communications to employees. Notable, also, is that most of these managers said that their organizations did not share specific pay information—such as average annual increase percentages, market competitors and wage survey methods, the organization spectrum of pay ranges—with employees. This is in contrast to the federal meritocracy in which employees appear to have information about their pay from many different (and conflicting) sources.
In contrast to the nearly universal presence of merit pay plans, our survey reviews revealed that less than 40 percent of private-sector firms have bonus plans for middle managers; less than 20 percent have gainsharing or profit-sharing plans in place. Baseline data for the frequency and distribution of specific plans is difficult to obtain, but there appears to be some increase in interest in these plans and in their application to groups of employees not traditionally covered.
There are a limited number of surveys on the use of group incentive plans. They report that most organizations adopt these plans to improve productivity and financial outcomes and, more generally, to ''revitalize the organization consistent with business strategy." These same surveys report that organizations that have adopted these plans believe that they have achieved the desired effects, but also acknowledge the importance of contextual factors such as employee involvement, information sharing, and ongoing marketing and communication to the employees covered. One survey acknowledged that design and implementation costs were high. None of these surveys reported employee perceptions about the equity or efficacy of variable pay plans.
PERFORMANCE-BASED PAY SYSTEMS: OVERALL FINDINGS AND CONCLUSIONS
Taken together, the evidence from research and practice suggests the following findings and conclusions about the effects on individual and organizational performance of pay for performance plans.
Findings: Individual Performance
The evidence on the effects of pay for performance, pieced together from research, theory, clinical studies, and surveys of practice, suggests that, in certain circumstances, variable pay plans produce positive effects on individual job performance.
There is insufficient research to determine conclusively whether merit pay can enhance individual performance or to allow us to make comparative statements about merit and variable pay plans.
Conclusion: Individual Performance
We nevertheless infer that merit pay can have positive effects on individual job performance, on the basis of analogy from the research and theory on variable pay plans. These effects might be attenuated by the facts that, in many merit plans, increases are not always clearly linked to employee performance, agreement on the evaluation of performance does not always exist, and increases are not always viewed as meaningful. However, we believe the direction of effects is nonetheless toward enhanced performance.
Finding: Organizational Performance
There is some evidence from the private sector suggesting that gainsharing plans are associated with improved organizational performance. However, it is not possible from existing research to conclude that these plans cause performance changes, to specify how they do so, or to understand how the behavior of individuals under these plans aggregates to the organization level.
III: THE IMPORTANCE OF CONTEXT
Our reviews of performance appraisal and merit pay research and practice indicate that their success or failure will be substantially influenced by the broader features of the context in which they are embedded. Research on performance appraisal has recently turned to organizational factors that might support or hinder the appraisal system from functioning as intended. Research on pay plans stresses the context of the organization's personnel system, technological systems, and strategic goals.
There is a broad consensus among practitioners—as well as some research evidence—that personnel systems in general and performance appraisal and pay systems in particular must exhibit "fit" or congruence to be effective.
Three categories of contextual factors of particular relevance to performance appraisal and pay for performance emerged from our reviews of research and practice: (a) the nature of the organization's work, or what might be called technological fit; (b) the broad features of the organization's structure and culture; and (c) external factors such as economic climate, the presence of unions, and legal or political forces exerted by external constituents.
The strongest evidence on congruence has to do with the fit between appraisal and pay systems and the nature of work. The literature on the
links between pay and individual motivation, for example, demonstrates the importance of job independence, concrete and easily measured products, and production standards that are perceived as fair (doable) to effective individual incentive pay plans. Only a limited number of jobs, mainly in some executive, sales, and manufacturing work, have proved to be amenable to this sort of performance measurement and incentive pay. Conversely, it has been shown that using highly specific individual performance appraisals and incentives with jobs that are complex, interdependent, and have multiple and amorphous goals can result in employees' ignoring important aspects of their jobs or distorting performance in order to meet the appraisal goals. This sort of gaming is a particular danger with objectives-based appraisal systems. Group incentives avoid some of the problem. They recognize the interdependent nature of work and focus on organization-level performance. However, they suffer from unclear links between individual actions and organization-level results.
Organizational Structure and Culture
Although there is little systematic evidence to suggest precisely what the congruence of pay system and organizational culture looks like, there is a growing body of case studies that look at organizational structure and culture, particularly studies of high-commitment organizations and of organizational innovation. The business policy literature, for example, describes two archetypal strategic postures—the dynamic firm and the steady-state firm—and the performance appraisal and pay systems that appear to go along with each. Firms pursuing innovation and growth tend to offer their employees a higher proportion of their pay in the form of incentives than do firms in steady state. The more entrepreneurial firms tend to evaluate their managers and professionals on quantitative, organization-level performance goals and to offer high payouts if strategic goals are met. Studies of organizational structure confirm this pattern. They describe the entrepreneurial firm as emphasizing general skill, higher investment in recruiting than training, and performance measures tied to market outcomes. Retention is not a primary management goal.
Firms pursuing a maintenance strategy tend to evaluate managers on more qualitative, individual behaviors. Their personnel practices emphasize internal skill development, the importance of work force norms, and the employee's long-term contribution. Such firms would seem to be well served by traditional performance appraisal and merit pay plans.
There are also theoretical literatures that suggest that organizations in highly institutionalized sectors or that rely greatly on public trust may be more likely to adopt very formal, precise performance appraisal systems. In such organizations, personnel and pay systems can have an important legitimizing function.
There is a considerable literature that supports these general patterns of
association between performance appraisal and pay systems on the one hand and organizational strategy and structure on the other. However, all of this work is theoretical or descriptive and should be viewed as suggestive, but not necessarily generalizable.
The final dimension of congruence has to do with external factors that constrain an organization's choice of evaluation and pay systems. One of the most relevant to federal policy makers is the widespread resistance of unions in the private sector to performance appraisal and pay for performance systems. Most surveys show that unionized employees are far less likely than nonunionized employees to be covered by incentive systems (including merit plans). To the extent that this changed in the 1980s, the incentive pay arrangements accepted by unions (e.g., profit-sharing) were not ones that differentiate among individual employees.
Also of particular salience to the issue of pay for performance is the role of external laws and regulations. Fair labor standards, occupational health and safety, and equal employment opportunity are a few of the areas of law that prescribe internal structures, policies, and procedures that may be more or less compatible with an organization's chosen evaluation and pay systems. Federal equal employment opportunity policy has had an enormous impact on personnel management in every organization of any size in the nation.
In addition to these requirements, the federal government as an employer faces a set of constraints imposed by the laws and regulations surrounding its merit system. The desire to shield civil servants from the exigencies of politics has placed serious constraints on the managerial flexibility needed to make pay for performance work.
IV. IMPLICATIONS FOR FEDERAL POLICY
Since its formal adoption by the federal government, performance appraisal for merit pay has been a matter of continuing controversy and periodic amendment. One view of this experience is an explicit criticism of the federal government and its inability to "get right" what is now widely used in the private sector with (at least) less criticism. While there are many features of the merit pay system that could be improved, we do not attribute these failings to mismanagement or stupidity in implementation. Instead, we would emphasize the constraints, many of which derive from features unique to the federal sector.
The federal government faces special, if not entirely intractable, problems that work against any easy transferability of private-sector experience. The very term merit pay carries far more meaning in the context of a public civil service than in the private sector—above all, the absence of partisan political considerations in the determination of pay levels of career employees. Where
private-sector practice relatively easily accepts manager-employee exchanges about performance objectives, both individual and organizational, such a practice in the public sector could be perceived as opening the civil service to partisan manipulation.
Hence, one of the most difficult questions facing federal policy makers is whether and how the experience of private-sector organizations with performance appraisal and pay for performance plans is applicable to civil service organizations. The portrait of high-commitment organizations that emerges from case studies highlights some fundamental differences between private firms in which performance-based pay seems to work well and the typical government agency. In high-commitment organizations, the following conditions appear to obtain:
Pay for performance would be one part of a total management system, which provides full financial and organizational support for effective administration of the plan;
The organization would be characterized by an emphasis on managerial discretion and flexibility and by the recognition that individual managerial authority is critical to effective performance appraisal;
The climate would be characterized by shared values and high levels of trust throughout the organization;
On the basis of those values, the ability to link individual performance and activities to organizational goals and objectives would be strong;
There would be widespread agreement about individual and organizational standards of success; and
There would be low turnover at the managerial levels.
Most of these conditions pose a problem for public-sector organizations because of the division of leadership between the political and career employees; the lack of managerial control over personnel and resource systems; the ambiguity of goals and performance criteria; and multiple authority centers for employee accountability. The very publicness of government creates organizations that are at once more open to external influences and less able to respond to them. These conditions have led to a working environment in which managers are frustrated in their ability to make personnel decisions and employees are distrustful of the performance appraisal and pay allocation systems—most do not see a link between their performance and their pay.
The issue of divided leadership provides a particularly salient example of the inherent difficulties of creating a successful merit pay system in the federal context. A continuing theme in modern government has been the need to make the bureaucracy more responsive to the chief executive. One tool available to presidents is appointing employees to positions outside the career civil service. But if the presence of political executives in leadership positions in federal agencies institutionalizes the continuing mandate for change, the authority and
communication structures within those agencies often create obstacles to change (Ingraham, 1987). For example, the "dual executive" characteristic of many public agencies tends to create a system in which decisions are made according to short-term policy goals at the upper levels of the organization and according to longer-term program goals elsewhere.
In many ways federal agencies function as two loosely coupled organizations with authority, control, and communication between them much more tenuous than prescribed by the classic paradigm. Even if the policy goals were not so often diffuse, unclear, and contradictory (Heclo, 1978; Ingraham, 1987), the ability to communicate them to the career bureaucracy is attenuated by the lack of experience and short tenure of many political executives (Heclo, 1978). All too often, in the judgment of experts in federal management, organization-wide goals are either not articulated or are not communicated down through the organization to the career employees responsible for their implementation. Functioning with two sets of managers makes congruence and coherence hard to achieve. In most models of organizational fit, there is a single leadership that creates a coherent culture and shared values that are necessary conditions to enable a successful performance appraisal system.
The issue of organizational boundary (at which the controlling influences shift from internal to external actors), particularly as it relates to the ability to control or direct organizational resources, is also a central concern. Many have observed that public organizations are notable for the porosity of their boundary (Waldo, 1971; Kaufman, 1978; Gawthrop, 1984). The federal government has been structured deliberately to disburse authority among competing institutions (Allison, 1983); members of Congress, administration officials, interest groups, concerned citizens, and others can, and do, influence bureaucratic actors. This further obfuscates goals and objectives within the organization. Of equal significance is the fact that many of these external influences, but most notably the Congress, have a controlling influence on the resources available to the organization, thus further complicating the authority issue.
Other institutional influences that profoundly shape federal agencies and their activities include civil service laws and regulations that impose great complexity and rigidity on the system. Recruiting, testing, hiring, firing and rewarding are all constrained in the federal government (National Academy of Public Administration, 1983). As a result of these externally imposed constraints, managerial discretion has traditionally been limited and has, in fact, been discouraged by the provisions of the merit system (Ingraham and Rosen-bloom 1990). Although there is emerging evidence that some federal managers do use whatever flexibilities that are available, including those provided by existing performance appraisal systems, there is also strong evidence that procedural constraints deter all but the strongest of heart (unpublished document, U.S. General Accounting Office, 1990).
A frequently cited example of the boundary problem is demonstrated by
the fact that Congress retained statutory control over development of the federal government's performance appraisal system, rather than delegating both the development and implementation components to the Office of Personnel Management. The rationale was to balance managerial discretion with employee rights in the context of a system that made it easier for agencies to fire incompetent employees; the result was to hobble the decision making of managers. On one hand, Civil Service Reform Act legislation provided the requirement for detailed performance appraisal standards that could be used by managers as proof of unsatisfactory performance. On the other hand, the managers' ability to act regarding unsatisfactory performance was limited in the statute by providing employees with strong substantive rights, such as the opportunity to improve before an unacceptable performance action can be taken and the ability to appeal performance appraisal ratings both within the agency and externally to the Merit Systems Protection Board. This has led to situations in which, at best, a number of years are required to release an inadequate employee, and the costs borne by managers serve as a strong disincentive against appraising mediocre performance accurately.
Another feature of the federal context that warrants consideration is whether the dominant motivations among employees are comparable to those of private-sector workers who work where pay for performance has been implemented. Although there has been a long tradition of simply applying private-sector motivation theory and techniques to the public sector, some recent studies are finding different sources for motivation and different motivational patterns among public employees. Perry and Wise (1990) explore the role of public service as a motivator; Rainey (1990) documents a fairly consistent pattern of differences in public and private managers in relation to money, job satisfaction and security, and organizational commitment. In a 1982 review article, Perry and Porter noted that public-sector employees had higher achievement needs and tend to value economic wealth less than do entrants into the private sector.
Furthermore, there is some evidence that public managers, particularly those at the highest levels of the organization, are keenly attuned to public perceptions of their effectiveness and the overall usefulness of the policies and programs they administer (Ingraham and Barrilleaux, 1983). Federal Employee Attitude Surveys in 1979 and 1980 demonstrated that upper-level managers perceived generalized "bureaucrat bashing" as a personalized attack. More recent studies by the Merit Systems Protection Board (1989) and the U.S. General Accounting Office (1987) indicate that managers continue to tie their overall job satisfaction to their perceptions of "appreciation" by the public. These findings suggest that policy makers would do well to give their attention to nonmonetary motivators in concert with their plan to strengthen the ties of pay to performance.
Finally, one of the most important contextual factors that governs how any new performance appraisal or pay for performance system is likely to function
is the less than satisfactory experience of federal employees with the merit pay systems implemented during the last 12 years.
We have conducted a wide-ranging study of performance appraisal and pay for performance in the private sector to help the director of the Office of Personnel Management and other federal policy makers as they rethink the Personnel Management and Recognition System. What we have learned does not provide a blueprint for linking pay to performance in the federal sector or even any specific remedy for what ails PMRS. Instead, we conclude with some general suggestions about priorities.
Performance appraisal ratings can influence many personnel decisions, and thus care in the development and use of performance appraisal systems is warranted. There is, however, no obvious technical (psychometric) solution to the performance management issues facing the federal government. Further refinements in the technology of performance appraisal (e.g., extensive new job analysis, modifications of existing rating scales or rater training programs) are unlikely to provide substantially more valid and accurate appraisals than those currently in force, particularly for managerial and professional jobs. There is also no evidence that one particular appraisal format is clearly superior to all others. For example, we do not know that the objective-based format for managerial appraisal, so popular in the private sector, yields more (or less) valid appraisals than the supervisory ratings used in the government.
There appears to be at least as much effort expended on performance appraisal in the federal government as elsewhere. More generally, the pursuit of further psychometric sophistication in the performance appraisal system used in the federal government is unlikely to contribute to enhanced individual or organizational performance.
Where performance appraisal is viewed as most successful in the private sector, it is firmly embedded in the context of management and personnel systems that provide incentives for managers to use performance appraisal ratings as the organization intends. These incentives include managerial flexibility or discretion in rewarding top performers and in dismissing those who continually perform below standards. When performance appraisal ratings are used to distribute pay (as in a merit plan) the size of the merit pay offered allows managers to differentiate outstanding performers from good and poor performers, and thus provides them with incentives to differentiate. For example, top performers may receive 10 percent of their base salary in merit pay, good performers, 5 percent, and poor performers, no merit increase. Finally, managers are themselves assessed on the results of their performance appraisal activities.
We have been struck by the apparent contrast between incentives for private and federal managers to use performance appraisal and merit plans effectively. Whatever incentives there are for federal managers seem currently dwarfed by the disincentives.
In order to motivate employees and provide them incentives to perform, a merit plan or any pay for performance plan must theoretically (a) define and communicate performance goals that employees understand and view as doable; (b) consistently link pay and performance; and (c) provide payouts that employees see as meaningful. These conditions seem straightforward, and the notion of pay for performance thus becomes deceptively simple. Our reviews of research and practice indicate, however, that selecting the best pay for performance plan and implementing it in an organizational context so that these conditions are met is currently as much an art as a science. We cannot generalize about which pay for performance plans work best—especially for the federal government, with its considerable organizational and work force diversity.
We can suggest that, given this diversity and the importance of matching pay for performance plans to organization context, federal policy makers consider:
Decentralizing the design and implementation of many personnel programs, including appraisal and merit pay programs, within the framework of central policy guidelines and to the extent possible given the government's legitimate concerns about facilitating interagency mobility, standardization and comparability, and equity.
Supporting careful, controlled pilot studies of a variety of pay for performance systems in a variety of agencies. These studies would serve to identify important design, implementation, and evaluation issues for users, policy makers, and the research community, along with incentives to investigate these issues. They could take a variety of forms, but to be useful must provide careful measures of preand postintervention conditions.
Ensuring fair and equitable treatment for all employees is an important objective of any personnel system. Yet the heavily legalistic environment surrounding the federal civil service has led to dependence on formal procedures and an elaboration of protections, requirements, and procedures that ultimately provide powerful disincentives for managers to use personnel systems as the organization intends. Although these protections are meant to ensure employee equity, it is not clear that their proliferation provides federal employees with a greater sense of equity than seen in many private-sector organizations. Effective reform of personnel management and pay systems in the federal government may well need to be part of a more fundamental rethinking of past notions of political neutrality, merit, and their protection in the civil service.
Our entire review has stressed the importance of viewing performance appraisal and merit pay as embedded in broader pay, personnel, management, and organizational contexts. For example, while by no means the only relevant
contextual factor, the issue of comparability of federal base salaries with pay for equivalent private-sector jobs may pose severe problems for the acceptance of merit pay or any other pay for performance system if the promise of recently enacted legislation proves illusory. We realize that the broader changes suggested by an analysis of context can be costly, but we suggest that making programmatic changes to the Performance Management and Recognition System in isolation is unlikely to enhance employee acceptance of the system or improve individual and organizational effectiveness significantly and, in the long run, may prove no less costly.