Anda di halaman 1dari 20

Journal of Business and Psychology, Vol. 19, No. 3, Spring 2005 (2005) DOI: 10.

1007/s10869-004-2232-0

IMAGE THEORY AND THE APPRAISAL OF EMPLOYEE PERFORMANCE: TO SCREEN OR NOT TO SCREEN? Bryan J. Pesta
Cleveland State University

Darrin S. Kass
Bloomsburg University

Kenneth J. Dunegan
Cleveland State University

ABSTRACT: The authors tested the predictions of image theory [Beach, 1990 Image theory: Decision-making in personal and organizational contexts. Chicester, England: Wiley] by examining the decision making processes underlying performance evaluation. Across three experiments, over 400 participants evaluated the performance of a book store employee with varying degrees of good and bad performance behaviors. Results indicated that: (1) performance judgments were linearly related to the number of good and bad behaviors present in the scenarios, (2) promotion decisions initially followed image theorys screening process, as participants focused only on the employees bad behaviors, and (3) the introduction of a contrast manipulation (Experiment 2) resulted in participants abandoning the screening process for the promotion decision, until we included instructions against comparing employees (Experiment 3). Consistent with image theory, but moderated by contrast effects, promotion decisions relied on screening based solely on the employees bad behaviors, whereas performance judgments involved compensatory use of both the employees good and bad behaviors. We argue that how participants perceive the decision making scenario inuences whether or not they will screen decisions.

Organizational members often face scenarios requiring them to make either a decision or a judgment. A decision requires commitment to a particular binary course of action, while a judgment involves evaluating or categorizing a target object along some dimension (Gilliland,
Address correspondence to Bryan J. Pesta, 14724 Grapeland, Cleveland, OH 44115, USA. E-mail: bpesta22@cs.com. 341
0889-3268/05/0300-0341/0 2005 Springer Science+Business Media, Inc.

342

JOURNAL OF BUSINESS AND PSYCHOLOGY

Benson, & Schepers, 1998). Examples of the former include deciding whether or not an employee should be promoted, or should receive a merit raise. Examples of the latter include rating employee performance from poor to excellent, or rating job satisfaction from low to high. Decisions and judgments therefore differ along two dimensions: (1) decisions require commitment to action, while judgments do not, and (2) decisions are typically a dichotomy (e.g., promote or do not promote) whereas judgments are made along a continuum (e.g., from poor to excellent). Image theory (Beach, 1990, 1998) attempts to explain decisionmaking in organizational settings (see, e.g., Beach, 1996; Beach & Strom, 1989; Dunegan, 1995), and assumes that people use different strategies when making decisions, versus when making judgments. According to the theory, decisions supposedly rely on a screening process, which focuses on weeding out unacceptable options. Here, people consider only the negative information that exists about each option, and only those options that survive screening enter the second phase of decision-making, choice. In the choice stage, people attempt to select the best option from among the survivors of screening. To do this, people consider both the good and the bad about each option, in a compensatory fashion (Beach, 1990; Beach & Mitchell, 1987; van Zee, Paluchowski, & Beach, 1992). Image theory therefore assumes that people rst focus on what is wrong with a given option (termed a violation), and then focus on what is right (termed a nonviolation; Beach & Mitchell, 1996, p. 2). People screen options in image theory by adopting a rejection threshold, which is the level of violations that will be tolerated before an alternative is no longer considered viable. For example, a rejection threshold of three would mean that any employee with three or more violations would not be considered viable for promotion, no matter how many nonviolations (i.e., good behaviors) also existed for that employee. Employees with fewer than three violations survive screening, and then merit further consideration in the choice phase of the process. The choice phase involves weighing both the violations and nonviolations that exist for each option. Unlike screening, the process is compensatory, as nonviolations can offset violations (see, e.g., Gilliland et al., 1998). Thus, in the choice phase, decision-makers weigh both the good and the bad about each alternative, to arrive at a nal decision about all alternatives. For example, a manager would now consider both the productive and counterproductive behaviors of any employees that survived screening, and from this comparison, decide whether or not those employees deserve promotion. In a study on justice evaluations (i.e., deciding whether an organization acted ethically), Gilliland et al. (1998) examined strategy differences between making decisions and making judgments. The

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

343

authors found evidence of screening under certain conditions but not others. Specically, when participants judged the organization, they considered both violations and nonviolations, and did not show evidence of a rejection threshold. When participants decided whether the organization should receive an award, however, clear evidence of a rejection threshold existed. Once the number of violations exceeded the threshold (i.e., two violations, in this case), the decision was not affected by additional nonviolations. In other words, beyond the rejection threshold, the number of nonviolations had no impact on the nal decision. Gilliland et al. (1998) illustrated that justice-evaluation decisions are non-compensatory, involving screening based mainly on consideration of violations. Justice-evaluation judgments, however, are compensatory and involve weighing both violations and nonviolations. In the present study, we sought to expand on Gilliland et al.s (1998) research, by replicating and extending their results to a more common organizational decision-making activity: performance evaluation. Additionally, by including a contrast manipulation in later experiments (where people rated a truly poor employee before rating the target employee), we explored whether use of the screening process depends on how people interpret the decision making situation they face. We chose performance evaluation as our decision-making domain, because outcomes of this process relating to pay increases, promotion/ demotion decisions, etc., are obviously critical for all employees. As such, a rich literature exists on the underlying cognitive processes used in this activity (see Murphy & Cleveland, 1995, for a review). To our knowledge, however, no one has explored whether people use different strategies when making performance appraisal judgments (i.e., rating an employees overall level of performance) versus when making performance appraisal decisions (e.g., deciding on a promotion, or a merit pay increase). Given Gilliland et al.s (1998) research, we would expect performance appraisal judgments to be compensatory, relying on both violations and nonviolations. On the other hand, performance appraisal decisions should use the screening process, and should focus only on violations. Further, Gilliland et al. presented their participants with only a single target (i.e., organization) to evaluate. Real world evaluationsespecially in the realm of performance appraisaltypically involve rating multiple targets (i.e., employees) at once. Hence, it is possible that rating more than one employee at a time, versus rating a single employee in isolation, affects whether one adopts compensatory or non-compensatory decision-making strategies. Rating multiple employees might change how participants perceive their decision making situation. Participants, for example, might be tempted to compare employees

344

JOURNAL OF BUSINESS AND PSYCHOLOGY

to other employees, which in turn might lead them to abandon the screening process. In sum, we tested the predictions of image theory within the experimental context of performance evaluation. Our goal was to show that image theory accounts for the different types of information performance appraisers consider when (1) judging an employees overall level of performance, versus deciding whether or not to promote that employee and (2) evaluating a single employee in isolation, versus evaluating multiple employees at once. To preview our study, in Experiment 1 we asked participants to judge the performance of a single employee, and then decide whether or not to promote him. In Experiment 2, we introduced a positive contrast manipulation (i.e., where participants rst rated a truly poor employee before rating the target employee) to see if the context of rating multiple employees would change the type of information that subjects considered. Finally, based on the results of Experiment 2, we included a simple warning against comparing employees in Experiment 3, in an attempt to again alter the type of information that participants used when evaluating the target employee. EXPERIMENT 1 Experiment 1 participants read ctitious scenarios describing the performance of a book store employee named John Snyder. In the scenarios, the number of violations (e.g., When customers are walking around the store, John often fails to ask if they need any help), and nonviolations (e.g., When John stocks the shelves, he makes sure the books in the section are in the proper order and neatly arranged on the shelves) varied factorially at one or three. Participants then made three judgments about the employees overall performance, and a binary decision on whether or not to promote him. Based on Gilliland et al. (1998), we predicted that: (1) both violations and nonviolations will be linearly related to judgments, such that increasing nonviolations will increase performance appraisal judgments, but increasing violations will decrease performance appraisal judgments. (2) nonviolations will affect decisions only when the number of violations is below the rejection threshold. Finally, we also conducted a pilot study as part of Experiment 1. Its purpose was to gather baseline data on the relative importance of each scenario violation and nonviolation (used in the main experiments) to peoples overall evaluation of an employees performance.

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

345

PILOT STUDY Method Participants. The participants for the pilot study were 69 undergraduate business students (38 males and 31 females) with a mean age of 25.99 (range 2050) years. All participants completed the pilot study during regularly-scheduled class periods, and received extra credit for their involvement. No student in the pilot study also participated in any experiment below. Materials and Procedure. We compiled a one-page survey which separately listed each violation and nonviolation used in Experiment 1. Participants rated how much weight they would place on each violation and nonviolation (although we did not label these as such in the surveys) when evaluating an employees overall level of performance. Ratings were made on a Likert scale, ranging from 1 (no weight) to 7 (highest weight). Results of the pilot study, together with the scenario descriptions used in Experiment 1, appear in Appendix. The pilot study produced signicant differences between the weights that participants placed on each violation and nonviolation. We expected this result, as some work behaviors are more important than others when evaluating an employees overall performance. In real world decision-making, however, both the severity and frequency of violations would likely affect the screening process (e.g., the rejection threshold is probably set by considering both the number of violations and their severity). As an initial study in this area, though, we looked at just the raw frequency of violations, controlling for their severity by counterbalancing how often each specic violation/nonviolation occurred in the scenarios for the main experiment (see below).

MAIN EXPERIMENT Method Participants and Design. For the main Experiment 1, the participants were 107 undergraduate business students (45 males and 62 females) with a mean age of 26.51 (range 1950) years. They, too, completed the experiment during regularly-scheduled class periods. Each student received extra credit for his or her involvement, and no student participated in more than one of our three experiments. The design was a 2 (one or three violations) 2 (one or three nonviolations) between subjects factorial, with sample sizes per cell ranging from n 26 to n 28. We picked these values because Gilliland et al.

346

JOURNAL OF BUSINESS AND PSYCHOLOGY

(1998) consistently found the rejection threshold to be two violations. Because carryover effects prohibited within-subject manipulation of the independent variables, we adopted Gilliland et al.s approach of manipulating them between subjects. Moreover, we opted for the extremegroups designexcluding cells with only two violations or two nonviolationsto maximize statistical power. Finally, the dependent variables were the participants judgments on three questions relating to the employees performance, and a yes/no decision on whether or not to promote the employee. Materials and Procedure. We compiled a three-page packet describing a ctitious scenario wherein the participants acted as managers of a local bookstore. The entire scenario appeared on the rst page, together with blank spaces used to record each participants age and gender. The participants task was to evaluate the performance of an employee named John Snyder. Depending on condition, the scenario described either one or three violations, and either one or three nonviolations. We counterbalanced the scenarios so that each of the three violations (and each of the three nonviolations) appeared equally often in cells that contained only one violation (or nonviolation). Following the procedure Gilliland et al. (1998) used, the nonviolations appeared rst in each scenario, followed by the violations. For half the participants, page 2 of the packet contained the judgment questions, and page 3 contained the decision question. We reversed this order for the other half of participants (a later analysis revealed that how we ordered the judgments/decision questions had no effect on peoples ratings). The judgment questions were: (1) John Snyder is a good employee, (2) John Snyder is a below average employee, and (3) overall, how would you rate the performance of John Snyder? We focused the judgment questions on overall performance levels, and made them redundant so we could check whether participants were paying attention (as indicated by consistent ratings across the three questions), and so we could compute reliabilities. Participants made their judgments on a seven-point Likert scale, ranging from strongly disagree (or poor for the third judgment question) to strongly agree (or very excellent for the third judgment question). Finally, for the decision question, participants answered yes or no to the following: The owner of the bookstore has asked you to recommend an employee for promotion to the position of Assistant Manager. Would you be willing to recommend that John Snyder be promoted to Assistant Manager? We collected the data in groups of about 20 students each, and by randomly distributing the scenarios within each group. Participants were told that they were in an experiment on decision-making in performance appraisals, and that they should read the entire scenario and

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

347

then work through the performance appraisal questions at their own pace. When the last person in each group had nished, we collected the scenarios and debriefed the participants. The experimental sessions lasted approximately 20 min. Results and Discussion Judgment Data. We used p < .05 as the level of signicance for all analyses. Responses to the three judgment questions correlated strongly, with values ranging from r(106) .76 to r(106) .85. Principal components factor analysis conducted on these questions produced a single factor explaining 88% of the judgment variance. We next calculated a composite score (after rst reverse coding the second judgment question) by averaging across the three judgment questions for each participant. The resulting composite score had an alpha reliability of .92. Table 1 lists the mean judgment values for our composite score by condition for Experiment 1. A 2 (one or three violations) 2 (one or three nonviolations) analysis of variance (ANOVA) on these data produced signicant main effects for the number of violations, F(1,103) 52.1, MSe .748, and the number of nonviolations, F(1,103) 25.5; MSe .748. The Violations Nonviolations interaction was also signicant, F(1,103) 4.88, MSe .748. Consistent with image theorys rst prediction, both violations and nonviolations affected judgments, as participants rated the employee higher when three nonviolations were present (versus one), and rated him lower when three violations were present (versus one).
Table 1 Mean Performance Appraisal Judgments and Decisions by the Number of Nonviolations and Violations Present in the Experiment 1 Scenarios Number of Nonviolations One Judgment Data Number of Violations One Three (Difference) Decision Data Number of Violations One Three (Difference) Three (Difference)

5.26 (.87) 3.68 (.93) 1.58* (1.8)

5.73 (.85) 4.89 (.81) 0.84* (1.0)

)0.47 (.55) )1.21* (1.3)

.385 (.50) .111 (.32) .274* (.67)

.808 (.40) .107 (.32) .701* (2.0)

).423* (.94) .004 (.01)

Notes. Each difference with an asterisk is signicant at p < .05, using Tukey LSD. Values in parenthesis are standard deviations for the means, and effect sizes (Cohens d) for the mean differences.

348

JOURNAL OF BUSINESS AND PSYCHOLOGY

The interaction, however, was not predicted. From the means in Table 1, the effect of nonviolations on judgment was stronger in the three (versus one) violations condition. At present, we have no explanation for this effect, but note that Gilliland et al. (1998) did not nd this interaction, nor did it replicate in Experiments 2 and 3, discussed below. Decision Data. Table 1 also lists the mean promotion decision values by condition for Experiment 1. Because the decision data are binary, we used logistic regressions to test whether the number of nonviolations affects promotion decisions only when the number of violations is below the rejection threshold. In our rst analyses, we looked at just the cells containing one violation (which is presumably below all participants rejection thresholds). We then regressed the number of nonviolations on the promotion decision question. The number of nonviolations signicantly predicted promotion rates (B .953), with an R2 value of 24%. In contrast, with just the cells containing three violations (which is presumably above participants rejection thresholds) nonviolations did not predict promotion rates (B .020), nor did they explain any of the variance in the promotion decision question (R2 .00). The interaction revealed by the logistic regression is consistent with image theorys second prediction. When only one violation was present, the three nonviolations group was much more likely to promote the employee (i.e., M .808), compared with the one nonviolation group (i.e., M .385). In contrast, with three violations present, the means for these groups were nearly identical (i.e., Ms .111 and .107, in the one and three nonviolations groups, respectively). Summarizing the Experiment 1 results, participants seemed to use a rejection threshold when making yes/no promotion decisions, as once the number of violations surpassed the threshold (i.e., in the three violations cells) the promotion decision was not affected by additional nonviolations. Further, participants did not seem to use the rejection threshold when making performance appraisal judgments. Even in groups with three violations present (which appears to be above the rejection threshold for the decision data), participants judged the employee more favorably in the three nonviolations cell than they did in the one nonviolation cell. A limitation of the Experiment 1 data is that participants evaluated only a single employee. Real-world performance appraisals, however, typically take place in settings that involve rating multiple employees. In such a context, decision-makers often commit an error known as the contrast effect (see Kravitz & Balzer, 1992). This effect occurs when a managers appraisal of the target employee is inuenced by how the manager perceives the performance of another employee. An average employee, for example, may receive a higher than deserved performance appraisal if he/she is evaluated in contrast with a truly poor employee

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

349

(i.e., a positive contrast effect). Conversely, the average employee may be rated lower than deserved if contrasted with a truly excellent employee (i.e., a negative contrast effect). The literature on contrast effects is extensive, comprising both laboratory and eld research (see Kravitz & Balzer, 1992; Maurer & Alexander, 1991). The origin of the effect (see Maurer & Alexander, 1991), and how it can be avoided (e.g., Maurer, Palmer, & Ashe, 1993) are well known. Our present interest, however, is on how contrast effects might alter the decision making strategies that participants use when evaluating employees, and whether image theory could still account for these strategies. For example, with the binary promotion decision, image theory could explain positive and negative contrast effects by assuming that the rejection threshold moves higher after evaluating the performance of a really poor employee, or lower after evaluating the performance of a really excellent employee (see Beach & Mitchell, 1987, p. 216, for a discussion of factors hypothesized to inuence the rejection threshold). Alternatively, perhaps the context of rating multiple employees causes the participant to abandon the screening process altogether. This would be expected if the person no longer perceives the promotion question to be a binary decision, but as one which instead involves comparing multiple employees with an eye toward selecting the best one. Comparing employees to other employees is the nature of the contrast effect; however, once participants begin mentally ranking employees, the screening process may be abandoned for a decision making strategy that focuses on both violations and nonviolations. Indeed, Gilliland et al. (1998) note that the distinction between judgment and decision is sometimes ambiguous, and that any decision task incorporating judgment-like comparisons would probably not involve use of the screening process. Ranking is a clear example of a decisionjudgment hybrid, as it involves both the identication of a best alternative (the decision part of ranking) and comparisons across alternatives (the judgment part of ranking, which is inherent in the contrast effect). With ranking, it is likely that people will rely less on the screening process and instead focus on both violations and nonviolations when making their decisions (Gilliland et al., 1998; see also Beach & Mitchell, 1987, 1990). Applied to the present study, if the contrast effect results in participants comparing employees, then participants are likely to abandon the screening process, and move directly to the choice phase of the process. EXPERIMENT 2 To explore these possibilities, we modied the Experiment 1 scenarios to include a positive contrast manipulation. Before evaluating the target employee (as the Experiment 1 participants above did), the

350

JOURNAL OF BUSINESS AND PSYCHOLOGY

Experiment 2 participants rst rated the performance of a truly poor employeeone with ve violations and zero nonviolations. As in Experiment 1, image theory predicts that judgments should be linearly related to violations and to nonviolations. Two outcomes, however, are possible with the decision data. First, if participants are still using the screening process, we should see a rejection threshold for the promotion decision, similar to the one found in Experiment 1. Alternatively, the contrast effect might change the nature of the decision task, such that participants no longer use the screening process. If this possibility holds, then we should not see evidence of a rejection threshold for the decision question. Instead, as with the judgment data, both violations and nonviolations would be linearly related to the promotion decision. Method Participants and Design. The participants were 100 undergraduate business students (36 males and 64 females) with a mean age of 25.99 (range 1852) years. As in Experiment 1, participants were run in groups, they completed the experiment during regularly-scheduled class periods, and they received extra credit for their involvement. The design was also identical to that used in Experiment 1 (with sample sizes per cell equaling n 25). Materials and Procedure. We modied the Experiment 1 scenarios by rst including a description of the performance of a second employee, named Susan Jones. All participants received the same description of Susans performance, which listed ve violations and zero nonviolations. The violations included: Susan has often been late for work; fails to reorder popular books; sometimes dresses in an inappropriate manner; is mouthy with the assistant manager, and often makes personal telephone calls while at work. On page 2 of the scenarios, participants rated Susans performance with the same judgment questions we used for John Snyder in Experiment 1. Also, rather than counterbalance the order of the judgment and decision questions (as we did in Experiment 1, but found the effect to be nonsignicant), we simply listed the promotion decision question last on the page, immediately following the judgment questions. Page 3 of the scenario described the performance of John Snyder. As in Experiment 1, the number of violations and nonviolations varied factorially at one or three. Page 4 of the scenarios contained the judgment questions for John, followed by the promotion decision question. Finally, the scenarios did not ask participants to compare the two employees, nor did we mention the contrast effect in our instructions. In fact, all other aspects of this experiment were identical to those of Experiment 1.

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

351

Results and Discussion Judgment Data. Once again, responses to the three judgment questions correlated strongly, with values ranging from r (98) .78 to r (98) .90. Principal components factor analysis produced a single factor, explaining 88% of the variance in the judgment questions. We next created a composite score (with a resulting alpha reliability of .93), by averaging across the three judgment questions for each participant. Table 2 presents the mean composite-score judgments by condition for Experiment 2. A 2 (one or three violations) 2 (one or three nonviolations) ANOVA produced main effects of violations, F(1,96) 32.1, MSe .821, and nonviolations, F(1,96) 17.0, MSe .821. The Violations Nonviolations interaction, however, was not signicant, F < 1.00. Hence, consistent with image theorys predictions, participants rated John higher when three nonviolations were present (versus one), and rated John lower when three violations were present (versus one). Decision Data. Table 2 also lists the mean promotion decisions by condition for Experiment 2. We again tested for the predicted interaction via logistic regression. In the rst analyses, we looked at just the cells containing one violation, and regressed the number of nonviolations on the promotion decision. As in Experiment 1, nonviolations here signicantly predicted promotion rates (B .592, R2 11%). However, in contrast with the Experiment 1 results, the logistic regression on just the three-

Table 2 Mean Performance Appraisal Judgments and Decisions by the Number of Nonviolations and Violations Present in the Experiment 2 Scenarios Number of Nonviolations One Judgment Data Number of Violations One Three (Difference) Decision Data Number of Violations One Three (Difference) Three (Difference)

4.69 (1.0) 3.75 (.78) 0.94* (1.1)

5.52 (.74) 4.41 (.91) 1.11* (1.3)

)0.83* (.94) )0.66* (.78)

.440 (.51) .040 (.20) .400* (1.1)

.720 (.46) .240 (.44) .480* (1.1)

).280* (.58) ).200* (.63)

Notes. Each difference with an asterisk is signicant at p < .05, using Tukey LSD. Values in parenthesis are standard deviations for the means, and effect sizes (Cohens d) for the mean differences. The new cell containing ve violations and ve nonviolations produced a mean judgment value of 4.65 (SD = .75), and a mean decision value of .231 (SD = .43).

352

JOURNAL OF BUSINESS AND PSYCHOLOGY

violation cells here was signicant. That is, nonviolations in Experiment 2 still predicted promotion rates, even when three violations were present (B .76, R2 16%). Hence, participants in Experiment 2 did not seem to use the screening process for the promotion decision. The means in Table 2 clearly reveal this pattern. With three violations and only one nonviolation present, few people promoted John (M .04). The promotion rate, however, was six times higher in the cell containing three violations and three nonviolations (M .24, t 2.09, p .042).1 As noted earlier, it is possible that contrast effects move the rejection threshold for the target employee higher (after evaluating a poor employee) or lower (after evaluating an excellent employee). It is also possible, then, that participants still used the screening process for the promotion decision in Experiment 2, but that their rejection thresholds were shifted from under three violations (i.e., where they were in Experiment 1) to some number greater than three. In other words, the contrast effect might have caused participants to tolerate more violations for the target employee before rejecting him, as evidenced by a higher rejection threshold here (relative to Experiment 1, which did not have the contrast manipulation). Our design, however, included at most three violations in any scenario, which would not allow us to detect an increase in the rejection threshold. To address this issue, we created a new scenario for the target employee which contained ve violations and ve nonviolations. The two new violations for this scenario included: John socializes with other employees when customers are present, and John tends to be slow ringing up sales. The two new nonviolations were: John is willing to assist in training new employees, and customers react positively to the book displays John sets up for the store. This new scenario provides a liberal test for screening effects. Even though ve nonviolations (i.e., good things) exist for the target employee, use of the screening process should lead to his rejection, since ve violations are also present. We distributed these scenarios (complete with the contrast manipulation described above) to 26 other participants (12 males and 14 females, with a mean age of 24.54 years), using the same procedure as used in the main experiment above. The mean pro1 The mean in the three violations/one nonviolation cell here (M .04) is lower than it was in Experiment 1 (M .11), which actually helps our case. If we assume that the true mean is indeed .11 for this cell, then our analyses in text overestimate the degree to which subjects abandoned the screening process (i.e., the signicant difference between the 3/1 and 3/3 cells in Experiment 2 was in part due to the lower 3/1 mean found here, relative to Experiment 1). Nonetheless, even considering the unexpected help from the 3/1 cell, the effect size (.63, Table 2) is still quite large. Further, the across-experiment mean differences for this cell resulted from 3 of 27 participants promoting the employee in Experiment 1, compared with only 1 of 24 in Experiment 2. Hence, the across-experiment mean difference was caused by just two more people promoting in Experiment 1, than did in Experiment 2.

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

353

motion rate for this group (M .23), however, was still signicantly above the baseline found in the three violations/one nonviolation cell (M .04; t 2.02, p .049) above. Hence, the relatively high promotion rate in the new ve/ve cell, combined with the non-signicant Violations Nonviolations interaction above, suggests that nonviolations indeed compensated for violations in determining whether John should get the promotion. Presumably, the contrast effect here caused participants to abandon the screening process in favor of a strategy that weighed both violations and nonviolations.2 Contrast effects obviously involve comparing employees to other employees. Experiment 2, however, shows that contrast effects also change how participants perceive the decision-making task itself. When these effects are present, participants seem to abandon the screening process for a strategy where nonviolations compensate for violations. As alluded to previously, the comparison process inherent in the contrast effect likely changes the promotion question from one involving a binary, yes/no decision to one involving a decisionjudgment hybrid, where comparisons across alternatives are made. This change in the nature of the task seems to lead people away from the screening process. If the above logic is correct, then it should be possible to reverse Experiment 2s data pattern back to that found in Experiment 1, where clear evidence of the screening process existed. We conducted Experiment 3 to test this possibility. EXPERIMENT 3 Many organizations are aware of contrast effects, and train their performance appraisers on how to avoid them. Often, rater training that involves a denition of the effect, and a reminder to appraisers to evaluate performance only against existing job standards, can help to reduce the bias (see Murphy & Cleveland, 1995). To the extent that a warning leads participants away from comparing employees, it should also lead them back to use of the screening process for the promotion decision question. In other words, if the comparisons inherent in the contrast effect are the primary reason the Experiment 2 participants abandoned the screening process, then a warning against such comparisons should
2 An alternative explanation exists for the difference in results between Experiments 1 and 2. Perhaps the low promotion rate found in the 3/3 cell of Experiment 1 was a uke, and would not replicate. Such a possibility would mean that participants did not use the screening process in either experiment. To rule out this possibility, we reran just the 3/3 cell of Experiment 1 (i.e., without the contrast effect manipulation) on 27 other undergraduates (16 females and 11 males with a mean age of 24.51 years). The promotion decision for this group was only .074, which further supports the conclusion that without the contrast manipulation, participants use the screening process to decide on the target employees promotion.

354

JOURNAL OF BUSINESS AND PSYCHOLOGY

result in the Experiment 3 participants once again using this process. We tested this idea by adding a simple warning about the contrast effect to the Experiment 3 scenarios. Method Participants and Design. The participants were 98 undergraduate business students (49 males and 49 females) with a mean age of 25.22 (range 1957) years. Sample sizes per cell ranged from n 24 to n 26. Except for excluding the cell with ve violations and ve nonviolations herewhich would not have added to the interpretation of the Experiment 3 resultsthe design for Experiment 3 was identical to that used in Experiment 2. Materials and Procedure. We made a single modication to the materials used in the previous experiment. On the rst page of each participants scenarios, the following instructions appeared: Sometimes, managers evaluate employees by comparing them with other employees. This, however, is a mistake. For example, if the evaluation of an excellent employee comes before the evaluation of an average employee, managers sometimes give the average employee lower performance ratings then he/she actually deserves. This is an error which leads to mistakes in the performance appraisal process. Please remember when doing your performance appraisals that the performance ratings of one employee should not affect the performance ratings of another employee. So, as you complete the following questionnaire, please evaluate each employee without regard to the other employees performance. All other aspects of this experiment relating to materials and procedure were identical to those used in Experiment 2. Results and Discussion Judgment Data. A principal components factor analysis on the three judgment questions produced a single factor explaining 84% of the variance. We calculated a composite score by averaging across the judgment questions for each participant (with a resulting alpha reliability of .90). Means of the composite scores appear in Table 3. A 2 (one or three violations) 2 (one or three nonviolations) ANOVA produced main effects of violations, F(1,94) 28.1, MSe .821, and nonviolations, F(1,94) 13.9, MSe .821. The Violations Nonviolations interaction, however, was not signicant, F < 1.00. As in the previous experiments, we found evidence of

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

355

Table 3 Mean Performance Appraisal Judgments and Decisions by the Number of Nonviolations and Violations Present in the Experiment 3 Scenarios Number of Nonviolations One Judgment Data Number of Violations One Three (Difference) Decision Data Number of Violations One Three (Difference) Three (Difference)

4.68 (.85) 3.76 (1.1) 0.92* (.94)

5.47 (.72) 4.38 (1.0) 1.09* (1.3)

)0.79* (1.0) )0.62* (.59)

.375 (.50) .125 (.34) .250* (.60)

.667 (.48) .077 (.27) .590* (1.6)

).292* (.60) .048 (.16)

Notes. Each difference with an asterisk is signicant at p < .05, using Tukey LSD. Values in parenthesis are standard deviations for the means, and effect sizes (Cohens d) for the mean differences.

compensatory information use for the judgment items. Namely, participants rated John higher when three nonviolations were present (versus one), and rated John lower when three violations were present (versus one).

Decision Data. Table 3 also lists the mean promotion values by condition for Experiment 3. The logistic regression using the cells containing only one violation again revealed that the number of nonviolations predicted promotion rates (B .602, R2 11%). In contrast with Experiment 2, however, nonviolations here did not signicantly predict promotion rates in the cells containing three violations (B ) .27, R2 1.3%). The interaction revealed by the logistic regressions suggests that the Experiment 3 warning caused participants to once again screen the promotion decision. Specically, in cells with one violation present, the promotion difference between the one nonviolation group (M .375) and the three nonviolations group (M .667) was large. This difference was trivial, however, in cells with three violations present (Ms .125 and .077, respectively). Hence, once the number of violations exceeded the rejection threshold, the number of nonviolations had no effect on the promotion decision. The results of Experiment 3 are similar to those found in Experiment 1, and are consistent with the predictions of image theory. First, judgments here were linearly related to both violations and to nonvio-

356

JOURNAL OF BUSINESS AND PSYCHOLOGY

lations. Second, participants once again used the screening process for the promotion decision question. The warning included here seemed to discourage participants from comparing the two book store employees, which in turn led them back to the screening process.

GENERAL DISCUSSION The present study replicated Gilliland et al. (1998), by showing that people also use a rejection threshold when making performance-evaluation decisions, but not when making performance-evaluation judgments. Hence, at least two contexts existjustice evaluations and performance evaluationswhere people seem to screen decisions but not judgments. Additionally, we discovered here that the screening process itself can be controlled. That is, we were able to decrease, and then increase, participants use of screening via experimental manipulations (i.e., the contrast manipulation in Experiment 2, and the warning in Experiment 3, respectively). We believe this nding has important theoretical and practical consequences, as discussed below. Summary of Findings Experiment 1 replicated the ndings of Gilliland et al. (1998), by having participants judge the performance of a single employee, and then decide whether or not to promote him. Both the decision and judgment data were consistent with the predictions of image theory. First, participants used the screening process for the promotion decision, focusing only on the violations that existed for the target employee. When making performance judgments, however, participants did not rely on screening, and did not use a rejection threshold. Rather, participants made their judgments by considering both the violations and nonviolations of the targets performance, in a compensatory fashion. In Experiment 2, we added a positive contrast manipulation to the scenarios, which caused participants to abandon the screening process for their promotion decisions. We also added a new scenario containing ve violations and ve nonviolations for the target employee, to rule out the possibility that the contrast manipulation simply increased the rejection threshold for each participant. Experiment 2 showed that when a contrast is introduced, the nature of the promotion decision question changes. Specically, participants no longer treated the promotion question as one involving a binary yes/no decision, but instead as one involving comparisons across employees. These across-employee comparisons led participants to bypass screening. As support for this argument, a simple warning against comparing employees in Experiment 3

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

357

resulted in participants once again using the screening process for the promotion decision. Implications We believe one key implication of the present study is that image theory seems to offer a general account of organizational decision making. Both the present study and Gilliland et al. (1998) found support for image theorys predictions with regard to use of the screening process. People seem to screen decisions, but not judgments, when making both performance- and justice-evaluations. A second implication of our study is that experimental manipulations can inuence whether participants screen decisions. The contrast manipulation in Experiment 2 seemed to frame the promotion question as something other than a yes/no decision. Instead, participants seemed to make comparisons across employees, which led them to abandon screening. In Experiment 3, however, the warning apparently redirected that participants focus on the yes/no aspect of the promotion question. In this setting, participants once again used the screening process. Hence, the present study illustratesin a novel fashionhow framing a problem (see, e.g., Dunegan, 1993) inuences participants use of decision-making strategies. Future research might focus on how more direct framing manipulations affect whether or not participants screen decisions. Murphy and Cleveland (1995) stressed the importance of identifying factors that inuence how managers react to information when doing performance evaluation. From the perspective of image theory, the results of the current study revealed that the decision context has a direct impact on the type of information that people will rely on in this process. When presented with a performance appraisal judgment, participants considered all the performance-related behaviors of the target, and used both violations and nonviolations to reach their conclusions. When making promotion decisions, however, people initially considered only examples of poor performance. Further, whether participants treated the promotion question as a decision or a judgment depended also on the existence of both the contrast manipulation in Experiment 2, and the warning in Experiment 3. In many of the our experimental conditions, good performance was taken into consideration only if the rejection threshold had not been surpassed. It follows then, that if a given employee has enough instances of poor performance-related behavior, his/her instances of positive behavior might not even be considered. This consequence may have practical implications relating to perceptions of fairness for the evaluation process. Employees may perceive that the evaluative decision processes in organizations are unfair, if decision makers indeed give

358

JOURNAL OF BUSINESS AND PSYCHOLOGY

preferential weight to instances of poor performance. At the very least, explaining to employees how managers weigh good and bad performance behaviors when deciding on promotions might help to alleviate any potential perceptions of unfairness. Fairness is considered an essential tenet of an accepted and effective performance evaluation system (Gilliland & Langdon, 1998). The ability to understand this evaluative process provides the opportunity to take steps toward improvement. A possible avenue for improvement stems from the nding that contextual factors, such as a positive contrast, can make the processes that underlie judgment and decision tasks nearly identical. Thus, there are aspects of the decision situation that lead people to consider both the violations and nonviolations of an employee. Arguably, this would lead to a better decision, as people would weight all aspects of an individuals performance. Obviously, we are not suggesting that a positive contrast be used to achieve this end. Rather, future research needs to explore what types of contextual factors might bring about the same outcome, without introducing the possibility of bias. For example, one possibility is to make decision-makers accountable for their decisions. London, Smither, and Adsit (1997) argue that accountability actually motivates a rater. Having decision-makers justify their ratings to others causes them to more carefully consider performance behaviors (Williams, DeNisi, & Cafferty, 1985). Thus, accountability may cause decision-makers to abandon the screening process, and instead consider both violations and nonviolations when making evaluative decisions. Limitations of the Present Study The present study is not strong in external validity, as we collected the data in classroom settings, using college students as participants. Also, our scenarios probably did not possess the realism of an actual performance evaluation. However, our primary interest was in testing the predictions of image theory. As such, we opted to design our experiments with an eye toward maximizing internal validity (see, e.g., Mook, 1983). As a rst step, we found that image theory seems to capture the differences in strategy use between making performance appraisal judgments and performance appraisal decisions, albeit in the laboratory. The next step is to see if these ndings hold when more realistic stimuli are used. Hence, although we think the processes discovered here (as opposed to the specic data points in each experimentsee, e.g., Mook, 1983) would generalize to the real world, this is an empirical question that awaits further study.

BRYAN J. PESTA, DARRIN S. KASS, AND KENNETH J. DUNEGAN

359

APPENDIX MEAN JUDGMENT WEIGHTS (AND STANDARD DEVIATIONS FOR THE VIOLATIONS AND NONVIOLATIONS PRESENT IN THE EXPERIMENTAL SCENARIOS General Scenario Description Used in All Conditions Once again it is time for the annual performance appraisal process at the bookstore that you manage. John Snyder is one of your employees. He is a sales clerk in the store. It is his job to sell books, stock shelves, answer customer questions, order books and maintain a neat and orderly store appearance. For the last six months, you have had the opportunity to observe all of your employees as they go about their various tasks each work day. Violations 1) The cash register balance for shifts has not always been accurate, and there have been a few times when John was over or under the correct balance by as much as $10.00. 2) When customers are walking around the store, John often fails to ask if they need help. 3) Employees are expected to shelve a full cart of books each day. However, John usually falls short of this requirement, as he shelves just under a stack of books a day. Nonviolations 1) When John is at the counter, he is quick to answer phones, and he is courteous and helpful to customers that call. 2) When John stocks the shelves, he makes sure the books in the section are in the proper order and are neatly arranged on the shelves. 3) John is very helpful when customers approach him. He answers their questions and checks to see if the store has the books they are interested in. M 5.78c SD 1.41

5.26b 4.75a

1.37 1.35

6.00c 5.55b,c

1.02 0.98

6.26d

0.90

Notes. These data are from the pilot study, with N = 69. Means not sharing superscripts differ at p < .05, using Tukey LSD.

REFERENCES
Beach, L. R. (1990). Image theory: Decision making in personal and organizational contexts. Chichester. England: Wiley. Beach, L. R. (1996). Decision making in the workplace: A unied perspective. Newark, New Jersey: Lawrence Erlbaum Associates. Beach, L. R. (1998). Image theory: Theoretical and empirical foundations. Newark, New Jersey: Lawrence Erlbaum Associates. Beach, L. R., & Mitchell, T. R. (1987). Image theory: Principles, goals, and plans in decision making. Acta Psychologica, 66, 201220. Beach, L. R., & Mitchell, T. R. (1990). Image theory: A behavioral theory of decision making in organizations. Research in Organizational Behavior, 12, 141. Beach, L. R., & Mitchell, T. R. (1996). Image theory, the unifying perspective. In L. R. Beach (Ed.), Decision-making in the workplace: A unied perspective (pp. 120). Mahwah, NJ: Lawrence Erlbaum Associates. Beach, L. R., & Strom, E. (1989). A toadstool among the mushrooms: Screening decisions and image theorys compatibility test. ACTA Psychologica, 72, 112.

360

JOURNAL OF BUSINESS AND PSYCHOLOGY

Dunegan, K. J. (1993). Framing, cognitive modes and image theory: Toward an understanding of a glass half full. Journal of Applied Psychology, 78, 491499. Dunegan, K. J. (1995). Image theory: Testing the role of image compatibility in progress decisions. Organizational Behavior and Human Decision Processes, 62, 7986. Gilliland, S. W., Benson, L. III, & Schepers, D. H. (1998). A rejection threshold in justice evaluations: Effects on judgment on decision-making. Organizational Behavior and Human Decision Processes, 76, 113131. Gilliland, S. W., & Langdon, J. C. (1998). Creating performance management systems that promote perceptions of fairness. In J. Smither (Ed.) Performance appraisal: State of the art in practice (pp. 209243). San Francisco: Jossey-Bass. Kravitz, D. A., & Balzer, W. K. (1992). Context effects in performance appraisal: A methodological critique and empirical study. Journal of Applied Psychology, 77, 2431. London, M., Smither, J. W., & Adsit, D. J. (1997). Accountability: The Achilles heel of multisource feedback. Group and Organizational Management, 22, 162184. Maurer, T. J. & Alexander, R. A. (1991). Contrast effects in behavioral management: An investigation of alternative process explanations. Journal of Applied Psychology, 76, 310. Maurer, T. J., Palmer, J. K., & Ashe, D. K. (1993). Diaries, checklists, and contrast effects in measurement of behavior. Journal of Applied Psychology, 78, 226231. Mook, D. (1983). In defense of external invalidity. American Psychologist, 38, 379387. Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. California: Sage Publications. van Zee, E. H., Paluchowski, T. F., & Beach, L. R. (1992). The effects of screening and task partitioning upon evaluations of decision options. Journal of Behavioral Decision Making, 5, 123. Williams, K. J., DeNisi, A. S., & Cafferty, T. P. (1985). The role of appraisal purpose: Effects of purpose on information acquisition and utilization. Organizational Behavior and Human Performance, 35, 314339.