Anda di halaman 1dari 11

Behavioraliy Anchored Rating Scales:

Some Theoretical Issues^


ROBERT S. ATKIN
Carnegie-Mellon University
EDWARD J.CONLON
Georgia Institute of Technology

A number of theoretical problems exist which underline the develop-


ment and implementation of behaviorally anchored rating scales in
particular and all performance evaluation procedures in general. Sev-
eral specific areas need additional study. It is concluded that future
research should concentrate on the process of performance evaluation
in the framework of a cognitive, information-processing model of the
rater.

A recent review by Schwab, iHeneman, and mental cost incurred during the development of
De Cotiis (11) concluded that behaviorally an- BARS. Exactly why BARS have not lived up to
chored rating scales (BARS) have not been dem- early expectations is not clear; further empirical
onstrated as superior to alternate evaluation in- investigation may demonstrate the hypothesized
struments; they showed that a good empirical relative superiority. However, another possibil-
case has not yet been made to justify the incre- ity is that insufficient theoretical work has been
focused on BARS, with a resulting deficiency in
Robert S. Atkin (Ph.D. candidate — University of Illinois) is As- the basic understanding of these instruments.
sistant Professor of Behavioral Science at the Graduate School The purpose of this paper is to examine sev-
of Industrial Administration, Carnegie-Mellon University, eral theoretical problems inherent in BARS, in-
Pittsburgh, Pennsylvania.
ciuding: (a) scaiing considerations; (b) relative
Edward ). Conlon (Ph.D. candidate — Carnegie-Mellon Uni- frequency of critical incidents; (c) model of the
versity) is Assistant Professor of Industrial Management at
Georgia Institute of Technology, Atlanta, Georgia.
' The authors would like to thank Rio Sussmann, Hans Pen-
Received 11/22/76; Revised 2/3/77; Accepted 3/30/77; nings, Herb Heneman, Myron Washil and an anonymous re-
Revised 5/20/77. viewer for their comments on earlier versions of this paper.

119
120 Behaviorally Anchored Rating Scales: Some Theoretical /ssuej

rater; (d) reiationship between performance di- certain dimensions may be dropped
mensions, job analysis, and multiple raters; and The remaining incidents whose mean
(e) BARS in light of a more general theory of per- scores ("scale values") represent nearly
formance evaiuation. equally spaced intervals on the "good-
bad" continuum are chosen.
Scaling Considerations 6. The resultant set of performance dimen-
sions, each with a set of ordered and
The deveiopment of a BARS requires the scaled incidents, is referred to as BARS,
identification of a set of "performance dimen- a retranslation scale or a behavioral ex-
sions" and a set of "incidents" which are repre- pectation scale.
sentative of a wide range of actual behaviors
which job incumbents have displayed in the Although several variants of this procedure
past. The procedure introduced by Smith and exist, underlying all of them are the following
Kendall (13) and modified by subsequent re- assumptions: (a) more reliable and more valid
searchers usualiy has the foilowing developmen- ratings of performance will be obtained if actual
tal phases: incidents of behavior, rather than arbitrary ad-
jectives, are used as anchors or "benchmarks'
1. Supervisors of a group of employees, on the performance dimensions; (b) use of two
performing simiiar jobs, are asked to sets of supervisors will improve the validity of the
identify those broad sets of job activities behavioral incidents and the reliability of the in-
that comprise the job. For programmer/ cident scale values; (c) use of two sets of super-
analysts, examples of such "perform- visors will improve the validity of performance
ance dimensions" might be coding and dimensions; and (d) reliability of incident scale
documentation. values and the validity of the incidents and of
2. The same supervisors generate a set of the dimensions wiii be improved if the incidents
critical incidents that represent actual are written in the jargon of the job. The refer-
examples of very good and very poor ences in the above sentence to "improve relia-
performance on each performance di- bility or validity" is to improvement reiative to
mension. non-BARS type scales.
3. Each member of a second, independent In practice, the rater (who usually partici-
group of supervisors then rates each in- pated in either phases 1 and 2, or 3 above) is told
to evaluate one employee at a time by: (a) read-
cident on a "good-bad" continuum and
ing a description of the performance dimen-
also identifies the dimension to which
sion; (b) reading each incident in a particular
each incident belongs.
order (usually from "poorest" to "best") and de-
4. The instrument-developer then identi- ciding whether or not the incident just read
fies that set of incidents which was sys- represents behavior "most typically expected"
tematically associated by the second of the employee; and (c) if the incident does
group with the original performance not represent that "most typically expected,"
dimension. The criterion for association then the rater should proceed to the next inci-
varies by study and frequently results in dent and repeat steps 1-3; if the incident does
the "dropping" of dimensions due to represent the "most typically expected" behav-
° lack of retained items. ; ior, it should be so marked, and the rater should
5. The variance of each remaining incident proceed to the next dimension. The scale value
is then examined; if it exceeds some cri- of the incident is then the "score" of the em-,
terion, the incident is not retained, and pioyee on the dimension.
Academy of Management Review - January 1978 121

to behave this way, but let's see what the next


Very Postitive incident says." Assume the rater makes a similar
Incident I - judgment for incidents G and H. Finally, the
rater decides that the employee would probably
never be expected to behave as described in in-
cident I. What incident should be rated as "most
Incident H -
typical"?
The point being made here is: a "good"
Incident G - ratee (i.e., an employee ultimately rated above
the neutral point on the dimension of interest)
Incident F - would be expected to perform aii of the behav-
iors described in the incidents at or above the
Neutral
neutral point up to some maximum (in the ex-
Incident E • ample above, incidents F, G, and H), and would
not be expected to perform the behaviors
"above" this point (incident I) or "beiow" the
netural point (incidents A-E). This point can be
Incident D •
made more meaningful by examining Figure 2,
which depicts the "meeting day-to-day dead-
Incident C • lines" dimension for department managers pre-
sented in Campbell, Dunnette, Lawler, and
Incident B • Weick (2). Although these authors did not give
the scale values for each incident, their content
Incident A suggests that item A is the most negative, item E
is about neutral and item i is the most positive.
Very Negative As with the above example, let us assume that
the rater decides that the ratee would not be
expected to behave in the manner described in
Figure 1. A Hypothetical Scaling of Incidents on a incidents A-E, would be expected to behave in
Particular Dimension of a BARS. the manner described in F, G, and H, and would
not be expected to behave in the manner de-
Let us examine this task for a moment from scribed in I. Which item should be rated as "most
the rater's point of view. Assume that a particu- typical"? The manager who is rated as "could be
lar performance dimension has 9 incidents, of expected to meet deadlines comfortably . . ."
which 4 are distinctly negative, 1 is slightly "be- (H) wouid also be expected ". . . to get his (her)
low" neutral and 4 are distinctly positive (see associates' work schedules made out on time"
Figure 1). Foiiowing steps b and c in the previ- and ". . . to meet seasonal ordering deadlines"
ous paragraph, assume that the rater reads each (F and G). However, this manager would not
of the first 5 incidents (A, B, C, D, and E on Fig- necessarily be expected ". . . never to be late"
ure 1) and concludes with the implicit statement (i) or ". . . to offer to do the orders at home after
that "the employee would never be expected to failing to get them out on deadline day" (E) or
behave in this manner." Hence, the employee's to behave in the manner described in any of
fnost typically expected ievei of performance on the other negative items. Does "most typical"
this dimension must be in the positive range. refer to the most positively scaled incident that
Reading the next incident (F), assume the rater would be expected (H in the example) or does
thinks "yes, this empioyee would be expected it refer to one of the other expected incidents?
122 Behaviorally Anchored Rating Scales: Some Theoretical /ssues

Incident 1 Could be expected never to be late Confusion about whether the BARS proce-
in meeting deadlines, no matter how dure results in an interval or an ordinal scale
unusual the circumstances. stems from the fact that the assignment of scale
values assumes that the categories on the "good-
Incident H Could be expected to meet deadlines bad" continuum (in phase 3 above) have inter-
comfortably by delegating the writing
of an unusually high number of orders val properties. Such an assumption may not be
to two highly rated selling associates. particularly valid (6); if the assumption is vio-
lated, the resultant scale is most likely ordinal in
Incident G Could be expected always to get nature. The importance of this issue resides in
his/her associates' work schedules the ultimate use of the instrument. If the goal of
made out on time.
the performance evaluation is a ranking of indi-
incident F Could be expected to meet seasonal viduals on each job dimension, then ordinal
ordering deadlines within a reason- scales are sufficient. But if the intent is to iden-
able length of time. tify some absolute level of performance (which
might map onto a specific salary increment),
Incident E Could be expected to offer to do the
orders at home after failing to get
then the non-interval aspect becomes very im-
them out on the deadline day. portant.
The second issue concerns the trace line or
Incident D Could be expected to fail to operating characteristic of the BARS incident.
schedule additional help to
complete orders on time.
In reexamining the example presented earlier,
recall that explicit endorsement of an incident
Incident C Could be expected to be late all above the neutral point (xy) implied endorse-
the time on weekly buys for his/her ment of all other incidents (xj's) between the in-
department. cident in question and the neutral point in). En-
Incident B
dorsement of X i also implied that incidents with
Could be expected to disregard due
dates in ordering and run out of a scale values less than n or greater than if j have a
major line in his/her department. decreasing probability of being implicitly en-
dorsed. Hence, for explicit endorsement of an
Incident A Could be expected to leave order incident above the neutral point, the trace line is
forms in his/her desk drawer for several non-monotonic across the entire range, but
weeks even when they had been given
to him/her by the buyer after calling "flat" in the range n to x j . Close examination of
his/her attention to short supplies and what is happening in the range reveals that ex-
due dates for orders. plicit endorsement of incident H in the example
implies that the rater expects the employee to
Figure 2. Behaviorally Anchored Rating Scale for exhibit the behavior described in incidents F
the Dimension "Meeting Day-to-Day Deadlines" and G and H.
for Department Managers. The critical point is that this cumulative
property is characteristic of Cuttman scales (as
Source: Modified slightly from Managerial Behavior, Perform-
is the flat trace line). But the basic scaling proce-
ance and Effectiveness by |. Campbell, M. Dunnette, E,
Lawler, and K, Weick (N.Y,: McGraw-Hill, 1970), 122,
dure most frequently used (in phase 3 above) is
the Thurstone method of equal-appearing inter-
vals. Items scaled by this method would be ex-
In addressing this question, consider first pected to have nonmonotonic trace lines, but
(a) the interval or ordinal quality of the BARS, they would not be expected to have "flat"
and (b) the trace line (or operating characteris- ranges.
tic) of the BARS incident. This situation is confounded by considering
Academy of Management Review - January 1978 123

the case of the individual receiving a rating be- explicitly whether each incident on every di-
low the neutral point. Returning to Figure 1, mension would or would not "be expected" of
assume that the individual in question has just the employee being evaluated, (Heneman [per-
been rated at level B. Presumably, the "distant" sonal communication) suggested that some rat-
incidents at the positive end of the scale have ers may treat each incident as a "mini-dimen-
essentially a zero probability of occurrence, and sion" and hence would like to make an evalua-
the more negative incident (A) has a lower prob- tion of each ratee on each incident. The ap-
ability than does B. However, what can be said proach discussed in this article would alleviate
about those incidents "between" B and the neu- this as a potential "problem" by making such
tral point? In the case of the positive rating, evaluations explicit.) This would allow the iden-
those between neutral and the rated level pre- tification of a range of expected behaviors, or a
sumably have a probability of occurrence rough- pair of thresholds, "above" and "below" which
Ily equivalent to that of the rated level. But in the rater would not expect the employee's be-
this case, that is most likely not so. It seems rea- havior to occur. The median (or perhaps some
sonable to hypothesize that the probability of other central tendency measure) of the scale
being rated as performing at a higher level would value of those incidents checked as "expected"
monotonically decrease as the items became also might be operationalized as the employee's
more distant. If this is the case, the item trace rating on the dimension. But, such a range may
line for a person rated as below neutral point be incomplete in the following sense: not all
probably would be nonmonotonic and would behaviors need be noted as "expected" within a
resemble that expected for a Thurstone scale. range defined by the "expected" incidents with
Returning to the earlier question — what is the highest and lowest scale values. Furthermore,
the implication of checking an incident as most the use of any measure of central tendency is in-
typical?—the following hypotheses are of- consistent with the "Cuttman part" of the scale.
fered. If "most typical" refers to an incident However, the use of a two parameter rating —
above the neutral point on a performance di- central tendency and range — appears to cap-
mension, then explicit endorsement of that in- ture much of the information in this "part" of
cident implies endorsement of all other inci- the scale, while also being representative of the
dents between neutral and the explicit endorse- "non-Cuttman part" of the scale.
ment (the "endorsement range"). It also implies
in increasing probability of nonendorsement of Relative Frequency of Critical Incidents
ill other incidents as a function of their distance
rom the extremes of the "endorsement range." The use of critical incidents (referring not
inally, if "most typical" refers to an incident be- to routine behaviors, ". . . but rather (to) those
low the neutral point, we would hypothesize essentials in job performance which make the
the probability of endorsement of a specific difference between success and failure" (9, p.
other incident varies as a function of the differ- 56)) as "behavioral benchmarks" presents an in-
ence between the scale score of the other inci- triguing dilemma. Theoretically it is doubtful
dent and the explicitly endorsed one. that an incident describing mid-range or neutral
What implications does this have for the behaviors is really critical. Yet because most be-
of BARS? The answer, if one exists, requires havior occurs in the mid range, it is there that ac-
empirical investigation. Perhaps at that time, the curate appraisal is particularly necessary. Ex-
'ues raised here will be found to be moot. But tremely good and extremely poor performers
is analysis strongly suggests that current prac- could probably be identified by much coarser
e concerning the rater's task be modified evaluation systems.
Specifically, the rater should judge The problem is complicated somewhat by
124 Behaviorally Anchored Rating Scales: Some Theoretical Issues

the fact that certain critical incidents may have may pose particular problems for the rater, A|.
very low or very high probabilities of occurrence. though we have proposed a means to circum-
Some users of BARS attempt to account for this vent the problem of the "critical" mid-range in-
by screening incidents on a frequency of occur- cident, this area requires future empirical effort.
rence criterion, retaining only those with a base
rate that is neither high nor low. But this seems Model of the Rater
to introduce another dilemma: does use of such
a criterion artifically distort the performance do- what does the rater actually do when con-
main? fronted with a BARS task? Does the rater actual-
It would seem that the criterion of "critical" ly read and evaluate each incident vis a vis a giv-
should not enter into the "generation of inci- en employee and then use that evaluation to in-
dents" step in BARS development (phase 2 fer "how good" the employee is (the "intended"
above). Rather, emphasis should be placed on strategy), or does the rater use some alternate,
developing an inventory of possible or historical "unintended" strategy?
behaviors pertinent to the dimension in ques- Although feasible alternative strategies are
tion. These would then be "scaled" by judges in easy to identify, the actual strategies used by
terms of the degree of association with poor or raters have not been subjected to direct empiri-
excellent performance in the defined area. The cal examination. Consider the situation in which
rater would indicate all of the behaviors ex- the rater sorts through the incidents seeking one
pected of the employee, using a central tenden- that matches some a priori estimate of "how
cy measure to determine the employee's rating. good" the employee is on that particuiar dimen-
A very different view of the frequency ques- sion. In such a case, both the employee and the
tion suggests that only the infrequent incidents behavioral incidents are mapped onto the rater's
would provide reliable ratings. Specifically, the subjective scale of "good-poor" performance.
observer (rater) can make reliable appraisals of This compares with the intended strategy ol
the actor (ratee) only to the extent that the be- mapping the employee onto a previously scaled
havior of the actor provides a moderate amount incident.
of unequivocal information. The attribution Perhaps the difference in strategy seems
model of Jones and Davis (8) suggests that unique trival, and perhaps it is. Before drawing such
inferences may be made only to the extent that conclusion, two points must be considered
an actor's behavior deviates from that expected First, the claim of BARS advocates is that a priori
in the sample (i.e., only when a relatively low evaluations do not enter into the rater's selec-
base rate behavior occurs or a relatively high tion of the "most typical" behavior since the
base rate behavior does not occur). One inter- method presumes to eliminate such biases by
pretation of this model is simply that "stand- focusing the rater's task on actual or expected
ard" behaviors do not contain much unique in- job-related behavior without prior evaluations
formation about the actor. An alternative inter- of the individual or the behavior. The validity of
pretation, especially important in this context, is this presumption is a critical issue. Assume thata
that "standard behavior" may simply not be per- rater has two employees. El and E2, and that the
ceived, or even if perceived, may not be proc- rater has an a priori evaluation that El is "bettef
essed and stored in the same way as "non-stand- than" E2 on dimension X. Further assume that o"
ard" behavior. Hence at the time of rating, raters the basis of "objective frequency," the "most
may not have enough information about the per- typical" incident is the same for both employ-
formance of standard behaviors to use them in ees. Will the rater be more likely to rate E2 belo**
the BARS context. El? This is an empirical question which deserve*
Thus, incidents in the mid-range of BARS attention.
Academy of Management Review - January 1978 125

If in the above discussion, E1 did receive share a relationship that Jones and Davis (8) have
more positive ratings than E2, the second issue called "hedonic relevancy," described as ". . . .
emerges. Specifically, what might contribute to the extent that a person's action proves reward-
the rater's a priori evaluation? Raters may con- ing or costly to the perceiver" (7, p. 69).
struct non-veridical, prior evaluations based on Arguably, people do not make attributions
a number of factors. Strickland (14) found that unless forced to do so by the experimenter (and
role-playing supervisors appear to extend great- the ubiquitous questionnaire). While individ-
er "trust" to subordinates whom they have had uals may not and probably do not constantly at-
less opportunity to supervise. In Strickland's tend to attributional issues, there do appear to
study, two subordinates (A and B) had equiva- be naturally occurring situations that are equiva-
lent objective levels of performance. On an ini- lent to the experimenter's questionnaire. Most
tial set of trials, the experimental treatment re- obvious of these is the performance evaluation.
quired the supervisor to monitor A more fre- Typically, performance evaluation requires that
quently than B. During subsequent trials, the a supervisor use a rather imperfect set of data to
supervisor was allowed to allocate his or her make inferences about aspects of employee per-
monitoring of the two parties. The supervisor formances. Frequently these inferences concern
continued to monitor A more than B and also predictions about future behavior in new (not-
attributed greater trust to B. In substantially re- yet-experienced) conditions or postdictions
plicating these findings, Kruglanski (10) also about what it was that "got" the employee to
suggested that monitoring may be affected by a his/her present situation. Inferences about ante-
number of supervisor motivations. Deci, Ben- cedents or consequents, required by the evalu-
ware, and Landy (3) also demonstrated that ob- ation system (given a relatively imperfect and
servers attribute internal or external motivation usually static set of information), are examples of
to actors with identical levels of performance as attributions that are probably not artifacts of the
a function of different environmental conditions. instrument as much as requisites of the task,
Although these studies do not directly ex- A problem very similar to that of hedonic
amine the relationship between attribution of relevancy is the relationship of rater "ego-in-
the cause of behavior and actual, on-the-job volvement" with the object of the evaluation
performance evaluations, they raise some pro- and the rater's subsequent evaluation. This prob-
vocative questions. In most situations, behavior lem dates to early criticisms of Thurstone's use
perceived as "internally motivated" would re- of judges to obtain scale values for statements
ceive a relatively higher performance rating than about attitude objects, the primary interest be-
behavior perceived as "externally motivated", ing to determine the effect that judges' attitudes
even if both sets of behavior were equivalent. toward the object had on their scaling statements
Yet, by definition, such "non-behavioral" infor- about that object. Sherif and Sherif (12) argued
mation is not admissable as a behavioral inci- that ". , , the distributions of judgments by a
dent in the BARS methodology. highly involved person are bimodal . , . (result-
As a second example, the supervisor's eval- ing in) categories with which he strongly agrees
uation of the subordinate" actually provides a and disagrees occurring at the expense of cate-
tacit evaluation of the supervisor. Specifically, gories intermediate to them" (p. 342-343).
consider the problem of supervisor A providing In the BARS context, this phenomenon may
extensive training for employee B during some have an impact in two places: scale develop-
time period and then subsequently evaluating ment, which may affect the decision to retain or
B's performance. A's ability to train subordinates drop certain incidents and dimensions, and
and A's subsequent evaluation of one such scale use, which may result in certain supervisors
subordinate are not independent. Instead they using the scales on certain dimensions. Consider
126 Behaviorally Anchored Rating Scales: Some Theoretical /ssues

the following hypothesis about the latter: to the probabilities: (a) the relationship between the
degree to which a particular rater believed that environmental probability of incidents and the
a particular dimension is substantially more im- definition of a job, and (b) the relationship be-
portant than others, he/she would tend to de- tween the incidents and performance dimen-
fine a relatively narrow range of acceptable be- sions developed by the "standard" procedure,
haviors, a relatively broad set of unacceptable and the organizational position of the individ-
behaviors, and a virtually null set of neutral be- uals involved in the development.
haviors. As an illustration, consider the situa- In regard to (a) above, we may ask — what is
tion of the rater who strongly believes that "doc- the role of job analysis in the development of a
umentation" is a more critical aspect of the pro- BARS? At the naive conceptual level, the dimen-
grammer/analyst job than is "interaction with sions identified by a job analysis should at least
users." We would hypothesize that such a rater be roughly congruent with those derived in
would: (a) tend to classify employees as " g o o d " phase 1 of the BARS development, and Dun-
or "poor" on documentation as opposed to " i n - nette noted that "behavior-based job analyses
between"; (b) tend to use the "in-between" tell us which behaviors are desired on the job
range on "interaction . . ." more than either ex- and which are necessary for getting the job done
treme; and (c) tend to have a very "narrow" def- properly" (4, p. 85). Yet job analysis appears
inition of good behavior and a very "broad" rarely to precede the BARS development. The
definition of poor behavior on documentation. first two steps of the developmental procedure
Furthermore, a priori evaluations probably are apparently used as a surrogate for the job
would tend to be more likely for documentation analysis. Although these steps do provide the
than for interaction. broad behavioral dimensions that underlie a job
(a typical aspect of most job analyses), there are
Performance Dimensions, Job Analysis differences between the two.
and Multiple Raters First, the BARS procedure uses presumed
performance dimensions to generate behavioral
The development of a BARS assumes that incidents, while job analytic procedures usually
the behavioral incidents selected for inclusion reverse this sequence. Second, the components
have a relatively equal "environmental probabil- or factors (i.e., job dimensions) that emerge from
ity" (objective opportunity) for each employee the job analysis are usually orthogonal; those
to actually demonstrate the activities included from the BARS procedure generally are not. One
in the scale. Parameters such as seniority, "ter- possible outcome of a job analysis is a "cluster-
ritory," or the age of ancillary equipment might ing" of employees in terms of "the amount" of
seriously affect the environmental probabilities each job component or factor that is inherent in
of certain behaviors. For example, consider the the work actually performed by the individual
Fogli, Hulin and Blood (5) study of grocery employee with each cluster considered as a spe-
checkers. If one checker had been "randomly" cific job (4).
and permanently assigned to an "express" reg- This implies that a class of individuals with a
ister, the resultant mix or quantity of groceries common job title might actually consist of several
packed might be altered relative to other check- clusters of individuals representing very differ-
ers, thus altering the environmental probabil- ent activity patterns, and hence, different jobs.
ities of the incidents. Two serious issues concern- The BARS approach assumes that some class of
ing the applicability of BARS in many jobs, es- individuals all have a common job and hence, a
pecially those that are complex and/or per- single activity pattern. Disagreement among the
formed in diverse environments, follow from raters involved in phases 1-5 is treated as unreli-
the phenomenon of unequal environmental able data about a presumably singular job. It
127
Academy of Management Review - January 1978

seems that raters' lack of agreement on which onstrate higher interjudge reliability within
dimensions underlie the " j o b " in question group on those dimensions developed by the
(phase 1), or which incidents are sorted back group. Also observed were superior within-
onto which dimensions (phase 4) represents a group convergent and discriminant validity co-
confounding of rater unreliability and job mul- efficients on those dimensions developed by the
tiplicity. Whether raters are unreliable in their group. This suggests that different groups of
reporting of the characteristics of a single job or raters not only focus on different aspects of the
whether they are reliable in their reporting of job domain, but also that they may have differ-
the characteristics of more than one job is un- ential expertise in rating these aspects.
known. The assumption of equal environmental
probability of incidents across employees is Discussion
equivalent to the assumption of a single com-
mon job. Although the effect of dropping di- To see the theoretical problems inherent in
mensions and incidents during the BARS devel- behaviorally anchored rating scales in the prop-
opment may be useful in identifying a common er perspective, we need to examine them in
core of dimensions across jobs, it should not be terms of a more general theory of performance
confused with the identification of the complete evaluation. First, although the workplace tends
set of dimensions for a single job. to constrain individual behavior, behavior which
The second issue raised by the problem of is emitted may be quite varied. As noted by
unequal environmental probabilities is that Campbell, et al. (2), only a portion of the indi-
even if all members of the target class do hold a vidual's total set of behaviors directly impacts on
common job, the BARS procedure may not ade- performance. Presumably it is this subset of per-
quately reflect the entire domain of dimensions formance relevant behaviors that should be
or incidents. One approach to this issue is to ex- evaluated during performance appraisal and
amine the dimensions and incidents that would which the BARS technqiue attempts to isolate.
be produced by different groups of potential But is not at all clear that this subset is the only
raters, the strategy adopted in a recent study by input used by the evaluator or that it contains
Borman (1) in which supervisors and incumbents all the behaviors relevant to performance.
separately developed a BARS for the incum- Since performance relevant behavior is an
bents' job (university secretaries). Four dimen- input, it is necessary to develop techniques cap-
sions were developed by the incumbents (sec- able of measuring it. The considerable effort
retaries), and three were developed by super- expended in this area for some time has taken
visors (faculty members). Although Borman did two slightly different paths. The first has focused
not explicitly attempt to measure the degree to on identification of the proper behavioral di-
which each set of dimensions sampled a differ- mensions to evaluate, while the second has
ent "part" of the job domain, he stated that the been concerned with issues of reliability and
validity, resulting in a search for instruments
. . , content of the two sets of scales seemed to which are internally reliable, stable over time,
reflect each rater group's 'bias' in terms of content valid, and free of rater bias. From this
position to observe ratee behavior , . , (and
that the instructions) resulted in two sets of perspective, BARS represent an advanced type,
dimensions at least partially independent although they, too, appear to suffer from the
from each other conceptually (1, p-112). several afore mentioned technical maladies.
Presumably all seven of the dimensions But, since factors other than performance
should ultimately be employed in a perform- relevant behaviors do enter the evaluation proc-
ance evaluation. But Borman's findings sug- ess, the traditional strategy of instrument devel-
gested that each group of raters tended to dem- opment does not provide us with the tools nee-
128
Behaviorally Anchored Rating Scales: Some Theoretical /

essary to precisely investigate the variance con- information-processing model of the rater. This
tributed by these "other factors". Examination framework would allow us to deal directly with
and explication of these factors as bona fide such issues as the impact of attribution on eval-
phenomena, rather than as sources of error, may uation, the degree to which performance rele-
ultimately result in the construction of improved vant behaviors actually enter the formal evalua-
rating devices. The identification of what these tion, and the degree to which non-performance
"other factors" may be moves beyond the scope relevant behaviors enter the evaluation.
of this article and may be beyond the scope of Ultimately, we need to examine the per-
the present state of development of our knowl- formance evaluation for what it really is: a com-
edge of the evaluation process. plex decision-making task, one which might be
This article stresses the need to break from better studied by concentrating on the process
our myopic concern with instrumentation and of this particular form of decision-making than
to examine the rater, by focusing on a cognitive. by continuing to focus only on instrumentation.

REFERENCES
Borman, W. C. "The Rating of individuals in Organiza- 8. Jones, E., and K. Davis. "From Acts to Dispositions: The
tions: An Alternative Approach," Organizational Behav- Attributional Process in Person Perception," in L. Berko-
ior and Human Performance, Vol. 12 (1974), 105-124. witz (Ed.), /Advances in Experimental Social Psychology,
Campbell, J., M. Dunnette, E. Lawler, and K. Weick. Man- Vol. 2 (New York: Academic Press, 1965).
agerial Behavior, Performance, and Effectiveness (New 9. Kirchner, W., and M. Dunnette. "Using Critical Incidents
York: McGraw-Hill, 1970). to Measure Job Proficiency Factors, " Personnel, Vol. 34
Deci, E. L., C. Benware, and D. Landy. "The Attribution (1957), 54-59.
of Motivation as a Function of Output, Rewards and the 10. Kruglanski, A. "Attributing Trustworthiness in Supervi-
Contingency of Pay," Journal of Penonality, Vol. 42 sor-Worker Relations," Journal of Experimental Social
(1974), 652-667. Psychology, Vol. 6 (1970), 233-247.
Dunnette, M. Personnel Selection and Placement (Bel- 11. Schwab, D., H. Heneman, and T. DeCotiis. "Behaviorally
Anchored Rating Scales: A Review of the Literature,"
nnont, Calif.: Wadsworth, 1966).
Personnel Psychology, Vol. 28 (1975), 549-562.
Fogli, L., C. Hulin, and M. Blood. "Development of First-
12. Sherif, M., and C. Sherif. Social Psychology (New York:
Level Behavorial Job Criteria," Journal of Applied Psy-
Harper and Row, 1%9).
chology, Vol. 55 (1971), 3-8.
13. Smith, P. C , and L. M. Kendall. "Retranslation of Expec-
Green, B. "Attitude Measurement," in G. Lindzey (Ed.), tations: An Approach to the Construction of Unambigu-
Handbook of Social Psychology, 1st ed., (Reading, Mass.: ous Anchors for Rating Scales," Journal of Applied Psy-
Addison-Wesley, 1954). chology, Vol. 47 (1%3), 149-155.
Hastorf, A., D. Schneider, and J. Polefka, Person Percep- 14. Strickland, L. H. "Surveillance and Trust," Journal of Per-
tion (Reading, Mass.: Addison-Wesley, 1970). sonality, Vol. 26 (1958), 200-215.

Anda mungkin juga menyukai