Jennifer A. Stillman
School of Psychology
Massey University Albany
William Kirkley
Department of Management & International Business
Massey University Albany
Interest in exercise effects commonly observed in assessment centers (ACs) has resurfaced with Lance, Lambert, Gewin, Lievens, and Conways 2004 study. The study
presented here addressed the construct validity puzzle associated with ACs by investigating whether traditional trait-based overall assessment ratings (OARs) could be
explained by behavioral performance on exercises. In a sample of 208 job applicants
from a real-world AC, it was found that the multivariate combination of scores from
three behavioral checklists explained around 90% (p < .001) of the variance in supposedly trait-based OARs. This study adds to the AC literature by suggesting that
traditional OARs are predictive of work outcomes because they reflect exercise-specific behavioral performance rather than trait-based assessments. If this is the case,
validity and efficiency are best served by abandoning redundant trait ratings (dimensions) in favor of more direct behavioral ratings.
Correspondence should be sent to Duncan J. R. Jackson, Department of Management & International Business, Massey University Albany, Private Bag 102904, North Shore MSC, Auckland, New
Zealand. E-mail: d.j.r.jackson@massey.ac.nz
416
JACKSON ET AL.
417
used as the basis for employment decisions. A reasonable take on the literature
suggests that there have been few analyses of these exercise-specific behaviors,
and the focus has been primarily on dimensions (see Jackson et al., 2005; Russell
& Domm, 1995).
BACKGROUND
Lance et al. (2004) reemphasized long-held concerns associated with the measurement properties of ACs. They also aimed to minimize various biases that can result
from specifying particular parameter constraints in structural models. In a
reanalysis of Lievens and Conways (2001) large-scale review of AC measurement
properties, and contrary to the suggestions in that article, Lance et al. concluded
that exercise effects are prominent in ACs. In many ways, the arguments presented
by Lance et al. have taken the AC measurement debate full circle, bringing attention back to concerns that were raised more than 20 years ago on method-driven
variance outweighing any dimension variance that was intended for assessment in
an AC (Sackett & Dreher, 1982).
The legacy of AC research over the last 2 decades, with respect to construct validity, has been an unsatisfactory and nonresolute position for both academics and
practitioners. Construct validity issues hold clear implications for practice in, for
example, developmental ACs, where feedback may be provided on the basis of
constructs for which there is no psychometric justification. Moreover, selecting
people on the basis of nondefensible constructs has clear implications for fairness,
accuracy, and efforts to both understand and improve an organizations assessment
procedures (Gatewood & Feild, 2001).
418
JACKSON ET AL.
FIGURE 1
419
ternational Task Force on Assessment Center Guidelines, 2000) that data from an
AC be pooled in some way, either mechanically (e.g., calculating an average) or
through a group discussion.
The process that is traditionally followed to arrive at an OAR is intricate and
commonly involves a series of inferences and aggregations. The approach that is
typically prescribed for dimension-specific ACs (e.g., see Ballantyne & Povah,
2004) can be summarized into three main steps. These are depicted in Figure 1.
The first step is to observe and record behavior within a set of simulation exercises.
Researchers from both the task-specific and dimension-specific schools of thought
recommend using behavioral checklists at this stage (Ahmed, Payne, & Whiddett,
1997; Lievens, 1998; Lowry, 1997). The second step involves inferring trait-based
dimensions from behavior that has been observed and recorded. This information,
and that derived from the following step, is often used for making employment decisions (Spychalski, Quiones, Gaugler, & Pohley, 1997). The third, and usually
final, step is the adoption of an OAR. This is achieved through either mechanical
integration or a discussion held by assessors. Research suggests that no meaningful difference is observed between the predictive validity of OARs that are derived
either mechanically or by way of discussion (Pynes & Bernardin, 1992; Pynes,
Bernardin, Benton, & McEvoy, 1988).
420
JACKSON ET AL.
Research over several decades has shown that ACs, and more specifically
OARs, display evidence of predictive validity (Arthur et al., 2003; Gaugler et al.,
1987; Schmidt & Hunter, 1998). The reason why they hold predictive properties
remains unsolved in the light of Lance et al.s (2004) study. We hypothesize that if
a task-specific interpretation is substantiated, behavioral indicators of performance on exercises, observed at Step 1 (see Figure 1), should account for a meaningful amount of variance in Step 3, the derivation of the trait-based OAR. The extent of this explained variation between behavioral performance and the OAR will
inform on the necessity and importance of the inferences at Step 2, that is, the necessity to infer traits from observed behaviors in ACs.
METHOD
The AC designed for Jackson et al. (2005) was used with a different pool of job applicants (i.e., new data were used in this study). Details of the AC are publicly
available and are, therefore, described only briefly next. The major point of difference in the study presented here was the focus on OARs and what defines these
variables.
Participants
A total of 208 job applicants for a private sector retail chain participated. Jobs applied for were in customer service and general sales. Around 87% of the sample
was female, and the mean age across all participants was 30.1 years (SD = 15.2).
Most participants described themselves as Caucasian (77%) and as having some
high school education (68%). Nine assessors participated in this AC. All participating assessors were managers, as is usually the case in ACs (Lowry, 1996;
Spychalski et al., 1997) who had a minimum of 2 years experience in the positions
under scrutiny and were not formally trained in psychology.
AC Construction
The AC was designed according to a two-tiered process. Tier 1 involved the development of task-specific rating criteria, and Tier 2 involved the development of dimension-specific rating criteria.
Tier 1. Task-specific behavioral checklists. Using task-analysis procedures described in Lowry (1997), inductive job analysis interviews were used to
determine job-relevant tasks, which were, in turn, incorporated into a deductive
task-analysis questionnaire. The tasks rated as most important for selection were
retained for behavioral checklists used in the AC exercises. Performance on exer-
421
cises was rated on a scale ranging from 1 (certainly below standard) to 6 (certainly
above standard). An example of an item on a behavioral checklist was Speaks
clearly and annunciates appropriately. These behavioral checklist items differ
from the trait-based assessment that follows for two fundamental reasons. First,
these judgments were only considered within exercises, and as such no behavioral
patterning across exercises is sought at this level of assessment. Second, and related to the first point, trait-level inferences were not drawn in the checklists. They
were intended as job-relevant situationally specific descriptors only, and no inferences were made beyond this stance.
Tier 2. Dimension-specific rating criteria. A deductive job analysis instrument known as the threshold traits analysis (Lopez, Kesselman, & Lopez,
1981) was used to guide the development of dimensions used in this AC. Assessors
were instructed to use behavioral evidence, including any notes and behavioral
checklist information, to guide their assessment and inference of global dimensions (as suggested by Ahmed et al., 1997; see Lievens, 1998, for dimension-specific ACs). Note that, contrary to the Tier 1 behavioral assessment, trait inferences
were drawn at this level, including the inference of consistency of behavior across
exercises. The parsimonious number of five dimensions (as suggested by Lievens
& Klimoski, 2001) assessed were the following:
1. Teamwork. The extent to which the individual works effectively and harmoniously with other team members.
2. Customer focus. The extent to which the individual is concerned with customer needs, describes products accurately, matches presentations to the
customers interests, and attempts to assist customers to make satisfactory
purchases.
3. Oral expression. The extent to which the individual speaks grammatically
and clearly using appropriate language and appropriate gestures.
4. Tolerance. The extent to which the individual interacts effectively with
people despite delicate, frustrating, or tense situations that demand understanding, patience, and empathy.
5. Comprehension. The extent to which the individual understands spoken
and written, verbal, or behavioral language.
422
JACKSON ET AL.
Assessor training. Frame-of-reference training was used as a standard-setting procedure for assessors, as suggested by numerous researchers in the field
(Bernardin & Buckley, 1981; Lievens, 1998; Schleicher, Day, Mayes, & Riggio,
2002). Training was focused on key issues suggested in the literature as maximizing conditions for dimension measurement (e.g., Lievens, 1998). Assessors were
also trained on the task-specific component of the AC, that is, the behavioral
checklists, in a similar manner.
Derivation of overall assessment ratings. The OAR used in this applied
AC was based on assessor panel judgment across dimensions, whereby assessors
gathered subsequent to the AC to discuss overall performance of participants with
respect to traits. Overall ratings are commonly pooled in this manner (Ballantyne
& Povah, 2004; International Task Force on Assessment Center Guidelines, 2000).
Procedure
The literature suggests that rating performance just after exercises (within-exercise
approach) versus waiting until the end of the AC and rating across dimensions
(within-dimension approach) adds little to the facilitation of construct validity
(Harris, Becker, & Smith, 1993; Silverman, Dalessio, Woods, & Johnson, 1986).
Thus, a within-exercise approach to rating was followed in the interests of maintaining assessor recollections of the performance of candidates on exercises. The
task-specific rating on behavioral checklists naturally preceded the rating on dimensions, as behavioral assessment is logically prior to the evaluation of
dimensions.
The intention in this study was to retain a data-driven approach to the analysis
of the ratings obtained. As such, oblique factor analyses were used to identify general patterns in the task and dimension-specific assessment steps. For the task-specific step, scores on behavior within exercises were developed by aggregating
scores on each behavioral checklist associated with each exercise (a total of three
exercise scores). The focus of this study was to investigate the relationship between behavioral checklist scores and trait-based OARs. Multiple regression coefficients were used to explore this relationship.
RESULTS
Two sets of results were pertinent to this study. First, as background, the underlying structure of the data was explored. Second, and representing the primary aim of
423
this study, the relationship between behavioral checklist scores and trait-based
OARs was investigated.
TABLE 1
Factor Analysis of Behavioral Checklists
Factor
Item
Ex1b1
Ex1b2
Ex1b3
Ex1b4
Ex1b5
Ex1b6
Ex1b7
Ex1b8
Ex1b9
Ex1b10
Ex3b1
Ex3b2
Ex3b3
Ex3b4
Ex3b5
Ex3b6
Ex3b7
Ex3b8
Ex3b9
Ex3b10
Ex2b1
Ex2b2
Ex2b3
Ex2b4
Ex2b5
Ex2b6
Ex2b7
Ex2b8
Ex2b9
Ex2b10
.76
.81
.76
.65
.77
.78
.80
.78
.72
.63
SS
Cumulative %
h2
SD
.60
.71
.66
.74
.75
.68
.79
.84
.60
.82
.60
.60
.52
.45
.68
.64
.63
.65
.60
.44
.50
.58
.52
.65
.62
.58
.61
.59
.42
.60
.81
.82
.62
.79
.71
.79
.80
.72
.73
.72
4.99
4.92
4.68
5.02
4.55
4.78
4.33
4.90
4.77
5.03
4.70
4.52
4.78
4.86
4.44
4.91
4.81
4.89
5.08
5.03
4.42
4.33
4.32
4.38
4.33
4.21
3.96
4.07
4.55
4.51
0.84
0.86
0.90
0.88
1.01
1.00
1.04
1.07
1.02
0.85
0.95
1.02
0.87
0.99
0.98
0.95
0.94
0.87
0.86
0.84
1.11
1.10
1.13
1.12
1.04
1.21
1.13
1.20
1.22
1.13
.91
.92
.79
.86
.87
.93
.91
.81
.70
.77
8.24
40.40
9.47
54.12
8.46
63.33
Note. Principal axis factor analysis with direct oblimin rotation and a scree criterion. Factor correlations between 1 and 2 = .36, 1 and 3 = .46, 2 and 3 = .43. Ex1 through Ex3 indicate the simulation exercises used in this study. h2 = communality estimates upon extraction; SS = rotated sums of squared
loadings; % = cumulative percentage of variance explained.
424
425
.64
.49
.57
.55
.38
.32
.17
.32
.25
.20
.30
.12
.22
.20
.65
.71
.62
.42
.41
.25
.38
.30
.29
.40
.24
.34
.29
.81
.61
.38
.38
.33
.45
.39
.24
.34
.13
.29
.23
.63
.38
.34
.28
.43
.37
.29
.40
.21
.35
.29
.31
.26
.18
.33
.34
.26
.34
.19
.32
.28
.62
.48
.55
.56
.27
.39
.23
.35
.29
.54
.65
.55
.35
.45
.34
.38
.36
.71
.53
.29
.34
.34
.33
.35
.71
.38
.43
.38
.41
.45
.42
.44
.41
.45
.46
.78
.77
.88
.84
.69
.81
.73
.79
.80
.84
Note. All correlations were significant (p < .05). Monotraitheteromethod correlations appear in frames (median = .34,
interquartile range (IQR) = .13), heterotraitmonomethod correlations appear in bold (median = .61, IQR = .09). D = dimension; D1 =
comprehension; D2 = oral expression; D3 = tolerance; D4 = teamwork; D5 = customer focus; Ex = exercise.
Ex1D1
Ex1D2
Ex1D3
Ex1D4
Ex1D5
Ex2D1
Ex2D2
Ex2D3
Ex2D4
Ex2D5
Ex3D1
Ex3D2
Ex3D3
Ex3D4
Ex3D5
Ex1D1 Ex1D2 Ex1D3 Ex1D4 Ex1D5 Ex2D1 Ex2D2 Ex2D3 Ex2D4 Ex2D5 Ex3D1 Ex3D2 Ex3D3 Ex3D4
TABLE 2
MultitraitMultimethod Matrix for Assessment Dimensions and Exercises
426
JACKSON ET AL.
TABLE 3
Factor Analysis of Trait-Based Assessment
Factor
Item
Ex1D1
Ex1D2
Ex1D3
Ex1D4
Ex1D5
Ex2D1
Ex2D2
Ex2D3
Ex2D4
Ex2D5
Ex3D1
Ex3D2
Ex3D3
Ex3D4
Ex3D5
SS
%
h2
SD
.63
.72
.81
.88
.68
.48
.70
.68
.77
.56
.50
.58
.57
.79
.58
.86
.73
.74
.90
.82
4.99
4.76
4.74
4.57
5.05
4.93
4.78
4.82
4.63
4.82
4.43
4.41
4.45
4.12
4.50
0.79
1.06
0.89
1.02
0.84
0.90
0.98
0.88
0.95
0.94
1.15
1.18
1.02
1.17
1.11
.70
.82
.77
.86
.76
.95
.76
.87
.93
.89
5.22
44.69
4.57
59.71
4.91
68.37
Note. Principal axis factor analysis with direct oblimin rotation and a scree criterion. Factor correlations between 1 and 2 = .34, 1 and 3 = .50, 2 and 3 = .50. h2 = communality estimates upon extraction; D = dimension; D1 = comprehension; D2 = oral expression; D3 = tolerance; D4 = teamwork; D5 =
customer focus; Ex = Exercise; SS = rotated sums of squared loadings; % = cumulative percentage of
variance explained.
DISCUSSION
In this AC study scores on behavioral checklists explained most of the variance in
OARs that were supposedly based on trait inferences (adjusted R2 = .90). Taken
with the exercise-based structure of ratings suggested in Tables 1, 2, and 3, the results of this study suggest that OARs may have held predictive validity all along
because they reflect cross-situationally specific performance. If OARs actually reflect situationally specific performance, then drawing trait inferences in AC exercises may be pointless and misleading. In the overview of this article, it was explained that exercise effects have traditionally been viewed as error, confounding
the supposed true variance among the measured dimensions. However, two decades of AC development have not eliminated presumed method variance from
this technique. Exercise-based variance was recently shown to underpin ratings
across a wide spectrum of ACs (Lance et al., 2004). Nevertheless, OARs derived
from ACs have repeatedly shown to be predictive of future work performance (Ar-
427
thur et al., 2003). It seems paradoxical (at least) that such an apparently error-ridden instrument should yield any sort of predictive utility.
Defining or Redefining Assessment Centers
One solution to this puzzle may be that exercise effects are not error as conventionally believed but simply indicate of situationally driven behavioral performance.
Given the research into situational effects on human behavior (Mischel, 1984), this
supposed measurement error may, from an alternative perspective, contribute to the
predictive qualities of OARs. To test this hypothesis, we analyzed the relationship
between situationally specific behavioral performance, as measured by behavioral
checklists, and traditional OARs, based on trait inferences. Note that behavioral
checklists represented within exercise, behavioral performance on separate checklists. Whereas, and in line with the traditional approach to ACs, OARs were intended
to summarize trait levels across exercises. The behavioral and trait information was
set up to represent distinct steps in the rating process (see Figure 1). In Step 1, performance was observed and recorded by exercise. In Step 2, and on a separate rating instrument, traits were inferred from behavioral observations. In turn, OARs were developed by discussing trait-based dimensions. Thus, under strict trait theory, the
inferences drawn at Steps 2 and 3 represent a search for cross-situationally stable behavior. In this study, the exercises were quite similar in format in an effort to minimize method variance, yet HTMM correlations were still stronger than MTHM correlations in the dimension assessment. In the traditional approach to ACs, Step 1, the
behavioral information, is secondary to the trait inferences. We argue that this should
not be the case as OARs may only ever reflect behavioral responses anyway. As such,
Steps 2 and 3 in Figure 1 may be redundant.
The finding that supposedly trait-based OARs may actually be based on
cross-situationally specific performance raises two questions. First, can measurements of behavior lead to accurate predictors of future work performance, and second, what value is added by making conventional trait inferences based on these
behaviors?
The first of these questions is easiest to answer. As previously discussed,
work-sample testing has been repeatedly shown to be highly predictive of future
work performance (Schmidt & Hunter, 1998) through consistency of behavioral
responses in similar situations (Wernimont & Campbell, 1968). If ACs are purposely designed as collections of work samples, then they may well share the high
predictive validity associated with work samples. The second question is harder to
answer. Conventionally, the measurement of dimensions is seen as more commercially desirable than the measurement of situation-specific behavior because it,
supposedly, allows behavioral generalization (see Cooks, 2004, description of the
specificity of work samples). If, as the findings of this study suggest, ACs are collections of situationally specific exercises, then their generality has been estab-
428
JACKSON ET AL.
429
ples) and not performance dimensions should be used as the basis for assessor
scores (p. 54). The results of this study suggest that average checklist scores
(OERs) are so closely related to OARs anyway (r = .95, p < .001) that trait-based
inferences drawn from exercise performance may be redundant.
A Step Toward Solving an Old Puzzle
The exercise effect has possibly been one of the most enduring empirical findings
throughout the AC literature, dating back to when it was first identified in Sackett
and Drehers (1982) study. However, the issue has never reached an acceptable
conclusion, despite numerous attempts to resolve it and to encourage ACs and assessors to rate people on trait-based dimensions. Although Lievens and Conways
(2001) review provided initial hope for dimension-specific ACs, Lance et al.s
(2004) re-analysis of their data reaffirmed the prominence of exercise effects.
The perplexing question surrounding the AC measurement debate has centered
on why ACs are predictive of important work-related criteria. If evidence suggests
that ACs do not measure dimensions, then what is it about the process that makes it
useful for making employment decisions? OARs are the most commonly used
scores from ACs for the purpose of investigating predictive validity (Arthur et al.,
2003; Gaugler et al., 1987; Schmidt & Hunter, 1998). This study makes a significant contribution to the literature by informing on this apparent paradox. It was
found, in this study, that OARs most likely reflect situationally specific behavioral
performance rather than the commonly intended trait measures. The implication
here is that traditional OARs have been found to be predictive in the past because
they are likely to reflect behavioral performance on specific exercises.
Implications for Practice and Conclusions
The results of this study suggest that task-specific ACs (defined by Step 1 in Figure
1) returned a conceptually acceptable factor solution (see Table 1). The reliability
of ratings was high, with coefficients alpha in excess of .90 for each behavioral
checklist. Fewer inferences were drawn in the task-specific step of the assessment,
thus lowering the potential for error and saving time. For human resource professionals, the task-specific design may present an effective alternative to more traditional AC architecture.
The results of this study provide new directions, not only for research but also
for practice. For any company wishing to use ACs, the focus may be on using a
technique that is able to predict work outcomes. Although we have known for
some time that ACs are useful predictors of such outcomes, the reasons for this
have remained unclear. The suggestion of this study is that usual predictive properties of the traditional OAR are likely to be retained if a summary is derived of behavioral performance on exercises, as opposed to dimensional performance. A
430
JACKSON ET AL.
construct valid direction for the development of overall ratings could be to average
behavioral performance across exercises and use OERs instead.
Organizations could design ACs in such a way that each exercise is scored individually with behavioral checklists that are specific to particular exercises. The traditional approach is to score trait-like variables across exercises, which has been
shown, more than not, to return construct invalid results. With a task-specific framework, human resource practitioners could expect construct valid ratings coupled
with the usual predictive properties associated with ACs. Such an approach would
carry fewer inferences and would therefore be less complex, less time consuming,
and easier to train assessors on. This study also highlights the importance of the exercises included in ACs. Despite the similarity in format of each of the exercises used in
this study, clear exercise effects resulted. Given the link between behaviors and situations (exercises) in ACs, it is important that exercises bear job relevance.
The analyses reported in this article provide evidence for the idea, as suggested
previously by Lowry (1997), that behavior is inextricably linked to exercises in
ACs. For this reason, a shift from a dimension-specific to a task-specific approach
may lead to AC practices that are justifiable both conceptually and legally (Lowry,
1996; Norton, 1977). Task-specific ACs present a green field in terms of potential
research. As Lance et al. (2004) and Jackson et al. (2005) commented, more information is needed on a host of issues surrounding their use. The results of this study,
in line with others, add further suggestion that dimension labels should be dropped
in ACs. If OARs actually reflect cross-situationally specific performance, then the
use of dimensions as stable, trait-based measures is potentially misleading, inaccurate, and redundant. This issue creates an ethical dilemma, particularly in developmental ACs, where inaccurate trait-based measurement and feedback may have a
detrimental effect on participants. Assessment on the basis of exercise-specific behavioral output may be the key to resolving such issues.
REFERENCES
Ahmed, Y., Payne, T., & Whiddett, S. (1997). A process for assessment exercise design: A model of
best practice. International Journal of Selection and Assessment, 5, 6268.
Arthur, W., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A meta-analysis of the criterion-related
validity of assessment center dimensions. Personnel Psychology, 56, 125154.
Ballantyne, I., & Povah, N. (2004). Assessment and development centres (2nd ed.). Hampshire, UK:
Gower.
Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review, 6, 205212.
Cook, M. (2004). Personnel selection: Adding value through people (4th ed.). Chichester, UK: Wiley.
Gatewood, R. D., & Feild, H. S. (2001). Human resource selection (5th ed.). Fort Worth, TX: Harcourt
College.
431
Gaugler, B., Rosenthal, D., Thornton, G., & Bentson, C. (1987). Meta-analysis of assessment center validity. Journal of Applied Psychology, 72, 493511.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data analysis (5th ed.).
Upper Saddle River, NJ: Prentice-Hall.
Harris, M. M., Becker, A. S., & Smith, D. E. (1993). Does the assessment center scoring method affect
the cross-situational consistency of ratings? Journal of Applied Psychology, 78, 675678.
International Task Force on Assessment Center Guidelines. (2000). Guidelines and ethical considerations for assessment center operations. Public Personnel Management, 29, 315331.
Jackson, D. J. R., Stillman, J. A., & Atkins, S. G. (2005). Rating tasks versus dimensions in assessment
centers: A psychometric comparison. Human Performance, 18, 213241.
Joyce, L. W., Thayer, P. W., & Pond, S. B. (1994). Managerial functions: An alternative to traditional
assessment center dimensions. Personnel Psychology, 47, 109121.
Lance, C. E., Lambert, T. A., Gewin, A. G., Lievens, F., & Conway, J. M. (2004). Revised estimates of
dimension and exercise variance components in assessment center postexercise dimension ratings.
Journal of Applied Psychology, 89, 377385.
Lance, C. E., Newbolt, W. H., Gatewood, R. D., Foster, M. R., French, N. R., & Smith, D. (2000). Assessment center exercise factors represent cross-situational specificity, not method bias. Human Performance, 13, 323353.
Lance, C. E., Noble, C. L., & Scullen, S. E. (2002). A critique of the correlated trait-correlated method
and correlated uniqueness models for multitrait-multimethod data. Psychological Methods, 7,
228244.
Lievens, F. (1998). Factors which improve the construct validity of assessment centers: A review. International Journal of Selection and Assessment, 6, 141152.
Lievens, F., & Conway, J. M. (2001). Dimension and exercise variance in assessment center scores: A
large-scale evaluation of multitraitmultimethod studies. Journal of Applied Psychology, 86,
12021222.
Lievens, F., & Klimoski, R. J. (2001). Understanding the assessment center process: Where are we
now? In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial and organizational
psychology, Vol. 16 (pp. 245286). Chichester, UK: Wiley.
Lopez, F. M., Kesselman, G. A., & Lopez, F. E. (1981). An empirical test of a trait-oriented job analysis
technique. Personnel Psychology, 34, 479502.
Lowry, P. E. (1995). The assessment center process: Assessing leadership in the public sector. Public
Personnel Management, 24, 443450.
Lowry, P. E. (1996). A survey of the assessment center process in the public sector. Public Personnel
Management, 25, 307321.
Lowry, P. E. (1997). The assessment center process: New directions. Journal of Social Behavior and
Personality, 12, 5362.
Mischel, W. (1984). Convergences and challenges in the search for consistency. American Psychologist, 39, 351364.
Norton, S. D. (1977). The empirical and content validity of assessment centers vs. traditional methods
for predicting managerial success. Academy of Management Review, 2, 442445.
Pynes, J., & Bernardin, H. J. (1992). Mechanical vs. consensus-derived assessment center ratings: A
comparison of job performance validities. Public Personnel Management, 21, 1728.
Pynes, J., Bernardin, H. J., Benton, A. L., & McEvoy, G. M. (1988). Should assessment center dimension ratings be mechanically-derived? Journal of Business and Psychology, 2, 217227.
Robertson, I. T., & Kandola, R. S. (1982). Work sample tests: Validity, adverse impact and applicant reaction. Journal of Occupational Psychology, 55, 171183.
Russell, C. J., & Domm, D. R. (1995). Two field tests of an explanation of assessment centre validity.
Journal of Occupational and Organizational Psychology, 68, 2547.
432
JACKSON ET AL.
Sackett, P. R., & Dreher, G. F. (1982). Constructs and assessment center dimensions: Some troubling
empirical findings. Journal of Applied Psychology, 67, 401410.
Schleicher, D. J., Day, D. V., Mayes, B. T., & Riggio, R. E. (2002). A new frame for frame-of-reference
training: Enhancing the construct validity of assessment centers. Journal of Applied Psychology, 87,
735746.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262274.
Silverman, W. H., Dalessio, A., Woods, S. B., & Johnson, R. L. (1986). Influence of assessment center
methods on assessors ratings. Personnel Psychology, 39, 565578.
Spychalski, A. C., Quinones, M. A., Gaugler, B. B., & Pohley, J. (1997). A survey of assessment center
practices in organizations in the United States. Personnel Psychology, 50, 7190.
Thornton, G. C., III (1992). Assessment centers in human resource management. New York:
Addison-Wesley.
Wernimont, P., & Campbell, J. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52,
372376.
Woehr, D. J., & Arthur, W. (2003). The construct-related validity of assessment center ratings: A review
and meta-analysis of the role of methodological factors. Journal of Management, 29, 231258.
Woodruffe, C. (1993). Assessment centres: Identifying and developing competence (2nd ed.). London:
Institute of Personnel Development.