Anda di halaman 1dari 18

APPLIED PSYCHOLOGY: AN INTERNATIONAL REVIEW, 2003, 52 (4), 515 532

Blackwell Oxford, Applied APPS 0269-994X October 0 1 4 52 Original ATTITUDES MARCUS 00 International UK Psychology: Article 2003 Publishing TOWARDS Association an LtdInternational SELECTION for Applied Review METHODS Psychology, 2003

Attitudes Towards Personnel Selection Methods: A Partial Replication and Extension in a German Sample
Bernd Marcus*
Chemnitz University of Technology, Germany

Cette recherche qui fait appel un chantillon de 213 tudiants allemands porte sur les attitudes envers un ensemble de mthodes utilises dans la slection professionnelle. Son but premier tait dapporter un nouvel clairage sur les diffrences culturelles qui marquent les ractions des candidats devant les techniques de slection en reconstituant partiellement une tude de Steiner & Gilliland (1996) qui recueillirent des valuations de lacceptation du processus pour dix procdures diffrentes auprs dtudiants franais et amricains. Des divergences signicatives sont apparues au niveau des moyennes, mais aucune structure sous-jacente ne put rendre compte de ces diffrences. En gnral, les sujets des trois nations ont note les plus favorablement les mthodes rpandues (lentretien et le C.V.), ainsi que les procdures en rapport vident avec le travail (les tests dchantillon de travail), puis les tests papier-crayon, tandis que les contacts personnels et la graphologie taient ngativement apprcis. Autre objectif important: prouver la validit des courtes descriptions des instruments de slection gnralement utilises dans les tudes comparatives portant sur ce thme. On a valu deux fois les attitudes envers quatre types de tests imprims, une premire fois aprs la prsentation de la description et une seconde fois lissue de la passation du test. La convergence prtestposttest, de basse moyenne, met en vidence de srieux problmes en ce qui concerne ces descriptions des tests papier-crayon. On aborde aussi les leons en tirer quant aux jugements sur les pratiques de slection du point de vue des candidats et pour les recherches venir.

* Address for correspondence: Bernd Marcus, Department of Psychology, Chemnitz University of Technology, Wilhelm-Raabe-Str. 43, D-09107 Chemnitz, Germany. Email: bernd.marcus@phil.tu-chemnitz.de Formerly at the University of Tbingen, Germany. This research was supported by a grant from the Deutsche Forschungsgemeinschaft (SCHU 422/9-2; granted to Heinz Schuler) and by a doctoral scholarship, granted by the Bundesland Baden-Wuerttemberg. I am grateful to Dick Weissert and Stefan Hft for many helpful comments on an earlier draft of this paper, and to Dirk Steiner and Stephen Gilliland for their permission to reprint parts of their original results. I also wish to thank the editor and the anonymous reviewers for many insightful suggestions International Association for Applied Psychology, 2003. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.

516

MARCUS

This research examined attitudes towards a variety of personnel selection methods in a German student sample (N = 213). Its rst objective was to shed further light on cultural differences in applicant reactions to selection techniques by partially replicating a study by Steiner and Gilliland (1996), who obtained ratings of process favorability for ten different procedures from two groups of French and American students. Results indicated a number of signicant mean discrepancies but no systematic pattern appeared to underlie these differences. In general, subjects in all three nations rated widespread methods (e.g. interview, rsums) or obviously job-related procedures (work sample tests) most favorably, followed by paper-and-pencil tests, whereas personal contacts and graphology appeared in the negative range. A second major objective was to examine the validity of the brief descriptions of selection instruments often used in comparative studies on this topic. Attitudes towards four different types of written tests were assessed twice for this purpose, once after presenting descriptive information, and a second time after actual test administration. Low to moderate pretestposttest convergence pointed to serious problems with these descriptions for paper-and-pencil tests. Implications for current evaluations of selection practices from the applicants perspective and for future research are discussed.

INTRODUCTION
Applicant reactions to various selection procedures have received considerable attention by I/O psychology in recent years. One reason for this emerging trend is that the rst personal contact between an employer and a prospective employee is usually established through the selection process which might affect an applicants attitudes towards the organisation and inuence his or her decision to accept a job offer (Rynes, 1993). Another cause is the growing interest in the applicants perspective on selection situations relative to that of the employer, a perspective Schuler (1993; Schuler & Stehle, 1983) labeled social validity in contrast to the more organisationcentered criterion-related validity of selection devices. According to Schuler and Stehle, applicants evaluate the selection process and the instruments applied therein on the basis of four distinguishable aspects: (1) how informative they are with respect to job requirements, (2) the degree to which it is possible to participate in the selection process and control its outcomes, (3) how transparent the methods are, and (4) whether an acceptable feedback is provided. A more recent yet highly inuential contribution is Gillilands (1993) application of justice theory to the selection process, in which he distinguished between the dimensions of procedural and distributive justice to develop a formal model of the antecedents, rules, and outcomes of applicants perceptions of the fairness of selection systems. As a consequence of these and other developments, dozens of empirical studies were conducted in the past decade to investigate the favorability of attitudes towards specic instruments, compare their relative acceptability,
International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

517

and explore the bases of these evaluations (e.g. Gilliland, 1994; Harland, Rauzi, & Biasotto, 1995; Kravitz, Stinson, & Chavez, 1996; Macan, Avedon, Paese, & Smith, 1994; Ryan, Greguras, & Ployhart, 1996; Smither, Reilly, Millsap, Pearlman, & Stoffey, 1993; Whitney, Diaz, Mineghino, & Powers, 1999, to quote only a few). Among the more generalisable results from these studies are the nding that certain types of selection procedures (e.g. interviews, work sample tests) are viewed most favorably while others (e.g. graphology, polygraphs) are almost uniformly rejected, and the observation that theoretically different facets of fairness are often highly intercorrelated empirically, with face validity or perceived job relatedness playing a particularly crucial role for overall evaluations. The purpose of the present study is twofold. First, it is intended to add another piece of evidence on the relative fairness of selection procedures, as perceived by test takers, with the emphasis on cultural differences. Crossnational surveys (e.g. Lvy-Leboyer, 1994; Ryan, McFarland, Baron, & Page, 1999; Schuler, Frier, & Kauffmann, 1993; Shackleton & Newell, 1994) consistently demonstrated that the extensiveness of method use differs substantially across nations. For example, written ability and personality tests are much more extensively employed in North America than in Germany (e.g. Schuler et al., 1993, found that intelligence tests are almost exclusively used for selecting apprentices in Germany, and the usage rates of personality tests are 10 per cent or less for all job categories). Graphology as a selection device seems to be extensively used in France and the French-speaking part of Belgium (Lvy-Leboyer, 1994; Schuler et al., 1993; Shackleton & Newell, 1994), and is also employed by many Spanish companies (Schuler et al., 1993), but is very rarely used in other nations (see the above cited sources). Such differences may affect favorability, for example via mere exposure effects (Zajonc, 1968). Findings from previous investigations on test takers attitudes, mostly conducted with US samples, may therefore not generalise to other countries. The only study to date that directly compared test takers attitudes from two different nations was that by Steiner and Gilliland (1996) who had two groups of French and American students rate ten different selection methods. In the present research, one part of their study examining relative process favorability is replicated with a German sample, using the same measures and procedures as Steiner and Gilliland to provide directly comparable results. The second purpose of this paper is to highlight some methodological problems with current research on test takers attitudes. Surveys in general can only be reliable when participants are familiar with the object of attitude, that is, they know what they are talking about. Because laypeople often have a very limited knowledge of many selection procedures, this suggests the importance of administering the tests in question rst and then assessing favorability. On the other hand, it is most informative to compare ratings for a wide variety of selection methods collected from
International Association for Applied Psychology, 2003.

518

MARCUS

the same sample. There is an obvious tradeoff between these two goals, since it is usually not possible in a cost-effective way to administer a large number of instruments in one study. As a consequence, most comparative studies on this topic (e.g. Fruhner, Schuler, Funke, & Moser, 1991; Kravitz et al., 1996; Smither et al., 1993), including that by Steiner and Gilliland, had to rely on introducing the various procedures by brief descriptions and, in part, controlling for differential familiarity by asking for prior experience. It is not self-evident that this procedure can serve as a sufcient substitute for real test applications. If the descriptions were valid, they should lead to essentially the same results, both at the group and the individual level, as actually administering the tests in question. Put differently, a group of laypeople who evaluate a number of tests on the basis of brief descriptions should provide highly similar ratings as on the basis of actual experience with the same kinds of tests. If this were true, it would be expected that (a) the group means of favorability ratings for a single category of selection procedures do not differ substantially between the two modes of presentation (group level), and (b) high correlations between ratings of merely described tests and actually experienced tests are obtained (individual level). Two recent studies (Bauer, Maertz, Dolen, & Campion, 1998; Chan, Schmitt, Sacco, & DeShon, 1998) addressed this issue by examining test attitudes before and after actually taking cognitive ability measures; Chan et al. also administered a personality test. In neither investigation did group means change notably after test administration. Despite some differences in attitude measurement, however, both studies indicated only moderate preposttest reactions stability for the intelligence tests (correlations range from .34 to .60), whereas Chan et al. found somewhat higher values (.60 to .66) for their personality test. While these ndings suggest that actual experience with a test changes attitudes to a limited yet not negligible extent, it is noteworthy that in neither study were procedures introduced by the descriptions typically used for multi-method comparisons. The highly diverse content of the latter type of investigations precludes use of general test fairness perceptions (as in Bauer et al.), or sample items (as in Chan et al.). The present study addresses the question whether such descriptions are valid proxies for real experience more directly by providing subjects with the cues used by Steiner and Gilliland before attitude measurement and test administration. It is also more comprehensive in that four different types of selection instruments are assessed thereafter: a cognitive ability measure, a general personality inventory, an integrity test, and a biographical questionnaire. In addition, the research design permits examination of the extent to which test scores and attitudes are related for different instruments (cf. Bauer et al., 1998; Chan et al., 1998; Jones & Joy, 1991; Ryan & Sackett, 1987; Whitney et al., 1999, with mixed results).
International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

519

METHOD

Sample
Two hundred and thirteen undergraduate students from a German university, majoring in diverse subjects (e.g. economics, business and administration, biology and other natural sciences, agriculture) participated for this study. In all, 90.4 per cent of the sample indicated having at least one month of prior job experience and 54.8 per cent had an employment record of more than one year. Of the participants 89 (41.8%) were female. Mean age was 23.7 years with a standard deviation of 2.9 and a range from 20 to 41. The corresponding gures from the Steiner and Gilliland study are 68 per cent job experience, 83 per cent women, and a mean age of 20.4 years for the French sample, and 99 per cent job experience, 75 per cent female participants, and 20.4 years mean age for the US sample, respectively. In contrast to the German sample, almost all of Steiner and Gillilands French subjects and one-third of their American participants were majoring in psychology. At least with respect to age and gender, the demographic composition of the present sample differs from both of Steiner and Gillilands groups more than the latter two samples differ from one another. It will be addressed in the results section whether these differences translate into divergent ratings.

Procedure
Subjects were invited via campus advertising to participate in a psychological research project. A reward of DM 30 cash was offered in order to reduce any bias due to voluntary participation (see Ullman & Newcomb, 1998, for evidence that a monetary incentive effectively enhances the willingness of otherwise reluctant subgroups to participate in time-consuming research). This is approximately the wage the university pays to student employees for two working hours. Administration of all materials took about 90 to 120 minutes for each participant. The study was conducted in 20 group sessions of 10 to 12 subjects each under the surveillance of either a male or a female test proctor. At the beginning of each session, participants were instructed to read the brief descriptions of ten selection procedures and then provide their initial ratings. They next took a battery of eight psychological tests, including those described in the measures section (three specic personality tests are omitted from the following analyses, as well as one social desirability scale for which no ratings were collected). Sequence of administration was counterbalanced for all tests, except for the intelligence measure which was always presented rst, because it was the only test with a xed time limit. Immediately after test administration, examinees rated the instruments they had just completed
International Association for Applied Psychology, 2003.

520

MARCUS

on the same form used to measure pretest reactions and then received their rewards in a separate room.

Measures
Attitudes and Test Descriptions. Descriptions of the ten selection procedures and items on process favorability were adapted from Steiner and Gilliland (1996). Method descriptions were in part slightly revised in order to make them more compatible with German selection practices, thereby leaving content unchanged as far as possible. The most notable example is the personal references where the American practice, as expressed in the Steiner and Gilliland description (. . . you must request letters of reference or provide the names of your prior employers so that the employer can obtain information about your suitability on the job, p. 136) contrasts sharply to a much more formalised process in Germany (present description: references provided by your prior employers in which they evaluate your behavior and performance in the prior job. Note that employers in Germany have a legal obligation to write such references, and their content and wording is highly formalised). Another change occurred for the honesty or integrity test which Steiner and Gilliland had introduced as a typical exemplar of the overt category for these instruments (Tests that ask you about your thoughts on theft and experiences related to your personal honesty, p. 136) while, in the present study, both an overt and a personalitybased test are presented with items in a mixed order (see Sackett, Burris, & Callahan, 1989, for this distinction). The present description was more neutral with respect to this categorisation, introducing the method as a specic written personality test which asks you questions related to your trustworthiness, reliability, and honesty (the exact wording of all descriptions is available upon request). For assessing process favorability, the two items of Steiner and Gilliland were more literally translated by the present author. The items examine, on a 7-point Likert-type scale, perceived predictive validity of the method (How would you rate the effectiveness of this method for identifying qualied people for a job you were likely to apply for after graduation?), and an evaluation of test fairness (If you did not get the job based on this selection method, what would you think of the fairness of this procedure?). All translations were checked by an independent bilingual reviewer. Tests Administered. A German version of the Wonderlic Personnel Test (Wonderlic, 1996), a brief measure of g widely used for personnel selection in the US, represents cognitive ability tests in the present study. As a general personality inventory, the 240-item NEO-PI-R (Costa & McCrae, 1992; German version by Angleitner & Ostendorf, 1993) was used, a comprehensive
International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

521

measure of the non-cognitive traits comprising the ve-factor model of personality. As already mentioned, the integrity test covers both the overt (58 items) and the personality-based (53 items) type which were jointly presented. It was newly constructed because no German-language integrity test existed at the time of data collection but it had already gone successfully through a series of construct and criterion-related validation studies (Marcus, 2000; Marcus, Hft, Riediger, & Schuler, 2000; Marcus, Schuler, Quell, & Hmpfner, 2002). The exemplar of biographical questionnaires, however, is somewhat less prototypical for the entire class of selection procedures. It is a newly developed (Marcus, in press) 67-item measure tapping into the specic construct of self-control, as dened by Gottfredson and Hirschi (1990). It is therefore not a typical biodata scale, as used for personnel selection, but it relies exclusively on reports of overt behavior in the past and, thus, meets the denition of a biographical questionnaire as specied by Mael (1991). Results for this instrument may nevertheless be interpreted with caution.

RESULTS
Means and standard deviations for favorability ratings from Steiner and Gilliland and from the present study, assessed before and after test administration in the latter case, are presented in Table 1. Signicant differences between methods within one sample, as well as those between different samples for the same selection device, are also indicated in the table. ANOVAs indicated a signicant main effect for selection method (within-subject factor; SS: 2692.16, df: 9, MS: 299.13, F: 4.00, p < .01), as found by Steiner and Gilliland, and a signicant interaction with gender (SS: 47.75, df: 9, MS: 5.31; F: 4.00, p < .01), where women rated intelligence tests (d = .54) and personal contacts (d = .40) less favorably than men but slightly preferred interviews (d = .27) and rsums (d = .29). If the smaller proportion of women in the present sample compared to Steiner and Gillilands study had translated into mean differences for the entire samples (i.e. these were due to a gender effect rather than an effect for country), precisely the opposite pattern of single contrasts than those indicated in Table 1 would have been expected for the four respective procedures. The main effect for gender was non-signicant (between-subjects factor; SS: 7.21, df: 1, MS: 7.21, F: 3.31, p > .05). The single contrasts shown in Table 1 indicate that the present subjects in general evaluated interviews and work sample tests most favorably, followed by references and rsums. Paper-and-pencil tests appear in the neutral range, with the intelligence test most and the biodata inventory least preferred, especially after test taking. In general, however, test application had only minor inuence on mean ratings (none of the t-tests on pre-posttest
International Association for Applied Psychology, 2003.

522

MARCUS TABLE 1 Mean Process Favorability Ratings from Steiner and Gilliland (1996) and the Present Study (Standard Deviations in Parentheses)
Steiner & Gilliland (1996) Present Study Pretest U> U> 4.85b (1.13) 4.10c (1.17) 5.34a (1.20) 2.62f (1.51) 4.18c (1.09) 3.64d (1.28) 4.91b (1.21) 5.67a (.99) 1.90g (1.10) 3.20e (1.27) Posttest

Selection Method 1. Rsums 2. Written ability test 3. Work-sample test 4. Personal contacts 5. (General) personality test 6. Honesty (or integrity) test 7. Personal references 8. Interview 9. Graphology 10. Biographical questionnaire

USA 5.37a (1.19) 4.50b (1.25) 5.26a (1.49) 3.29c (1.64) 3.50c (1.30) 3.41c (1.62) 4.38b (1.30) 5.39a (1.26) 1.95d (1.18) 4.59b (1.31) >

France 4.54b (1.19) 4.21b.c (1.36) 5.26a (1.19) 2.92d.e (1.67) 3.96c (1.35) 2.54e (1.24) 4.12b.c (1.10) 4.56b (1.19) 3.23d (1.62) 3.91c (1.31)

4.30a (1.43)

U> U< F< U< F< F< F> U> F>

< >

3.96b (1.20) 3.83b (1.19)

> < >

3.02c (1.15)

Note: Same subscripts within columns indicate that means are not signicantly different at p < .05 (Tukey post-hoc tests). Greater or less than signs stand for signicant (t-tests; p < .01) differences between adjacent columns (same study), or between German pretest reactions (third column) and the US (U, rst column) or French (F, second column) sample. Ns are 142 for the American sample, 117 for the French sample, and 209 to 213 for the present study. Data for the US and French samples from Table 3, p. 137, in Fairness reactions to personnel selection techniques in France and the United States by D.D. Steiner and S.W. Gilliland, 1996, Journal of Applied Psychology, 81, 134 141. Copyright 1996 by the American Psychological Association. Adapted with permission.

differences was signicant). Personal contacts and, in particular, graphology received clearly negative reactions on average. This tendency to prefer face valid and commonplace procedures over written tests, and to reject nontransparent methods like graphology, generally conrms ndings from all the comparative studies cited above. In contrast to this rough correspondence in the general pattern, a more detailed inspection of Table 1 reveals a large number of signicant differences between the present sample and both the French and the American participants of the Steiner and Gilliland study for single tests. However, no
International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

523

systematic tendency appears to underlie these differences. The most marked distance between the US and the German sample occurred for a paper-andpencil test (biodata) but a null nding (integrity) and the opposite direction (intelligence) may be found within the same category. Relative to the French examinees, the German sample showed the largest negative difference for graphology, an expected result given the widespread use of this procedure in France, but the largest positive contrasts occurred for interviews, a standard procedure in both countries, and honesty tests, which are virtually nonexistent in either nations selection practices. Thus, evidence for systematic cultural differences is not readily revealed from the present data. This conclusion is even strengthened if one concentrates only on the differences between the rank orders of procedures across countries, as proposed by one anonymous reviewer. Here, the only remarkable differences appear to remain for integrity tests between Germany and France, and for biodata and references between Germany and the USA. To the best of my knowledge, the current literature provides no rm basis to explain these particular discrepancies so that it would be mere speculation to interpret them as more than unsystematic deviations from a general pattern of similarity in rankings. Table 2 presents the intercorrelations and, in parantheses, internal consistencies (Cronbachs ) for all study variables. (The focus of the present study is mainly on the convergent relationships given in bold in the table. But it has to be mentioned that a principal components analysis on the ten two-item pretest ratings indicated no evidence for a general favorability factor across all instruments (four components extracted; eigenvalues: 1.9, 1.6, 1.2, 1.0). After varimax-rotation, the rst two factors were interpretable as comprising non-cognitive tests, and standard procedures (rsum, references, interview), respectively, while the last two factors had substantial loadings only for specic instruments. Hence, it appears as if subjects in the present investigation sharply distinguished between most selection procedures. This certainly does not mean that they also distinguished between different aspects of evaluation.) As in almost any study on this topic before, where the aspects of face validity, fairness, transparency, etc. were often treated separately, the relatively high internal consistencies of the present two-item measures seem to point to an overall evaluative factor. This topic surely merits attention in future studies. With respect to pretestposttest correlations, it is striking that ratings for the same instruments were only modestly related between the two points of measurement. The correlations are even lower than those found by Bauer et al. and Chan et al., particularly for the personality tests examined here. This would point to the conclusion that the present examinees, despite the very modest differences in mean ratings (see Table 1), individually changed their minds in many cases after they had actually experienced a
International Association for Applied Psychology, 2003.

TABLE 2 Intercorrelations and Internal Consistency Reliabilities of Study Variables


Variables (1) (2) (3) Pretest Favorability (4) (5) (6) (7) (8) (9) (10) (11) Posttest Favorability (12) (13) (14) (15) Test Scores (16) (17) (18)

International Association for Applied Psychology, 2003.

524
MARCUS

Pretest favorability: (1) Rsum (.52) (2) Ability test .01 (.71) (3) Work-sample .15* .09 (4) Pers. contact .21** .09 (5) Personality .01 .04 test (6) Honesty test .17* .03 (7) References .31** .04 (8) Interview .33** .03 (9) Graphology .06 .05 (10) Biodata .06 .14* Posttest favorability: (11) WPT .17* (12) NEO-PI-R .11 (13) Integr. test .09 (14) Biograph. .06 questionn. Test scores: (15) Intelligence (16) Conscient. (17) Integrity (18) Self-control (biograph.) .01 .02 .00 .06

(.70) .03 .13 .11 .03 .11 .03 .01

(.44) .11 .14* .04 .09 .09 .10 .01 .06 .04 .16*

(.62) .46** (.65) .00 .04 (.63) .01 .02 .24** (.43) .10 .20** .02 .17* .34** .23** .02 .05 .09 .30** .23** .08 .00 .18** .21** .09 .01 .05 .11 .07 .00 .10 .14* .06

(.81) .16* .03 .14* .17* .16*

(.76) .09 .02 .05 .24** (.82) .07 .02 .05

.40** .06 .13 .03 .01 .05 .06 .01

(.78) .59** .20**

(.72) .22**

(.69)

.22** .21** .08 .02 .01 .12 .04 .04 .02 .08 .01 .06

.04 .01 .08 .03

.02 .10 .05 .07

.05 .04 .02 .01

.01 .10 .04 .05

.17* .03 .04 .05

.06 .09 .07 .01

.27** .01 .05 .09

.01 .10 .03 .02

.06 .11 .08 .07

.04 .12 .05 .03

.09 .02 .02

(.92) .50** .43**

(.92) .54**

(.92)

Note: Variables (1) through (10) are the same as in Table 1. Integr. test = mixed overt and personality-based integrity test; Biogr. questionn. = Retrospective Behavioral Self-Control Scale (RBS); Conscient. = NEO-PI-R Conscientiousness. Convergent correlations between procedures of the same kind (pretest-posttest attitudes, or test scorefavorability relationships) are given in boldface. * = p < .05; ** = p < .01; N = 209 to 213.

ATTITUDES TOWARDS SELECTION METHODS


1

525

test situation. The information provided by descriptions of two to three lines length, as presented here and in other studies, does not seem to represent the objects of attitude sufciently for a decided evaluation. One nal research question of this investigation applied to the relationships between attitudes and test scores. Conscientiousness was chosen out of the ve NEO domains because this factor has a more generalisable relevance for job performance across job categories than the other four (Barrick & Mount, 1991) and was also investigated by Chan and colleagues (1998). In summary, the ndings presented in Table 2 show that only performance in the cognitive abilities measure was associated with ratings of the same instruments validity and fairness, as assessed in the favorability scores, and even this correlation was not overly substantial. Thus, from the present data, it has to be concluded that test performance and evaluation are largely independent of each other.

DISCUSSION
Two major goals were pursued in this study. The rst was to enhance knowledge on cultural differences in applicant reactions to various selection procedures by replicating the only previous cross-national comparison of France and America (Steiner & Gilliland, 1996) with a German sample. In agreement with this study, the present results in general suggest that these differences are less pronounced than might be expected from the great variation in method use. In all three countries, subjects evaluated interviews, work sample tests, and rsums most favorably, held a neutral attitude towards most written tests, and expressed reservations against personal contacts and graphology. Generalisability of this tripartite rank order of evaluations is further corroborated by ndings from past research where a somewhat different methodology has been applied (Fruhner et al., 1991; Kravitz et al., 1996; Rynes & Connerley, 1993; Smither et al., 1993). Thus, when investigated with student samples and brief descriptive stimuli, mean

1 One anonymous reviewer, providing hypothetical data in support, raised the concern that low correlations may be found even when changes from t1 to t2 are only trivial. While this could certainly happen under extreme circumstances, the actual data of the present study do not support this argument. The absolute values of the difference scores for the four procedures evaluated twice have means between 1.09 and 1.23, a quite substantial effect size of about one standard deviation. Between 13 and 16 per cent of the sample gave identical ratings at both times of measurement, whereas between 21 and 26 per cent changed their initial ratings by two scale points or more on a 7-point scale. The most extreme values of absolute difference scores were between 4.5 and 5.5. Thus, there is evidence from both the correlational analysis and the examination of absolute difference scores that a non-trivial proportion of the sample evaluated these procedures substantially differently before and after test administration.

International Association for Applied Psychology, 2003.

526

MARCUS

ratings of selection procedures appear to correspond roughly across different countries. Although, unlike Steiner and Gilliland, the present study did not explicitly address the bases for these evaluations, the robustness of ndings across comparative investigations permits us to draw some conclusions on features of selection procedures that applicants consider for their ratings. First, there appears to be little reason for being concerned with a justice dilemma (Cropanzano & Konovsky, 1995), the notion that higher social validity may come at the price of lower criterion-related validity and vice versa. Instead, the least preferred selection techniques are also characterised by a lack of scientic evidence for their ability to predict important aspects of job performance, and the most valid methods (e.g. ability tests, biodata, work samples, integrity tests; cf. Schmidt & Hunter, 1998) received quite variable ratings. This seems to indicate that social and criterion-related validity are largely unrelated rather than negatively related. Thus, there is no evidence for a dilemma; both important goals in personnel selection may well be achieved at the same time. Apart from the fact that there are some differences in test takers attitudes, and in their emphasis on various justice dimensions across countries and studies, the repeated ndings of a relatively robust rank order for selection methods as well as highly intercorrelated facets of fairness or justice (e.g. Gilliland, 1994; Kravitz et al., 1996; Macan et al., 1994; Ployhart & Ryan, 1997; Smither et al., 1993; Thorsteinson & Ryan, 1997) raise the question whether differences are overemphasised relative to commonalities in current research on applicant reactions. Certain selection procedures may be simply liked or disliked, as indicated by a potential overall evaluative factor. That is, a person may feel comfortable or not with one kind of test and then project this somewhat diffuse emotion onto rational judgments of fairness, job-relatedness, or invasiveness. The actual reasons for these evaluations may well differ across individuals, depending on their perceived strengths and weaknesses. More in-depth, perhaps qualitative, research methods could provide additional insights into the exact mechanisms underlying attitudinal ratings. With respect to the mean effects replicated herein, it appears as if some features of selection procedures are particularly valued, on average, in any culture investigated so far. Some of these are obvious. For instance, interviews, rsums, and references by prior employers are almost ubiquitous in personnel selection throughout Western culture and may therefore raise few objections (that is, may be seen as a natural part of the selection process not to be called into question). Other methods do not possess this advantage and will therefore have to persuade by different means. For example, work sample tests by denition have an obvious relationship to the position one is applying for. As has been shown in prior research (e.g. Smither et al.,
International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

527

1993; Whitney et al., 1999), a more obviously job-related content of written tests may also improve evaluations for this kind of instruments. Graphology, on the other hand, may have a logical appeal (Steiner & Gilliland, 1996) but certainly provides no information to evaluate actual test performance as well as job-relatedness. As Steiner and Gilliland pointed out, it is striking that this technique received negative evaluations even in France where it is widely used in practice. More research is needed to investigate how different features of selection devices interact or perhaps cancel each other out in evaluations. Future studies may also examine the effects of combining several instruments to batteries (see Rosse, Miller, & Stecher, 1994, for a small-scale study on this topic), as is usually the case in real-world applications. The second major objective of the present investigation was to examine how well real test experience is approximated by the brief descriptions often used in comparative studies on applicant reactions. While the mean ratings remained relatively stable after actual test administration, the ndings of low to moderate correlations for pretest-posttest attitudes cast doubt on the validity of some conclusions drawn from such investigations. Past experience was also investigated prior to test administration, indicating that between 1.4 (integrity test) and 22.5 (ability test) per cent of the sample believed they had taken one of the paper-and-pencil procedures before. That is, the vast majority of subjects had their rst personal experience with these kinds of tests during the present study. The very limited convergence of pretest-posttest evaluations suggests that, for many people, the image of these instruments provided by brief descriptions changes considerably after actual experience, whereas the directions of these changes are almost equally distributed (additional analyses indicated that pretest-posttest differences [see footnote 1] are not substantially related to any of the personality traits measured in this study). Low bivariate correlations are usually taken as evidence that both variables carry a different meaning. If pretest ratings based on descriptions measure experiencebased attitudes only to a limited extent, they would appear to measure something else. With respect to the paper-and-pencil tests examined here, two explanations for this nding seem plausible. First, given that a lot of myths surround psychological testingranging from complete nonsense to big brother is watching youare discussed in Germany (and perhaps elsewhere), it seems likely that initial ratings were largely based on prejudice (note that no students majoring in psychology participated in this study). Second, in light of the result that all these tests were most often rated neutrally, it is possible that many subjects were simply insecure about how to evaluate unfamiliar instruments and therefore provided an indifferent rating. Both explanations would shed an unfavorable light on the validity of short descriptions which would have been concealed by the apparent stability of mean ratings; the
International Association for Applied Psychology, 2003.

528

MARCUS

sample items used by Chan et al. may serve this purpose much better. But how would one introduce non-standardised techniques by sample items? For instance, in the case of the interview, the generally favorable evaluation of this procedure as a category may change substantially, depending on how an actual interview is conducted. It is now agreed that applicants in general accept being interviewed, but it is not so clear that they would accept being asked invasive questions, or being confronted with aggressive or uninterested interviewers. Such topics surely merit attention in future research.2 Finally, except for the measure of intelligence, test scores were largely unrelated to attitudes in the present investigation. If this were a generalisable result, it would further corroborate the aforementioned notion that social and criterion-related validity are roughly orthogonal dimensions for the evaluation of these procedures. For human resources practitioners, this would mean that they could apply a decision rule for selecting a method where both factors are weighted with respect to their relative importance and then simply combined algebraically. Utility analysts of selection procedures may incorporate this in their formulas. This discussion would be incomplete if some shortcomings of the present study were not mentioned. As pointed out in many articles cited throughout this paper, especially those which employed applicant samples, laboratory settings with university students may lead to different conclusions than actual selection situations. However, provided one is interested in an honest evaluation of selection procedures, it is not self-evident that eld settings really reveal more reliable results. Actual applicants may be inclined to disguise their real opinion of whatever they experience during the selection process, particularly when this opinion is negative, in order not to offend a prospective employer. A problem more specic to the replication part of this study arises from the fact that all three samples differ with respect to their demographic composition and prior job experience. Steiner and Gillilands French and American participants were more similar in age and gender than the present subjects, whereas both the German and American students had considerably more job experience than the French subjects. Further, all three samples contained substantially different proportions of psychology majors. It is not clear from the data how this complex pattern of discrepancies may have

2 An anonymous reviewer pointed out that similar problems of generalising results for a category as a whole to exemplars of that category may also apply to more standardised selection procedures, like tests or even items within one test. Although I would believe that there is much more variation to be expected within a category of non-standardised techniques, I subscribe to the notion that test categories should be more sharply distinguished from single tests or items. I would add that different tests than those chosen here may have received different evaluations, although this statement remains to be empirically examined.

International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

529

affected the results across countries. However, given that in all three countries, all participants were university students of predominantly young age, and that the majority had prior job experience, the similarities in sample composition may be seen as outweighing the differences. On the other hand, the more balanced gender composition of the present sample compared to Steiner and Gillilands mostly female subjects is one possible source of variation, particularly for those selection methods for which gender differences were revealed (i.e. intelligence tests, personal contacts, interviews, and rsums). The present results indicate that the single contrasts reported in Table 1 are probably conservative estimates of the true differences for these procedures. A third potential drawback of the present investigation is the fact that the administration of several personality tests in addition to those examined in this paper may have affected the ratings in an unknown way. However, at least one possible biasthat participants may have mixed up the numerous tests on remembrancehas been controlled by obtaining ratings for two of the more specic scales. These ratings were both signicantly lower (M = 3.40, SD = 1.12, and M = 3.34, SD = 1.20, respectively) and only moderately correlated (mean r = .36) to those of the personality inventory and the integrity test examined here, indicating that the subjects of this study were able to distinguish between the different tests. The applicants perspective on personnel selection procedures has long been overlooked by I/O psychologists, and a number of still underresearched topics remain, some of which are mentioned above. The present study was conducted to ll one of these gaps. However, the considerable overlap between different studies may eventually be summarised in one or several meta-analyses.

REFERENCES
Angleitner, A., & Ostendorf, F. (1993). Deutsche Version des NEO-PI-R (Form S) [German version of the NEO-PI-R]. Unpublished test manuscript. Bielefeld, Germany: Author. Barrick, M.R., & Mount, M.K. (1991).The big ve personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 126. Bauer, T.N., Maertz, C.P., Dolen, M.R., & Campion, M.A. (1998). Longitudinal assessment of applicant reactions to employment testing and test outcome feedback. Journal of Applied Psychology, 83, 892 903. Chan, D., Schmitt, N., Sacco, J.M., & DeShon, R.P. (1998). Understanding pretest and posttest reactions to cognitive ability and personality tests. Journal of Applied Psychology, 83, 471 485. Costa, P.T., & McCrae, R.R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources.
International Association for Applied Psychology, 2003.

530

MARCUS

Cropanzano, R., & Konovsky, M.A. (1995). Resolving the justice dilemma by improving the outcomes: The case of employee drug screening. Journal of Business and Psychology, 10, 221 243. Fruhner, R., Schuler, H., Funke, U., & Moser, K. (1991). Einige Determinanten der Bewertung von Personalauswahlverfahren [Some determinants of the evaluation of personnel selection methods]. Zeitschrift fr Arbeits- und Organisationspsychologie, 37, 119178. Gilliland, S.W. (1993). The perceived fairness of selection systems: An organizational justice perspective. Academy of Management Review, 18, 694734. Gilliland, S.W. (1994). Effects of procedural and distributive justice on reactions to a selection system. Journal of Applied Psychology, 79, 691701. Gottfredson, M.R., & Hirschi, T. (1990). A general theory of crime. Stanford, CA: Stanford University Press. Harland, L.K., Rauzi, T., & Biasotto, M.M. (1995). Perceived fairness of personality tests and the impact of explanations for their use. Employee Responsibilities and Rights Journal, 8, 183 192. Jones, J.W., & Joy, D.S. (1991). Empirical investigation of job applicants reactions to taking a preemployment honesty test. In J.W. Jones (Ed.), Preemployment honesty testing: Current research and future directions (pp. 121131). Westport, CT: Quorum Books. Kravitz, D.A., Stinson, V., & Chavez, T.L. (1996). Evaluations of tests used for making selection and promotion decisions. International Journal of Selection and Assessment, 4, 24 34. Lvy-Leboyer, C. (1994). Selection and assessment in Europe. In H.C. Triandis, M.D. Dunnette, & L.M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 4, pp. 173190). Palo Alto, CA: Consulting Psychologists Press. Macan, T.H., Avedon, M.J., Paese, M., & Smith, D.E. (1994). The effects of applicants reactions to cognitve ability tests and an assessment center. Personnel Psychology, 47, 715738. Mael, F.A. (1991). A conceptual rationale for domains and attributes of biodata items. Personnel Psychology, 44, 763792. Marcus, B. (2000). Kontraproduktives Verhalten im Betrieb: Eine individuumsbezogene Perspektive. [Counterproductive behavior in organizations: An individual differences perspective]. Gttingen, Germany: Verlag fr Angewandte Psychologie. Marcus, B. (in press). An empirical examination of the construct validity of two alternative self-control measures. Educational and Psychological Measurement. Marcus, B., Hft, S., Riediger, M., & Schuler, H. (2000). What do integrity tests measure? Two competing views examined. Paper presented at the 108th Annual Convention of the American Psychological Association, Washington, DC, August. Marcus, B., Schuler, H., Quell, P., & Hmpfner, G. (2002). Measuring counterproductivity: Development and initial validation of a German self-report questionnaire. International Journal of Selection and Assessment, 10, 1835. Ployhart, R.E., & Ryan, A.M. (1997). Toward an explanation of applicant reactions: An examination of organizational justice and attribution frameworks. Organizational Behavior and Human Decision Processes, 72, 308335.
International Association for Applied Psychology, 2003.

ATTITUDES TOWARDS SELECTION METHODS

531

Rosse, J.G., Miller, J.L., & Stecher, M.D. (1994). A eld study of job applicants reactions to personality and cognitive ability testing. Journal of Applied Psychology, 79, 987 992. Ryan, A.M., Greguras, G.J., & Ployhart, R.E. (1996). Perceived job relatedness of physical ability testing for reghters: Exploring variations in reactions. Human Performance, 9, 219 240. Ryan, A.M., McFarland, L., Baron, H., & Page, R. (1999). An international look at selection practices: Nation and culture as explanations for variability in practice. Personnel Psychology, 52, 359391. Ryan, A.M., & Sackett, P.R. (1987). Pre-employment honesty testing: Fakability, reactions of test takers, and company image. Journal of Business and Psychology, 2, 248 256. Rynes, S.L. (1993). Whos selecting whom? Effects of selection practices on applicant attitudes and behavior. In N. Schmitt & W.C. Borman (Eds.), Personnel selection in organizations (pp. 203239). San Francisco, CA: Jossey-Bass. Rynes, S.L., & Connerley, M.L. (1993). Applicant reactions to alternative selection procedures. Journal of Business and Psychology, 7, 261277. Sackett, P.R., Burris, L.R., & Callahan, C. (1989). Integrity testing for personnel selection: An update. Personnel Psychology, 42, 491529. Schmidt, F.L., & Hunter, J.E. (1998). The validity and utility of personnel selection methods in personnel psychology: Practical and theoretical implications of 85 years of research ndings. Psychological Bulletin, 124, 262274. Schuler, H. (1993). Social validity of selection situations: A concept and some empirical results. In H. Schuler, J.L. Farr, & M. Smith (Eds.), Personnel selection and assessment: Individual and organizational perspectives (pp. 1126). Hillsdale, NJ: Lawrence Erlbaum. Schuler, H., Frier, D., & Kauffmann, M. (1993). Personalauswahl im europischen Vergleich [Personnel selection in European comparison]. Gttingen, Germany: Hogrefe/ Verlag fr Angewandte Psychologie. Schuler, H., & Stehle, W. (1983). Neuere Entwicklungen des Assessment-CenterAnsatzesbeurteilt unter dem Aspekt der sozialen Validitt [New developments in the assessment center approach, evaluated with the focus on social validity]. Psychologie und Praxis. Zeitschrift fr Arbeits- und Organisationspsychologie, 27, 33 44. Shackleton, V., & Newell, S. (1994). European management selection methods: A comparison of ve countries. International Journal of Selection and Assessment, 2, 91102. Smither, J.W., Reilly, R.R., Millsap, R.E., Pearlman, K., & Stoffey, R.W. (1993). Applicant reactions to selection procedures. Personnel Psychology, 46, 4976. Steiner, D.D., & Gilliland, S.W. (1996). Fairness reactions to personnel selection techniques in France and the United States. Journal of Applied Psychology, 81, 134 141. Thorsteinson, T.J., & Ryan, A.M. (1997). The effect of selection ratio on perceptions of the fairness of a selection test battery. International Journal of Selection and Assessment, 5, 159168. Ullman, J.B., & Newcomb, M.D. (1998). Eager, reluctant, and nonresponders to a mailed longitudinal survey: Attitudinal and substance use characteristics differentiate responses. Journal of Applied Social Psychology, 28, 357375.
International Association for Applied Psychology, 2003.

532

MARCUS

Whitney, D.J., Diaz, J., Mineghino, M.A.E., & Powers, K. (1999). Perceptions of overt and personality-based integrity tests. International Journal of Selection and Assessment, 7, 35 45. Wonderlic, Inc. (1996). Wonderlic Personal Test (WPTGerman Version, Form A and B). Libertyville, IL: Wonderlic Personnel Test, Inc. Zajonc, R.B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology, 9(2, pt. 2), 127.

International Association for Applied Psychology, 2003.

Anda mungkin juga menyukai