Anda di halaman 1dari 4

GENERALISABILITY OF THE PSYCHOMETRIC PROPERTIES OF A PILOT SELECTION BATTERY

Agns Kokorian People Technologies, UK


agneskokorian@ntlworld.com

& Colin Valsler, Psytech Ltd, UK


colin@psytech.demon.co.uk

The Pilot Aptitude (PILAPT) system The development of the PILAPT computer-based system grew out of the meta-analysis reported by Hunter and Burke, and the design principles have been described by Burke, Kitching and Valsler (1994). In summary, these design principles were as follows: that the test designs be based on clearly understood measures of individual differences that research has shown are relevant to pilot performance, either in training or in operations. As such, PILAPT had to cover both handling skills (as required in ab initio training) and CRM competencies (such as situational awareness and capacity). that the test designs should assume no prior knowledge of flying, but should have links to key pilot performance factors that are intuitive to both candidates and users. that the test designs should allow for practice to avoid the influence of prior experience of video games and give all candidates a level playing field to demonstrate their potential.

that the overall battery should be efficient and avoid redundancy and nugatory assessments. Design work on PILAPT began in 1994 and has continued with new tests and new scoring algorithms over the nine years since. Beginning with ab initio selection for the Royal Air Force (RAF) University Air Squadrons, PILAPT has been evaluated through data provided by air forces in Chile, Denmark, Portugal, Sweden, Norway and Italy as well as civilian airlines and training schools in the UK, Europe and Asia. PILAPT is a fully automated test delivery system built on the TEKS technology developed by Psytech Ltd. The system caters for all aspects of the testing process from candidate log on including the capture of biographical data, instructions, test administration, test scoring, analysis of candidate performance, reporting, and data transfer to other systems. The system has crash recovery and networking capabilities. The PILAPT battery of tests developed to date includes: Hands (10 minutes) the ability to process oral (verbal) rules to execute a visual task quickly and accurately related to absorbing and using oral (e.g. radio information) under pressure

Patterns (10 minutes) the ability to ignore distracting information in order to make quick and accurate decisions under time pressure related to maintaining focus on critical information when confronted with ambiguous situations and pressure Concentration (8 minutes) - the ability to maintain focus on a primary task when the conditions for that task are constantly changing related to maintaining situational awareness Deviation Indicator (7 minutes) the ability to compensate for deviations in flight parameters with a look-and-feel based on the flight path deviation indicator (FPDI) related to basic handling skills Trax (5 minutes) a pursuit tracking task requiring the candidate to work in a 3 dimensional environment related to advanced aircraft control

In addition to the tests above, primarily driven in design by ab initio requirements and taking around 40 minutes in total, the PILAPT battery has been extended to include a minitest battery named Capacity designed to assess performance under increasing workload. Capacity takes around 15 minutes to complete and comprises a primary handling task and two secondary tasks involving visual and auditory information. Tasks are administered and measured under a combination of single, dual and triple task load conditions, and the impact of increased workload on the candidates performance is then analysed and reported using a display similar to that shown in Figure 1 below. The data shown shows average performance for Swedish fighter pilot applicants.
500

400

300

C a p a c ity u n d e r s in g le ta s k lo a d
200

H o w m u c h c a p a c ity d o e s th e c a n d id a te re ta in a s w o rk lo a d in c re a s e s ?

Mean

100 DI4 IN G L E S S ING L DI4 DUA L DUAL

TDI4 TRIPL R IP L E

C a p a c ity u n d e r trip le ta s k lo a d

Figure 1: Overview of what the PILAPT Capacity mini-battery measures Reliability and construct validity evidence supporting PILAPT This section of the paper provides a summary of the data collected on the PILAPT tests to date in military context. Given that different tests are at different stages in the development cycle, the evidence provided varies across PILAPT tests reflecting the iterative cycle of development since 1994. The evidence is presented in three parts in line with

recommendations from professional bodies such as the American Psychological Association (APA), British Psychological Society (BPS) and the International Test Commission (ITC). First, evidence of test reliability (associated with accuracy and stability of scores) is presented and followed by results from studies involving other marker tests of pilot aptitude (construct validity). Papers two (Calanna and Serusi) and three (Kokorian, Valsler and Cabrera) present criterion validity data. Reliability The standard recommendation for the level of reliability required for tests used in selection is a minimum coefficient of 0.7 (this in effect states that 70% of the variation in test scores is true variation as intended in the tests design). Table 1 summarises the results of reliability (internal consistency) analyses across various country and organisational sites using the Schmidt-Hunter meta-analysis model. DI and Trax are not included in Table 1 as internal consistency estimates of reliability are not suitable for these tests. Data on their test-retest reliability is given below. Table 1 contains two versions of Hands, a longer 40-item version and a shorter 25-item version. Source Chile Denmark Italy Norway Portugal Sweden UK Total Sample Size (N) Sample Weighted Mean 90% Credibility Local Sample 370 1,212 108 232 1,218 762 585 Hands 0.89 0.90 0.87 0.92 0.93 0.94 0.91 4,487 0.92 0.89 Patterns 0.6 0.69 0.72 0.71 0.73 0.71 3,902 0.70 0.66 Concentration

0.76

0.83 (N=430) 538 0.82 0.79

Table 1: Reliability results for PILAPT tests across various national sites In addition to these results, a test-retest (stability) study was conducted in 1995 for the RAF UAS (N=109). This study had a four month interval between test administrations and yielded reliabilities of 0.80 for DI, 0.84 for Trax and 0.77 for Hands, and an overall test-retest reliability of 0.91 for the sum of these three PILAPT test scores. All these data clearly show PILAPT tests exceed the minimum requirement of 0.7 reliability for use in pilot selection. As an overall composite score for use in selection decisions, the PILAPT battery offers a reliability of 0.9 and above.

Construct validity This section presents the results of a study conducted in Denmark involving four PILAPT tests DI, Hands, Patterns and Trax and a 15-test battery used to assess both aircrew and ATC aptitudes. Data were available across all 19 tests for a sample of 632 applicants. The content of the 15-test battery was classified according to test content in line

with the classifications used by Hunter and Burke in their meta-analysis. This classification then provides a direct test of the extent to which PILAPT is measuring pilot relevant predictor constructs. The results are shown in Table 2. Test Group DI Hands Patterns Trax Overall Mathematical Reasoning .06 .12 .31 .37 0.44 Numerical Speed & Accuracy .03 .11 .29 .25 0.35 Language .08 .18 .14 .20 0.29 General Reasoning .13 .18 .33 .51 0.57 Spatial .24 .38 .38 .17 0.53 Mechanical .27 .35 .40 .29 0.55 Memory .05 -.09 .23 .13 0.27 Notes: Overall column gives the regression of the Test Group onto the 4 PILAPT tests Correlations in bold and italicised are significant at the 0.01 level Table 2: Results for 632 Danish military applicants Hunter and Burke identified the following predictor constructs as being the most consistent and substantial predictors of pilot training success: perceptual speed, mechanical reasoning, spatial reasoning, psychomotor and simulation based tests. The Danish data set did not contain psychomotor or simulation based tests, but the results clearly show that PILAPT is tapping the other predictor constructs identified by Hunter and Burke as critical to predicting pilot training success. Corrected for average reliabilities in the Danish tests (0.8), the overall regressions (furthest right hand column in Table 2) would range from 0.34 to 0.71. Conclusion This paper has summarised the R&D objectives as well as reliability and construct validity evidence supporting the PILAPT battery. The consistency of the reliability estimates across different national sites with different selection processes and different applicant populations suggests that scores are generalisable across settings. Evidence of criterion validity and transportability of validity is presented in the second and third papers of this symposium. References
Burke, E., Kitching, A., and Valsler, C. (1994). Computer-based assessment and the construction of valid aviator selection tests. In N. Johnston, R. Fuller, and N. McDonald (Eds.). Aviation Psychology: Training and Selection. Cambridge: Avery. Hunter, D. R., and Burke, E. (1994). Predicting aircraft pilot-training success: A meta-analysis of published research. Journal of Aviation Psychology, 4, 297-313. Hunter, D. R., and Burke, E. (1995). Handbook of Pilot Selection. Cambridge: Ashgate. Hunter, J. E., and Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage

Anda mungkin juga menyukai