Robert J. McCaffreya; Albert Ortegaa; Susan M. Orsilloa; Wendy B. Nellesa; Richard F. Haasea
a
The University at Albany, State University of New York,
To cite this Article McCaffrey, Robert J. , Ortega, Albert , Orsillo, Susan M. , Nelles, Wendy B. and Haase, Richard F.(1992)
092@-1637/92/0601-0032$3.00
0 Swets & Zeitlinger
CLINICAL ISSUES
ABSTRACT
The magnitude of the practice effects associated with repeated administration of the
same neuropsychological assessment instruments was examined. In two separate research
protocols, subjects were administered their respective battery of neuropsychological
instruments twice within 7 to 10 days prior to the initiation of any experimental
manipulations. Factors of interest included the test-retest reliability correlation coefficients, the magnitude of practice effects, and the intercorrelation matrices among the
instruments. In general, the test-retest reliabilities of the neuropsychologicalinstruments
were consistent with those found in the general psychological assessment literature.
The magnitude of practice effects was greatest on the Logical and Figural Memory
Subtests of the Wechsler Memory Scale in both protocols. The intercorrelation matrices
may be useful in planning sample sizes for future studies since estimates of statistical
power will require the consideration of the intercorrelations among groups of dependent
variables.
Preparation of the article was supported in part by National Institutes of Health Grants
HL-35112 and NS-25006 to Robert J. McCaffrey. Reprint requests to: Robert J. McCaffrey,
Department of Psychology, The University at Albany, State University of New York,
1400 Washington Avenue, Albany, NY 12222, USA.
Accepted for publication: April 2, 1991.
PRACTICE EFFECTS
33
practiced response, or that have a single, easily conceptualized solution are likely
to result in significant practice effects (Dodrill & Troupin , 1975). Among braininjured patients, significant practice effects are reported to occur frequently,
while large test to retest changes are reported not to be common among
neurologically intact subjects (Lezak, 1982). The majority of the neuropsychological
literature on practice effects in normals has focused on the subtests of the Wechsler
Adult Intelligence Scale (Matarazzo, 1972; Matarazzo, Carmody, & Jacobs, 1980;
Shatz, 1981). There is little normative information on practice effects associated
with most neuropsychological instruments for the general population (Gill, Reddon,
Stefanyk, & Hans, 1988; Maxwell & Niemann, 1984; Maxwell, Wise, Pepping,
& Townes, 1984; Wilson, Wilson, Iacoviello, & Risucci, 1982) and very little for
actual patient populations (desRosiers & Kavanagh, 1987).
The psychological assessment literature contains data on the test-retest reliability
of virtually all of the instruments used to assess cognitive/behavioral performance,
almost without exception, however, only reliability coefficients are reported (e.g.,
Brown, Rourke, & Cicchetti, 1989; Matarazzo, Wiens, Matarazzo, & Manaugh,
1973; Su & Yerxa, 1984). Psychometrically, this is useful information; however,
it is meaningless in terms of evaluating practice effects. For example, a reliability coefficient of .98 could be obtained if the subjects systematically made a
mild, moderate, or substantial increase (or decrease) in performance at the retest
compared to the initial test but maintained their same relative rank order on the
two administrations of the instrument. Statistically, Shatz (1981) has suggested
that the standard error of measurement be used to set up confidence intervals
around an individual patients score in order to partial out practice effects from
other factors related to improvement in patients performance across assessments
(e.g., recovery of function following a CVA). Another technique involves the
use of equated alternate forms of the same instruments. For many
neuropsychological assessments, however, these are not available. In addition,
Anastasi (1988) points out the gains may also occur at retesting using parallel
forms (i.e., the test sophistication effect). In general, these gains are reported to
be smaller than those obtained using the same form of an instrument.
Another alternative is to administer the entire neuropsychological battery
twice, in order to assess for the degree of practice effects and to obtain a baseline
level of performance. The second administration of the neuropsychological battery is used as a baseline for comparison against subsequent assessments. The
initial administration of the neuropsychological battery then serves as a methodological procedure to reduce the influence of practice effects.
The present study reports on the general issue of practice effects based on two
separate ongoing research projects. The selection of the instruments contained in
each of the neuropsychological batteries was based on specific research questions.
Despite differences between the two subject populations and the neuropsychological
batteries, important information regarding test-retest reliability, the degree of
practice effects and the intercorrelations among the neuropsychological instruments
is presented.
34
METHOD
Subjects
The first project involved an examination of the neuropsychological and physical sideeffects of the beta-adrenergic blocker, metoprolol, using a double blind placebo controlled crossover experimental design (McCaffrey, McCoy, Haase, Ortega, & Orsillo, 1990).
Twenty-five newly diagnosed, untreated, mild, essential hypertensives (DBP = 90-105
mm Hg) volunteered as subjects. There were 17 males and 8 females with an average age
of 50.1 years (SD= 14.0) and 14.1 (SD = 2.9) years of education. The second project was
designed to evaluate the neuropsychological sequelae of prophylactic cranial irradiation
therapy in patients with small cell lung cancer who presented with no evidence of CNS
metastases (McCaffrey et al., 1990). As a control for the influence of pulmonary disease
in the cancer patients, the control group was comprised of chronic cigarette smokers.
Thirty-three chronic cigarette smokers with a mean consumption of 1.3 packs per day (SD
= 0.5) and a mean smoking history of 38.6 years (SD = 12.2) were recruited from the
community and paid $50 for their participation. In the smoker group, there were 15 males
and 18 females with a mean age and educational level of 59.1 (SD = 9.3) and 14.9 (SD =
3.4) years, respectively.
Apparatus and Procedure
For both groups, the subjects were administered their respective neuropsychological battery twice, within 7 to 10 days. The same assessor was used to evaluate the same subject at
both the test and retest assessments. This procedure was employed in order to reduce the
confounding influence of practice effects.
The rationale for the selection of the neuropsychological instruments was determined
by the requirements of the separate research questions. The instruments in each of the
neuropsychological batteries are presented in Tables I and 2 and are described in Lezak
(1983), except for the following tests. The Span of Attention Test (Kay, 1982) deals with
the ability to sustain focused alertness on a task which makes minimal demands on higher
cognitive processes. The test consists of a single 8 1/2 x 11 inch page filled with 500 Xs
arranged in 25 rows of 20 Xs, each separated by a dash (e.g., X-X-X ...). The instructions
are to circle as many Xs as possible in 400 seconds. The Math and Reading tests were
obtained from the University of the State of New York, Regents High School examination,
Competency Tests. The versions used were from prior forms of the Regents Competency
Test. The Math test involved 20 untimed arithmetic problems. The Reading test involves
a passage that is read aloud by the teacher to the students followed by 10 multiple choice
questions. The test was modified by having the subjects read the passage themselves and
then answer the questions without the aid of the passage. The Static Motor Steadiness Test
manufactured by Lafayette Instruments Company (#30211) consisted of having the subject
hold a metal stylus in each of four holes of decreasing diameter (0.3 12,O. 187, and 0.125
in.) for separate 15-second trials. The score was the total number of times the stylus came
in contact with the diameter of each hole. Simple Auditory Reaction Time was obtained
using a Lafayette Instruments Company Choice Reaction Time Apparatus (#63035) and a
ClocWCounter (#54035) set at .001 seconds. The reaction time was the average of 30
trials following 5 practice trials.
The neuropsychological instruments were administered and scored using their standardized instructions, except for the following. In order to decrease the amount of uncontrolled variance attributable to assessor differences in the administration and the scoring
of the Logical Memory portions of the Wechsler Memory Scale (Wechsler, 1945). the
paragraphs were tape recorded and the instructions and recorded paragraphs played to
each subject. For research purposes, subjects were informed that there would be a delayed
recall (Russell, 1975) of the Logical and Figural Memory Subtests of the Wechsler Memory
Scale at each of the two assessments. The subjects recall of the Logical Memory, imme-
35
PRACTICE EFFECTS
diate and delayed, was tape recorded for subsequent verbatim transcription and scoring
using Prigatanos (1978) criteria. The Digit Span Subtest of the WAIS and the PairedAssociate Learning Subtest of the Wechsler Memory Scale were also presented via tape
recordings. Since the responses to these two subtests are not as complex as those to the
Logical Memory component, the responses were recorded on prepared assessment forms.
Table 1.Test-retest reliability and practice effects associated with select neuropsychological
instruments in essential hypertensives at the practice effects (PE) and baseline
(BL) assessments.
.61 10.23
.74 8.55
(3.14)
(2.65)
12.36
11.34
(2.75)
(3.39)
4.03
-6.12
(C.005)
Figural Memory
Immediate Recall
Delayed Recall
24
24
.63 11.17
.74 10.08
(3.14)
(3.51)
12.17
11.70
(2.18) -2.00
(2.37)
-3.3
(c.05)
Paired Associates
25
.53 10.88
(2.73)
12.21
(2.39)
-2.7
(C.01)
25
24
33.86 (14.98)
87.08 (53.03)
1.05
1.31
(c.152)
(<.lo)
78.91 (26.92)
81.59 (15.79)
1.56
3.47
(c.06)
(<.005)
5 8 54.67 (11.55)
.82 50.35 (10.95)
55.94 (10.50)
49.78 (10.22)
4.64
0.44
(<.263)
(<.332)
0.57
(<.286)
(<.0005)
(<.005)
Span of Attention
24
WAIS-RDigit Span
Forward
Backward
24
24
.78
.85
8.44
7.44
(2.10)
(2.58)
8.70
8.13
(2.69)
(3.06)
-0.85
-1.80
(<.202)
(<.042)
Math Test
24
3 9 14.08
(4.31)
15.17
(4.05)
-2.10
(<.025)
Reading Test
25
.87
(2.91)
5.56
(3.11)
-1.41
(<.085)
5.12
36
RESULTS
The test-retest reliability ( r J and the statistical magnitude of the practice effects
are presented in Tables 1 and 2 for the qssential hypertensive and chronic smoker
groups, respectively. The test-retest reliability ( r J of each of the neuropsychological
instruments was evaluated by calculating Pearson-Product Moment correlations.
The correlations ranged from 0.53 to 0.92 among the sample of essential
hypertensives and 0.47 to 0.89 for the chronic smokers.
The magnitude of practice effects were examined by computing one-tailed,
dependent group, t tests for each of the instruments. The use of the t tests was to
document the magnitude of the practice effects. The p values presented in Table
1 and 2 indicate the magnitude of the obtained practice effects.
Table 2.Test-retest reliability and practice effects associated with select neuropsychological
instruments in chronic smokers at the practice effects (PE) and baseline (BL)
assessments.
Chronic Smokers
Logical Memory
Immediate Recall
Delayed Recall
33 .47
32 .68
9.68
8.09
(3.38) 11.12
(2.93) 10.55
(3.26)-2.42 (C.025)
(2.98)-5.93 (<.0005)
Figural Memory
Immediate Recall
Delayed Recall
32 .53
32 .69
9.70
8.94
(2.94) 11.19
(3.50) 11.03
(2.87)-2.98 (<.001)
(2.95)-4.43 (<.0005)
Preferred Hand
Nonpreferred Hand
33 .70 46.88
33 .59 8.55
33 3 9 10.71
(6.63) 7.94
(9.31) 10.30
(6.90)
(9.68)
(3.03) 55.06
33 .64 26.42
(3.14) 26.24
(3.25)
33 .62
(.08)
Nonpreferred Hand
.37
.35
.59 (<.28)
.54 (<.30)
.39 (<.35)
(TMTA) -.11/-.03
(TMTB) -.31/-.15
.27/.32
,751.84
.l8/.24
FMD
.05/ .32
.10/.25
.22/ .I6
.26/ .16
(DSF)
(DSB)
(MT)
(RT)
Math Tcsl
Rending Tcsr
.55/ 5 1
,651 .63
TMTB
91/ .57
GPP
-.55/-.59
-.53/-.57
-.56/-.55
.42/ .a-.61/-36
.52/ 3 4
,271.21 - 3 - . 5 3
.52/ .41
.17/ .13
GPN
-.36/-si
-.47/-.42
-.211-.17
,171 3 4
.28/ 3 0
,471.24
SOA
DSF
-.ia/-.u
.55/ .54
,571.51
DSB
MT
.15/ 2 8 -.38/-.22
,501.45 -.21/-.24 ,691.69
-.251-36
FOTK
Note: The first corrclation is the practice effects correlation for borh instruments and lhe second Correlation I S rhe correlation for both instruments at baseline.
-.29/-.35
-.37/-28
,761.84
FOTP
-.37/-36
-.34/-.35
-. 4 w .29 -.56/-.45
-.41/-.34 -.48/-36
.40/ .25
.33/ .47
,481.64
.77/ 3 0
TMTA
,071.04
(SOA)
SpanofAtlenrion
.07/.22
.24/ 3 2
.24/ .34
,041.04
PA
.08/-.20
-.06/-.09
-.G9/-.25
.24/.17
.39/.11
(PA)
Paired Assaciatcr
.18/ 2 5
.17/ .I1
.27/ .I3
.35/-.06
FMI
LMD
,821 .81
LMI
Instruments
RT
(RT)
(MT)
(S ON
(FOTP)
(FOTN)
(TMTA)
(TMTB)
Table 3.Intercorrelational matrix of the neuropsychological instruments used in the essential hypertensive population at the practice effects/
baseline assessments.
w
4
(SDMT) -.34/-.03
(MSP)
(MSN)
.05/-.I8
(RT)
,521.13
GPN
SDMT
-.OW .34
.71/ .79
MSP
MSN
SSPT
(RT)
(SRT)
(TMTB)
SRT R T
.09/.05 -.02/-.19
-.23/-.37
-.36/-.33 -.13/-.40
-.44/-.41 -.36/-.56
,341.32 -.31/-33
,351.45
,371.48 .19/-.02
.32/ .09
.73/ .69
GPP
Note: The first comelation is the practice effects correlation for both instruments and the second correlation is the correlation for both instruments at baseline.
.33/ .17
(SRT)
,261.25
,331.13
,341.32
.27/ .36
(GPP) -.36/-.23
(GPN) -.33/-.20
.84/ .74
.58/.31
.52/.22
TMTB
,551.24
,441 .I7
-
FMD
(FMI)
(FMD)
FMI
Figural Memory
Immediate Recall
Delayed Recall
.94/.65
LMD
&MI)
(LMD)
LMI
W e c h l e r Memory Scale
Logical Memory
Immediate Recall
Delayed Recall
Instruments
Table 4. Intercorrelation matrix of the neuropsychological instruments used in the chronic smoker population at the practice effectshaseline
assessments.
!a
39
PRACTICE EFFECTS
Delayed Recall
Figural Memory
Immediate Recall
Delayed Recall
Paired Associates
Subjects
r,,
Test-retest Interval
Source
5 1 elderly males
26 elderly females
30 normals and
75 brain-damaged
subjects
.93
.77
1 day
1 day
51 elderly males
26 elderly females
30 normals and
75 brain-damaged
subjects
.90
.51
51 elderly males
26 elderly females
.90
.93
1 day
1 day
20 weeks
12.4 weeks
20 weeks
12.4 weeks
20 weeks
20 weeks
12.4 weeks
20 weeks
Russell (1975)
1 day
.67 1 day
Halstead-Reitan Battery
Trail Making Test
Part A
Part R
W N S - R Digif Span
24
40-80 normals
.83
12.4 weeks
1 to 7 weeks
Wechsler (198 1)
4 weeks
Smith (1982)
Test
80 adults
40
DISCUSSION
The values of the test-retest reliability correlation coefficients for the neuropsychological instrument obtained with the essential hypertensive and chronic
smoker samples are consistent with those reported in the general psychological
assessment literature (Anastasi, 1986). In both samples, the magnitude of practice
effects, as indexed by one-tailed t tests, were greatest for the Logical and Figural
Memory subtests of the Wechsler Memory Scale. Among the essential hypertensives
considerable practice effects were also obtained on the Paired-Associate Subtest
of the Wechsler Memory Scale, the Grooved Pegboard Test-nonpreferred hand,
the WAIS-R Digit Span-Backwards and the Math Test. Practice effects for the
chronic smokers were observed on the Trail Making Test - Part B, the Grooved
Pegboard Test-nonpreferred hand, and the Simple Auditory Reaction Time Test.
In both samples, the magnitude of practice effects among the remaining instruments
was minimal.
The magnitude of the test-retest correlation coefficients among the chronic
smokers were generally lower than those obtained on the same instruments in the
sample of essential hypertensives. There are at least three factors which may
have contributed to these discrepant findings despite comparable methodologies.
First, the chronic smokers were paid volunteers whose primary motivation was
financial while the essential hypertensives were unpaid subjects whose motivation
focused on knowing the potential impact of their medication on cognitive functioning. The second factor is that the chronic smokers pulmonary functioning
may have been compromised relative to the essential hypertensives. Third, the
chronic smokers were significantly older ( t (60) = 9.27, p < .Ol) than the essential
hypertensives. Thus, the differences obtained between the two samples may have
been due to any one or a combination of these factors.
The test-retest reliability measures of both research samples and those reported from various normative data sources in the neuropsychological literature
vary considerably reflecting different populations, sampling procedures and testretest intervals. As such, clinical neuropsychologists must take these factors into
consideration when attempting to interpret the data from individual patients.
The primary goal of this report was to acknowledge the differences in test-
PRACTICE EFFECTS
41
retest reliability across different populations, evaluate t h e extent of practice effects, and report on the intercorrelations matrices of a select g r o u p o f
neuropsychological instruments. The latter goal was t o provide data that would
b e useful t o investigators planning multivariate studies. Clearly, clinical
neuropsychologists need t o expand to the existing data base on the factors noted
above. In t h e interim, both t h e scientist and practitioner should exercise caution
when attempting t o interpret the significance o f practice effects a n d test-retest
reliability coefficients i n repeated neuropsychological assessments.
REFERENCES
Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.
Brown, S.J., Rourke, B.P., & Cicchetti, D.V. (1989). Reliabilities of tests and measures
used in the neuropsychological assessment of children. The Clinical Neuropsychologist, 3, 353-368.
desRosiers, G., & Kavanagh, D. (1987). Cognitive assessment in closed head injury:
Stability, validity and parallel forms for two neuropsychological measures of
recovery. The International Journal of Clinical Neuropsychology, 9, 162-1 73.
Dodrill, C.B., & Troupin, A.S. (1 975). Effects of repeated administration of a comprehensive
neuropsychological battery among chronic epileptics. Journal of Nervous and Mental
Disease, 161, 185-190.
Gill, D.M., Reddon, J.R., Stefanyk, W.O., & Hans, S.H. (1986). Finger tapping: Effects of
trials and sessions. Perceptual and Motor Skills, 62,675-678.
Kay, S.R.(1982). The cognitive diagnostic battery. Odessa, FL: Psychological Assessment
Resources.
Lezak, M.D. (1982, June). The test-retest stability and retiability of some resrs commonly
used in neuropsychological assessment. Paper presented at the fifth European
conference of the International Neuropsychological Society, Deauville, France.
Lezak, M.D. (1983). Neuropsychological assessment (2nd ed.). New York: Oxford.
Matarazzo, J.D. (1972). Wechslersmeasurement and appraisal of adult intelligence. New
York: Oxford University Press.
Matarazzo, J.D., Wiens, A.N., Matarazzo, R.G., & Goldstein, S.G. (1974). Psychometric
and clinical test-retest reliability of the Halstead Impairment Index in a sample of
healthy, young, normal men. Journal of Nervous andMental Disease, 158.37-49.
Matarazzo, J.D., Carmody, T.P., & Jacobs, L.D. (1980). Test-retest reliability and stability of the WAIS: A literature review of implications for clinical practice. Journal
of Clinical Neuropsychology, 2,89-105.
Matarazzo, R.G., Wiens, A.N., Matarazzo, J.D., & Manaugh, T.S. (1973). Test-retest
reliability of the WAIS in a normal population. Journal of Clinical Psychology, 29,
194-197.
Maxwell, J.K., & Niemann, H. (1984). The Finger-Tip Numberwriting Test: Practice
effects versus lateral asymmetry. Perceptual and Motor Skills, 59, 343-351.
Maxwell, J.K., Wise, F., Pepping, M., & Townes, B.D. (1984). Fingertip number-writing
errors by psychiatric patients. Perceptual and Motor Skills, 59, 933-934.
McCaffrey. R.J., McCoy, G.C., Haase, R.F., Ortega, A., & Orsillo, S.M. (1990, November). Neuropsychological side effects of metoprolol. Presented at the Annual
Meeting of the National Academy of Neuropsychology, Reno, NV.
McCaffrey, R.J., Orsillo, S.M., Lefkowicz, D.P., Ortega, A., Haase, R.F., Wagner, H., &
Ruckdeschel, J.C. (1990, November). Neuropsychological sequelae of chemotherapy and prophylactic cranial irradiation: An extension of earlier findings.
42