PUSH UP - Norm Referenced and Criterion-Referenced Reliability

See
discussions, stats, and author profiles for this publication at:

https://www.researchgate.net/publication/232865639
Norm-Referenced and
Criterion-Referenced
Reliability of the Push-Up and
Modified Pull-Up
Article in Measurement in Physical Education and Exercise Science · June

2001
DOI: 10.1207/S15327841MPEE0502_1
CITATIONS READS
20 1,282
2 authors, including:
Matthew T Mahar
San Diego State University
181 PUBLICATIONS 2,534 CITATIONS
SEE PROFILE
All content following this page was uploaded by Matthew T Mahar on 05 October 2017.
The user has requested enhancement of the downloaded file.

MEASUREMENT IN PHYSICAL EDUCATION AND EXERCISE SCIENCE, 5(2), 67–80
Copyright © 2001, Lawrence Erlbaum Associates, Inc.
Norm-Referenced and
Criterion-Referenced Reliability of the
Push-Up and Modified Pull-Up
Benny Saint Romain and Matthew T. Mahar
Department of Exercise and Sport Science
East Carolina University
This study determines the test–retest reliability and equivalence reliability (i.e., de-
fined as consistency of 2 tests designed to measure the same construct) of the push-up
and the modified pull-up tests from both norm-referenced and criterion-referenced
frameworks. Sixty-two (30 boys and 32 girls) 5th- and 6th-grade students (mean age =
11.4 ± 0.9 years) were administered 2 test trials for both the push-up and modified
pull-up tests following a 4-week fitness unit. The norm-referenced test–retest reliabil-
ity estimates, using intraclass correlations from a one-way analysis of variance
(ANOVA), were consistently high for 2 trials of both the push-up and modified
pull-up (R = .99) and also high for 1 trial (R = .98 for push-up, R = .97 for modified
pull-up). Moderately high correlations between the push-up and modified pull-up
were obtained (r ≥ .74). Criterion-referenced reliability was estimated with propor-
tion of agreement (Pa), modified kappa (Kq), and Phi coefficient using the
FITNESSGRAM® standards (Cooper Institute for Aerobics Research, 1999). Crite-
rion-referenced test–retest reliability estimates were high for the push-up (Pa = .97,
Kq = .94) and the modified pull-up (Pa = .95, Kq = .90). Equivalence reliability esti-
mates were considerably lower between push-up and modified pull-up test Trial 1 (Pa
= .69, Kq = .38, and Phi = .50) and test Trial 2 (Pa = .71, Kq = .42, and Phi = .52).
Norm-referenced and criterion-referenced test–retest reliability estimates in this
study were acceptable. However, criterion-referenced equivalence reliability find-
ings were not acceptable. Equating of standards between fitness tests designed to
measure the same component of fitness should be examined.
Requests for reprints should be sent to Matthew T. Mahar, Department of Exercise and Sport Sci-
ence, East Carolina University, Greenville, NC 27858. E-mail: maharm@mail.ecu.edu
68 SAINT ROMAIN AND MAHAR
Key words: criterion-referenced reliability, norm-referenced reliability, push-up,

modified pull-up, fitness testing, upper body strength
The U.S. Surgeon General (U.S. Department of Health and Human Services,
1996) reported about one half of U.S. students are not vigorously active on a
regular basis and nearly one fourth of those students perform no vigorous physi-
cal activity at all. The U.S. Public Health Service (1990) reported health objec-
tives that encouraged people to engage in physical activity that promotes
cardiorespiratory fitness, strength, endurance, and flexibility. Regular physical
activity reduces the risk of premature death, diabetes, high blood pressure, and
colon cancer. Therefore, developing long-term fitness habits is important for
children and youth.
Physical activity directors can choose from several national youth fitness test
programs, including the President’s Challenge (President’s Council on Physical
Fitness and Sports, 1993) and the FITNESSGRAM® (Cooper Institute for Aero-
bics Research, 1999). Both fitness test programs include criterion-referenced
standards. FITNESSGRAM® standards were developed to represent minimal
levels of fitness consistent with health and reduced risk of disease (Morrow,
Falls, & Kohl, 1994).
The FITNESSGRAM® tests three components of fitness: aerobic capacity;
body composition; and muscular strength, endurance, and flexibility. Five tests
are recommended and alternative tests are available to measure each component
of youth fitness (Cooper Institute for Aerobics Research, 1999). This article fo-
cuses on tests of upper body muscular strength and endurance. Physical activity
directors can choose from among four upper body strength tests, including the
push-up, pull-up, modified pull-up, and flexed-arm hang. The recommended test
for upper body strength is the push-up because the test requires little equipment,
many students may be tested at one time, and most students can perform at least
one push-up (Cooper Institute for Aerobics Research, 1999).
Zhu (1998) noted that tests for upper body strength have different levels of
difficulty and different methods of scoring. For example, the pull-up test is mea-
sured by the number of pull-ups completed, and the flexed-arm hang test is mea-
sured by the number of seconds a student can hang with flexed arms. Pate,
Burgess, Woods, Ross, and Baumgartner (1993) reported moderate
intercorrelations among field tests of upper body strength and endurance. The
correlations between push-up and modified pull-up performance ranged between
.40 and .71 for 9- and 10-year-old children; however, principal components fac-
tor analyses indicated that all five field tests of upper body strength and endur-
ance loaded on the same factor.
Pate, Ross, Baumgartner, and Sparks (1987) identified that one problem with
pull-up tests and flexed-arm hang tests is that they are affected by body weight. A partic-
ipant must overcome his or her entire body weight to perform each test, which hin-
RELIABILITY OF THE PUSH-UP AND MODIFIED PULL-UP 69
ders performance of heavier individuals. In the first National Children and Youth
Fitness Study, Ross, Dotson, Gilbert, and Katz (1985) confirmed that between 10%
and 30% of boys ages 10 to 14 and more than 60% of girls ages 10 to 18 had zero
scores on the pull-up test. Reiff et al. (1986) revealed similar results where 45% of
boys ages 6 to 14 and 55% of girls ages 6 to 17 scored zero on the flexed-arm hang
test. Engelman and Morrow (1991) reported that 53% of girls and 31% of boys in
Grades 3 through 5 had zero scores on the pull-up test. These field tests do not have
the ability to detect individual differences among students with lower levels of mus-
cular strength and endurance. Lower reliability can be expected when a test lacks the
ability to discriminate among ability levels.
Although pull-up and flexed-arm hang tests are used as measures for upper
body strength, Ross, Pate, Delpy, Gold, and Svilar (1987) argued that the mod-
ified pull-up and the push-up tests are more acceptable field tests for upper
body strength than the chin-up, pull-up, and flexed-arm hang field tests. The
modified pull-up and push-up tests provide a better range of scores because
body weight is supported and few zero scores occur. Thus, these tests have
better discrimination among ability levels yielding more heterogeneous scores.
Only 5% of both boys and girls older than age 8 scored zero on the push-up
test. Similar results were also recorded for the modified pull-up test, where
only 5% of both boys and girls ages 6 to 9 had zero scores (Ross et al., 1987).
Engelman and Morrow (1991) reported that only about 2% of boys and 3% of
girls received zero scores on the modified pull-up. Cotten (1990) found similar
results.
Reliability studies have been conducted for both the push-up and modified
pull-up tests. McManis, Baumgartner, and Wuest (2000) reported intraclass reli-
ability estimates of a single trial of the 90° push-up of .64 for 73 girls and .71 for 83
boys in Grades 3 through 5. Students tested each other in this study. Cotten (1990) re-
ported moderate single-trial reliability estimates (.56 < R < .82 for boys and .71 < R <
.90 for girls) for children in Grades K through 6 for the modified pull-up. Sample
sizes for this study ranged between 8 and 33 for boys and between 11 and 33 for girls.
Engelman and Morrow (1991) found acceptable intraclass reliability estimates for a
single trial of the modified pull-up for boys (.68 < R < .83) with sample sizes ranging
between 70 and 89 and for girls (.77 < R < .83) with sample sizes ranging between 67
and 87 in Grades 3 through 5. Kollath, Safrit, Zhu, and Gao (1991) reported an
intraclass reliability estimate for a single trial of the modified pull-up of .91 for 61
boys and .72 for 44 girls in Grade 9.
Although the push-up and modified pull-up are considered acceptable fitness
tests for upper body strength, nearly all studies of reliability are reported using a
norm-referenced framework. Kollath et al. (1991) examined the crite-
rion-referenced test–retest reliability of the modified pull-up in ninth graders (age
not provided) with various criterion-referenced standards. The criterion-referenced
standard that corresponds to the FITNESSGRAM® standard for girls (i.e., 4 modi-
fied pull-ups) resulted in high consistency of classification (Proportion of agree-

ment or Pa = .91; Kappa or K = .66). None of the criterion-referenced standards used
by Kollath et al. corresponds to the FITNESSGRAM® standard for boys; however,
high criterion-referenced reliability estimates were reported for the standards that
were closest to FITNESSGRAM® standards for this age group (Pa > .90; K > .64).
The modified pull-up and push-up are used to assess the same aspect of
musculoskeletal development (i.e., upper body strength and endurance). However,
equivalence reliability between these two measures of upper body strength has not
been reported previously. Thus, the purpose of this study was to determine the
test–retest reliability and equivalence reliability of the push-up and the modified
pull-up tests from both norm-referenced and criterion-referenced frameworks.
Equivalence reliability is used in this case to represent the agreement between two
tests designed to measure the same construct (Morrow, Jackson, Disch, & Mood,
1995).
METHOD
Participants
This study was conducted at an elementary school in rural eastern North Carolina
with an enrollment of 472 students. All grades participated in the
FITNESSGRAM® assessment measures of physical fitness (Cooper Institute for
Aerobic Research, 1999). All sixth-grade classes and one fifth-grade class (N = 68)
participated once a week for 4 weeks on push-up and modified pull-up stations dur-
ing their physical education class. Six participants were excluded from this analysis
because of physical disabilities and absences, leaving a final sample of 62 partici-
pants. Participants included ten 10-year-olds, twenty-five 11-year-olds, twenty-two
12-year-olds, and five 13-year-olds. Boys (n = 30) averaged 10.6 ± 0.8 years of age
and girls (n = 32) averaged 11.2 ± 0.9 years of age. The participants were tested twice
on push-ups and modified pull-ups following the 4 weeks of training. The tests were
administered one week apart. The test administrators remained the same for both test
dates. Informed consent was obtained for the participants prior to this study.
Procedures
Height and body weight were measured with a physician’s scale. Body mass index
was calculated as body mass (kg) divided by the square of height in meters (m).
Fitness training. The entire school population participated once a week for 4
weeks in a physical education unit on fitness training. During this unit each class
was divided into groups of four and assigned to different training stations. The
training stations were used to counterbalance the order of the tests. Two of the six
stations were a push-up station and a modified pull-up station where each student
was instructed on the proper technique and execution of both the push-up and modi-
fied pull-up. To ensure that each student performed these upper body strength and
endurance tests correctly, two fitness instructors monitored the push-up and modi-
fied pull-up stations. The fitness instructors followed the procedures outlined in the
FITNESSGRAM® manual (Cooper Institute for Aerobics Research, 1999) to pro-
vide feedback on correct form. Stations were positioned so that the participants had
at least 20 min between push-up and modified pull-up tests. This allowed the stu-
dents to have an opportunity to recover between upper body stations. The sample of
fifth- and sixth-grade participants in this study were informed that they were in-
volved in a research project and that their best effort was requested each day. The
fitness-training unit provided practice for the participants as recommended in the
FITNESSGRAM® manual (Cooper Institute for Aerobics Research, 1999).
Push-up station. The fitness instructor had each participant assume the
push-up position. The instructor emphasized hand placement under the shoulders,
arms straight, fingers stretched out, and legs together and straight with the toes
tucked downward. Participants would then lower the body by bending their elbows
to a 90° angle and continue the movement until the arms were straight again (the
back and legs were kept in a straight line throughout the execution). Completion of
this movement was counted as one successful push-up. The entire process was per-
formed as many times as possible. A cassette tape with the recorded push-up ca-
dence (one push-up every 3 sec) was played while each participant performed the
push-ups. The test ended when the participant stopped or rested, did not maintain
correct body position, did not extend the arms fully, or did not achieve a 90° bend at
the elbow on at least two push-ups.
Modified pull-up station. The fitness instructor had each participant lie on
his or her back with shoulders under the bar. The bar was set 1–2 in. above the reach
of the participant. An elastic band was placed 7–8 in. below the bar. Each partici-
pant would grasp the bar using an overhand grip while keeping his or her arms and
legs straight, heels on the floor, and buttocks off the floor. Participants would then
pull themselves up until their chin was above the elastic band using only the arms
and keeping the body straight. Participants were required to perform the modified
pull-ups as a continuous movement. Completion of this movement was counted as
one successful modified pull-up. The entire process was performed as many times
as possible without stopping or resting. The test was terminated when the partici-
pant stopped or experienced pain or discomfort.
Statistical Analysis
Criterion-referenced reliability. The criterion-referenced standards from

the FITNESSGRAM® (Cooper Institute for Aerobics Research, 1999) for 10- to
13-year-old boys and girls for the push-up and modified pull-up tests were used to
classify participants as either passing or failing a test. A passing classification was
recorded when participants met or exceeded the criterion-referenced standards for
age and gender. A failing classification was recorded when participants did not
meet criterion-referenced standards for age and gender. Table 1 presents the
FITNESSGRAM® criterion-referenced standards for the push-up and modified
pull-up. Pa and modified kappa (Kq) statistics were used to provide criterion-refer-
enced reliability estimates (Looney, 1989). Pa represents the proportion of partici-
pants that are classified consistently (pass/pass or fail/fail) on both trials of a test.
For equivalence reliability, Pa is the proportion of participants classified the same
by both the push-up and modified pull-up. The Kq is used to adjust for chance
agreement. Phi coefficient is a Pearson correlation between two dichotomous vari-
ables and was used to provide additional information about the agreement between
the push-up and modified pull-up. These estimates of equivalence reliability de-
scribe the degree to which different fitness tests of the same construct consistently
classify participants in terms of meeting or not meeting the criterion-referenced
standards.
Norm-referenced reliability. Norm-referenced test–retest reliability for the

push-up and modified pull-up tests were estimated using an intraclass correlation
coefficient (R) from a one-way analysis of variance model (ANOVA; Baumgartner
& Jackson, 1999). Reliability for one trial for each test was estimated with the
Spearman–Brown prophecy formula. Pearson product–moment (r) correlations
were used to determine the relations between push-up and modified pull-up tests.
TABLE 1
FITNESSGRAM® Standards for Boys and Girls for the Push-Up and Modified Pull-Up
Variable Age Boys Girls
Push-ups (number completed) 10 7 to 20 7 to 15

11 8 to 20 7 to 15
12 10 to 20 7 to 15
13 12 to 25 7 to 15
Modified pull-ups (number completed) 10 5 to 15 4 to 13
11 6 to 17 4 to 13
12 7 to 20 4 to 13
13 8 to 22 4 to 13
Note. The number on the left under the Boys column and under the Girls column is the minimum
score needed to pass for each test and the number on the right is the more desirable standard for each test.
Ninety-five percent confidence intervals (CI) were calculated for each estimate
(Morrow & Jackson, 1993).
Where mean differences were important to examine, the size of the difference
between means (effect size) was estimated with Cohen’s (1988) delta (ES). ES was
calculated as the difference between means divided by the pooled standard devia-
tion so that ES equals the difference between means in standard deviation units.
Cohen suggested the following rule of thumb for interpretation of delta: ES ≥ 0.80
represents a large effect, ES ~ 0.50 represents a medium effect, and ES ≤ 0.20 rep-
resents a small effect.
RESULTS
Descriptive statistics are presented in Table 2. The boys performed better than the
girls on the push-up and on the modified pull-up tests (p < .001; ES ≥ 0.96). The
mean body mass index was similar for boys and girls (p = .99; ES = 0.00). Although
boys weighed approximately 2 kg more than the girls, this difference was small (ES
= 0.14) and not statistically significant (p = .58).
The percentages of students who passed the first trial of the push-up and modi-
fied pull-up tests are presented in Figure 1. A higher percentage of boys (80%)
passed the modified pull-up than girls (53%). Similarly, a higher percentage of
boys (57%) passed the push-up test than girls (22%). Results for Trial 2 were simi-
lar with a greater percentage of boys passing than girls.
Criterion-referenced test–retest reliability estimates are presented in Table 3.
The Pa statistics for the total sample, boys, and girls revealed that 97% of the partici-
pants were classified similarly on both trials of the push-up. This high level of classi-
fication agreement was found with 57% of boys and 22% of girls passing both
push-up trials. A high level of classification agreement was also found for the modi-
fied pull-up test with 95% of participants (93% of boys and 97% of girls) classified
TABLE 2
Descriptive Statistics for the Sample (Mean and Standard Deviation)
Total Sample (N = 62) Boys (n = 30) Girls (n = 32)
Variable M SD M SD M SD
Age (years) 11.4 0.9 11.6 0.8 11.2 0.9

Weight (kg) 50.6 14.3 51.8 13.4 49.7 15.2
Height (m) 1.53 0.10 1.54 0.09 1.51 0.11
BMI (kg · m2) 21.5 4.7 21.5 4.6 21.5 5.0
Push-up (Trial 1) 7.4 6.2 10.4 7.1 4.6 3.5
Push-up (Trial 2) 7.7 6.4 10.5 7.2 5.1 4.2
Modified pull-up (Trial 1) 6.9 4.8 9.5 4.7 4.4 3.6
Modified pull-up (Trial 2) 7.3 4.9 10.0 4.5 4.7 3.7
FIGURE 1 Percentage of participants who passed Trial 1 of the push-up and modified
pull-up tests.
similarly on both trials. Eighty percent of the boys and 50% of the girls passed both
trials of the modified pull-up.
Criterion-referenced equivalence reliability estimates between the push-up and
modified pull-up are also presented in Table 3. Values for Kq are low (≤.50), indi-
cating low classification agreement between the push-up and modified pull-up.
Only about 70% of the sample was classified the same on the two tests. These re-
sults were similar for boys and girls.
Norm-referenced test–retest reliability estimates are presented in Table 4.
Reliability estimates for the push-up were high for the total sample R = .99 (95%
CI = .99 to .99), boys R = .99 (95% CI = .99 to .99), and girls R = .97 (95% CI = .94
to .99). Table 2 presents means and standard deviations for the two push-up test
trials. The size of the difference between the two trials was small (ES = 0.05 for
the total sample). Test–retest reliability estimates for the modified pull-up were
also high for the total sample R = .99 (95% CI = .98 to .99), boys R = .98 (95% CI =
.96 to .99), and girls R = .98 (95% CI = .95 to .99). The difference between means
for the two modified pull-up test trials was small (ES = 0.08 for the total sample).
Typically, only one trial is used for fitness testing; therefore, reliability estimates
are provided for one trial of the push-up and one trial of the modified pull-up.
TABLE 3
Criterion-Referenced Test–Retest Reliability and Equivalence Reliability of the Push-Up and Modified Pull-Up
Total Sample Boys Girls
Test–Retest Pa Kq Pa Kq Pa Kq
Push-up (Trials 1 and 2) .97 .94 .97 .94 .97 .94

Modified pull-up (Trials 1 and 2) .95 .90 .93 .87 .97 .94
Total Sample Boys Girls
Equivalence Pa Kq Phi Pa Kq Phi Pa Kq Phi
Push-up vs. modified pull-up (Trial 1) .69 .39 .50 .70 .40 .40 .69 .38 .50
Push-up vs. modified pull-up (Trial 2) .71 .42 .52 .67 .33 .28 .75 .50 .58
Note. Pa = proportion of agreement; Kq = modified Kappa; Phi = Phi coefficient.

75
TABLE 4
Norm-Referenced Reliability of the Push-Up and Modified Pull-Up
Total Sample
(Intraclass R) Boys (Intraclass R) Girls (Intraclass R)
Test–Retest 2 Trials 1 Trial 2 Trials 1 Trial 2 Trials 1 Trial
Push-up (Trial 1 vs. Trial 2) .99 .98 .99 .99 .97 .94
Modified pull-up (Trial 1 .99 .97 .98 .96 .98 .95
vs. Trial 2)
Total Sample
Equivalence Reliability (Pearson r) Boys (Pearson r) Girls (Pearson r)
Push-up vs. modified .73 .64 .70

pull-up (Trial 1)
Push-up vs. modified .77 .69 .79
pull-up (Trial 2)
Note. Intraclass R was calculated from a one-way analysis of variance model.
These estimates were also consistently high. Reliability estimates for one trial of
the push-up were R = .98 (95% CI = .97 to .99) for the total sample, R = .99 (95%
CI = .98 to .99) for boys, and R = .94 (95% CI = .89 to .97) for girls. Reliability es-
timates for one trial of the modified pull-up were R = .97 (95% CI = .95 to .98) for
the total sample, R = .96 (95% CI = .92 to .98) for boys, and R = .95 (95% CI = .91
to .98) for girls.
Pearson correlations between the push-up and modified pull-up were moder-
ately high. Correlations between Trial 1 of the push-up and modified pull-up were
r = .73 (95% CI = .58 to .83) for the total sample, r = .64 (95% CI = .36 to .81) for
boys, and r = .70 (95% CI = .46 to .84) for girls. Correlations between Trial 2 of the
push-up and modified pull-up were r = .77 (95% CI = .65 to .81) for the total sam-
ple, r = .69 (95% CI = .43 to .84) for boys, and r = .79 (95% CI = .60 to .89) for
girls.
DISCUSSION
This study estimated the test–retest reliability and equivalence reliability of the
push-up and the modified pull-up tests from both criterion-referenced and
norm-referenced frameworks. Criterion-referenced reliability was examined using
standards provided by the FITNESSGRAM® (Cooper Institute for Aerobics Re-
search, 1999). Criterion-referenced test–retest reliability was high for both the
push-up and modified pull-up tests. For the push-up, 97% of boys and 97% of girls
were consistently classified from trial to trial. For the modified pull-up, 93% of
boys and 97% of girls were consistently classified from trial to trial. These high es-
timates of criterion-referenced test–retest reliability were found despite the fact
that the percentage of participants passing each test varied widely (ranging from
80% of boys passing the modified pull-up to only 22% of girls passing the push-up).
One reason for the high proportion of agreement from trial to trial is that the partici-
pants were instructed on the proper form for each test and were allowed adequate
practice before being tested.
Physical activity directors who adopt the FITNESSGRAM® can choose from
among four tests of upper body strength and endurance. Thus, the classification
agreement between the tests should be high. Few studies have examined the crite-
rion-referenced reliability of youth fitness tests of upper body strength and endur-
ance (Kollath et al., 1991; Rutherford & Corbin, 1994). In addition, our search of
the literature found no studies that examined the criterion-referenced equivalence
reliability of various tests of upper body strength and endurance. For equivalence
reliability in this study, about 70% of both the boys and the girls were classified
similarly by the push-up and modified pull-up. Low Kq (≤ .50) and phi coefficients
(≤ .58) suggest a low level of classification agreement between these two tests. Be-
cause these two tests may not classify participants similarly, however, feedback
provided to participants regarding their upper body strength and endurance might
be a function of the test used and not a function of the participants’ upper body
strength and endurance.
For both boys and girls, a higher percentage of participants passed the modified
pull-up test than the push-up test. For Trial 1, 80% of boys passed the modi-
fied-pull up test and 57% passed the push-up test. For Trial 1 for girls, 53% passed
the modified pull-up test and 22% passed the push-up test. Similar differences in
passing rates were found for Trial 2 of each test. Thus, the standards for the
push-up apparently are more difficult than the standards for the modified pull-up
for both boys and girls. This is not to suggest that one test is more appropriate than
the other, but rather to emphasize the need to examine closely and adjust passing
standards for these tests. Rutherford and Corbin (1994) examined the validity of
criterion-referenced standards of the push-up, pull-up, and flexed-arm hang tests
in college-age women using the contrasting groups method. They developed crite-
rion-referenced standards based on identified groups of trained and untrained par-
ticipants. They demonstrated the criterion-referenced standards were stable when
cross-validated and were able to differentiate between trained and untrained
groups relatively well. More work of this type needs to be conducted on children of
various ages.
Norm-referenced test–retest reliability estimates for both the push-up and modi-
fied pull-up were consistently high. The intraclass reliability coefficients of R ≥ .94
for both boys and girls were higher than previously reported (Cotten, 1990;
Engelman & Morrow, 1991; Kollath et al., 1994). In addition, the Kq value of .94 for
the push-up and .90 for the modified pull-up for the total sample suggests high levels
of consistency. These high relationships may be the result of the 4-week fit-
ness-training unit prior to testing. Baumgartner, Espinosa, and Montgomery (1995)
found that students had significantly better pull-up scores on the Baumgartner
pull-up test on retest days. Their study also included 4 weeks of training or practice
between pretests and posttests. They suggested that improvement occurred because
the students had more experience taking the test and had more practice. Participants
in this study had 4 weeks of practice before participating in the two upper body
strength and endurance tests. Payne and Isaacs (1999) encouraged the development
of physical fitness units of instruction, and they support the idea of physical fitness
units geared toward the value of engaging in healthy and active lifestyles. Many
physical activity directors only implement one of these upper body strength and en-
durance tests with limited practice. The results of our study suggest that practice
from a fitness unit may help improve reliability of push-up and modified pull-up
scores.
Literature on equivalence reliability between field tests of upper body strength
is scarce (Plowman & Corbin, 1994). Norm-referenced equivalence reliability es-
timates were moderately high in our study. The correlations between the push-up
and modified pull-up were r = .64 for boys and r = .70 for girls for the test Trial 1.
The criterion-referenced equivalence reliability estimates for the total sample (Pa
= .70, Kq = .40 for boys, and Pa = .69, Kq = .38 for girls) suggest a low and unac-
ceptable level of consistency between the push-up and modified pull-up test trials
for fifth- and sixth-grade students. Mahar et al. (1997) emphasized that if two tests
are designed to measure the same physical ability, participants should have similar
classifications on both tests. Zhu (1998) stated that if different tests are to be ex-
changeable, they must first be equated. If test standards are not equivalent to each
other across tests, then classification of participants as passing or failing may be
due to the test used instead of the underlying ability being assessed.
This is the first study to report criterion-referenced equivalence reliability esti-
mates between the push-up and modified pull-up tests in young children. Results
demonstrated that classification agreement between these two tests was low. Be-
cause the FITNESSGRAM® youth fitness test allows physical activity directors to
choose among four tests to measure upper body strength and endurance, the crite-
rion-referenced equivalence reliability of these tests should be examined. Test
equating (Zhu, 1998) should be used and the criterion-referenced standards should
be adjusted to provide for a high degree of classification agreement. Norm-refer-
enced and criterion-referenced test–retest reliability of the push-up and the modi-
fied pull-up were high. The moderate correlations between the push-up and
modified pull-up suggest that these tests are measuring different aspects of upper
body strength and endurance, which may limit the ability to find high levels of
classification agreement between the tests. Additional research on participants of
various ages and with other tests of upper body strength and endurance is needed.
ACKNOWLEDGMENTS
We would like to express our appreciation to Beverly Blalock for her assistance in
collecting data and Ben J. Saint Romain for constructing the modified pull-up
equipment. This study could not have been conducted without their dedication and
support. Thank you.
Data were collected at Middlesex Elementary School, Middlesex, North
Carolina.
REFERENCES
Baumgartner, T. A., Espinosa, D., & Montgomery, J. (1995). Improving pull-up scores. Journal of
Physical Education, Recreation and Dance, 66(6), 68–71.
Baumgartner, T. A., & Jackson, A. S. (1999). Measurement for evaluation in physical education and ex-
ercise science (6th ed.). Boston: WCB/McGraw-Hill.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Law-
rence Erlbaum Associates, Inc.
Cooper Institute for Aerobics Research. (1999). FITNESSGRAM test administration manual (2nd ed.).
Champaign, IL: Human Kinetics.
Cotten, D. J. (1990). An analysis of the NCYFS II modified pull-up test. Research Quarterly for Exer-
cise and Sport, 61, 272–274.
Engelman, M. E., & Morrow, J. R., Jr. (1991). Reliability and skinfold correlates for traditional and
modified pull-ups in children grades 3–5. Research Quarterly for Exercise and Sport, 62, 88–91.
Kollath, J. A., Safrit, M. J., Zhu, W., & Gao, L. (1991). Measurement errors in modified pull-ups testing.
Research Quarterly for Exercise and Sport, 62, 432–435.
Looney, M. A. (1989). Criterion-referenced measurement: Reliability. In M. J. Safrit & T. M. Wood
(Eds.), Measurement concepts in physical education and exercise science (pp. 137–152). Cham-
paign, IL: Human Kinetics.
Mahar, M. T., Rowe, D. A., Parker, C. R., Mahar, F. J., Dawson, D. M., & Holt, J. E. (1997). Crite-
rion-referenced and norm-referenced agreement between the mile run/walk and PACER. Measure-
ment in Physical Education and Exercise Science, 1, 245–258.
McManis, B. G., Baumgartner, T. A., & Wuest, D. A. (2000). Objectivity and reliability of the 90°
push-up test. Measurement in Physical Education and Exercise Science, 4, 57–67.
Morrow, J. R., Jr., Falls, H. B., & Kohl, H. W., III. (Eds.). (1994). The Prudential FITNESSGRAM tech-
nical reference manual. Dallas, TX: Cooper Institute for Aerobics Research.
Morrow, J. R., Jr., & Jackson, A. W. (1993). How “significant” is your reliability? Research Quarterly
for Exercise and Sport, 64, 352–355.
Morrow, J. R., Jr., Jackson, A. W., Disch, J. G., & Mood, D. P. (1995). Measurement and evaluation in
human performance. Champaign, IL: Human Kinetics.
Pate, R. R., Burgess, M. L., Woods, J. A., Ross, J. G., & Baumgartner, T. A. (1993). Validity of field tests
of upper body muscular strength. Research Quarterly for Exercise and Sport, 64, 17–24.
Pate, R. R., Ross, J. G., Baumgartner, T. A., & Sparks, R. E. (1987). The modified pull-up test. Journal
of Physical Education, Recreation and Dance, 58(9), 71–73.
Payne, V. G., & Isaacs, L. D. (1999). Human motor development: A lifespan approach (4th ed.). Moun-
tain View, CA: Mayfield.
Plowman, S. A., & Corbin, C. B. (1994). Muscular strength, endurance, and flexibility. In J. R. Morrow,
Jr., H. B. Falls, & H. W. Kohl, III. (Eds.), The Prudential FITNESSGRAM technical reference man-
ual. Dallas, TX: Cooper Institute for Aerobics Research.
President’s Council on Physical Fitness and Sports. (1993). The President’s Challenge Physical Fitness
Program packet. Washington, DC: Author.
Reiff, G. G., Dixon, W. R., Jacoby, D., Ye, G. X., Spain, C. G., & Hunsicker, P. A. (1986). The Presi-
dent’s Council on Physical Fitness and Sports 1985 National School Population Fitness Survey.
Ann Arbor: University of Michigan.
Ross, J. G., Dotson, C. O., Gilbert, G. G., & Katz, S. J. (1985). New standards for fitness measurement.
Journal of Physical Education, Recreation and Dance, 56(1), 62–66.
Ross, J. G., Pate, R. R., Delpy, L. A., Gold, R. S., & Svilar, M. (1987). New health-related fitness norms.
Journal of Physical Education, Recreation and Dance, 58(9), 66–70.
Rutherford, W. J., & Corbin, C. B. (1994). Validation of criterion-referenced standards for tests of arm
and shoulder girdle strength and endurance. Research Quarterly for Exercise and Sport, 65,
110–119.
U.S. Department of Health and Human Services. (1996). Physical activity and health: A report of the
Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Dis-
ease Control and Prevention, National Center for Chronic Disease Prevention and Health Promo-
tion.
U.S. Public Health Service. (1990). Healthy people 2000: National health promotion and disease objec-
tives. Washington, DC: U.S. Government Printing Office.
Zhu, W. (1998). Test equating: What, why, how? Research Quarterly for Exercise and Sport, 69, 11–23.
View publication stats

PUSH UP - Norm Referenced and Criterion-Referenced Reliability

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

PUSH UP - Norm Referenced and Criterion-Referenced Reliability

Diunggah oleh

Hak Cipta:

Format Tersedia

See

discussions, stats, and author profiles for this publication at:

Article in Measurement in Physical Education and Exercise Science · June

The user has requested enhancement of the downloaded file.

Key words: criterion-referenced reliability, norm-referenced reliability, push-up,

fied pull-ups) resulted in high consistency of classification (Proportion of agree-

Criterion-referenced reliability. The criterion-referenced standards from

Norm-referenced reliability. Norm-referenced test–retest reliability for the

Variable Age Boys Girls

Push-ups (number completed) 10 7 to 20 7 to 15

Total Sample (N = 62) Boys (n = 30) Girls (n = 32)

Age (years) 11.4 0.9 11.6 0.8 11.2 0.9

Total Sample Boys Girls

Push-up (Trials 1 and 2) .97 .94 .97 .94 .97 .94

Equivalence Pa Kq Phi Pa Kq Phi Pa Kq Phi

Note. Pa = proportion of agreement; Kq = modified Kappa; Phi = Phi coefficient.

Test–Retest 2 Trials 1 Trial 2 Trials 1 Trial 2 Trials 1 Trial

Push-up vs. modified .73 .64 .70

Note. Intraclass R was calculated from a one-way analysis of variance model.

Anda mungkin juga menyukai