Address for correspondence: Valerie Malabonga, Center for Applied Linguistics, 4646 40th Street
NW, Suite 200, Washington, DC 20016, USA; email: valerie@cal.org
© 2008 SAGE Publications (Los Angeles, London, New Delhi and Singapore) DOI:10.1177/0265532208094274
496 Development of a cognate awareness measure
Table 1 List of cognates, noncognates, and easy words and their frequencies* for
the operational version of the CAT
II Pilot study
Before using the CAT in full-scale studies, we piloted it with 100
Spanish-speaking ELLs in order to gather preliminary information
about its reliability and validity. We also collected feedback on the
test from the children and their teachers.
1 Participants
The pilot study participants were fourth and fifth graders from four
schools in low-income, predominantly Spanish-speaking neighbor-
hoods in a large mid-Atlantic city in the USA. Table 2 provides
demographic information on the children.
2 Measure
The pilot version of the CAT had three practice items and 61 test
items (30 cognates and 31 noncognates). The following samples
indicate the format of the pilot test:
1. initiate a) clean b) balance c) begin d) gain
2. strife a) plane b) choice c) king d) fight
3. infirm a) honest b) afraid c) confused d) sick
During test administration, the researcher wrote the practice items on
the board and reviewed them with the children. The children were
then instructed to work on their own using their test booklets.
500 Development of a cognate awareness measure
N %
Ethnicity
Latino/Hispanic 100 100%
Grade
Fourth (9 year olds) 69 69
Fifth (10 year olds) 31 31
Language spoken by child at home
Only Spanish 8 8
Mostly Spanish 5 5
Spanish and English 28 28
Mostly English 13 13
Only English 1 1
Missing 45 45
Language program in school
Spanish Dominant Transitional Bilingual 56 56
English Dominant 80–20 Bilingual 16 16
English Dominant Regular 28 28
3 Analysis
Analysis focused on evaluating the items in the pilot version of the
CAT in order to determine the final pool of items to be included in
the operational version. Following the definition of validity as ‘an
integrated evaluative judgment of the degree to which empirical evi-
dence and theoretical rationales support the adequacy and appropri-
ateness of inferences and actions based on test scores or other modes
of assessment’ (Messick, 1989, p. 13; italics in original), we exam-
ined two types of empirical evidence.
First, we looked at the set of all test items together, using the
Rasch model to determine whether the CAT was measuring a single
construct of vocabulary knowledge. Our assumption was that if all
the items fit the Rasch model, we could infer that both cognates and
noncognates were measuring a single construct related to English
vocabulary knowledge. Second, we examined the test’s construct
validity by investigating performance on the cognates and noncog-
nates separately.
We used WINSTEPS software (Linacre & Wright, 2000) to cali-
brate the difficulty of the items and the ability of test takers on a
common interval scale and to provide information about the test’s
properties, especially its reliability, scalability, and fit to the Rasch
model. We performed three separate calibrations. In the first, all 61
words were calibrated on a single logit scale. This calibration
Valerie Malabonga and Dorry M. Kenyon et al. 501
4 Results
a Map of children and items: Figure 1 shows the Rasch map of the
pilot study children and the pilot test items on a single scale. The
children’s ability covers a range of 3.88 logits, which is wider than
the range of 2.62 logits for item difficulty. The map also shows an
even spread of cognates and noncognates, with no major gaps except
at the lower end.
The map shows that, in general, the items on the CAT were spread
evenly along the scale, but the mean difficulty for test items (marked
‘M’ by the item names) was well above the mean of the children’s
Cognates (30 items) Noncognates (31 items) All words (61 items)
LOGITS
1.5 +
|
|
| fiend-NC
|T
1 +
|
. | stodgy–NC wily–NC forlorn–NC brevity–C casualty–NC
| malevolent–C epoch–C leery–NC
|S calumny–C kosher–NC allot–NC quagmire–NC
| frenzied–NC undermine–NC
| discard–NC infirm–C
. T| flee–NC
. | brittle–NC anterior–C
| pithy–NC hoist–NC strife–NC pun–NC drought–NC
clutch–NC impede–C edifice–C
0 .# +M nocturnal–C curative–C 6.initiate–C
.# | detain–C augment–C terminus–C amicable–C
.# | obligated–C faze–NC jest–NC haul–NC maladroit–NC
snug–NC navigate–C imitate–C pallid–C valor–C
.# S| tattered–NC profundity–C pensive–C rehearse–NC
## | jocose–C tranquil–C gritty–NC feasibility–NC
###### | odious–C
.# |S
.# |
.####### | adorn–C castigate–C
.## M| trustworthy–NC
21 .##### +
.### |T drowsy–NC
.## | accompany–C
## | matrimonial–C
### S| converse–C
# |
.# |
. |
|
# |
!2 T+
|
|
|
|
|
|
|
. |
|
!3 +
. |
|
|
|
–3.5 +
Less Able Children Easy Items
Each ‘#’ is two persons and ‘.’ is one person.
abilities (marked ‘M’ by the #s). This finding suggested that the
items were very difficult for the children and that a few less chal-
lenging items should be added to expand the range of the test and
motivate the children. The separate analyses performed on the cog-
nates and the noncognates also showed that the spread of both chil-
dren and items was generally even along the scale; however, both
subtests were quite difficult for the children.
c Analysis of fit to model: Fit of the items to the Rasch model was
examined through the infit and outfit mean square statistics.
Although Linacre (2007, p. 2) suggests that items with mean square
values of between. 5 and 1.5 are ‘productive of measurement,’ we
chose a more conservative approach, flagging as misfitting items
with mean square values greater than 1.3.
Our analysis indicated that three items in the pilot version of the
CAT were misfitting, 10 were problematic in terms of answer
options, and two were mischaracterized as noncognates. Likewise,
the children’s and the teachers’ feedback, as well as our observations
of the children, indicated that the test’s layout was difficult to follow.
Lastly, the teachers recommended clearer test instructions.
Based on the item analyses conducted in the pilot study, the fol-
lowing revisions were made to the CAT:
• Three misfitting items, ten items which had problematic options,
and two items that we erroneously classified as noncognates
(discard and frenzied) and their two cognate counterparts (aug-
ment and detain) were deleted from the original 61 words.
• For clarity, the wording of the responses for three items was
changed.
• Eight new cognate words with high frequencies in both English
and Spanish were added to the test: construction, idea, simple,
literature, modern, poet, production, and permit. These less chal-
lenging words, randomly inserted, were included to improve the
children’s motivation to complete the entire test.
• The instructions and the test layout were modified to make the
test more user-friendly.
504 Development of a cognate awareness measure
1 Participants
Participants in the study were 173 Spanish-speaking ELLs who were
participants in the larger transfer study. The students were fourth
graders in Success for All (SFA) reading programs in three urban
schools in predominantly Spanish-speaking neighborhoods in three
states in the USA. Some were being instructed only in English by
fourth grade, while others were still receiving some instruction in
Spanish. The first section of Table 4 presents demographic informa-
tion on the children.
2 Measures
The operationalized version of the CAT consisted of 52 items: 22 cog-
nates and 22 noncognates that were scored, and eight less challenging
items that were added as a result of the pilot study (see Table 1).
Figure 2 illustrates the new test format.
The WLPB-R/PV was used to measure the children’s English and
Spanish vocabulary knowledge. In the WLPB-R/PV, a child sees a
picture and is asked to name the object(s) or action(s) in the picture.
The WLPB-R is one of very few vocabulary tests that have both
English and Spanish versions.
Table 4 Student demographic information for Study 1 (fourth graders) and Study
2 (fifth graders)
“For each item, you will read the bolded word and think about what it
means. After you have thought about the bolded word and what it means, you
are supposed to pick the one word that is most closely related to the meaning
of the bolded word.”
Cognate Noncognate
converse jest
O speak with someone O defend
O fight with someone O bend
O include someone O joke
O leave out someone O observe
eight less challenging items were excluded from the second and
third calibration. The displacement values were within the normal
range: !.11 to .10 for cognates and !.02 to .02 for noncognates.
The first section of Table 5 shows the means, standard deviations
and ranges of the fourth graders’ scaled scores for cognates, noncog-
nates and the entire test. (Note: The table shows N " 170 because
tests with all correct answers and no correct answers were discarded
following standard Rasch analysis procedure.)
506 Development of a cognate awareness measure
Table 5 Means, standard deviations and ranges of cognates, noncognates and all
words (Year 1 [fourth graders] and Year 2 [fifth graders])*
Cognates (22 items) Noncognates (22 items) All words (52 items)
Cognates (22 items) Noncognates (22 items) All words (52 items)
b Map of children and items: Figure 3 shows the Rasch map for the
operational CAT, with children and items on a single scale. The map
shows an even spread of cognates and noncognates, similar to the
pilot results. The map also shows that the eight less challenging
words were all below the mean difficulty for the items as a whole.
The difference between mean item difficulty and mean student abil-
ity was reduced from one logit in the pilot to just half a logit in the
first study, demonstrating that the test had become ‘easier’.
The children’s ability range of 2.98 logits was narrower than the
range of 3.60 logits for item difficulty. This implies that the test can
be used for children with a wider range of abilities than this fourth
grade group. The mean difficulty of the items was also slightly higher
than the mean ability of the children, indicating that some items were
still difficult for some fourth graders, but that such items would be
appropriate for older children or fourth graders with higher abilities.
This finding was important since the CAT was intended to be used
with both fourth and fifth graders.
c Reliability: The estimated reliability was a moderate .70 for the
entire test and .65 for the cognates, about the same as on the pilot.
Reliability for the noncognates was .37, a decrease from the pilot.
Since reliability is influenced by the distribution of ability in the group
Valerie Malabonga and Dorry M. Kenyon et al. 507
The cut-off score used for the WLPB-R/PV was 80. That is,
a child with a standard score lower than 80 on the WLPB-R/PV
measure was categorized as low for that language, whereas a child
with a standard score equal to or greater than 80 was categorized as
high. The cut-off score was chosen based on a scatter plot in order to
have a meaningful number of students in each of the four quadrants.
(Note: for WLPB-R/PV, the mean obtained for the norming sample
of fourth graders was 100 and the standard deviation was 15. Thus,
our cut-off score is one and a third standard deviation points lower
than the mean of a typical monolingual fourth grader.)
To compare children’s cognate and noncognate vocabulary meas-
ures on the CAT within each study, and from one study to the next, the
logit measures were converted to scaled scores with a mean of 100 and
20 units to a logit. We then compared the mean performance on the
cognate and noncognate items by the four groups. We hypothesized
that if the CAT was tapping into the construct of cognate awareness,
then for cognates, knowledge of Spanish vocabulary would play an
important role, but not necessarily knowledge of English vocabulary.
We further hypothesized that for noncognates, the opposite would be
true: knowledge of English vocabulary would play an important role,
but not necessarily knowledge of Spanish vocabulary.
Table 6 shows the means and standard deviations of each sub-
group’s scores on cognates and noncognates.
To check for statistical differences between the mean perform-
ances of the four subgroups on the cognates, we conducted a
nonparametric Kruskal Wallis test for independent samples because
the number of children in each subgroup was unequal. The overall
Kruskal Wallis chi square for cognates was significant (!2 (3) "
25.82, p # .01). Results of individual tests are presented in Table 7.
The upper portion of the table shows that the subgroups with high
Spanish consistently outperformed the subgroups with low Spanish
on cognates, thus demonstrating their cognate vocabulary knowl-
edge, whereas subgroups with high English did not necessarily
perform statistically significantly better than subgroups with low
English.
The lower section of Table 7 shows the mean and standard devi-
ations of the four subgroups scores on the noncognates. The overall
Kruskal Wallis chi square was again significant (!2 (3) " 15.69,
p # .01). The table shows that subgroups with high English consis-
tently outperformed subgroups with low English on noncognates.
Furthermore, for both cognates and noncognates, the high Spanish,
high English subgroup consistently performed better than other
510 Development of a cognate awareness measure
Table 7 Means and Kruskal Wallis chi square for paired subgroups (Study 1, fourth
graders)
Cognates
High Spanish vs. Low Spanish subgroups
HSHE (18) vs. LSHE (29) 33.50 18.10 $2 (1) # 14.08 , p ! .01
HSHE (18) vs. LSLE (12) 19.97 8.79 $2 (1) # 11.69, p ! .01
HSLE (55) vs. LSHE (29) 49.45 29.31 $2 (1) # 13.04, p ! .01
HSLE (55) vs. LSLE (12) 37.37 18.54 $2 (1) # 9.27, p ! .01
High Spanish vs. High Spanish subgroup
HSHE (18) vs. HSLE (55) 43.67 34.82 $2 (1) # 2.38, NS
Low Spanish vs. Low Spanish subgroup
LSHE (29) vs. LSLE (12) 21.97 18.67 $2 (1) # .65, NS
Noncognates
High English vs. Low English subgroups
HSHE (18) vs. LSLE (12) 17.72 12.17 $2 (1) # 2.89, NS
LSHE (29) vs. HSLE (55) 52.97 36.98 $2 (1) # 8.28, p ! .01
HSHE (18) vs. HSLE (55) 51.31 32.32 $2 (1) # 11.02, p ! .01
LSHE (29) vs. LSLE (12) 22.29 17.88 $2 (1) # 1.17, NS
High English vs. High English subgroup
HSHE (18) vs. LSHE (29) 27.47 21.84 $2 (1) # 1.89, NS
Low English vs. Low English subgroup
HSLE (55) vs. LSLE (12) 32.89 39.08 $2 (1) # 1.01, NS
Valerie Malabonga and Dorry M. Kenyon et al. 511
subgroups, and this difference was usually (but not always) statisti-
cally significant. This finding confirms the CAT as a measure of
vocabulary knowledge.
In summary, our results consistently show that high Spanish
vocabulary knowledge, as measured by the WLPB-R/PV, was help-
ful in predicting high vocabulary scores on the CAT’s cognate items,
but high English knowledge was not. On the other hand, high English
vocabulary knowledge, as measured by the WLPB-R/PV, was a good
predictor of high vocabulary scores on noncognate items, whereas
high Spanish vocabulary knowledge was not. These findings provide
support for the claim that cognate items on the CAT appear to tap
into some level of cognate awareness for students with high Spanish
vocabulary knowledge. Lastly, children with high scores on both the
Spanish and English WLPB-R/PV consistently performed at the high-
est levels on both cognates and noncognates, providing support for the
CAT as a vocabulary measure.
1 Results
a Scalability: Twelve items (seven cognates and five noncognates)
showed noticeable displacement when their difficulty values were
anchored to the item difficulty values from the first calibration. The
remaining 32 items did not show any major displacements.
The second section of Table 5 shows the means, standard devi-
ations and ranges of the fifth graders’ scaled scores for cognates,
noncognates and the entire test. Table 5 clearly indicates that the
children’s knowledge of English vocabulary had increased, particu-
larly their cognate scores (from 93.89 as fourth graders to 102.09 as
fifth graders).
512 Development of a cognate awareness measure
Figure 4 Rasch map for Study 2 (operational version with fifth graders)
Valerie Malabonga and Dorry M. Kenyon et al. 513
b Map of children and items: Figure 4 shows the Rasch map for
the CAT administered to the fifth graders. Unlike Study 1, in which the
mean difficulty of the items was higher than the mean ability of thechil-
dren, in this study the mean examinee ability was slightly above the
mean item difficulty. Likewise, the range of 3.75 logits for the ability
scores is about the same as the range of 3.60 logits for the item dif-
ficulty scores. This finding indicates that the items in the operational
version of the CAT are also appropriate for these fifth graders. It like-
wise indicates that the average vocabulary ability of the children has
improved from fourth to fifth grade.
High English M " 90.48 (N " 24) M " 115.42 (N " 28)
(Picture Vocabulary %85) SD " 13.98 SD " 17.50
Low English M " 98.99 (N " 26) M " 105.77 (N " 54)
(Picture Vocabulary #85) SD " 17.71 SD " 24.36
Means and standard deviations of fifth graders’ scaled scores on noncognates
High English M " 99.47 (N " 24) M " 103.90 (N " 28)
(Picture Vocabulary %85) SD " 18.86 SD " 14.45
Low English M " 93.38 (N " 26) M " 93.17 (N " 54)
(Picture Vocabulary #85) SD " 19.50 SD " 13.25
Table 9 Means and Kruskal Wallis chi-square for paired subgroups (Study 2, fifth
graders)
Cognates
High Spanish vs. Low Spanish subgroups
HSHE (28) vs. LSHE (24) 35.55 15.94 $2 (1) ! 21.74, p % .01
HSHE (28) vs. LSLE (26) 33.88 20.63 $2 (1) ! 9.62, p % .01
HSLE (54) vs. LSHE (24) 43.87 29.67 $2 (1) ! 6.56, p % .01
HSLE (54) vs. LSLE (26) 42.28 36.81 $2 (1) ! .98, NS
High Spanish vs. High Spanish subgroup
HSHE (28) vs. HSLE (54) 49.71 37.24 $2 (1) ! 5.07, NS
Low Spanish vs. Low Spanish subgroup
LSHE (24)vs. LSLE (26) 21.77 28.94 $2 (1) ! 3.06, NS
Noncognates
High English vs. Low English subgroups
HSHE (28) vs. LSLE (26) 31.27 23.44 $2 (1) ! 3.36, NS
LSHE (24) vs. HSLE (54) 44.58 37.24 $2 (1) ! 1.76, NS
HSHE (28) vs. HSLE (54) 52.50 35.80 $2 (1) ! 9.13, p % .01
LSHE (24) vs. LSLE (26) 26.63 24.46 $2 (1) ! .28, NS
High English vs. High English subgroup
HSHE (28) vs. LSHE (24) 28.23 24.48 $2 (1) ! .80, NS
Low English vs. Low English subgroup
HSLE (54) vs. LSLE (26) 38.66 44.33 $2 (1) ! 1.05, NS
V Discussion
The purpose of this study was to provide empirical evidence (Messick,
1989) for the claim that scores on the cognate subtest of the CAT are
sensitive to awareness of cognates in Spanish-speaking ELL children
and that scores on the test as a whole are related to first and second
vocabulary knowledge. Our findings apparently demonstrate this.
The reliability of both the cognate subtest and the entire test improved
from the pilot to the operational version, and the internal reliabilities
of the cognate subtest and the entire test were consistent on two test-
ing occasions. The internal reliability of the noncognate subtest also
improved from one testing occasion to the next.
In comparing the children’s scores on the cognate and noncognate
subtests, we found that the CAT cognate items appear to tap into a
construct of cognate awareness. Higher scores on the cognate items
516 Development of a cognate awareness measure
Acknowledgements
The Transfer of reading skills in bilingual children study was funded
by Grant No. 5-P01-HD39530 from the National Institute for Child
Health and Human Development and the Institute of Education
Sciences of the US Department of Education to the Center for Applied
Linguistics (CAL). We thank the CAL staff and the Language Testing
reviewers for their helpful comments.
VI References
Anderson, R.C., & Freebody, P. (1983). Reading comprehension and the
assessment and acquisition of word knowledge. In B. Hudson (Ed.),
Advances in reading/language research (pp. 231–256). Greenwich, CT:
JAI Press.
Anthony, E. (1954). The teaching of cognates. Language Learning, 79–82.
August, D., Carlo, M., & Calderón, M. (2005). Development of literacy in
Spanish-speaking English-language learners: Findings from longitudinal
study of elementary school children. Perspectives, 31(2), 17–19.
August, D., Carlo, M., Dressler, C., & Snow, C. (2005). The critical role of
vocabulary development for English language learners. Learning
Disabilities Research & Practice, 20(1), 50–57.
August, D., & Hakuta, K. (1997). Improving schooling for language-minority
children: A research agenda. Washington, DC: National Academy Press.
August, D., & Shanahan, T. (Eds.). (2006). Developing literacy in second-
language learners. Report of the National Literacy Panel on Language-
Minority Children and Youth. Mahwah, NJ: Lawrence Erlbaum.
518 Development of a cognate awareness measure