Anda di halaman 1dari 12

System 28 (2000) 1930

www.elsevier.com/locate/system

Lex30: an improved method of assessing


productive vocabulary in an L2
Paul Meara*, Tess Fitzpatrick
Centre for Applied Language Studies, University of Wales Swansea, Singleton Park, Swansea SA2 8PP, UK

Received 28 January 1999; received in revised form 3 June 1999; accepted 21 June 1999

Abstract
This paper describes an easy-to-administer test of productive vocabulary. The test requires
Ss to produce a set of word association responses to a small set of stimulus words. The
stimulus words are chosen so that with native speaker Ss they typically generate a wide range
of responses, and a high proportion of low frequency responses. The paper argues that a test
of this sort might tap the extent of non-native speakers' productive vocabulary more eec-
tively than some other tests in current use. # 2000 Elsevier Science Ltd. All rights reserved.
Keywords: Vocabulary acquisition; Second language acquisition; Word associations

1. Introduction

This paper describes a tool which we believe can be used to make straightforward
assessments of the productive vocabulary of non-native speakers of English. The
data reported here are preliminary in the sense that we are not putting forward a
well-developed and properly validated testing instrument. Rather, we are trying to
address a complex `chicken and egg' situation which is causing something of a
blockage in the eld of vocabulary research. This blockage arises from the fact that
there are no well-established and easy-to-use tests of productive lexical skills.
The nearest thing we have to a useful tool in this area is Laufer and Nation's tests
(Laufer and Nation 1995, 1999). We think these are problematical for reasons which
are explained in more detail below. However, until we have some kind of test
which might be interpreted, however loosely, as an index of productive vocabulary,

* Corresponding author. Tel.: +44-1792-295-391; fax:+44-1792-295-641.


E-mail address: p.m.meara@swan.ac.uk (P. Meara).

0346-251X/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved.
PII: S0346-251X(99)00058-5
20 P. Meara, T. Fitzpatrick / System 28 (2000) 1930

it is unlikely that we will be able to make very much headway in this area. The tools
described here, then, are intended as a rst step in this direction. Our aim has been
to develop a methodology which we think might be honed into something more
formal. This paper rst describes the methodology that we have developed, and then
shows how the methodology could be used to make interesting comparisons between
productive and receptive vocabulary in L2 learners.
Successful L2 language learners are avid collectors of words, and tend to measure
their own success according to the number of words they know. Current teaching
materials and methodologies exploit and encourage this. The New Cambridge Eng-
lish Course, for example, proudly claims that, ``Students will learn 900 or more
common words and expressions during level 1 of the course'' (Swan and Walter,
1990, p. 5). The communicative language teaching techniques and comprehension-
based teaching methodologies of the last two decades also attach more importance
to vocabulary acquisition than did, for example, the grammar translation and
audio-lingual approaches which dominated pre-1970 language teaching (Nunan,
1995; Lightbown et al., 1998).
In most practical contexts it is clear that communicative eectiveness is achieved
more successfully by learners with a larger vocabulary than by learners with a more
detailed command of a smaller one. It is not surprising, then, that measurements of
vocabulary size have been shown to correlate positively with prociency levels in
reading (Anderson and Freebody, 1981) and writing (Engber, 1995), and in general
language prociency (Meara and Jones, 1988). In practice, however, most claims of
this sort have relied on measures of passive, receptive vocabulary knowledge, since
it has been dicult to measure control of productive vocabulary eectively. The
implicit assumption here is that active vocabulary knowledge can reasonably be
extrapolated from measures of receptive knowledge. This assumption is not an
implausible one. Few researchers would dispute that receptive vocabulary is prob-
ably larger than productive vocabulary, and that some level of receptive knowledge
of a word must exist in order for the word to be produced. Nonetheless, one could
imagine situations where the relationship between active and passive vocabulary
knowledge might not be straightforward, and for this reason it would be useful to
have an independent measure of active vocabulary.
Unfortunately, it is much more dicult to assess productive vocabulary knowledge
than it is to assess receptive vocabulary knowledge. The main reason for this is that
the vocabulary produced by a learner, whether in written or spoken form, tends to be
so context-specic that it is dicult to calculate from any small sample the true size
or range of the learner's productive vocabulary. It is also dicult to devise simple
tasks which produce the large quantities of vocabulary that are necessary to make
reasonable estimates. There are two principal methods of estimating productive
vocabulary currently in use, but neither of these has fully resolved these problems.
Controlled productive vocabulary tests prompt subjects to produce predetermined
target words. Testees are given a sentence context, a denition, and/or the beginning
of the target word, e.g.:

The book covers a series of isolated epis____ from history.


P. Meara, T. Fitzpatrick / System 28 (2000) 1930 21

and are required to complete the missing word in this case, episodes (Nation,
1983; Laufer and Nation, 1999). Free productive vocabulary tests such as Laufer
and Nation's (1995) lexical frequency proling tests analyse a written or spoken
discourse by the subject and categorise the vocabulary used in terms of frequent, less
frequent, and infrequent words. The higher the percentage count of infrequent
words, the larger the subject's productive vocabulary is estimated to be.
There are problems inherent in these two types of test. The controlled productive
vocabulary tests are eective mainly at low levels; when, for example, testees are
expected to have a limited vocabulary size, then a high proportion of these words
can be tested. The controlled productive vocabulary test used in Laufer and Nation
(1999) attempts to elicit 18 target words from each of ve word frequency bands:
2000, 3000, 5000, University Word List and 10,000. Although this approach seems
to be eective at lower levels, it must be dicult to extrapolate about the size of a
testee's productive lexicon beyond a relatively small vocabulary. At the 10,000 level,
we are in eect testing 18 words from a pool of several thousand words, and using
this to draw conclusions about the testee's knowledge of all the other words in this
pool. Suppose, for example, that we test the word fragrance with an item like:

the fra____ of the owers lled the room.

This item treads a ne line between receptive and productive skills: the production
of the target word is dependent on receptive understanding of the surrounding con-
text words. Additionally, it is possible that our subjects do not know fragrance, but
do know scent, aroma, perfume,. . ., similarly infrequent words that could easily
have tted the slot but for the (helpful?) hint. The point is that this kind of test item
can easily identify what the testees do not know, but it is rather less successful at
identifying the full extent of what they do know. In any case, if we are testing a
vocabulary of any size, say three or four thousand words, it would be impossibly
dicult in practice to devise a comprehensive set of items large enough to provide
the sort of coverage that we would need to get reliable estimates of productive
vocabulary.
The free productive vocabulary tests are problematic too. They are context-
limited, although in many cases the eects of this are minimised by using a broad
subject base (e.g. essays discussing a moral dilemma; cf. Laufer and Nation, 1995).
In most cases, however, it is unclear that the material these tasks elicit genuinely
encourages testees to `display' their vocabulary in the way that a test of productive
vocabulary would require. In addition to this, free productive vocabulary tests are
not a cost-eective way of eliciting vocabulary: most text even text generated by
uent native speakers is predominantly made up from a small set of highly fre-
quent words. A huge amount of text is needed to generate more than a handful of
infrequent words, and it is often dicult to elicit texts of this length from non-native
speakers. Laufer and Nation (1995), for example, reported that they needed to elicit
two 300-word essays from their testees in order to obtain stable vocabulary size
estimates. This required two hours of class time, a gure which would be prohibi-
tive, except in special circumstances.
22 P. Meara, T. Fitzpatrick / System 28 (2000) 1930

One supercially attractive alternative to continuous text as a source of productive


vocabulary is the spew test (Palmberg, 1987; Waring, 1999). In spew tests, subjects
are simply asked to produce words which share a common feature, e.g. words
beginning with B. In our view, research using spew tests has not lived up to its promise,
however. There are major problems over standardisation of scoring which have not
been addressed, and although we think there is some potential in the method, we think
spew tests need a lot more developmental work before they can be used reliably.
Clearly, then, there is a need for a cost-eective and ecient way of eliciting data
from testees which can give us enough material to make a rough estimate about their
productive vocabulary skills. The rest of this paper describes a new productive
vocabulary test which has been designed with these criteria in mind, and addresses
the practical problems we have discussed. The test generates rich vocabulary output
from the testees, but it is easily administered, takes only a short time to complete,
and can be scored automatically using a computer program. We believe that it
therefore has the potential to be developed into a practical and eective research tool.

2. Lex30

This section describes Lex30, and discusses the sorts of data it generates and the
analysis that we apply to this data to make estimates about the productive vocabu-
lary of the testees.

2.1. The test format

The Lex30 task is basically a word association task, in which testees are presented
with a list of stimulus words, and required to produce responses to these stimuli.
There is no predetermined set of response target words for the subject to produce,
and in this way, Lex30 resembles a free productive task. However, the stimulus
words tend to impose some constraints on the responses, and Lex30 thus shares
some of the advantages of context-limited productive tests. Word association tasks
typically elicit vocabulary which is more varied and less constrained by context than
free production tasks.
The test consists of 30 stimulus words, which meet the following criteria:

1. All the stimulus words are highly frequent In our experiment, the words
were taken from Nation's rst 1000 wordlist (Nation, 1984), i.e. they are words
which even a fairly low-level learner would be expected to recognise. This is a
deliberate choice, in order to make it possible to use the test with learners
across a wide range of prociency levels.
2. None of the stimulus words typically elicits a single, dominant primary
response The formal criteria that we adopted here was that the most fre-
quent response to the stimulus words, as reported in the Edinburgh Associative
Thesaurus (Kiss et al., 1973) should not exceed 25% of the reported responses.
In this way, we avoided stimulus words like BLACK or DOG, which typically
P. Meara, T. Fitzpatrick / System 28 (2000) 1930 23

elicit a very narrow range of responses, and selected stimulus words which
typically generate a wide variety of dierent responses.
3. Each of the stimulus words typically generates responses which are not com-
mon words The formal criterion here was that at least half of the most
common responses given by native speakers were not included in Nation's rst
1000 word list (Nation, 1984). In this way, the stimulus words give the testee a
reasonable opportunity to generate a wide range of response words.

2.2. Subjects

A group of 46 adult learners of English as a foreign language were used as test


subjects. These people were from a variety of L1 backgrounds ranging from Arabic
to Icelandic. Their class teachers rated them from high-elementary level to pro-
ciency level.

2.3. Method

The testees were asked to write a series of response words (at least three if possible)
for each stimulus word, using free word association (an example was worked through
with each class before the test). Stimulus words were presented one at a time, and
testees had 30 s to respond to each cue, after which the administrator called the next
number; the entire test therefore took 15 min to complete. For an example of a com-
pleted test, see Appendix A. The testees also completed a standard yes/no test
(Meara and Jones, 1990). Both tests were completed within the same week.

2.4. Scoring

In order to score the test, each testee's responses (approximately 90 per subject)
were typed into a machine-readable text le. The stimulus words are discarded for
the purpose of the analysis. Each of the responses was lemmatised so that inec-
tional suxes (plural forms, past tenses, comparatives, etc.) and frequent regular
derivational axes (-able, -ly, etc.) were counted as examples of base-forms of
these words. Words falling outside these levels were not lemmatised and were
treated as separate words. For a full account of criteria used, see Appendix B. This
sux list corresponds to levels 2 and 3 of Bauer and Nation's ``Word Families''
(Bauer and Nation, 1993). Once the stimulus words have been discarded, we are left
with a short text generated by each testee, which typically consists of about 90 dif-
ferent words.
Each testee's text is then processed using a program similar to Nation's Vocal
Prole (Heatley and Nation, 1998). The program reports the frequency level of each
word in the list, and produces a report prole for that testee. Table 1 illustrates a
typical results prole. Level 0 words (high frequency structure words, proper names
and numbers) and Level 1 words (the 1000 most frequent content words in English)
score zero points. Any response which falls outside these two categories scores one
point, up to a maximum of 90. In the example given, the score was (10+40)=50.
24 P. Meara, T. Fitzpatrick / System 28 (2000) 1930

Table 1
Typical prole generated by Lex30

Level 0 Level 1 Level 2 Level 3 and beyond

Sno: a1 4 49 10 40

3. Results

The results of the productive vocabulary test, Lex30, can be seen in Table 2. Not
surprisingly, the number of structure words produced is low. Native speaker word
association tests (Postman and Keppel, 1970; Kiss et al., 1973) also produce mostly
content words. Most of the words produced by subjects fall into Nation's ``rst
thousand'' category (Nation, 1984). Analysis of completed tests shows that the rst
response to a prompt word was usually a frequent word; the second, third and
fourth responses were more likely to be less frequent words. About a third of the
responses, on average, fell outside this highly frequent set of words, and some testees
produced very large numbers of words outside this category (Fig. 1).
The Lex30 scores were also compared with the results of the receptive yes/no
vocabulary test. The maximum score on this test is 10,000 words: two subjects
scored this maximum score. Mean scores on a standard yes/no test are: mean, 5089;
SD, 2803.
Fig. 2 shows the relationship between testees' scores on the two tests. The corre-
lation between these two scores was 0.841 ( p<0.01). This indicates that subjects
with a large receptive vocabulary also tended to produce a relatively high number of
infrequent words in the Lex30 test, and that scores on one of the tests can largely be
predicted from the other. The high correlation suggests that testees' productive
vocabulary is at least partly predictable from their receptive vocabulary as measured
by EVST. However, a closer examination of the data suggests that this interpreta-
tion may not be the best way of approaching these data.
Fig. 2 seems to suggest that some testees have scores which lie relatively far from
the regression line. Testees whose scores lie below the regression line are those with a
relatively higher receptive vocabulary in relation to their productive vocabulary, and
those whose scores lie above the regression line have a relatively higher productive
vocabulary. The graph suggests that the more procient subjects become, the larger
their receptive vocabulary is in relation to their productive vocabulary. This appears

Table 2
Mean prole for Lex30

Level 0 Level 1 Level 2 Level 3+ Total words Lex30

Mean 3.7 59.3 7.8 20.8 91.6 28.9


SD 3.6 13.9 3.6 11.4 24.2 13.9
P. Meara, T. Fitzpatrick / System 28 (2000) 1930 25

Fig. 1. Distribution of Lex30 scores.

Fig. 2. Comparison of yes/no test scores and Lex30 scores.


26 P. Meara, T. Fitzpatrick / System 28 (2000) 1930

to lend support to Laufer's (1998, p. 267) observation that, ``an increase in one's
passive vocabulary will, on the one hand, lead to an increase in one's controlled
active vocabulary, but at the same time lead to a larger gap between the two.''
Although Laufer's comment was mainly concerned with the vocabularies of 10th
and 11th grade learners of English, it might be more generally applicable.

4. Discussion

The main purpose of the work described in this paper has been to examine the
performance of a simple task that might serve as a practical index of productive
vocabulary. The results reported above suggest that Lex30 might be modestly suc-
cessful in this regard. The fact that the Lex30 scores relate closely to scores on a test
of passive recognition vocabulary suggests that Lex30 is sensitive to gross dier-
ences in vocabulary knowledge. More importantly, however, where the Lex30 scores
deviate from a close t with the EVST scores, these mistting cases seem to fall into
plausible patterns. Taken together, these ndings suggest that Lex30 might form the
basis of a useful index of productive vocabulary.
The test results were also surprisingly stable. In order to rule out the possibility
that testees were producing infrequent words randomly (i.e. for some cue words but
not others), we used a split-half test to check results for internal consistency. This
produced a correlation of 0.84 ( p<0.01, n=46), indicating that the test has a high
level of internal consistency. This indicates that testees produced infrequent words if
they were able to, regardless of the cue word used.
The main practical advantage of Lex30 is that it is extremely easy to administer,
and requires very little time to complete. The version reported here takes a mere 15
min for the testee, which means that Lex30 can easily be administered as part of a
larger test battery. We are currently developing a computerised version of Lex30,
where this timing element will be fully automated.
Earlier, we criticised other attempts to measure productive vocabulary on the
grounds that they often constrained the testee's vocabulary choice very tightly, and
on the grounds that they generated large amounts of data that threw very little
direct light on the extent of the testee's productive vocabulary. Lex30 seems to
address both of these issues eectively. The use of single word stimuli means that a
large number of vocabulary areas are opened up very economically. However, the
`texts' that the testees generate tend to be lexically rich compared to texts generated
by more traditional elicitation methods, and this means that almost every word in
the text gives us some useful information. The lenient scoring method adopted for
Lex30 basically any slightly unusual word produced by the testee counts towards
their score means that testees are given credit at every possible opportunity. This
contrasts sharply with the scoring practices typically used in more strictly controlled
productive tests, where only the `correct' response is counted. In Lex30, the stim-
ulus word POTATO might cause a medical student to respond with CARBO-
HYDRATE, and a waiter to respond with MASHED. Both responses are `unusual'
in the sense that we are using that term, and so both are awarded a point. In this
P. Meara, T. Fitzpatrick / System 28 (2000) 1930 27

way, we do not penalise students whose experience of words is inuenced by special


circumstances or special experience.
Lex30 also appears to have some potential as a diagnostic tool. Our results suggest
that for most testees, the size of their productive vocabulary is broadly proportionate
to the size of their receptive vocabulary. However, Fig. 2 suggests that some testees
do not t this expected pattern. Particularly interesting are four outlying cases in Fig.
3 (cases 7, 18, 29 and 31) who appear to have productive vocabulary scores that are
considerably larger than we would expect from their receptive scores. Conversely,
cases 23, 33, 36, 40 and 45 have a larger receptive vocabulary than might be expected
from their productive vocabulary score. Interestingly, case 23 stated during the test
that she could recognise many more words than she could use, ``because I read a lot
of scientic journals, but I don't often get to speak English.'' Whatever the cause of
these imbalances, it ought to be possible to develop specic training programs
designed to make up deciencies of this sort, once they have been identied.
Clearly, this line of argument is not straightforward, and the assumptions behind
it need to be examined in rather more detail. It is easy to imagine cases where a
severe imbalance between receptive and productive vocabulary might be quite
acceptable, or even normal: L2 learners have diverse aims and needs, and it would
be wrong to expect all learners to t into a single, oversimplied model. Nonetheless,
the data reported here suggest that there might be some basic patterns in the devel-
opment of L2 vocabulary, and that a tool like Lex30 could help to tease these
patterns out.

Fig. 3. Comparison of yes/no test scores and Lex30 scores: cases deserving special discussion.
28 P. Meara, T. Fitzpatrick / System 28 (2000) 1930

5. Conclusion

This study has examined a test of productive vocabulary which has a number of
practical administrative advantages over the tests currently in use. The data reported
suggested that our test has considerable potential as a `quick and dirty' productive
test, that might be used alongside other tests as part of a vocabulary test battery. It
seems to correlate highly with a test of receptive vocabulary, but also to have con-
siderable potential as a diagnostic tool for identifying cases where vocabulary
development appears to be abnormal or skewed. We are currently working on a
computerised version of Lex30, and a preliminary version of this test is available
from the authors on request. Future versions of Lex30 will be normed against native
speaker behaviour, and we also intend to examine whether the performance of the
test can be improved by a more careful choice of stimulus words. There are, of
course, a number of outstanding issues concerning the reliability and validity of the
Lex30 methodology, but we hope that this preliminary account of our current
research will stimulate further debate in this important area of research.

Appendix A. Sample data: completed Lex30 test

1 attack war, castle, guns, armour


2 board plane, wood, airport, boarding pass
3 close lock, avenue, nish, end
4 cloth material, table, design
5 dig bury, spade, garden, soil, earth, digger
6 dirty disgusting, clean, grubby, soiled
7 disease infection, hospital, doctor, health
8 experience adventure, travel, terrible
9 fruit apple, vegetable, pie
10 furniture table, chair, bed
11 habit smoking, singing, nagging
12 hold grip, hang on, cling
13 hope expect, optimistic, pessimistic
14 kick football, ground, goal, footballer
15 map country, roads, way, location
16 obey disobey, children, mum and dad, school rules
17 pot kitchen, vegetables, cook, roast
18 potato salad, roast, boiled, baked, chips
19 real true, sincere, really
20 rest pause, sleep, music
21 rice pudding, fried, pasta
22 science technical, physics, chemistry
23 seat bench, sit, sofa
24 spell grammar, test, bell
P. Meara, T. Fitzpatrick / System 28 (2000) 1930 29

25 substance material, chemical, poisonous


26 stupid dumb, silly, brains
27 television tv, cupboard, video, armchair, relax
28 tooth ache, dentist, drill, lling, injection
29 trade commerce, bank, exchange, money
30 window house, glass, broken, pane

Appendix B. Lemmatisation criteria

Words were lemmatised according to the criteria for level 2 and 3 axes
described in Bauer and Nation (1993). Words with axes included in the lists below
were treated as instances of their base lemmas, and scored accordingly. Words with
axes that do not appear in the lists were not lemmatised, and were treated as
separate words. Thus, UNHAPPINESS contains two level 3 axes, UN- and
-NESS, and is lemmatised as HAPPY. HAPPY is a level 1 word, and therefore
UNHAPPINESS scores zero points. In contrast, LAUGHABLE contains an ax
-ABLE which is not included in the level 2 or level 3 lists. LAUGHABLE is there-
fore not lemmatised as LAUGH. Although LAUGH is a level 1 word, LAUGH-
ABLE is not, and it therefore scores one point for the testee.

Level 2

Inectional suxes:

* Plural
* 3rd person singular present tense
* past tense
* past participle
* -ing
* comparative
* superlative
* possessive

Level 3

Most frequent and regular derivational axes:

* -able not when added to nouns


* -er
* -ish
* -less
* -ly
* -ness
* -th cardinal ordinal only
30 P. Meara, T. Fitzpatrick / System 28 (2000) 1930

* -y adjectives from nouns


* non-
* un-

References

Anderson, R.C., Freebody, P., 1981. Vocabulary knowledge. In: Guthrie, J. (Ed.), Comprehension and
Teaching: Research Reviews. International Reading Association, Newark, DE, pp. 77117.
Bauer, L., Nation, I.S.P., 1993. Word families. International Journal of Lexicography 6, 253279.
Engber, C.A., 1995. The Relationship of Lexical Prociency to the Quality of ESL Compositions. Journal
of Second Language Writing 4, 139155.
Heatley, A., Nation, I.S.P., 1998. VocabProle and Range. School of Linguistics and Applied Language
Studies. Victoria University of Wellington, Wellington, New Zealand.
Kiss, G.R., Armstrong, C., Milroy, R., 1973. An Associative Thesaurus of English. EP Microlms,
Wakeeld.
Laufer, B., 1998. The development of passive and active vocabulary in a second language: same or dif-
ferent? Applied Linguistics 19, 255271.
Laufer, B., Nation, I.S.P., 1995. Vocabulary size and use: lexical richness in L2 written production.
Applied Linguistics 16, 307322.
Laufer, B., Nation, I.S.P., 1999. A vocabulary-size test of controlled productive ability. Language Testing
16, 3351.
Lightbown, P.M., Meara, P.M., Halter, R., 1998. Contrasting patterns in classroom lexical environments.
In: Albrechtsen, D., Henriksen, B., Mees, I., Poulsen E. (Eds.), Perspectives on Foreign and Second
Language Learning. Odense University Press, Odense, pp. 221238.
Meara, P., Jones, G., 1988. Vocabulary size as a placement indicator. In: Grunwell, P. (Ed.), Applied
Linguistics in Society. CILT, London, pp. 8087.
Meara, P., Jones, G., 1990. The Eurocentres' 10K Vocabulary Size Test. Eurocentres Learning Service,
Zurich.
Nation, I.S.P., 1983. Testing and teaching vocabulary. Guidelines 5 (1), 1225.
Nation, I.S.P., 1984. Vocabulary Lists. Victoria University of Wellington, English Language Institute,
Wellington, New Zealand.
Nunan, D., 1995. Language Teaching Methodology. Prentice Hall International, Hemel Hempstead.
Palmberg, R., 1987. Patterns of vocabulary development in foreign language learners. Studies in Second
Language Acquisition 9, 201220.
Postman, L., Keppel, G., 1970. Norms of Word Association. Academic Press, New York.
Swan, M., Walter, C., 1990. The New Cambridge English Course. Cambridge University Press,
Cambridge.
Waring, R., 1999. The measurement of receptive and productive vocabulary (PhD thesis, University of
Wales, Swansea).

Anda mungkin juga menyukai