Anda di halaman 1dari 15

Measuring vocabulary size in an uncommonly taught language

Paul Nation Victoria University of Wellington, New Zealand Paul.Nation@vuw.ac.nz

Abstract This paper looks at what research needs to be done in order to test learners' vocabulary size. Testing vocabulary size has a long history, and is one of the most badly researched areas in applied linguistics. There are several steps that need to be carefully followed in order to make sure that the testing is valid. One of these involves corpus-based research to develop a word list so that a properly representative sample can be made. This paper outlines the steps and suggests a research agenda for developing a vocabulary size test.

Vocabulary size is the worst researched area in applied linguistics. This is not because of lack of interest in vocabulary size but because the methodology involved in measuring vocabulary size was so badly worked out. In this paper, I will look at why it is important for teaching and research to have good measures of vocabulary size, the methodology of measuring vocabulary size, and a possible research agenda for measuring vocabulary size in an uncommonly taught language. The importance of measuring vocabulary size Vocabulary size measurement is important for planning, diagnosis and research. It is not easy to plan a sensible vocabulary development program without knowing where learners are now in their vocabulary growth. Research on the amount of vocabulary needed for receptive use indicates that learners need around 6,000 word families to read novels written for teenagers, to watch movies, and to participate in friendly conversation. Around 8,000 to 9,000 words are needed to read newspapers, novels, and some academic texts (Nation, 2006). These figures assume 98% coverage of the input texts, which still leaves

1 word in every 50 or around six words on every page as unknown vocabulary. This coverage research suggests that it is useful to see vocabulary as divided into three main levels. A high-frequency vocabulary of around 2,000 words, a mid-frequency vocabulary of an additional 7,000 words making a total of 9,000, and the remaining low-frequency vocabulary of at least another 10,000 words but potentially higher. Adult native speakers seem to have the vocabulary size of around 20,000 words, but this would be a very long-term goal for most foreign language learners. More sensible goals are to aim initially at a highfrequency vocabulary of 2,000 words, and then to give attention largely to the most useful parts of the mid-frequency vocabulary for particular purposes. These can include academic vocabulary (currently in the form of the Academic Word List (Coxhead, 2000), and technical vocabulary relevant to a particular area of study, work, or pastime interest. Vocabulary size data is also very useful in planning extensive reading, particularly now that there are free adapted midfrequency readers for learners at the 4,000, 6,000 and 8,000 word levels (http://www.victoria.ac.nz/lals/staff/paul-nation.aspx). A vocabulary size test is also very useful for diagnostic purposes particularly where learners have reading problems. Such problems can be caused by a lack of vocabulary knowledge, a lack of grammatical knowledge, poor reading skill, inadequate background knowledge, vision or hearing problems, or cognitive processing issues. A carefully administered vocabulary size test can work out whether vocabulary knowledge is an issue or not. Studies with native speakers of English in New Zealand schools indicate that for almost all learners, general vocabulary knowledge is unlikely to be a major source of reading difficulty. A vocabulary size test can be a very useful contributor to research on language proficiency and the effect of experimental interventions on language learning. It can provide an independent measure to help in equating groups in controlled studies. Current research on the effect of text coverage and comprehension suggests that overall vocabulary size is a better predictor of comprehension than text coverage, although the two are clearly related. Vocabulary size measures are not so useful in measuring vocabulary increase as a result of some short term intervention because each word in a vocabulary size test typically represents at least 100 words and perhaps more, and most vocabulary interventions do not result in vocabulary increases of this size.

The methodology of measuring vocabulary size Published research on vocabulary size measurement in English goes back at least to the 1890s. In a single-subject study, Kirkpatrick (1891) read through a dictionary to work out how many words he knew. In the early 1900s several researchers recorded the vocabulary size growth of their young children. These studies typically lasted until the children were around three years old when their growth in vocabulary size and the amount of their spoken output accelerated so much that observational measurement was no longer feasible. These studies relied on output as the source of data for measuring vocabulary size based on the shaky assumption that we produce everything we know. Dictionary-based sampling Around the 1920s, the great educational psychologist Edward Thorndike became interested in vocabulary size measurement. It is likely that this coincided with his interest in developing frequency-based word lists of the English language for educational purposes. At this time, the methodology involved in measuring vocabulary size relied on dictionaries. Put simply, the methodology involved these steps. The researcher worked out how many words there were in the dictionary. Then a representative sample of these words was made so that the ratio between the sample and the number of words in the dictionary was known. For example, the sample might be 200th the size of the total dictionary. Thus if the dictionary contained 30,000 words, then a sample of 150 words might be taken. The sample was then turned into test items, learners were tested, and their scores were multiplied by the ratio to work out their vocabulary size. So, if the learner got half of the items correct on the 150 word test, then their vocabulary size was calculated as being 15,000 words (75 x 200). Thorndike (1924) saw several problems with this methodology. Firstly, the size of the dictionary used would have a strong effect on the calculation. It is also worth noting that the accuracy with which the number of words in the dictionary were determined would also have a major effect on the calculation. Some researchers simply accepted the dictionary makers claims for the number of words their dictionaries contained. These claims were often wildly inflated and gave no indication of the unit of counting (the word type, the lemma, or the word family) which was used. The figures were often simply created by the people involved in publicising the dictionary. Secondly, the method of sampling the words to go into the test was heavily biased towards high-frequency words.

The typical sampling procedures involved working out how many words needed to be sampled, seeing how many pages there were in the dictionary, and then taking the first word on every nth page. The major problem with this method of sampling is that it is a space-based sample, and high-frequency words have more entries and occupy more space in a dictionary than low-frequency words (Lorge & Chall, 1963). This meant that there were more high-frequency words in the sample than there should be. High-frequency words are typically more likely to be known than low-frequency words and so the biased sample would result in the test-takers getting more words correct than they should have and thus their vocabulary sizes would be greatly overestimated. Diller (1978) is a good example of this. His heavily biased sample resulted in estimates of vocabulary size of close to 200,000 words for university students. This figure is probably ten times the size it should be. So far we have looked at two of Thorndike's criticisms, dictionary size and sampling method. His third criticism involved the unit of counting. When we talk about words, what do we mean? Are pen and pens two different words or one? Are frequent and frequency two different words or one? What are the rules for deciding what forms are counted as part of the same word and what are counted as different words? Most early researchers simply accepted the decision of the dictionary makers, and the dictionary makers rules were typically not made explicit. This meant that the researchers in fact did not know what they were counting. Thorndike's fourth criticism was that researchers did not check the representativeness of their sample. Thorndike found that none of the nine studies published between 1907 and 1919 that he reviewed was methodologically sound. They failed on three and sometimes all four of the procedures he suggested. Unfortunately, Thorndike published his critique of the vocabulary studies in a little-known collection of papers. This meant that his critique was not widely read and was not easily available for researchers to read. As a result, the fundamental errors in sampling, deciding on the unit of counting, and checking continued to be made for the next 60 years. Looking back, it seems that the dictionary-based method of vocabulary testing was doomed to failure. There are just too many things that can go wrong in the procedure, and the amount of work in avoiding these errors is so great that it makes such a procedure unmanageable.

Frequency-based sampling There is an alternative procedure which avoids most of the problems and this can be called frequency-based sampling. It would be nice to think that the introduction of computers made such a sampling process feasible. This is partly true, but it is even nicer to know that years before any computer appeared, Thorndike developed word frequency lists which he then used as the basis for creating vocabulary size tests. It seems that these frequency lists were given a boost during the economic depression of the 1920s and 1930s when unemployed clerical workers and teachers were used as a manual word counters to develop a series of word frequency lists (Thorndike & Lorge, 1944). We can only imagine the amount of labour that this involved which now can be done in a few minutes on a modern computer. The frequency-based method can suffer from similar problems as the dictionary-based method but nowadays these problems are more readily solved (Nation & Webb, 2011a). The steps involved in the frequency-based method are as follows. 1 Decide on the kind of vocabulary knowledge that you want to measure and write a description of this kind of knowledge and why it is important to measure it. This description should consider whether the knowledge is receptive or productive, partial or precise, spoken or written, and recognition or recall (Read, 2000). It is also worth considering if responses to the test items can be in the learners first language if they are not native speakers of the language being tested. Begin to develop the test specifications. Bearing in mind your decision in step one, choose or develop a corpus that represents the kind of language use that is the goal of the learners you want to test. This corpus needs to be large enough and representative enough to make sure that all the words that are part of the test-takers knowledge occur within the corpus. Divide the corpus into sections so that it is possible to measure the range of occurrence, frequency of occurrence, and dispersion of the words in the corpus (Leech, Rayson, & Wilson, 2001). Bearing in mind your decisions at step 1, decide on the unit of counting that you are going to use when making a word list from a corpus. If you

are interested in receptive knowledge, then the unit of counting should be the word family. If you are interested in productive knowledge, then the unit of counting should be the word type or lemma. Researchers interested in productive knowledge tend to favour the word type (Vermeer, 2000 ). 5 Make a ranked word list from the corpus. The ranking should take account of range, frequency and dispersion. The freely available word counting program AntWordProfiler (http://www.antlab.sci.waseda.ac.jp/antwordprofiler_index.html) can do this for a very wide range of languages using Unicode files. Check your word list against other corpora to make sure that it does not contain serious omissions and errors. Divide your word list into levels based on range, frequency and dispersion. Randomly select a suitable number of words from each level, so that the total number of words in your test is large enough to obtain a good level of reliability and small enough to be able to be tested within a sensible testing time. In the case of the Vocabulary Size Test (Beglar, 2010; Nation & Beglar, 2007), which is a multiple-choice test, 70 items were sufficient to obtain a good level of reliability for the whole test, and a test of 100 items could be administered in a computer-based form in just under half an hour. Bearing in mind your decisions at step 1, decide on a test-item type. Typical item types testing receptive knowledge have included yes/no items (Diack, 1975), yes/no items including some non-words (Eyckmans, 2004; Meara & Buxton, 1987; Meara & Jones, 1987), multiple-choice items, multiple-choice items with a non-defining context (Nation & Beglar, 2007), and definition or translation items administered as an oral interview (Biemiller & Slonim, 2001) or in a written form (Barnard, 1961). Barnards translation test of the 2000 word General Service List included non-defining contexts. Multiple-choice tests can make use of bilingual items, where the choices are given in the first language of the test-takers but the test word and its non-defining context are in the foreign language. Such bilingual items can make a positive contribution to the validity of a test, because they avoid the grammar and reading

6 7 8

comprehension difficulties of second language definitions in multiplechoice items. Table 1: Examples of vocabulary size test items Yes/No Could you give a meaning for each word? caddy prehistoric compassionate ploy Multiple-choice 26 estuary a. home of a religious brotherhood b. resting place of dead people c. place of safety d. mouth of a tidal river Translation We went along the road. I stopped by the estuary. 10 Yes/No with non-words Could you give a meaning for each word? caddy feldinate compassionate ploy Multiple-choice with context 26 estuary: Its beside the <estuary>. a. home of a religious brotherhood b. resting place of dead people c. place of safety d. mouth of a tidal river Oral interview caddy estuary

Thoroughly check, trial, and gather data on the performance of the test and the test items. The checking and trialling should include getting experienced native speakers to carefully examine the test items and comment on them, getting highly proficient native speakers and foreign language learners to sit the test items, replacing the test words in multiple-choice items and items with contexts with nonsense words and getting native speakers and experienced learners to sit the test, getting a few learners to explain how they were able to correctly answer the lower frequency items (the Slumdog Millionaire phenomenon). Make sure that loanwords and cognates are not over-represented or under-represented in the test (Elgort, In press). When loanwords and cognates are tested in multiple-choice tests using first language choices, make sure that the choices do not contain the first language loanword or cognate, but make use of a definition. Appropriate item analysis procedures should also be

used, but because you do not want to interfere with the representative selection of items it is best to improve poorly performing items rather than omit or replace them. Multiword units Let us now come back to step 4 of the procedure described above the unit of counting. There is now considerable research on multiword units. In English these include items like as well, at all, by and large, get the green light, kill two birds with one stone, look like, on the other hand, shake ones head, look forward to. There is evidence that many of these are stored as whole units in the brain (formulaic sequences) and that in language use they are treated as single choices (Schmitt, 2004; Wray, 2002; Wray, 2008). Some of them are also very frequent (Martinez & Schmitt, 2011; Shin & Nation, 2008). It is thus tempting to want to include them in vocabulary size tests. However, I think that there is danger in mixing the testing of single words and multiword units except in the very few cases where the multiword units are core idioms (Grant & Bauer, 2004; Grant & Nation, 2006). In core idioms, there is no obvious relationship between the parts and the whole. Knowing the word as and the word well does not help us understand the core idiom as well. The vast majority of multiword units are not arbitrary combinations (Liu, 2010). The words in most multiword units (figuratives and literals to use Grant and Bauer's terminology) are made up of words that occur together because there is a relationship between the meaning of the parts and the meaning of the whole multiword unit, and this relationship is in line with the core grammar and meaning of these words. Kill two birds with one stone has both a figurative and a literal meaning and these have an obvious connection with each other. Catfish in chilli sauce can be considered a multiword unit (It is also one of my favourite Thai dishes and is thus very frequent in my Thai vocabulary), but the parts of this literal clearly relate to the meaning of the whole. If we include figurative and literal multiword units in vocabulary size tests, we are effectively doubledipping, that is, potentially testing some words twice and at least giving some words two or more chances to appear in the test. Testing multiword units needs to be separated from testing vocabulary size. I have spent some time on the issue of multiword units because it has very important implications for the testing of the vocabulary of most languages and particularly for the testing of Asian languages. If we are testing Thai, should we consider items like rot fai (train, literally fire vehicle), me nam (river,

literally mother water), and pi saw (older sister) to be single words or multiword units? At the very least, when creating a word list for making a vocabulary test we need to have a stated policy when dealing with compound words. Languages like Chinese and Japanese face exactly the same problem, and in English the problem is not lessened by having spaces between words because many compound words appear in a variety of forms whiteboard, whiteboard, white board with exactly the same meaning. Those involved in testing uncommonly taught languages need to be careful that what they classify as a word is not being influenced by parallels in other languages. Just because train is clearly a single word in English, does not mean that its equivalent in another language is necessarily a single word or needs to be treated as a single word. A similar issue occurs with homonyms, homographs and homophones. Homonyms are words which have the same spoken and written form but unrelated meanings, for example ear as part of the body, and ear as in an ear of corn. Homographs have the same written form, but different spoken forms and unrelated meanings (row of houses, make a row). Homophones have the same spoken form but different written forms and unrelated meanings (so, sew). Such words are different words and need to be tested as different words. Around 10% of the words in English have homoforms, but in most cases the frequency of the second member makes up less than 5% of the combined total occurrences of the two words (Parent, 2012). It is important to distinguish homoforms from polysemes. Polysemes are related meanings of the same word form. In English, the word sweet has many polysemes or senses. We can talk of a sweet taste, sweet music, a sweet face, the sweet smell of success, and a sweet person. In New Zealand when we are very satisfied with something (and we are young), we can say "sweet as!". All of these uses of sweet are uses of the same word and in a vocabulary size test it would not be wise to test different senses of items unless we had very clear criteria for reliably distinguishing such senses, and we applied the criteria to all the words we were selecting. Such criteria are extremely hard to develop and apply. It is likely that in most language use only the very high frequency polysemes are stored separately in the brain. There is one other issue that I wish to focus on before moving on to the research agenda, and that is the administration of vocabulary size tests. In New Zealand we have just completed the testing of the vocabulary size of over 1000 secondary school students. Most of these are native speakers of English. We

used a multiple-choice vocabulary size test going up to the 20th 1,000 words of English. What is a little bit surprising about the study is that all of the learners involved were tested individually with a test administrator sitting next to each learner during the test encouraging them to focus on the test and to do their best while they sat it. This involves saying things like "Good. You got that one right. Now try the next one". When asked, and sometimes when not asked, the administrator provided the pronunciation of the words being tested. This very labour-intensive administration of the test was used because in previous piloting we had found that when tests were given in a group setting, some of the learners (particularly those who were switched off school work) did not try very hard when sitting the test. In some cases we found we could double their score by sitting next to them as they sat the test. I no longer have any faith in the results of the group administered tests given to students who are not doing very well at school. Thus, the reason for this very labour-intensive method of testing was to make sure that the results meant something. It is better to have the results from a few students where you know that they were doing their best to show what they knew, than the results from a very large number of students where many were just racing their way through the test to get it out of the way. This issue may not be a problem in the testing of uncommonly taught languages with highly motivated learners. However, it is an issue to be aware of. The results of our testing indicated that native speakers of English increase their vocabulary size by around 1000 word families a year starting roughly at the age of about three. So, a 13 year old learner typically has a vocabulary size of around 10,500 word families and a 17 year old around 14,000 or 15,000 word families. Where English is taught as a foreign language, I consider that learners should be aiming close to 1000 words a year in order to reach the 8000 to 9000 words needed for unassisted reading of unsimplified texts such as novels and newspapers. A research agenda for testing vocabulary size in an uncommonly taught language The progress that we have been able to make in the testing of English vocabulary size has been greatly assisted by the availability of large corpora and the creation of extensive word family lists to analyse such corpora. There is now a computer program freely available to do such analysis in a wide range of languages (AntWordProfiler) including Thai. Corpora in a wide range of

languages are also becoming available. See for example Sketch Engine (www.sketchengine.co.uk). Developing a suitable corpus The first item on the research agenda should be the development of a suitable corpus. With so much language in an electronic form available on the World Wide Web, such development is an easier task than it was. However, the greatest limiting factor is likely to be the amount of spoken data available in an electronic form. To make word lists which are suitable for most language teaching, we need to have substantial amounts of spoken data. For the very high frequency words, it is likely that a corpus of 1 or 2 million words would be sufficient (Brysbaert & New, 2009). Movie and television scripts are a reasonable source, but it is obviously desirable to have as much unscripted colloquial talk as possible. The primary principle of corpus-based research is that the nature of corpus determines the nature of the data taken from it. Having very large biased corpora does not negate this principle. If there is a lack of such corpora, then using what is available is an acceptable temporary compromise, as long as the resulting data from it is carefully screened manually to make sure that there are not serious omissions or inappropriate inclusions. It is also useful checking such data on texts which were not in the corpus. Making graded lists Although there is a high correlation between frequency and range measures, it is useful to include both measures when developing word lists. This is because some words can be highly frequent in just one text or a very limited range of texts and are rarely if ever used in other texts. Range is a measure of how many different texts or collections of texts a word occurs in and thus it is a good indicator of the general usefulness of words. Dispersion measures are a combination of range and frequency data and there are several competing dispersion measures, each with their advocates and detractors. It is possible to use dispersion as a substitute for range, although the more data there is, the finer the grading that can be made. Frequency-based word lists have their weaknesses. They are of course highly dependent on the corpus from which they were made, and if that corpus is biased for or against particular kinds of texts, that will be reflected in the

word lists. Possibly, the second most important principle of corpus research is that electronic analysis needs to be accompanied by human analysis. In the jargon of the trade, we need hardware (computers), software (programs), and wetware (our brains). Such human analysis introduces elements of unreliability to the analysis, but this is overwhelmingly repaid by the commonsense usefulness of the resulting lists. Traditionally, word lists have been divided into 1000 level sections. There is no particular reason why this should be so, and there is certainly value in considering smaller sections, particularly where lists are being made for beginners and intermediate students. Ideally, the size of the lists should bear some relationship to learning goals, so that particular stages in a course could be represented by different lists. Careful thought needs to be given to the unit of counting as this can affect the sequencing of vocabulary in a frequency-based list, and the difficulty of the tests. If an inappropriate unit of counting is used, learners may score highly on low-frequency words which could actually be members of higher frequency word families. If word families are used as a unit of counting, it is desirable to set up a series of word family levels so that there is some consistency in what is included in a word family (Bauer & Nation, 1993). Making vocabulary size tests If the aim of test making is to make tests of total vocabulary size, then the tests need to cover a wide span of frequency levels. Research on the Vocabulary Size Test of English (Nguyen & Nation, 2011) shows that even learners with a moderately low proficiency in the language know some words from the very low frequency levels. A major reason for this is the pervasive influence of English on other languages through the borrowing of vocabulary. Daulton (2008), for example, provides evidence that at least one half of the most frequent 3,000 words of English are present in Japanese in some form or other and that these loanwords in Japanese unknown and used by young native speakers of Japanese. Japanese is not unique in this respect. If learners are to get credit for the vocabulary that they already know or which will have a very low learning burden for them, then they must have an opportunity to show this knowledge in the high, mid, and low frequency levels of the language. The choice of the item format to use in the vocabulary size test deserves careful thought and justification. Many teachers are reluctant to give learners

credit for partial knowledge and favour vocabulary test items that require strong knowledge of each word. In my opinion, when measuring for receptive purposes it is better to have items that encourage the retrieval of knowledge and that allow for partial knowledge to be enough to get the correct answer. Thus, in a receptive vocabulary size test, I favour multiple-choice items in non-defining contexts, so that learners get the orienting effect of the context, and the cuing effect of the choices. My justification for this is that when reading and listening there is substantial support from background and context clues to allow learners to make use of partial vocabulary knowledge to assist comprehension. I think there is also a possible argument that the use of multiple-choice items makes it much more likely that different learners will respond to the test items in a somewhat similar way if they take the test seriously and make full use of their test taking skills. With yes/no tests and definition or translation recall tests, there is likely to be much more variability between test-takers in how they interpret test instructions. In spite of the widespread use of multiple-choice items, there is still a lot that we don't know about them. Surprisingly, research indicates that for most multiple-choice items the optimum number of choices is three (Rodriguez, 2005). This is largely because when making multiple-choice items for comprehension tests, it is usually very difficult to construct a plausible fourth choice. I think this is not a problem for vocabulary size tests as the incorrect choices for such items can be the meanings of other words at the same frequency level and there are plenty of those to choose from. There is also debate about whether an "I don't know" option should be included in the choices. Unpublished research indicates that when an "I don't know" option is included, learners complete the test more quickly than when it is not. This indicates that most learners do not make random guesses when doing a vocabulary size test, but carefully consider their answers. Thus an I don't know option, especially when accompanied by a penalty, discourages learners from drawing on partial knowledge. We need to see vocabulary growth as being a cumulative process, not only in the number of words learnt, but also in the strength and richness with which each word is known. Much of our vocabulary knowledge is subconscious and the use of context and choices allows learners to draw on this knowledge. I support the use of partial knowledge when measuring vocabulary size, largely because I am interested in teaching and learning. However there are other positions, and testers who are interested in using vocabulary size tests for proficiency-related decision-making are probably well

justified in having a tougher item type than a contextualised multiple-choice format. Conclusion It is important that we learn from the history of vocabulary size testing. Well over 60 years of research was very poorly done because researchers did not learn from the mistakes of others. This poor research is by no means a relic of the past (see for example my critique of Hart and Risley on my website). The relatively recent advances in technology have made the testing of vocabulary size a much more feasible activity. It is important when designing and using vocabulary size tests however that the use of technology is also accompanied by the use of wetware. References
Barnard, H. (1961). A test of P.U.C. students' vocabulary in Chotanagpur. Bulletin of the Central Institute of English, 1, 90-100. Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(4), 253-279. Beglar, D. (2010). A Rasch-based validation of the Vocabulary Size Test. Language Testing, 27(1), 101-118. Biemiller, A., & Slonim, N. (2001). Estimating root word vocabulary growth in normative and advantaged populations: Evidence for a common sequence of vocabulary acquisition. Journal of Educational Psychology, 93(3), 498-520. Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977-990. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238. Daulton, F. E. (2008). Japan's Built-in Lexicon of English-based Loanwords. Clevedon: Multilingual Matters. Diack, H. (1975). Test Your Own Wordpower. St. Albans: Paladin. Diller, K. C. (1978). The Language Teaching Controversy. Rowley, Mass.: Newbury House. Elgort, I. (In press). Effects of L1 definitions and cognate status on the Vocabulary Size Test of English as a foreign language. Eyckmans, J. (2004). Measuring receptive vocabulary size: Netherlands Graduate School of Linguistics. Grant, L., & Bauer, L. (2004). Criteria for redefining idioms: Are we barking up the wrong tree? Applied Linguistics, 25(1), 38-61. Grant, L., & Nation, I. S. P. (2006). How many idioms are there in English? ITL International Journal of Applied Linguistics, 151, 1-14. Kirkpatrick, E. A. (1891). Number of words in an ordinary vocabulary. Science, 18(446), 107-108. Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English. Harlow: Longman. Liu, D. (2010). Going beyond patterns: Involving cognitive analysis in the learning of collocations. TESOL Quarterly, 44(1), 4-30. Lorge, I., & Chall, J. (1963). Estimating the size of vocabularies of children and adults: an analysis of methodological issues. Journal of Experimental Education, 32(2), 147-157. Martinez, R., & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics. Meara, P., & Buxton, B. (1987). An alternative to multiple choice vocabulary tests. Language Testing, 4(2), 142-151. Meara, P., & Jones, G. (1987). Tests of vocabulary size in English as a foreign language. Polyglot, 8(Fiche 1). Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63(1), 59-82. Nation, I. S. P., & Webb, S. (2011). Researching and Analyzing Vocabulary. Boston: Heinle Cengage Learning. Nation, P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9-13.

Nguyen, L. T. C., & Nation, I. S. P. (2011). A bilingual vocabulary size test of English for Vietnamese learners. RELC Journal, 42(1), 86-99. Parent, K. (2012). The most frequent English homonyms. RELC Journal, 43(1), 69-81. Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press. Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24(2), 3-13. Schmitt, N. (Ed.). (2004). Formulaic Sequences. Amsterdam: John Benjamins. Shin, D., & Nation, I. S. P. (2008). Beyond single words: the most frequent collocations in spoken English. ELT Journal, 62(4), 339-348. Thorndike, E. L. (1924). The vocabularies of school pupils. In J. C. Bell (Ed.), Contributions to Education (pp. 69-76). New York: World Book Co. Thorndike, E. L., & Lorge, I. (1944). The Teacher's Word Book of 30,000 Words. New York: Teachers College Columbia University. Vermeer, A. (2000 ). Coming to grips with lexical richness in spontaneous speech data. Language Testing, 17(1), 65-83. Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Wray, A. (2008). Formulaic Language: Pushing the Boundaries. Oxford: Oxford University Press.

Anda mungkin juga menyukai