Language Assessment - Chapters 1 - 2 - 3

YESDL Eskiehirin LP Markas
LANGUAGE ASSESSMENT CHAPTER 1 TESTING, ASSESSING AND TEACHING

In an era of communicative language teaching: Our classroom tests should measure up to standards of authenticity and meaningfulness. Language teachers should design tests that serve as motivating learning experiences rather than anxiety-provoking threats. Tests; should be positive experiences should build a persons confidence and become learning experiences should bring out the best in students shouldnt be degrading shouldnt be artificial shouldnt be anxiety-provoking This lesson(Language Assessment) aims; to create more authentic, intrinsically motivating assessment procedures that are appropriate for their context and designed to offer constructive feedback to students. What is a test? A test; is a method of measuring a persons ability, knowledge or performance in a given domain. Method = A set of techniques, procedures or items. To qualify as a test, the method must be explicit and structured. Like; Multiple-choice questions with prescribed correct answers A writing prompt with a scoring rubric An oral interview based on a question script and a checklist of expected responses to be filled by the administrator Measure = A means for offering the test-taker some kind of result. If an instrument does not specify a form of reporting measurement, then that technique cannot appropriately be defined as a test. Scoring may be like the followings: Classroom-based short answer essay test may earn the test-taker a letter grade accompanied by the instructors marginal comments. Large-scale standardized tests provide a total numerical score, a percentile rank, and perhaps some subscores. The test-taker(the individual) = The person who takes the test. Testers need to understand; who the test-takers are? what is their previous experience and background? whether the test is appropriately matched to their abilities? how should test-takers interpret their scores?
www.yesdil.com
-1-
Performance = A test measures performance, but the results imply the test-takers ability or competence. Some language tests measure ones ability to perform language: To speak, write, read or listen to a subset of language Some others measure a test-takers knowledge about language: Defining a vocabulary item, reciting a grammatical rule or identifying a rhetorical feature in written discourse. Measuring a given domain = It means measuring the desired criterion and not including other factors. Proficiency tests: Even though the actual performance on the test involves only a sampling of skills, that domain is overall proficiency in a language general competence in all skills of a language. Classroom-based performance tests: These have more specific criteria. For example: A test of pronunciation might well be a test of only a limited set of phonemic minimal pairs. A vocabulary test may focus on only the set of words covered in a particular lesson. A well-constructed test is an instrument that provides an accurate measure of the testtakers ability within a particular domain. TESTING, ASSESSMENT & TEACHING
TESTING Tests are prepared administrative procedures that occur at identifiable times in a curriculum. When tested, learners know that their performance is being measured and evaluated. When tested, learners muster all their faculties to offer peak performance. Tests are a subset of assessment. They are only one among many procedures and tasks that teachers can ultimately use to assess students. Tests are usually time-constrained (usually spanning a class period or at most several hours) and draw on a limited sample of behaviour.
ASSESSMENT Assessment is an ongoing process that encompasses a much wider domain. A good teacher never ceases to assess students, whether those assessments are incidental or intended. Whenever a student responds to a question, offers a comment, or tries out a new word or structure, the teacher subconsciously makes an assessment of the students performance. Assessment includes testing. Assessment is more extended and it includes a lot more components.
www.yesdil.com
-2-
What about TEACHING? For optimal learning to take place, learners must have opportunities to play with language in a classroom without being formally graded. Teaching sets up the practice games of language learning: the opportunities for learners to listen, think, take risks, set goals, and process feedback from the teacher (coach) and then recycle through the skills that they are trying to master. During these practice activities, teachers are indeed observing students performance and making various evaluations of each learner. Then, it can be said that testing and assessment are subsets of teaching.
teaching assessment testing
ASSESSMENT
Informal Assessment They are incidental, unplanned comments and responses. Examples include: Nice job! Well done! Good work! Did you say can or cant? Broke or break!, or putting a on some homework. Classroom tasks are designed to elicit performance without recording results and making fixed judgements about a students competence. Examples of unrecorded assessment: marginal comments on papers, responding to a draft of an essay, advice about how to better pronounce a word, a suggestion for a strategy for compensating for a reading difficulty, and showing how to modify a students note-taking to better remember the content of a lecture.
Formal Assessment They are exercises or procedures specifically designed to tap into a storehouse of skills and knowledge. They are systematic, planned sampling techniques constructed to give teacher and student an appraisal of student achievement. They are tournament games that occur periodically in the course of teaching. It can be said that all tests are formal assessments, but not all formal assessment is testing. Example 1: A students journal or portfolio of materials can be used as a formal assessment of the attainment of the certain course objectives, but it is problematic to call those two procedures test. Example 2: A systematic set of observations of a students frequency of oral participation in class is certainly a formal assessment, but not a test.
www.yesdil.com
-3-
YESDL Eskiehirin LP Markas THE FUNCTION OF AN ASSESSMENT Formative Assessment Evaluating students in the process of forming their competencies and skills with the goal of helping them to continue that growth process. It provides the ongoing development of the learners language. Example: When you give a student a comment or a suggestion, or call attention to an error, that feedback is offered to improve the learners language ability. Virtually all kinds of informal assessment are formative.
IMPORTANT:
Summative Assessment It aims to measure, or summarize, what a student has grasped, and typically occurs at the end of a course. It does not necessarily point the way to future progress. Example: Final exams in a course and general proficiency exams. All tests/formal assessment(quizzes, periodic review tests, midterm exams, etc.) are summative.
As far as summative assessment is considered, in the aftermath of any test, students tend to think that Whew! Im glad thats over. Now I dont have to remember that stuff anymore! An ideal teacher should try to change this attitude among students. A teacher should: instill a more formative quality to his lessons offer students an opportunity to convert tests into learning experiences.
www.yesdil.com
-4-
TESTS
Norm-Referenced Tests Each test-takers score is interpreted in relation to a mean (average score), median (middle score), standard deviation (extend of variance in scores), and/or percentile rank. The purpose in such tests is to place test-takers along a mathematical continuum in rank order. Scores are usually reported back to the test-taker in the form of a numerical score. (230 out of 300, 84%, etc.) Typical of these tests are standardized tests like SAT (Scholastic Aptitude Test), TOEFL (Test of English as a Foreign Language), DS, KPDS, YDS, etc. These tests are intended to be administered to large audiences, with results efficiently disseminated to testtakers. They must have fixed, predetermined responses in a format that can be scored quickly at minimum expense. Money and efficiency are primary concerns in these tests.
Criterion-Referenced Tests They are designed to give testtakers feedback, usually in the form of grades, on specific course or lesson objectives. Tests that involve the students in only one class, and are connected to a curriculum, are typical of Criterion-Referenced Tests. Much time and effort on the part of the teacher are required to deliver useful, appropriate feedback to students. The distribution of students scores across a continuum may be of little concern as long as the instrument assesses appropriate objectives. As opposed to standardized, large scale testing with its emphasis on classroom-based testing, Criterion-Referenced Testing is of more prominent interest than Norm-Referenced Testing.
Approaches to Language Testing:
A Brief History
Historically, language-testing trends have followed the trends of teaching methods. During 1950s: An era of behaviourism and special attention to contrastive analysis. Testing focused on specific language elements such as phonological, grammatical, and lexical contrasts between two languages. During 1970s and 80s: Communicative Theories were widely accepted. A more integrative view of testing. Today: Test designers are trying to form authentic, valid instruments that simulate real world interaction.
www.yesdil.com
-5-
APPROACHES TO LANGUAGE TESTING
a) Discrete-Point Testing Language can be broken down into its component parts and those parts can be tested successfully. Component parts; the skills of listening, speaking, reading and writing. Units of language (discrete points); phonology, graphology, morphology, lexicon, syntax and discourse. An overall language proficiency test should sample all four skills and as many linguistic discrete points as possible. In the face of evidence that in a study each student scored differently in various skills depending on his background, country and major field, one of the supporters (Oller) of the unitary trait hypothesis retreated from his earlier stand and admitted that the unitary trait hypothesis was wrong.
b) Integrative Testing Language competence is a unified set of interacting abilities that cannot be tested separately. Communicative competence is so global and requires such integration that it cannot be captured in additive tests of grammar, reading, vocabulary, and other discrete points of language. Two types of tests have historically been claimed to examples of integrative tests: *cloze test and **dictation. Unitary trait hypothesis: It suggests an indivisible view of language proficiency; that vocabulary, grammar, phonology, the four skills, and other discrete points of language could not be disentangled from each other in language performance.
*Cloze Test: Cloze Test results are good measures of overall proficiency. The ability to supply appropriate words in blanks requires a number of abilities that lie at the heart of competence in a language: knowledge of vocabulary, grammatical structure, discourse structure, reading skills and strategies. It was argued that successful completion of cloze items taps into all of those abilities, which were said to be the essence of global language proficiency. **Dictation: Essentially, learners listen to a passage of 100 to 150 words read aloud by an administrator (or audiotape) and write what they hear, using correct spelling. Supporters argue that dictation is an integrative test because success on a dictation requires careful listening, reproduction in writing of what is heard, efficient short-term memory, and, to an extent, some expectancy rules to aid the short-term memory.
www.yesdil.com
-6-
YESDL Eskiehirin LP Markas c) Communicative Language Testing (A more recent approach after mid 1980s) What does it criticise? In order for a particular language test to be useful for its intended purposes, test performance must correspond in demonstrable ways to language use in non-test situations. Integrative tests such as cloze only tell us about a candidates linguistic competence. They do not tell us anything directly about a students performance ability. (Knowledge about a language, not the use of language) Any suggestion? A quest for authenticity was launched, as test designers centered on communicative performance. The supporters emphasized the importance of strategic competence (the ability to employ communicative strategies to compensate for breakdowns as well as to enhance the rhetorical effect of utterances) in the process of communication. Any problem in using this approach? Yes, communicative testing presented challenges to test designers, because they began to identify the real-world tasks that language learners were called upon to perform. But, it was clear that the contexts for those tasks were extraordinarily widely varied and that the sampling of tasks for any one assessment procedure needed to be validated by what language users actually do with language. As a result: The assessment field became more and more concerned with the authenticity of tasks and the genuineness of texts. d) Performance-Based Assessment In language courses and programs around the world, test designers are now tackling this new and more student-centered agenda. Instead of just offering paper-and pencil selective response tests of a plethora(fazlalk) of separate items, performance-based assessment of language typically involves oral production, written production, open-ended responses, integrated performance (across skill areas), group performance, and other interactive tasks. Any problems? Such assessment is time-consuming and therefore expensive, but those extra efforts are paying off in the form of more direct testing because students are assessed as they perform actual or simulated real-world tasks. The advantage of this approach? Higher content validity is achieved because learners are measured in the process of performing the targeted linguistic acts. Important In an English language-teaching context, performance-based assessment means that a teacher should rely a little less on formally structured tests and a little more on evaluation while students are performing various tasks.
www.yesdil.com
-7-
YESDL Eskiehirin LP Markas In performance-based assessment: Interactive Tests (speaking, requesting, responding, etc.) IN Paper-and-pencil Tetsts OUT Result: In performance-based assessment, tasks can approach the authenticity of reallife language use. CURRENT ISSUES IN CLASSROOM TESTING The design of communicative, performance-based assessment continues to challenge both assessment experts and classroom teachers. Therere three issues which are helping to shape our current understanding of effective assessment. These are: The effect of new theories of intelligence on the testing industry The advent of what has come to be called alternative assessment The increasing popularity of computer-based testing
New Views on Intelligence
In the past: Intelligence was once viewed strictly as the ability to perform linguistic and logical-mathematical problem solving. For many years, weve lived in a word of standardized, norm-referenced tests that are timed in a multiple-choice format consisting of a multiplicity of logicconstrained items, many of which are inauthentic. We were relying on timed, discrete-point, analytical tests in measuring language. We were forced to be in the limits of objectivity and give impersonal responds. Recently: Together with the traditional conceptualizations of linguistic intelligence and logical-mathematical intelligence on which standardized IQ (Intelligence Quotient) tests are based, 5 other frames of mind have been included. What are they? Spatial intelligence(Yn bulma zekas), musical intelligence, bodily-kinesthetic intelligence(motor zeka), interpersonal intelligence(sosyal zeka), intrapersonal intelligence(isel zeka) More recently, a concept of EQ (Emotional Quotient) has spurred us to underscore the importance of the emotions in our cognitive processing. o Those who manage their emotions tend to be more capable of fully intelligent processing, because anger, grief, resentment, self-doubt, and other feelings can easily impair peak performance in everyday tasks as well as higher-order problem solving. These new conceptualizations of intelligence have not been universally accepted by academic community. However, their intuitive appeal infused the decade of the 1990s with a sense of both freedom and responsibility in our testing agenda. In the past, our challenge was to test interpersonal, creative, communicative, interactive skills, and in doing so to place some trust in our subjectivity and intuition. (Ama artk bireyler sbjektif yarglarda bulunabiliyorlar, bylece her birey orijinal olabiliyor.)
www.yesdil.com
-8-
YESDL Eskiehirin LP Markas Traditional and Alternative Assessment Traditional Assessment

-One-shot, standardized exams -Timed, multiple-choice format -Decontextualized test items -Scores suffice for feedback -Norm-referenced scores -Focus on the right answer -Summative -Oriented to product -Non-interactive process -Fosters extrinsic motivation IMPORTANT
Alternative Assessment
-Continuous long-term assessment -Untimed, free-response format -Contextualized communicative tests -Individualized feedback and washback -Criterion-referenced scores -Open-ended, creative answers -Formative -Oriented to process -Interactive process -Fosters intrinsic motivation
It is difficult to draw a clear line of distinction between traditional and alternative assessment. Many forms of assessment fall in between the two, and some combine the best of both.(u daha iyidir denilemez.) More time and higher institutional budgets are required to administer and score assessments that presuppose(gerekmek) moresubjective evaluation, more individualization, and more interaction in the process of offering feedback. But the payoff (brin karln almak) of the Alternative Assessment comes with more useful feedback to students, the potential for intrinsic motivation, and ultimately a more complete description of a students ability. Computer-Based Testing Computer-Based Testing has been increasing in recent years. Some computer-based tests are small-scale. Others are standardized, largescale tests (e.g. TOEFL) in which thousands of test-takers are involved. A specific type of computer-based test (Computer-Adaptive Test / CAT) has been available for many years.(En byk zellii doru cevap verdike sorularn zorlamas, yanl cevap verdike sorularn kolaylamas) In CAT, the test-taker sees only one question at a time, and the computer scores each question before selecting the next one. So test-takers cannot skip questions, and, once they have entered and confirmed their answers, they cannot return to questions. Advantages of Computer-Based Testing: o Classroom-based testing o Self-directed testing on various aspects of a language (vocabulary, grammar, discourse, etc.) o Practice for upcoming high-stakes standardized tests o Some individualization, in the case of CATs. o Scored electronically for rapid reporting of results. Disadvantages of Computer-Based Testing: o Lack of security and the possibility of cheating in unsupervised computerized tests. o Home-grown quizzes that appear on unofficial websites may be mistaken for validates assessments. o Open-ended responses are less likely to appear because of the need for human scorers. o The human interactive element is absent.
www.yesdil.com
-9-
IMPORTANT
Some argue that computer-based testing lacks the artful form of being tailored by teachers for their classrooms, of being designed to be performance-based, and of allowing a teacher-student dialogue to form the basis of assessment. But this need not be the case: o Computer technology can be a boon to communicative language testing. o By using technological innovations creatively, testers will be able to enhance authenticity, to increase interactive exchange, and to promote autonomy. An Overall summary tests I hope you appreciate the place of testing in assessment. Assessment is an integral part of the teaching-learning cycle. In an interactive, communicative curriculum, assessment is almost constant. Tests, which are a subset of assessment, can provide authenticity, motivation, and feedback to the learner. Tests are essential components of a successful curriculum and one of several partners in the learning process. assessments Periodic assessments, both formal and informal, can increase motivation by serving as milestones of student progress. Appropriate assessments aid in the reinforcement and retention of information. Assessments can confirm areas of strength and pinpoint areas needing further work. Assessments can provide a sense of periodic closure to modules within a curriculum. Assessments can promote student autonomy by encouraging students self-evaluation of their progress. Assessments can spur learners to set goals for themselves. Assessments can aid in evaluating teaching effectiveness. - END OF CHAPTER 1 -
www.yesdil.com
-10-
CHAPTER
EXERCISE 1: Decide whether the following statements are TRUE or FALSE. 1. Its possible to create authentic and motivating assessment to offer constructive feedback to the learners. ----------2. All tests should offer the test takers some kind of measurement or result. ----------3. Performance based tests measure the test takers knowledge about the language. ---------4. Tests are the best tools to assess students. ----------5. Assessment and testing are synonymous terms. ----------6. Teachers incidental and unplanned comments and responses to the students is an example of formal assessment. ----------7. Most of our classroom assessment is summative assessment. ----------8. Formative assessment always points towards the future formation of learning. ---------9. The distribution students scores across a continuum is a great concern in normreferenced test. ----------10.Criterion referenced testing has more instructional value than norm-referenced testing for classroom teachers. ----------ANSWERE KEY 1. TRUE 2. TRUE 3. FALSE (They are designed to test the actual use of the language not the knowledge about the language.) 4. FALSE (We cannot say that they are the best, but just one of the useful devices to assess students.) 5. FALSE (They are not.) 6. FALSE (They are informal assessment) 7. FALSE (Its formative assessment) 8. TRUE 9. TRUE 10.TRUE
www.yesdil.com
-11-
CHAPTER 2 PRINCIPLES OF LANGUAGE ASSESSMENT

Therere five testing criteria for testing a test: 1. 2. 3. 4. 5. Practicality Reliability Validity Authenticity Washback
1. PRACTICALITY A practical test is not excessively expensive, stays within appropriate time constraints, is relatively easy to administer, and has a scoring/evaluation procedure that is specific and time-efficient. Furthermore For a test to be practical administrative details should clearly be established before the test, students should be able to complete the test reasonably within the set time frame, the test should be able to be administered smoothly (prosedrle bomamal), all materials and equipment should be ready, the cost of the test should be within budgeted limits, the scoring/evaluation system should be feasible in the teachers time frame. methods for reporting results should be determined in advance. 2. RELIABILITY A reliable test is consistent and dependable. (Ayn test bir renciye farkl zamanlarda verildiinde ayn sonular alnabilmeli.) The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider following possibilities: fluctuations in the student(Student-Related Reliability), in scoring(Rater Reliability), in test administration(Test Administration Reliability), and in the test(Test Reliability) itself.
Student-Related Reliability: Temporary illness, fatigue, a bad day, anxiety and other physical or psychological factors may make an observed score deviate from ones true score. Also a test-takers test-wiseness or strategies for efficient test taking can also be included in this category.
www.yesdil.com
-12-
YESDL Eskiehirin LP Markas Rater Reliability: Human error, subjectivity, lack of attention to scoring criteria, inexperience, inattention, or even preconceived (pein hkml) biases may enter into scoring process. Inter-rater unreliability occurs when two or more scorers yield inconsistent scores of the same test. (Deerlendirme sonucunda farkl eitmenlerin ayn test iin tutarsz skorlar vermesi.) Intra-rater unreliability is a common occurrence for classroom teachers because of unclear scoring criteria, fatigue, bias toward particular good and bad students, or simple carelessness. One solution to such intra-rater unreliability is to read through about half of the tests before rendering any final scores or grades, then to recycle back through the whole set of tests to ensure an even-handed judgement. The careful specification of an analytical scoring instrument can increase raterreliability. Test Administration Reliability: Unreliability may also result from the conditions in which the test is administered. Samples: Street noise, photocopying variations, poor light, variations in temperature, condition of desks and chairs. Test Reliability: Sometimes the nature of the test itself can cause measurement errors. o o Timed tests may discriminate against students who do not perform well on a test with a time limit. Poorly written test items (that are ambiguous or that have more than one correct answer) may be a further source of test unreliability.
3. VALIDITY Arguably, validity is the most important principle. The extent to which the assessment requires students to perform tasks that were included in the previous classroom lessons. How is the validity of a test established? There is no final, absolute measure of validity, but several different kinds of evidence may be invoked in support. In some cases it may be appropriate to examine the extent to which a test calls for performance that matches that of the course or unit of study being tested. In other cases we may be concerned with how well a test determines whether or not students have reached an established set of goals or level of competence. Still in some other cases it could be appropriate to study statistical correlation with other related but independent measures. Other concerns about a tests validity may focus on the consequences beyond measuring the criteria themselves - of a test, or even on the test-takers perception of validity. We will look at these five types of evidence below.
www.yesdil.com
-13-
Content Validity: If a test requires the test-taker to perform the behaviour that is being measured, it can claim content-related evidence of validity, often popularly referred to as content validity. Example: If you are trying to assess a persons ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple choice questions requiring grammatical judgements does not achieve content validity. In contrast, a test that requires the learner actually to speak within some sort of authentic context does. Additionally, in order for content validity to be achieved in a test, one should be able to elicit the following conditions: Classroom objectives should be identified and appropriately framed. The first measure of an effective classroom test is the identification of objectives. Lesson objectives should be represented in the form of test specifications. In other words, a test should have a structure that follows logically from the lesson or unit you are testing.
If you clearly perceive the performance of test-takers as reflective of the classroom objectives, then you can argue this, content validity has probably been achieved. Another way of understanding content validity is to consider the difference between direct and indirect testing. Direct testing involves the test-taker in actually performing the target task. Indirect testing involves the test-taker in performing not the target task itself, but that is related in some way. Example: When you test learners oral production of syllable stress, if you have them mark stressed syllables in a list of written words, this will be an indirect testing, but if you require them actually produce target words orally then, this will be a direct testing. Consequently, it can be said that direct testing is the most feasible (uygun) way to achieve content validity in classroom assessment. Criterion-related Validity: It examines the extent to which the criterion of the test has actually been achieved. (Test edilen becerinin, konunun, bilginin gerekte ne kadar iyi kavranm olduu.) For example, a classroom test designed to assess a point of grammar in communicative use will have criterion validity if test scores are corroborated either by observed subsequent behaviour or by other communicative measures of the grammar point in question. (Ya test edilen kiinin test edildii konuyla ilgili davranlarnn gzlem yoluyla tutarll gzlenir. Ya da test edildii konuyla ilgili farkl bir teste tabi tutularak iki test sonucu arasnda tutarl bir sonuca varlp varlmad incelenir.) Criterion-related evidence usually falls into one of two categories:
www.yesdil.com
-14-
YESDL Eskiehirin LP Markas Concurrent (uygun, ayn zamanda olan) validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. For example, the validity of a high score on the final exam of a foreign language course will be substantiated by actual proficiency in the language. (Testte elde edilen baarnn dilin gerek kullanmnda yanstlabilmesi.) Predictive (ngrsel, tahmini) validity: The assessment criterion in such cases is not to measure concurrent ability but to assess (and predict) a test-takers likelihood of future success. For example, the predictive validity of an assessment becomes important in the case of placement tests, language aptitude tests, and the like. (rnein daha baarl snflar elde etmek iin seviye tespit snavnda homojen gruplarn oluturulmas.)
Construct Validity: Virtually every issue in language learning and teaching involves theoretical constructs. In the field of assessment, construct validity asks, Does this test actually tap into the theoretical construct as it has been identified? (Yani bu test gerekten de test etmek istediim konu ya da beceriyi test etmede gerekli olan yapsal zellikleri tayor mu?) Example 1: Imagine that you have been given a procedure for conducting an oral interview. The scoring analysis for the interview includes several factors in the final score: pronunciation, fluency, grammatical accuracy, vocabulary use, and sociolinguistic appropriateness. The justification for these five factors lies in a theoretical construct that claims those factors to be major components of oral proficiency. So if you were asked to conduct on oral proficiency interview that evaluated only pronunciation and grammar, you could be justifiably suspicious about the construct validity of that test. Example 2: Lets suppose youve created a simple written vocabulary quiz, covering the content of a recent unit, that asks students to correctly define a set of words. Your chosen items may be a perfectly adequate sample of what was covered in the unit, but if the lexical objective of the unit was the communicative use of vocabulary, then the writing of definitions certainly fails to match a construct of communicative language use. Not: Large-scale standardized tests olarak nitelediimiz snavlar construct validity asndan pek de uygun deildir. nk pratik olmas asndan (yani hem zaman hem de ekonomik nedenlerden) bu testlerde llmesi gereken btn dil becerileri llememektedir. rnein TOEFL da oral production blmnn olmamas construct validity asndan byk bir engel olarak karmza kmaktadr.
www.yesdil.com
-15-
YESDL Eskiehirin LP Markas Consequential Validity: Consequential validity encompasses (iermek) all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the (intended and unintended) social consequences of a tests interpretation and use. McNamara (2000, p. 54) cautions against test results that may reflect socioeconomic conditions such as opportunities for coaching (zel ders, zel ilgi). For example, only some families can afford coaching, or because children with more highly educated parents get help from their parents. Teachers should consider the effect of assessments on students motivation, subsequent performance in a course, independent learning, study habits, and attitude toward school work. Face Validity: Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the test-takers. (Snava girenlerin snav ne kadar dzgn, konuyla ilgili ve faydal bulduuyla ilgili) Face validity means that the students perceive the test to be valid. Face validity asks the question Does the test, on the face of it, appear from the learners perspective to test what it is designed to test? Face validity is not something that can be empirically tested by a teacher or even by a testing expert. It depends on the subjective evaluation of the test-taker. A classroom test is not the time to introduce new tasks. If a test samples the actual content of what the learner has achieved or expects to achieve, face validity will be more likely to be perceived. Content validity is a very important ingredient in achieving face validity. Students will generally judge a test to be face valid if directions are clear, the structure of the test is organized logically, its difficulty level is appropriately pitched, the test has no surprises, and timing is appropriate. To give an assessment procedure that is biased for best(iyi sonu elde etmek amacyla, bacy dvmeyip ona zm yedirmek iin) , a teacher offers students appropriate review and preparation for the test, suggests strategies that will be beneficial, and structures the test so that the best students will be modestly challenged and the weaker students will not be overwhelmed.
4. AUTHENTICITY In an authentic test the language is as natural as possible, items are as contextualized as possible, topics and situations are interesting, enjoyable, and/or humorous, some thematic (konuyla ilgili) organization, such as through a story line or episode is provided, tasks represent real-world tasks.
www.yesdil.com
-16-
YESDL Eskiehirin LP Markas Reading passages are selected from real-world sources that test-takers are likely to have encountered or will encounter. Listening comprehension sections feature natural language with hesitations, white noise, and interruptions. More and more tests offer items that are episodic in that they are sequenced to form meaningful units, paragraphs, or stories. 5. WASHBACK Washback includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. (Resmi snavlardan nce rencinin kendisine eki dzen vermesi iin yaplan ara snavlar washback etkisi yapar.) Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score. Classroom tests should serve as learning devices through which washback is achieved. Students incorrect responses can become windows of insight into further work. Their correct responses need to be praised, especially when they represent accomplishments in a students inter-language. Washback enhances a number of basic principles of language acquisition: intrinsic motivation, autonomy, self-confidence, language ego, interlanguage, and strategic investment, among others. One way to enhance washback is to comment generously and specifically on test performance. Washback implies that students have ready access to the teacher to discuss the feedback and evaluation he has given. Teachers can raise the washback potential by asking students to use test results as a guide to setting goals for their future effort.
What is washback? What does washback enhance? What should teachers do to enhance washback?
In general terms: The effect of testing on teaching and learning In large-scale assessment: Refers to the effects that the tests have on instruction in terms of how students prepare for the test In classroom assessment: The information that washes back to students in the form of useful diagnoses of strengths and weaknesses
Intrinsic motivation Autonomy Self-confidence Language ego Interlanguage Strategic investment
Comment generously and specifically on test performance Respond to as many details as possible Praise strengths Criticize weaknesses constructively Give strategic hints to improve performance
- END OF CHAPTER 2 -
www.yesdil.com
-17-
CHAPTER
EXERCISE 1: Decide whether the following statements are TRUE or FALSE. 1. 2. 3. 4. 5. An expensive test is not practical. One of the sources of unreliability of a test is the school. Students, raters, the test, and the administration of it may affect the tests reliability. In indirect tests, students do not actually perform the task. If students are aware of what is being tested when they take a test, and think that the questions are appropriate, the test has face validity. 6. Face validity can be tested empirically. 7. Diagnosing strengths and weaknesses of students in language learning is a facet of washback. 8. One way of achieving authenticity in testing is to use simplified language.
EXERCISE 2: Decide which type of validity does each sentence belong to? a) Content validity b) Criterion related validity c) Construct validity d) Consequential validity e) Face validity 1. It is based on subjective judgment. ---------------------2. It questions the accuracy of measuring the intended criteria. ---------------------3. It appears to measure the knowledge and abilities it claims to measure. --------------------4. It measures whether the test meets the objectives of classroom objectives. --------------------5. It requires the test to be based on a theoretical background. ---------------------6. Washback is part of it. ---------------------7. It requires the test-taker to perform the behavior being measured. --------------------8. The students (test-takers) think they are given enough time to do the test. --------------------9. It assesses a test-taker's likelihood of future success. (e.g. placement tests). --------------------10.The students' psychological mood may affect it negatively or positively. --------------------11.It includes the consideration of the test's effect on the learner. ---------------------12.Items of the test do not seem to be complicated. ---------------------13.The test covers the objectives of the course. ---------------------14.The test has clear directions. ---------------------EXERCISE 3: Decide with which type of reliability could each sentence be related? a) Student-related reliability b) Rater reliability c) Test administration reliability d) Test reliability 1. There are ambiguous items. 2. The student is anxious. 3. The tape is of bad quality. 4. The teacher is tired but continues scoring. 5. The test is too long. 6. The room is dark. 7. The student has had an argument with the teacher. 8. The scorers interpret the criteria differently. 9. There is a lot of noise outside the building.
www.yesdil.com
-18-
YESDL Eskiehirin LP Markas ANSWERE KEY EXERCISE 1: 1. TRUE 2. FALSE 3. TRUE 4. TRUE 5. TRUE 6. FALSE 7. TRUE 8. FALSE EXERCISE 2: 1. Face validity 2. Consequential validity 3. Face validity 4. Content validity 5. Construct validity 6. Content validity 7. Criterion related validity 8. Face validity 9. Criterion related validity 10.Consequential validity 11.Consequential validity 12.Face validity 13.Content validity 14.Face validity EXERCISE 3: 1. 2. 3. 4. 5. 6. 7. 8. 9. Test reliability Student-related reliability Test administration reliability Rater reliability Test reliability Test administration reliability Student-related reliability Rater reliability Test administration reliability
www.yesdil.com
-19-
CHAPTER 3 DESIGNING CLASSROOM LANGUAGE TESTS

In this chapter, we will examine test types, and we will learn how to design tests and revise existing ones. To start the process of designing tests, we will ask some critical questions. The following five questions should form the basis of your approach to designing tests for your classroom. Question 1: What is the purpose of the test? Why am I creating this test? For an evaluation of overall proficiency? (Proficiency Test) To place students into a course? (Placement Test) To measure achievement within a course? (Achievement Test)
Once you have established the major purpose of a test, you can determine its objectives. Question 2: What are the objectives of the test? What specifically am I trying to find out? What language abilities are to be assessed? Question 3: How will the test specifications reflect both the purpose and objectives? When a test is designed, the objectives should be incorporated into a structure that appropriately weights the various competencies being assessed. Question 4: How will the test tasks be selected and the separate items arranged? The tasks need to be practical. They should also achieve content validity by presenting tasks that mirror those of the course being assessed. They should be evaluated reliably by the teacher or scorer. The tasks themselves should strive for authenticity, and the progression of tasks ought to be biased for best performance. Question 5: What kind of scoring, grading, and/or feedback is expected? Tests vary in the form and function of feedback, depending on their purpose. For every test, the way results are reported is an important consideration. Under some circumstances a letter grade or a holistic score may be appropriate; other circumstances may require that a teacher offer substantive washback to the learner.
www.yesdil.com
-20-
YESDL Eskiehirin LP Markas TEST TYPES Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test.
Below are the test types to be examined: 1. 2. 3. 4. 5. Language Aptitude Tests Proficiency Tests Placement Tests Diagnostic Tests Achievement Tests
1. Language Aptitude Tests They predict a persons success prior to exposure to the second language. A language aptitude test is designed to measure capacity or general ability to learn a foreign language. Language aptitude tests are ostensibly(grnte olan) designed to apply to the classroom learning of any language. Two standardized aptitude tests have been used in the US. The Modern Language Aptitude Test(MLAT), and the Pimsleur Language Aptitude Battery(PLAB) Tasks in MLAT includes: Number learning, phonetic script, spelling clues, words in sentences, and paired associates. Theres no unequivocal(su gtrmez bir ekilde) evidence that language aptitude tests predict communicative success in a language. (Yani bu tr test sonular ile rencilerin dil renme sreleri arasnda genelde bir tutarllk olsa da bu testlerin mutlak olduu sylenemez.) Any test that claims to predict success in learning a language is undoubtedly flawed because we now know that with appropriate self-knowledge, and active strategic involvement in learning, virtually everyone can succeed eventually. 2. Proficiency Tests A proficiency test is not limited to any one course, curriculum, or single skill in the language; rather, it tests overall ability. It includes: standardized multiple choice items on grammar, vocabulary, reading comprehension, and aural(iitsel) comprehension. Sometimes a sample of writing is added, and more recent tests also include oral production. Such tests often have content validity weaknesses. Proficiency tests are almost always summative and norm-referenced. They are usually not equipped to provide diagnostic feedback. Their role is to accept or to deny someones passage into the next stage of a journey. TOEFL is a typical standardized proficiency test. Creating these tests and validating them with research is a time-consuming and costly process. To choose one of a number of commercially available proficiency tests is a far more practical method for classroom teachers. 3. Placement Tests The ultimate objective of a placement test is to correctly place a student into a course or level. Certain proficient tests can act in the role of placement tests. A placement test usually includes a sampling of the material to be covered in the various courses in a curriculum.
www.yesdil.com
-21-
YESDL Eskiehirin LP Markas In a placement test, a student should find the test material neither too easy nor too difficult but appropriately challenging. The English as a Second Language Placement Test (ESLPT) at San Francisco State University has three parts. Part 1: students read a short article and then write a summary essay. Part 2: students write a composition in response to an article. Part 3: multiple-choice; students read an essay and identify grammar errors in it. The ESL is more authentic but less practical, because human evaluators are required for the first two parts. Reliability problems are also present but are mitigated(hafifletmek) by conscientious training of all evaluators of the test. What is lost in practicality and reliability is gained in the diagnostic information that the ESLPT provides. 4. Diagnostic Tests A diagnostic test is designed to diagnose specified aspects of a language. A diagnostic test can help a student become aware of errors and encourage the adoption of appropriate compensatory strategies. A test of pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum. Usually such tests offer a checklist of features for the administrator to use in pinpointing difficulties. Another example: a writing diagnostic would elicit a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention. A typical diagnostic test of oral production was created by Clifford Prator(1972) to accompany a manual of English pronunciation. In the test; Test-takers are directed to read a 150-word passage while they are taperecorded. The test administrator then refers to an inventory(envanter, deftere kaytl eya) of phonological items for analyzing a learners production. After multiple listenings, the administrator produces a checklist for errors in five separate categories. o o o o o Stress and rhythm, Intonation, Vowels, Consonants, and Other factors.
This information can help teacher make decisions about aspects of English phonology. 5. Achievement Tests An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests should be limited to particular material addressed in a curriculum within a particular time frame and should be offered after a course has focused on the objectives in question. Theres a fine line of differences between a diagnostic test and an achievement test. Achievement tests analyze the extent to which students have acquired language features that have already been taught. (Gemiin analizini yapyor.) Diagnostic tests should elicit information on what students need to work on in the future. (Gelecek ile ilgili bir analiz yaplyor.)
www.yesdil.com
-22-
YESDL Eskiehirin LP Markas The primary role of an achievement test is to determine whether course objectives have been met and appropriate knowledge and skills acquired by the end of a period of instruction. Achievement tests are often summative because they are administered at the end of a unit or term of study. But effective achievement tests can serve as useful washback by showing the errors of students and helping them analyse their weaknesses and strengths. (Tam bir washback rnei.) Achievement tests range from five- or ten-minute quizzes to three-hour final examinations, with an almost infinite variety of item types and formats. IMPORTANT: New and innovative testing formats take a lot of effort to design and a long time to refine through trial and error. Traditional testing techniques can, with a little creativity, conform to the spirit of an interactive, communicative language curriculum. Your best tack(yol, gidi) as a new teacher is to work within the guidelines of accepted, known, traditional testing techniques. Slowly, with experience, you can get bolder in your attempts. In that spirit, then, let us consider some practical steps in constructing classroom tests: A) Assessing Clear, Unambiguous Objectives Before giving a test; examine the objectives for the unit youre testing. Your first task in designing a test, then, is to determine appropriate objectives. (Objective olarak: Tag questions ya da Students will learn tag questions. eksiktir. nk testable deildir. rnein; spoken olarak m, writing olarak m renecekleri belli deil. Dahas context olarak conversation da m, essay de mi, academic lecture da m olaca belli deil.) Olmas gereken objective: Students will recognize and produce tag questions, with the correct grammatical form and final intonation pattern, in simple social conversations. For more see the original book pg. 50 B) Drawing Up Test Specifications (Talimatlar) Test specifications will simply comprise a) a broad outline of the test b) what skills you will test c) what the items will look like This is an example for test specifications based on the objective stated above: Students will recognize and produce tag questions, with the correct grammatical form and final intonation pattern, in simple social conversations. Test specifications Speaking (5 minutes per person, previous day) Format: oral interview, T and S Task: T asks questions to S Listening (10 minutes) Format: T makes audiotape in advance, with one other voice on it Tasks: a. 5 minimal pair items, multiple choice b. 5 interpretation items, multiple choice
www.yesdil.com
-23-
YESDL Eskiehirin LP Markas Reading (10 minutes) Format: cloze test items (10 total) in a story line Tasks: fill in the blanks Writing (10 minutes) Format: prompt for a topic: why I liked/didnt like a recent TV sitcom Task: writing a short opinion paragraph
These informal classroom-oriented specifications give you an indication of the topics(objectives) you will recover the implied elicitation and response formats for items the number of items in each section the time to be allocated for each C) Devising Test Tasks As you devise your test items, consider such factors as how students will perceive them(face validity) the extent to which authentic language and contexts are present potential difficulty caused by cultural schemata In revising your draft, you should ask yourself some important questions: 1. 2. 3. 4. 5. Are the directions to each section absolutely clear? Is there an example item for each section? Does each item measure a specified objective? Is each item stated in clear, simple language? Does each multiple choice have appropriate distractors; that is, are the wrong items clearly wrong and yet sufficiently alluring that they arent ridiculously easy? 6. Is the difficulty of each item appropriate for your students? 7. Is the language of each item sufficiently authentic? 8. Do the sum of the items and the test as a whole adequately reflect the learning objectives? In the final revision of your test, imagine that you are a student taking the test go through each set of directions and all items slowly and deliberately. Time yourself if the test should be shortened or lengthened, make the necessary adjustments make sure your test is neat and uncluttered on the page, reflecting all the care and precision you have put into its construction if there is an audio component, make sure that the script is clear, that your voice and any other voices are clear, and that the equipment is in working order before starting the test. D) Designing Multiple-Choice Test Items Therere a number of weaknesses in multiple-choice items: The technique tests only recognition knowledge. Guessing may have a considerable effect on test scores. The technique severely restricts what can be tested. It is very difficult to write successful items. Washback may be harmful. (Nasl olsa cevab tahmin ederim. Atsam bile tutar gibi dncelerle negatif bir washback olabilir.) Cheating may be facilitated.
www.yesdil.com
-24-
YESDL Eskiehirin LP Markas However, The two principles that stand out in support of multiple-choice formats are, of course, practicality and reliability. Some important jargons in Multiple-Choice Items: Multiple-choice items are all receptive, or selective, that is, the test-taker chooses from a set of responses rather than creating a response. Other receptive item types include true-false questions and matching lists. Every multiple-choice item has a stem(soru kk), which presents several options(klar/ usually between three and five) or alternatives to choose from. One of those options, the key (doru cevap), is the correct response, while the others serve as distractors (eldirici). IMPORTANT!!! Consider the following four guidelines for designing multiple-choice items for both classroom-based and large-scale situations: 1. Design each item to measure a specific objective. (rnein; ayn anda hem modal bilgisini hem de article bilgisini lme.) see pg. 56 2. State both stem and options as simply and directly as possible. Do not use superfluous (lzumsuz) words, and another rule of succinctness (az ve z) is to remove needless redundancy (gereksiz bilgi) from your options. see pg. 57 3. Make certain that the intended answer is clearly the only correct one. Eliminating unintended possible answers is often the most difficult problem of designing multiple-choice items. With only a minimum of context in each stem, a wide variety of responses may be perceived as correct. 4. Use item indices (indeksler) to accept, discard, or revise items: The appropriate selection and arrangement of suitable multiple-choice items on a test can best be accomplished by measuring items against three indices: a) item facility(IF), or item difficulty b) item discrimination (ID), or item differentiation, and c) distractor analysis a) Item facility (IF) is the extent to which an item is easy or difficult for the proposed group of test-takers. (ok ok kolay ya da ok ok zor olan sorular altn ve kmr deerindeki rencileri birbirinden ayrt edebilmemize hizmet etmez. Bu yzden bu item facility nemli bir parametre olarak karmza kmaktadr.) 20 renciden 13 doru cevap geldiyse; 13/20=0,65(%65) Orann ne olmas gerektii hakknda kesin bir bilgi olmamasna ramen %15 - %85in kabul edilebilir oranlar olduu sylenebilir. Note: Two good reasons for occasionally including a very easy item (%85 or higher) are to build in some affective feelings of success among lower-ability students and to serve as warm-up items. And very difficult items can provide a challenge to the highestability students.
www.yesdil.com
-25-
YESDL Eskiehirin LP Markas b) Item discrimination (ID) is the extent to which an item differentiates between high- and low-ability test-takers. An item on which high-ability students and low-ability students score equally well would have poor ID because it did not discriminate between the two groups. An item that garners(toplamak) correct responses from most of the high-ability group and incorrect responses from most of the low-ability group has good discrimination power. 30 renciyi en iyiden en de kadar eit paraya ayr. En yksek notu alan 10 renci ile en dk notu alan 10 renciyi bir itemda aadaki gibi ayrdmz farz edelim: Item # High-ability students (top 10) Low-ability students (bottom10) Correct 7 2 Incorrect 3 8
ID: 7-2=5/ 10= 0,50 The result tells us that us that the item has a moderate level of ID. High discriminating level would approach 1.0 and no discriminating power at all would be zero. In most cases, you would want to discard an item that scored near zero. As with IF, no absolute rule governs the establishment of acceptable and unacceptable ID indices. c) Distractor efficiency (DE) is the extent to which the distractors lure a sufficient number of test-takers, especially lower-ability ones, and those responses are somewhat evenly distributed across all distractors. Example: Choices High-ability students (10) Low-ability students (10) A 0 3 B 1 5 C* 7 2 D 0 0 E 2 0
*Note: C is the correct response. The item might be improved in two ways: a) Distractor D doesnt fool anyone. Therefore it probably has no utility. A revision might provide a distractor that actually attracts a response or two. b) Distractor E attracts more responses (2) from the high-ability group than the low-ability group (0). Why are good students choosing this one? Perhaps it includes a subtle reference that entices the high group but is over the head of the low group, and therefore the latter students dont even consider it. The other two distractor (A and B) seem to be fulfilling their function of attracting some attention from the lower-ability students.
www.yesdil.com
-26-
YESDL Eskiehirin LP Markas SCORING, GRADING AND GIVING FEEDBACK A) Scoring As you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. (Lesson objective hangi beceriyi daha ok nemsemise o beceriye daha fazla puan vermek gerekir.) Listening ve speaking e younlaan reading ve writing e daha az nemseyen bir lesson objective ya da curriculum varsa puan dalm yle olabilir: Oral production %30, Listening %30, Reading %20 ve Writing %20 eklinde. B) Grading Grading doesnt mean just giving A for 90-100, and a B for 80-89. Its not that simple. How you assign letter grades to a test is a product of the country, culture, and context of the English classroom, institutional expectations (most of them unwritten), explicit and implicit definitions of grades that you have set forth, the relationship you have established with the class, and student expectations that have been engendered(cause) in previous tests and quizzes in the class. C) Giving Feedback Feedback should become beneficial washback. Those are some examples of feedback: 1. 2. 3. 4. a letter grade a total score four subscores (speaking, listening, reading, writing) for the listening and reading sections a. an indication of correct/incorrect responses b. marginal comments for the oral interview a. scores for each element being rated b. a checklist of areas needing work c. oral feedback after the interview d. a post-interview conference to go over the results on the essay a. scores for each element being rated b. a checklist of areas needing work c. marginal and end-of-essay comments, suggestions d. a post-test conference to go over work e. a self-assessment on all or selected parts of the test, peer checking of results a whole-class discussion of results of the test individual conferences with each student to review the whole test Options 1 and 2 give virtually no feedback. The feedback they present does not become washback. Option 3 gives a student a chance to see the relative strength of each skill area and so becomes minimally useful. Options 4, 5, and 6 represent the kind of response a teacher can give that approaches maximum feedback. - END OF CHAPTER 3 -
5.
6.
7. 8. 9.
www.yesdil.com
-27-
CHAPTER
EXERCISE 1: Decide whether the following statements are TRUE or FALSE. 1. A language aptitude test measures a learners future success in learning a foreign language. 2. Language aptitude tests are very common today. 3. A proficiency test is limited to a particular course or curriculum. 4. The aim of a placement test is to place a student into particular level. 5. Placement tests have many varieties. 6. Any placement test can be used at a particular teaching program. 7. Achievement tests are related to classroom lessons, units, or curriculum. 8. A five-minute quiz can be an achievement test. 9. The first task in designing a test is to determine test specification.
EXERCISE 2: Decide whether the following statements are TRUE or FALSE. 1. 2. 3. 4. 5. 6. It is very easy to develop multiple-choice tests. Multiple-choice tests are practical but not reliable. Multiple-choice tests are time-saving in terms of scoring and grading. Multiple-choice items are receptive. Each multiple-choice item in a test should measure a specific objective. The stem of a multiple-choice item should be as long as possible in order to help students to understand the context. 7. If the Item Facility value of a multiple-choice item is .10(% 10), it means the item is very easy. 8. Item discrimination index differentiates between high and low-ability students.
ANSWERE KEY EXERCISE 1: 1. TRUE 2. FALSE 3. FALSE 4. TRUE 5. TRUE 6. FALSE (Not all placement tests suit to every teaching program.) 7. TRUE 8. FALSE 9. FALSE (The first task is to determine appropriate objectives.) EXERCISE 2: 1. 2. 3. 4. 5. 6. 7. 8. FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE (It seems easy, but is not very easy.) (They can be both practical and reliable.)
(It should be short and to the point.) (An item with an IF value of .10 is a very difficult one.)
www.yesdil.com
-28-
LANGUAGE ASSESSMENT FIRST MID-TERM EXAM SAMPLE QUESTIONS CHAPTER 1, 2, 3

1. Which of the followings is false about tests? A) They should be positive experiences. B) They should build a persons confidence and become learning experiences. C) They should be natural. D) They should bring out the best in students. E) They should serve as anxietyprovoking threats. 2. Which one of the followings isnt one of the characteristics of tests? A) Tests are usually timeconstrained and draw on a limited sample of behaviour. B) Tests are a subset of assessment. C) When tested, learners know that their performance is being measured and evaluated. D) Tests are prepared administrative procedures that occur at identifiable times in a curriculum. E) Testing is more extended than assessment and it includes a lot more components. 4. Teachers, I. make marginal comments on papers, II. respond to a draft of an essay, advice about how to better pronounce a word, III. suggest for a strategy for compensating for a reading difficulty, IV. show how to modify a students note-taking to better remember the content of a lecture. Which one of the followings is described above? A) B) C) D) E) Cloze test Informal assessment Performance-based assessment Formal assessment Summative assessment
3. I.
It is an ongoing process that encompasses a wide domain. II. Assessment includes teaching. III. A good teacher doesnt assess students all the time. IV. Assessments are incidental all the time. They can never be intended. Which of the above is false about assessment?
5. Which one of the followings is true about summative assessment? A) Virtually all kinds of informal assessment are summative. B) It aims to measure, or summarize, what a student has grasped, and typically occurs at the end of a course. C) Giving a student a comment or a suggestion, and calling attention to an error are examples of summative assessment. D) Classroom tasks are designed to elicit performance without recording results E) It aims at future development of the students.
A) B) C) D) E)
II, III and IV I, II and III II and IV All of the above None of the above
www.yesdil.com
-29-
YESDL Eskiehirin LP Markas 6. I. They are designed to give testtakers feedback, usually in the form of grades, on specific course or lesson objectives. II. Tests that involve the students in only one class, and are connected to a curriculum, are typical of criterion-referenced tests. III.Much time and effort on the part of the teacher are required to deliver useful, appropriate feedback to students. IV.The distribution of students scores across a continuum may be of little concern as long as the instrument assesses appropriate objectives. Which of the above is true about Criterion-Referenced Tests? A) B) C) D) E) I, II and IV II and III III and IV None of the above All of the above 9. Which of the below is/are not among the disadvantages of Computer-Based Testing? A) Lack of security and the possibility of cheating in unsupervised computerized tests. B) Scores are electronically evaluated for rapid reporting of results. C) Home-grown quizzes that appear on unofficial websites may be mistaken for validates assessments. D) Open-ended responses are less likely to appear because of the need for human scorers. E) The human interactive element is absent.
7. In ----, language can be broken down into its component parts and those parts can be tested successfully. A) B) C) D) E) Norm-referenced tests Formative tests Discrete-point testing Communicative testing Summative tests
10.Which one of the followings is false about assessment? A) Periodic assessments, both formal and informal, can increase motivation by serving as milestones of student progress. B) Appropriate assessments aid in the reinforcement and retention of information. C) Assessments can confirm areas of strength and pinpoint areas needing further work. D) Assessments can aid in evaluating teaching effectiveness. E) Assessments prevent student autonomy by discouraging students self-evaluation of their progress.
8. ---- suggests an indivisible view of language proficiency. A) B) C) D) E) Formal assessment Teaching Traditional assessment Unitary trait hypothesis Computer-based testing
www.yesdil.com
-30-
YESDL Eskiehirin LP Markas 11.For a test to be ----, -administrative details should clearly be established before the test, -students should be able to complete the test reasonably within the set time frame, -the test should be able to be administered smoothly, -all materials and equipment should be ready, -the cost of the test should be within budgeted limits, -the scoring/evaluation system should be feasible in the teachers time frame, -methods for reporting results should be determined in advance. A) B) C) D) E) reliable sustainable authentic valid practical 15.If a test has all the components it needs to have, it has ----. A) B) C) D) E) construct validity consequential validity face validity content validity washback
16.---- refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the testtakers. A) B) C) D) E) construct validity consequential validity face validity content validity washback
12.If theres poor lightning, then the test will have ----. A) B) C) D) E) test unreliability test administration unreliability rater unreliability student-related unreliability intra-rater unreliability
13.When you test learners oral production of syllable stress, if you have them mark stressed syllables in a list of written words, this will be a(n) ----. A) B) C) D) E) direct testing summative testing indirect testing concurrent validity predictive validity
17.I. Informal performance assessment is by nature more likely to have builtin washback effects because the teacher is usually providing interactive feedback. II. Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score. III.One way to enhance washback is to comment generously and specifically on test performance. IV.Teachers can raise the washback potential by asking students to use test results as a guide to setting goals for their future effort. Which of the above is true about washback? A) B) C) D) E) III and IV Only III II and IV All of the above None of the above
14.If lesson objectives are effectively represented in the test, then the test will achieve ----. A) B) C) D) E) construct validity consequential validity face validity content validity washback
18.Which one of the followings isnt enhanced through washback? A) B) C) D) E) Intrinsic motivation Lack of confidence Autonomy Strategic investment Language ago
www.yesdil.com
-31-
YESDL Eskiehirin LP Markas 19.Which of the below is false for a placement test? A) In a placement test, a student should find the test material neither too easy nor too difficult but appropriately challenging. B) A placement test is designed to measure capacity or general ability to learn a foreign language. C) Certain proficient tests can act in the role of placement tests. D) A placement test usually includes a sampling of the material to be covered in the various courses in a curriculum. E) The ultimate objective of a placement test is to correctly place a student into a course or level. 20.A (n) ---- is not limited to any one course, curriculum, or single skill in the language; rather, it tests overall ability. A) B) C) D) E) diagnostic test placement test proficiency test achievement test language aptitude test 21.---- analyze the extent to which students have acquired language features that have already been taught. ---- should elicit information on what students need to work on in the future. A) B) C) D) E) Achievement tests/Diagnostic tests Proficiency tests/Placement tests Diagnostic tests/Lang. Apt. tests Achievement test/Proficiency Tests Lang. Apt. tests/Achievement Tests
22.Which of the below is false for multiple-choice tests? A) The technique tests only recognition knowledge. B) Guessing may have a considerable effect on test scores. C) The technique severely restricts what can be tested. D) It is very difficult to write successful items. E) Washback cant be harmful. 23.Which of the below cant be considered as a guideline for designing multiple-choice items. A) Use item indices to accept, discard, or revise items. B) Design each item to measure a specific objective. C) Make certain that the intended answer is clearly the only correct one. D) State both stem and options as simply and directly as possible. E) Include some options which have no utility.
CEVAP ANAHTARI 1.E 2.E 3.A 4.B 5.B 14.D 15.A 16.C 17.D 18.B
6.E 19.B
7.C 20.C
8.D 9.B 21.A 22.E
10.E 23.E
11.E
12.B 13.C
www.yesdil.com
-32-
www.yesdil.com
-33-

Language Assessment - Chapters 1 - 2 - 3

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Language Assessment - Chapters 1 - 2 - 3

Diunggah oleh

Hak Cipta:

Format Tersedia

YESDL Eskiehirin LP Markas

LANGUAGE ASSESSMENT CHAPTER 1 TESTING, ASSESSING AND TEACHING

YESDL Eskiehirin LP Markas

YESDL Eskiehirin LP Markas

teaching assessment testing

YESDL Eskiehirin LP Markas

Approaches to Language Testing:

YESDL Eskiehirin LP Markas

APPROACHES TO LANGUAGE TESTING

New Views on Intelligence

YESDL Eskiehirin LP Markas Traditional and Alternative Assessment Traditional Assessment

YESDL Eskiehirin LP Markas

YESDL Eskiehirin LP Markas

YESDL Eskiehirin LP Markas

CHAPTER 2 PRINCIPLES OF LANGUAGE ASSESSMENT

YESDL Eskiehirin LP Markas

Intrinsic motivation Autonomy Self-confidence Language ego Interlanguage Strategic investment

YESDL Eskiehirin LP Markas

YESDL Eskiehirin LP Markas

CHAPTER 3 DESIGNING CLASSROOM LANGUAGE TESTS

YESDL Eskiehirin LP Markas

YESDL Eskiehirin LP Markas

LANGUAGE ASSESSMENT FIRST MID-TERM EXAM SAMPLE QUESTIONS CHAPTER 1, 2, 3

8.D 9.B 21.A 22.E

YESDL Eskiehirin LP Markas

Anda mungkin juga menyukai