Anda di halaman 1dari 6

Eskiehir / YESDL

CHAPTER 2 PRINCIPLES OF LANGUAGE ASSESSMENT


Therere five testing criteria for testing a test: 1. 2. 3. 4. 5. Practicality Reliability Validity Authenticity Washback

1. PRACTICALITY A practical test is not excessively expensive, stays within appropriate time constraints, is relatively easy to administer, and has a scoring/evaluation procedure that is specific and time-efficient. Furthermore For a test to be practical administrative details should clearly be established before the test, students should be able to complete the test reasonably within the set time frame, the test should be able to be administered smoothly (prosedrle bomamal), all materials and equipment should be ready, the cost of the test should be within budgeted limits, the scoring/evaluation system should be feasible in the teachers time frame. methods for reporting results should be determined in advance. 2. RELIABILITY A reliable test is consistent and dependable. (Ayn test bir renciye farkl zamanlarda verildiinde ayn sonular alnabilmeli.) The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider following possibilities: fluctuations in the student(Student-Related Reliability), in scoring(Rater Reliability), in test administration(Test Administration Reliability), and in the test(Test Reliability) itself.

Student-Related Reliability: Temporary illness, fatigue, a bad day, anxiety and other physical or psychological factors may make an observed score deviate from ones true score. Also a test-takers test-wiseness or strategies for efficient test taking can also be included in this category.

www.yesdil.com

Eskiehir / YESDL

Rater Reliability: Human error, subjectivity, lack of attention to scoring criteria, inexperience, inattention, or even preconceived (pein hkml) biases may enter into scoring process. Inter-rater unreliability occurs when two or more scorers yield inconsistent scores of the same test. (Deerlendirme sonucunda farkl eitmenlerin ayn test iin tutarsz skorlar vermesi.) Intra-rater unreliability is a common occurrence for classroom teachers because of unclear scoring criteria, fatigue, bias toward particular good and bad students, or simple carelessness. One solution to such intra-rater unreliability is to read through about half of the tests before rendering any final scores or grades, then to recycle back through the whole set of tests to ensure an even-handed judgement. The careful specification of an analytical scoring instrument can increase raterreliability.

Test Administration Reliability: Unreliability may also result from the conditions in which the test is administered. Samples: Street noise, photocopying variations, poor light, variations in temperature, condition of desks and chairs.

Test Reliability: Sometimes the nature of the test itself can cause measurement errors. o o Timed tests may discriminate against students who do not perform well on a test with a time limit. Poorly written test items (that are ambiguous or that have more than one correct answer) may be a further source of test unreliability.

3. VALIDITY Arguably, validity is the most important principle. The extent to which the assessment requires students to perform tasks that were included in the previous classroom lessons. How is the validity of a test established? There is no final, absolute measure of validity, but several different kinds of evidence may be invoked in support. In some cases it may be appropriate to examine the extent to which a test calls for performance that matches that of the course or unit of study being tested. In other cases we may be concerned with how well a test determines whether or not students have reached an established set of goals or level of competence. Still in some other cases it could be appropriate to study statistical correlation with other related but independent measures. Other concerns about a tests validity may focus on the consequences beyond measuring the criteria themselves - of a test,

www.yesdil.com

Eskiehir / YESDL

or even on the test-takers perception of validity. We will look at these five types of evidence below.

Content Validity: If a test requires the test-taker to perform the behaviour that is being measured, it can claim content-related evidence of validity, often popularly referred to as content validity. Example: If you are trying to assess a persons ability to speak a second language in a conversational setting, asking the learner to answer paper-and-pencil multiple choice questions requiring grammatical judgements does not achieve content validity. In contrast, a test that requires the learner actually to speak within some sort of authentic context does. Additionally, in order for content validity to be achieved in a test, one should be able to elicit the following conditions: Classroom objectives should be identified and appropriately framed. The first measure of an effective classroom test is the identification of objectives. Lesson objectives should be represented in the form of test specifications. In other words, a test should have a structure that follows logically from the lesson or unit you are testing.

If you clearly perceive the performance of test-takers as reflective of the classroom objectives, then you can argue this, content validity has probably been achieved. Another way of understanding content validity is to consider the difference between direct and indirect testing. Direct testing involves the test-taker in actually performing the target task. Indirect testing involves the test-taker in performing not the target task itself, but that is related in some way. Example: When you test learners oral production of syllable stress, if you have them mark stressed syllables in a list of written words, this will be an indirect testing, but if you require them actually produce target words orally then, this will be a direct testing. Consequently, it can be said that direct testing is the most feasible (uygun) way to achieve content validity in classroom assessment.

Criterion-related Validity: It examines the extent to which the criterion of the test has actually been achieved. (Test edilen becerinin, konunun, bilginin gerekte ne kadar iyi kavranm olduu.) For example, a classroom test designed to assess a point of grammar in communicative use will have criterion validity if test scores are corroborated either by observed subsequent behaviour or by other communicative measures of the grammar point in question.

www.yesdil.com

Eskiehir / YESDL

(Ya test edilen kiinin test edildii konuyla ilgili davranlarnn gzlem yoluyla tutarll gzlenir. Ya da test edildii konuyla ilgili farkl bir teste tabi tutularak iki test sonucu arasnda tutarl bir sonuca varlp varlmad incelenir.) Criterion-related evidence usually falls into one of two categories: Concurrent (uygun, ayn zamanda olan) validity: A test has concurrent validity if its results are supported by other concurrent performance beyond the assessment itself. For example, the validity of a high score on the final exam of a foreign language course will be substantiated by actual proficiency in the language. (Testte elde edilen baarnn dilin gerek kullanmnda yanstlabilmesi.) Predictive (ngrsel, tahmini) validity: The assessment criterion in such cases is not to measure concurrent ability but to assess (and predict) a test-takers likelihood of future success. For example, the predictive validity of an assessment becomes important in the case of placement tests, language aptitude tests, and the like. (rnein daha baarl snflar elde etmek iin seviye tespit snavnda homojen gruplarn oluturulmas.)

Construct Validity: Virtually every issue in language learning and teaching involves theoretical constructs. In the field of assessment, construct validity asks, Does this test actually tap into the theoretical construct as it has been identified? (Yani bu test gerekten de test etmek istediim konu ya da beceriyi test etmede gerekli olan yapsal zellikleri tayor mu?) Example 1: Imagine that you have been given a procedure for conducting an oral interview. The scoring analysis for the interview includes several factors in the final score: pronunciation, fluency, grammatical accuracy, vocabulary use, and sociolinguistic appropriateness. The justification for these five factors lies in a theoretical construct that claims those factors to be major components of oral proficiency. So if you were asked to conduct on oral proficiency interview that evaluated only pronunciation and grammar, you could be justifiably suspicious about the construct validity of that test. Example 2: Lets suppose youve created a simple written vocabulary quiz, covering the content of a recent unit, that asks students to correctly define a set of words. Your chosen items may be a perfectly adequate sample of what was covered in the unit, but if the lexical objective of the unit was the communicative use of vocabulary, then the writing of definitions certainly fails to match a construct of communicative language use. Not: Large-scale standardized tests olarak nitelediimiz snavlar construct validity asndan pek de uygun deildir. nk pratik olmas asndan (yani hem zaman hem de ekonomik nedenlerden) bu testlerde llmesi gereken btn dil becerileri llememektedir. rnein TOEFL da oral production blmnn olmamas construct validity asndan byk bir engel olarak karmza kmaktadr.

www.yesdil.com

Eskiehir / YESDL

Consequential Validity: Consequential validity encompasses (iermek) all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the (intended and unintended) social consequences of a tests interpretation and use. McNamara (2000, p. 54) cautions against test results that may reflect socioeconomic conditions such as opportunities for coaching (zel ders, zel ilgi). For example, only some families can afford coaching, or because children with more highly educated parents get help from their parents. Teachers should consider the effect of assessments on students motivation, subsequent performance in a course, independent learning, study habits, and attitude toward school work.

Face Validity: Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the test-takers. (Snava girenlerin snav ne kadar dzgn, konuyla ilgili ve faydal bulduuyla ilgili) Face validity means that the students perceive the test to be valid. Face validity asks the question Does the test, on the face of it, appear from the learners perspective to test what it is designed to test? Face validity is not something that can be empirically tested by a teacher or even by a testing expert. It depends on the subjective evaluation of the test-taker. A classroom test is not the time to introduce new tasks. If a test samples the actual content of what the learner has achieved or expects to achieve, face validity will be more likely to be perceived. Content validity is a very important ingredient in achieving face validity. Students will generally judge a test to be face valid if directions are clear, the structure of the test is organized logically, its difficulty level is appropriately pitched, the test has no surprises, and timing is appropriate. To give an assessment procedure that is biased for best(iyi sonu elde etmek amacyla, bacy dvmeyip ona zm yedirmek iin) , a teacher offers students appropriate review and preparation for the test, suggests strategies that will be beneficial, and structures the test so that the best students will be modestly challenged and the weaker students will not be overwhelmed.

4. AUTHENTICITY In an authentic test the language is as natural as possible, items are as contextualized as possible, topics and situations are interesting, enjoyable, and/or humorous, some thematic (konuyla ilgili) organization, such as through a story line or episode is provided, tasks represent real-world tasks.

www.yesdil.com

Eskiehir / YESDL

Reading passages are selected from real-world sources that test-takers are likely to have encountered or will encounter. Listening comprehension sections feature natural language with hesitations, white noise, and interruptions. More and more tests offer items that are episodic in that they are sequenced to form meaningful units, paragraphs, or stories. 5. WASHBACK Washback includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. (Resmi snavlardan nce rencinin kendisine eki dzen vermesi iin yaplan ara snavlar washback etkisi yapar.) Formal tests can also have positive washback, but they provide no washback if the students receive a simple letter grade or a single overall numerical score. Classroom tests should serve as learning devices through which washback is achieved. Students incorrect responses can become windows of insight into further work. Their correct responses need to be praised, especially when they represent accomplishments in a students inter-language. Washback enhances a number of basic principles of language acquisition: intrinsic motivation, autonomy, self-confidence, language ego, interlanguage, and strategic investment, among others. One way to enhance washback is to comment generously and specifically on test performance. Washback implies that students have ready access to the teacher to discuss the feedback and evaluation he has given. Teachers can raise the washback potential by asking students to use test results as a guide to setting goals for their future effort.
What is washback? What does washback enhance? What should teachers do to enhance washback?

In general terms: The effect of testing on teaching and learning In large-scale assessment: Refers to the effects that the tests have on instruction in terms of how students prepare for the test In classroom assessment: The information that washes back to students in the form of useful diagnoses of strengths and weaknesses

Intrinsic motivation Autonomy Self-confidence Language ego Interlanguage Strategic investment

Comment generously and specifically on test performance Respond to as many details as possible Praise strengths Criticize weaknesses constructively Give strategic hints to improve performance

- END OF CHAPTER 2 -

www.yesdil.com

Anda mungkin juga menyukai