Anda di halaman 1dari 15
EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA Rentauli Mariah Silalahi Politeknik Informatika Del Abstract English is one of the subjects tested in the National Examination or Ujian Nasional (UN) in Senior High Schools (SMA) in Indonesia. The English test is used as an achievement test that should reflect the school curriculum. However, there is an inconsistency between the English test and school curriculum because the English test only assesses listening and reading skills while the curriculum is communicative competence based. To be agreeable with the principles of communicative competence, the English test should be comprised of the four major skills: listening, reading, speaking, and writing. This study will discuss the issue of this inconsistency using five testing principles of what makes a good test: validity, reliability, authenticity, practicality, and washback. The findings will be useful for pointing out some factors that need to be considered in designing an English test that is consistent to the test's objectives. In general, this paper will also be useful for English teachers preparing an English test because it is expected to motivate teachers to always carefully design their English tests. Key words: validity, reliability, authenticity, practicality, washback Introduction English is one of the subjects tested in the National Examination or Ujian Nasional (UN) in senior high schools. As a national test, the purpose of the English test is to assess the students’ achievement in national standards. Achievement tests relate to the previous learning the students have experienced (Hughes, 2003). Therefore, the English test is used to assess students’ achievement after learning English for three years. Based on the curriculum in schools, students learn English to be communicative. To be communicative means to integrate receptive (listening and reading) and productive «nal of Language Literature Culture an Education ‘LOT Vol 5 No. 1 January 2011 (Speaking and writing) skills. Yet, the English test seems to not reflect what students learn in schools because it only tests students’ receptive skills. This issue is important. because the UN is a high-stakes test that has a powerful influence in determining students’ lives. The English test from the 2008 UN for SMA will be evaluated based on the key testing principles of validity, reliability, authenticity, practicality, and washback. There have been many criticisms toward the validity of the UN in general. Many people including parents, concerned citizens, and education stakeholders criticise the validity of the national examination in representing what students have learned for three years in school (Alam, 2008). Yet, there is hardly any published research on criticism arguing the validity of the English subject tested in the UN. Therefore, this study might be a useful starting point to trigger critical views for evaluating the contents of each subject tested in the UN. It is expected that this study will encourage English teachers in general to design any English test based on the principles of a good test. And hopefully it can also provide input to the improvement of the English test for the UN in the future. Due to restricted access to the 2008 UN audio recording and transcripts, the listening section could not be analysed, therefore, the analysis will primarily focus on the reading section of the UN. In addition, information was not collected from students or teachers in order to gain their perspectives on the impact of the test. Given the limited sample of data used for analysis (only the reading section) and the fact that no data was collected from the people involved in the test, including the students who sat the test, the teachers who prepared the students, and the committee who designed the test, no generalizations should be made from this article. This article is a stepping stone to future research which will discuss the complete test with the availability of the recordings and ethics approval to collect data from participants. A. Validity The validity of a test can be evaluated to the extent to which the test measures what it is intended to measure (Harrison, 1983; Weir, 1993; Hughes, 2003; Brown, 2004). While Harrison (1983) emphasises only content and face validity are important for a teacher to design a test, Hughes (2003) argues a test should have construct and 34 [OMIVERSITAS pr4tTA MARAPAN & EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA Thiterion-related vatidity in addition to the aforementioned ones. However, only the content, construct and face validity of the 2008 UN English test are observable. Therefore the discussions of the test validity are limited to these points. Harrison (1983) argues a test will have content validity if the test is decided by considering the purposes of the test and drawn up as a list known as a content specification. He continues to argue that: content specification is important because it ensures as far as possible that the test reflects all the areas to be assessed in suitable proportions and also because it represents a balanced sample, without bias towards the kinds of items which are easiest to write or towards the test materials which happens to be available (p.11). Reflecting on Harrison's argument about the importance of content specification above, it seems that to design a test that can represent three years of study material into a fifty- question test for the UN English test is difficult. Moreover, the purpose of the test is to test students’ achievement in national standards. It means despite all differences students have in learning support and facilities they must pass the same test. However, the Indonesian government seems to have anticipated the test's content validity. The government made a policy that the test's contents must be written based ‘on the standard of graduates’ competence (SKL). The SKL was introduced to schools a few months before the test was held. It was designed from the intersection of the previous years’ curricula SKL and the standard of content (National Department of Education, 2007a, p. 4). It is quite obvious that the government expects all students to have been taught the same content of knowledge because all schools have the same curriculum. While the standard competence in listening is defined as ability to understand a short text, such as a news item, a conversation, and a simple monologue, competence in reading is defined as the ability to understand short functional texts (National Department of Education, 2007a, p. 28). Construct validity is used to refer to the extent to which we can interpret a given test score as an indicator of the ability (ies), or construct(s), we want to measure (Bachman & Palmer, 1996). The senior high school English curriculum is based on communicative competence and it is intended to enable students to be urnal of Language Literature Culture an Education (GLOT Vol 5 No. 1 January 2011 ‘Ommunicatively competent in listening, speaking, reading, and writing (National Department of Education, 2003, p. 16-17). However, the UN English test only tests students’ listening and reading skills. Though the multiple choice test of the listening and reading might yield very reliable scores, this would not be sufficient to justify students’ speaking and writing abilities and ‘do not contain authentic representations of classroom activities’ (O'Malley & Pierce, 1996, p. 2). Generally, there are many activities employed in English classrooms, such as role play and group work. Therefore the test is not content and construct valid: it does not reflect all the objectives of the curriculum which are aiming at communicative competence. Face validity is ‘concerned with what teachers and students think of the test’ (Harrison, 1983, p.11). There has been a great deal of news on television and newspaper about teachers and students’ disappointment and anger about the UN. Their disappointment was solely based on the unfairness of judging students’ success or failure to graduate from school on the UN, which obviously only tests some of their subjects in three years learning (Alam, 2008). Recently, there is an increasing protest from the society doubting the validity of the UN. An unsolved case in the Indonesian Supreme Court stated that the UN had many problems (Rahmatullah, 2009). Overall, the UN English test has low content, construct and face validity. However, if the reading test is evaluated independently, it has a high degree of validity because it has been written based on the standard of graduates’ competence (National Department of Education, 2007a). Unfortunately, because of the unavailability of the recordings, the listening test's validity will not be discussed. The reading test consists of different types of texts including narrative, descriptive, argumentative and exposition. Some of the examples of each type can be found in passages 1, 2, 3, and 4 consecutively (National Department of Education, 2008). All texts are of daily contexts and they test students’ understanding in finding general descriptions, main ideas, explicit, implicit, and specific information, and the meaning of words or phrases from the texts. Therefore, the texts structure and questions for the reading test have met the criteria required for the standard of graduates’ competence. 8 ors Gp EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA ne samples of the questions are given in the table (Table 1) below to align what the UN aims to assess (e.g. the students’ ability to identify the main idea in the text) and the questions used to measure their ability. Table 1. Samples of Questions in line with their Objectives The student is able to | Question ss peut +. Question and answer choices General description | 20 The text mainly discusses ... | A. The founding of Asuncio’n B. The development of Paraguay C. The center of Paraguay’s cultural life D. The capital and largest city of Paraguay _E. The relationship between Asuncio’n and its neighbor What is the main idea of the passage? A. Daisy liked Donald’s old hat B. Daisy liked Donald’s new hat C. Donald wanted to change his old hat D. Daisy wanted to buy Donald a new hat E. Donald asked Daisy to buy him a new hat _ What is the writer’s suggestion to the editor? A. B. c. D. E. Main idea 19 Explicit information | 31 The duration for ads should be shortened Ads on TV should be limited Ads should not annoy TV reviewers Films should be more shown than ads Ads on TV should be stopped Implicit information | 34 Among the qualifications, all applications should ... A. Live downtown B. Have a computer C. Be an S1 graduate from any faculty D. Be able to communicate in English | E. Have at least 30 years of work experience | Which of the following statements is TRUE? A. Harry rocked his hips B. William sang “Man Eater” | C, D. E. | Specific information 49 Elton John played a guitar Nelly Furtado sang Candle in the Wind . Princes William and Harry commemorated their | | mother’s death | wurnal of Language Literature Culture an Education LOT Vol 5 No. 1 January 2011 student is able to | Question a Question and answer choices = The meaning of words 1 39 “This disease affects Aya’s nervous system ..., to or phrases from the | eat.” The underlined word has the same meaning texts. as... A. Transforms B. Attacks | | C. Changes D. Alters | E. Hits B. Reliability Reliability is defined as consistency of measurement (Weir, 1993; Bachman & Palmer, 1996; Harrison, 1983; Hughes, 2003; Brown, 2004). According to Harrison (1983), there are three aspects of reliability related to student and administration, scoring, and test. Regarding the first aspect, the UN test is nationally conducted under the same supervision circumstances. Each test room is supervised by two teacher-supervisors, and each school is supervised by two supervisors from the Independent Monitoring Team (TPI) and one policeman. However, many people in Indonesia comment that the additional number of test supervisors will increase students’ discomfort and anxiety in doing the test. Yet, this claim is not highly regarded because it is claimed that as long as students are well prepared, the additional supervisors will not bother them. Moreover the TPI supervisors and the policeman are not allowed to enter any of the test-rooms. Using recordings for the listening test ensures the uniformity in what is presented to all test-takers, provided they are played in rooms with good acoustic qualities (Hughes, 2003). However, the UN listening test is administered in common classrooms and therefore caused sound quality problems. Different schools applied different methods in playing the recordings raising reliability issues. In some schools, a tape recorder was used to play the recordings through a loudspeaker so that the sound could be heard in all rooms. Consequently, the farther the test-rooms, the poorer the sound quality be. In other schools, each test-room had a tape recorder to play the recordings. However, the voice of the recordings played in EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA © room could be heard in the others because the rooms were close to one another. Both methods brought problems that could distract students’ concentration in listening. The second reliability issue is related to scoring. The English test is a multiple-choice test format and the marking is done using a computer. The test type and method of marking make it a highly reliable test because there is no concern about inter or intra- rater reliability. Brown (2004) and Luoma (2004) refer to inter-rater reliability as the extent to which two or more scorers consistently or inconsistently score the same test and the intra-rater reliability is to what extent an individual scorer consistently scores a group of students taking the same test. There are some factors that might cause inconsistencies among raters such as lack of attention to scoring criteria, fatigue and bias towards different students. Harrison (1983) argues objective marking is possible with a multiple choice format because the test has only one correct answer. It appears that the scoring procedure applied in this test, while objective, is highly reliable too (O'Malley & Pierce, 1996) as no ‘human error, subjectivity, and bias may enter into the scoring process’ (Brown, 2004, p. 21) since the answer sheet is marked using a computer. However, since the answer sheet is marked using a computer, another issue might be raised as to what extent the computer can read or identify students’ answer. Can the computer recognise the answer if students do not blacken the circle of the correct answer in a proper way? Sometimes, incidentally, students do not blacken the circle black enough to be able to be recognised by the computer program and so the answer is counted «rong though it is really the correct answer. Yet, there is no toleration for this sort of issue because the government has ordered schools to give enough time for students to practice to get used to answering questions using a computer answer sheet before taking the UN test. The 2008 UN English test evaluated, however, has an issue of an inconsistency of the number of answer choices. Based on the guidelines to write a multiple choice test that is published by educational research and development agency (National Department of Education, 2007b, p. 13-14), the number of answer choices for senior high schools’ test must be five. However, looking at the sample question and sample answer choices in the section for questions 6-10, it is very obvious that questions 6-10 only have four answer choices each. Below is the sample question and sample answer written on the test script page 4: en 49 journal of Language Literature Culture an Education GLOT Vol 5 No. 1 January 2011 w listen to a sample question. You will hear: Woman: Good morning. John. How are you?, Man You will hear: A. lam fine, thank you B. [am in the living room C. Let me go now. D. See you tomorrow You are supposed to choose the correct answer in the computer answer sheet. The guidelines also require that the answer choices of each question are to be arranged in good order and be of the same length. Yet, there are some questions that have very obvious different word lengths in the answer choices. They are questions number 17, 20, 32, 34, 37, 41, 43, 47, and 48. For instance question no. 41 (National Department of Education, 2008, p. 14), the answer choices A to D have 7 to 8 words each but the answer choice E consists of 12 words. Question no. 41 What does the text mainly discuss? The role of television in communicating news | The positive and negative impacts of television The best inventions of the twentieth century Television is the invention of the twentieth century An argument whether television is the best invention of the twentieth century moOOSP The test-related reliability aspect of the UN can be observed from its uniformity. Since Indonesia is a large country that consists of 33 provinces, the government provides different sets of tests. The tests have similar levels of difficulties because they are written based on the same rules. The tests are then differentiated with unique codes. 40 LUNIVERSITAS PELITA HARAPAN EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA Authenticity Bachman & Palmer (1996) define authenticity as ‘the degree of correspondence of the characteristics of a given language test task to the features of a TLU task’ (p. 23). TLU stands for target language use. This way, test-takers can evaluate the test to what extent it tests their performance for what they use the language for and so they can also evaluate their performance. O' Malley & Pierce (1996) argue authentic tests should ‘reflect student learning, achievement, motivation, and attitudes on instructionally- relevant classroom activities’ (p. 4). Contrasting the definition and the factors that make a test authentic to the UN English test, it seems that the test is less authentic because it does not completely reflect the communicative use of language for students. In real life, students will use English for various purposes using both productive and receptive skills. They might use English for speaking with teachers and other students, writing e-mails or letters, listening to the radio or television, and reading books. However, the UN English test does not test students’ speaking and writing skills but only receptive skills. There is possibility that they are not tested because it is costly to administer these tests. To test the productive skills requires a lot of money and time to conduct and mark the tests, because hundreds of interlocutors as well as assessors, and writing test examiners would need to be hired to test them. It is as McNamara (2000) claims that a test becomes more authentic as it becomes more expensive and complex. The listening test comprises of 15 questions and can be grouped as an ‘unscripted speech’ (Harrison, 1983, p. 67) test because the monologue or dialogue is not written on the test paper. Unfortunately, neither the transcripts nor the recordings are available to be evaluated. The questions are also not provided; instead only the answer choices for 10 numbers are given while the other five have no questions or answer choices but instructions saying ‘mark your answer on your answer sheet’ (National Department of Education, 2008, p. 4). Therefore, it is difficult to definitely know the topic of each question to enable evaluation towards its authenticity. ee a innal of Language Literature Culture an Education |LOT Vol 5 No. 1 January 2011 example: Table 2. Examples of incomplete test items Question no | Answer.« instruction ‘Author’s comments: 1 A. She cut herself quite badly Question is not provided | B, She wasn’t given any help She cried while slicing onions She sliced the onions hurriedly was going to make fried rice but only answer choices wer on your answer sheet. Question and answer choices are not provided but only instruction For the reading test, there are 11 passages with two, three, or four questions following them. The topics are varied from common themes such as debates about the ban for advertisements on TV programs to quite uncommon themes, such as solar energy. Yet, solar energy may be a common theme for students whose major is science. The theme of each reading text is shown in Table 3 It appears that all reading texts are bas below: ed on general themes around daily life except for particular passages about solar energy and the process of making paper. In addition, the passages meet the criteri level for the test-takers (Harrison, 1983; ia for general interest and suitable language ). Therefore, the reading test has evidence of authenticity since it offers a range of topics students might find in their daily lives. Table 3. The themes of the reading texts Seer ee For ‘Common text about students’ Donald’s new hat 16-19 social interaction with friends _ _| ‘Common text about historical ‘Asuncio"n 20-23 places ‘Common notice from the ‘Announcement 2425 | authority or government | Not common themes for students | No title (about solar 26-28 in general but may common for _| energy) science students No title (about the process | 35-37 of making paper) 42 EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA For Theme Reading text title question number Common public opinions that can | Should Ads be banned from | 29-32 usually be found on newspaper —_| TV program? | No title (argument about 41-44 television as the best invention of the twentieth | century) | Common job advertisement No title (about job vacancy | 33-34 : as an Engineering Manager) Common text about people’s No title (about Aya’s 38-40 biography . ___| illness) Common text about people’s No title (about a Sunday 45-47 story or experiences travel experiences) Common news about celebrity Diana’s sons pay homage to | 48-50 L her at a concert D. Practicality “Practicality is a matter of the extent to which the demands of the particular test specifications can be met within the limits of existing resources’ (Bachman & Palmer, 1996) or how cost effective the test is (Weir, 1993). The resources Bachman & Palmer refer to are including the human resources that write and rate the test, the administrator and clerical support, material resources needed in the process of writing the test and time from planning to reporting after the scoring and analysing of the test results. Therefore, a practical test is not excessively expensive, stays within appropriate time constraints, is relatively easy to administer, and has a scoring/evaluation procedure that is specific and time efficient’ (Brown, 2004, p. 19). There is hardly any information about the budget for the 2008 national examination. But generally speaking, conducting UN is very costly and every year the budget increases. To conduct UN, the government will spend billions of rupiahs every year. The fund is used for producing, multiplying, conducting, and reporting results of the UN test. Two supervisors in one test-room are quite costly because they only supervise 20 students. Moreover, the policy for more supervision from an Independent Monitoring Team (TPI) and policemen makes the UN budget swell. & UNIVERSITASPELITA HARAPAY 83 mal of Language Literature Culture an Education OT Vol 5 No. 1 January 2011 ne of the impractical practices in administering the listening test is related to tape recorder problems. Some schools have good facilities and support while some have not. Even there are some schools that have no tape recorders. Yet, providing the tape recorders to conduct the listening test becomes the schools’ responsibility. The process of delivering test scripts and marking the answer sheet is quite complicated, time consuming and costly. It is complicated and time consuming because the test scripts are sent from provinces to schools in cities and villages on the test day or the day before. Therefore, delivery might be done in the evenings and the independent monitoring team that supervises the delivery must work during the evenings. Some schools are situated in rural areas and transportation is rarely available, particularly at night time. For these two reasons, extra working hours in the evening and transportation problems, the government must spend a lot more money. —. Washback Washback refers to the influence of testing on teaching and learning (Alderson & Wall, 1993, cited in Cheng, L. et al., 2004; Hughes, 1989, cited in Bachman & Palmer, 1996) Bachman & Palmer (1996) explain how teachers can turn to teaching to the test in the classroom in order to prepare the students to face the test. Weir (1993) argues testing can result in positive washback if the test has a beneficial influence on the teaching that precedes it. Therefore the test should represent the syllabus or the target situation of the testing. However, the UN English test does not reflect the syllabus very well because it is not a communicative test. There is no obvious record that ‘teachers and learners have a positive attitude toward the examination or test, and work willingly and collaboratively towards its objectives’ (Cheng & Curtis, 2004, p. 10) It is very common in Indonesia that schools will make extra classes in the afternoon after school hours to prepare students for the test. Most English teachers will focus on teaching listening and reading in order to prepare students to face the test and because they do not want their students to fail. Apart from giving teachers a heavier load, this will make them neglect the principles of communicative competence based curriculum because they will ignore the importance of speaking and writing skills that students should have. “ ceovarnramanars UG EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA chman & Palmer (1996) argue that testing can affect test-takers in three aspects including the experience of preparing for the test, the feedback they receive about their performance on the test, and the decisions that may be made about them on the basis of their test scores. Regarding the UN test, is it fair to make a judgement on students’ failure or success to graduate solely on the basis of the UN test? Is it fair to ignore other marks students gain, probably good to excellent, during their studies that are not tested in the UN? This decision might put some students under great pressure. Students might learn only what will be tested on the UN and pay less attention to their other subjects, and for English test preparation, they might only focus on learning listening and reading skills. Some students cannot bear the burden of failure because it means not only shame but also failure to continue to university. That's probably the biggest reason why there were some incidents where some students committed suicide after knowing they did not pass the UN. Recently, a student failed to pass the 2010 UN, tried to commit suicide by drinking insect poison, but luckily her life was saved (Abdurrahman, 2010). Schools have been too busy preparing students for the test but ignoring the essential factors that students should mentally or psychically be prepared. So far, there is no significant change in the educational system after the implementation of the 2008 UN. The government still maintains the policy about the communicative competence based curriculum until currently. But unfortunately, the contents of the UN English test in 2009 and 2010 are still the same which consists of listening and reading tests only Conclusion Generally, the 2008 UN English test was written following the principles of writing good questions because all questions have clear instructions and plausible answer choices (Harrison, 1983). However, based on the principles of what makes a good test, it appears that the test was written basically on the basis of reliability. Based on the test evaluation discussed earlier, this factor seems to be the priority because the reliability can highly be identified in the test, while the authenticity and practicality of the test are relatively good in some points. Though there was only negative washback after implementing the test, it is quite obvious that the validity of the test is the weakest factor. uurnal of Language Literature Culture an Education YGLOT Vol 5 No. 1 January 2011 Hughes (2003, p. 136) argues ‘the challenge for the language tester is to set tasks which will not only cause the candidate to exercise reading (or listening) skills, but will also result in behaviour that will demonstrate the successful use of those skills’. As an achievement test, the 2008 UN English test would have been better if it comprised of both receptive and productive skills so that it measures students’ understanding in using the language and their ability to actually produce the language for congruency to the curriculum and test's objectives. References: Abdurrahman, M. N. “Not passed UN, Riska gulp insect poison”. Detik News. (April 28). 2010, Alam, S.. “National examination, a poor mark of learning”. The Jakarta Post., Mei 04, 2008 Bachman, Lyle F. & Palmer, Adrian S. Language Testing in Practice. Oxford: Oxford University Press,. . 1996. Brown, H. D. Language Assessment Principles And Classroom Practices. White Plains, New York: Longman, 2004. Cheng, L., Watanabe, Y., & Curtis, A. Washback In Language Testing. Mahwah, New Jersey: Lawrence Erlbaum Associates, Inc. 2004. Harrison, Andrew. A Language Testing Handbook. London: the Macmillan Press Limited. 1983 Hughes, Arthur. Testing For Language Teachers, 2nd ed. Cambridge: Cambridge University Press, 2003. Luoma, S. Assessing Speaking. Cambridge: Cambridge University Press, 2004.. McNamara, Tim. Language Testing. Oxford: Oxford University Press, 2000. McNamara, Tim & Roever, Carsten. Language Testing: The Social Dimension. USA: Blackwell Publishing, 2006.. 46 UNIVERSITAS PLATA HAKAN rd EVALUATING THE ENGLISH TEST OF NATIONAL EXAMINATION FOR SMA onal Department of Education.. Kurikulum Bahasa Inggris 2004 Sekolah Menengah ‘Atas Dan Madrasah Aliyah. (The 2004 English curriculum of Senior High Schools and Madrasah Aliyah). Jakarta: Departemen Pendidikan Nasional (Department of National Education). 2003. National Department of Education. Peraturan Menteri Pendidikan Nasional Republik Indonesia Nomor 34 Tahun 2007 Tentang Ujian Nasional Tahun Pelajaran 2007/2008. (The Decree of the National Minister of Education number 34/2007 on the national examination, academic year 2007/2008). Jakarta: Departemen Pendidikan Nasional (National Department of Education), 2007a. National Department of Education. Panduan Penulisan Soal Pilihan Ganda (Guide writing multiple choice questions). Jakarta: Pusat penilaian pendidikan- BALITBANGDEPDIKNAS (Education assessment center-Research and Development Agency-Ministry of National Education), 2007b. National Department of Education. Prosedur Operasional Standar (Standard Operating Procedures). Jakarta: Badan Standar Nasional Pendidikan (National Bureau of Educational Standards), 2007c. National Department of Education. Ujian Nasional SMA Tahun Pelajaran 2007/2008 Bahasa_Inggris. (National Examination for Senior High Schools English academic year 2007/2008). Jakarta: Departemen Pendidikan Nasional (National Department of Education)., 2008. O'Malley, J.M. & Pierce, L.V. Authentic Assessment For Foreign Language Learners: Practical Approaches For Teachers. United States of America: Addison- Wesley, 1996. Rahmatullah, A. Affirm the UN SC decision Many Problems. Detik News. November 26, 2009. Weir, C. J. Understanding And Developing Language Tests. Hertfordshire: Prentice Hall Europe, 1993. QY) oemsnnonins o

Anda mungkin juga menyukai