INTRODUCTION A. Background Evaluation of learning is an inseparable part of the learning organization as a whole.

In one study, the evaluation was held to achieve the learning objectives have been formulated on the basis of in-depth study of the learning needs. The purpose of learning attempted homage through a series of learning activities designed for mature and carefully in earnest so that learning goals can be achieved. Position evaluation in design of organizing learning as the final part of series of three principal components, namely the implementation of learning, learning objectives, learning activities and the evaluation of the results of the learning activities. In fact in performing activities of teaching and learning (KBM) including the penilainnya system, many educators are still having trouble to compile a test and develop the grain problem is valid and reliabel. Therefore, the Government made various guides implementation of KBM is one of them is a matter of drafting guidelines. B. Purpose This paper was created to fulfill the duties of subjects in math learning assessment service. to add to knowledge, especially to a teacher in drawing up the test in learning. to add to the knowledge of teachers and related institutions in the preparation of a matter. Add insights for teachers to do penskoran, as well as convert scores into grades. C. Benefits

DISCUSSION OF THE A. Examine the matter in the evaluation of learning. In General, the process of developing presentation and utilization evaluation learning can be described in the steps below; 1. Define the purpose of the evaluation In conducting a teacher has a specific purpose, that purpose can be for example the evaluation purposes to determine mastery of learners in competency / sub specific competence after participating in the learning process. Can also be an evaluation which aims to determine the learning difficulties of students (diagnostic test). Evaluation purposes should be clear so as to provide direction and scope for development of further evaluation. Formulate implementation evaluation purposes. Formulation of objective evaluation of learning outcomes is important, because without a clear purpose is the evaluation of learning outcomes will be running without direction and in turn can result in a loss of meaning and evaluation functions. 2. Preparation of lattice problems The lattice problem is also known by the name test blueprint or table of specifications ". In essence, this grid is required before someone devised a test grating is a description of the scope and content of what is tested, as well as giving details of the issues required in the evaluation. Writing about is one important step to be able to generate good test gauges. Writing is writing about the type and level of behavioral indicators to be measured into the questions whose characteristics correspond to the details in the lattice. Thus, every question or problem items need to be made such that it is also clear answer to what is required. About the quality of each item will determine the overall test quality. Now we know the scope of the lessons to be assessed and evaluation, drafted chart details, a chart is a guideline in the preparation of the next evaluation tool. In the preparation of detailed chart should note the following. 1) The subject of study areas / sub-field of study / subjects to be assessed.

2) The level of mastery-level aspects to be measured cognitive (memory / recall, understanding / comprehension, application / application), affective and psychomotor. 3) The number of items that will be developed along with evaluation tools that will be met. 4) The number of items every aspect of each subject. The amount is determined by the teacher based on the level of mastery / aspects to be assessed. The amount of time it takes to do the test. 3. Reviewed the matter and analysis about This step is important to note, because they often lack contained on a matter not visible to the author about. Review and revision of this problem should ideally be done by other competent person (not a writer about) and consists of a team of reviewers consisting of experts in the field of study, measurements and language. After finishing up a matter, then must examine the questions that have been made. By reviewing a matter, means have analyzed these questions in a qualitative way. The analysis performed to determine the function about whether or not a problem. The analysis is generally done through two ways, namely qualitative analysis (qualitative control) and quantitative analysis (quantitative control). Qualitative analysis are often named as the validity of the logical (logical validity) conducted before a matter is used. Useful to see whether or not a matter of function. Problem in quantitative analysis are often named as the validity of the empirical (empirical validity) is performed to see whether or not a matter of more work after about it tested on a representative sample. One purpose of the analysis is done to improve the quality of the question, namely whether it is a matter of (1) is acceptable because it has been supported by adequate statistical data, (2) fixed, because it proved there are some weaknesses, or even (3) not used at all because empirically proven to not work at all. Qualitative Analysis. That is a review intended to analyze the problem in terms of technical, content and editorial. Technical analysis is intended as a review of a matter based on the principles of measurement and the format of writing about.

Analysis of the contents are intended as a special study related to the feasibility of the knowledge in question. Analysis of the editorial was intended as a review with particular regard to the overall format and regularity of the editorial matter about which one to another. Other qualitative analysis can also be categorized in terms of material, construction, and language. Analysis of the material is intended as a review of science relating to the substance in question in the matter and the level of capability in accordance with the matter. Analysis of construction is intended as a review of techniques commonly associated with writing about. Analysis of language is intended as a review of a matter relating to the use of Indonesian is good and right according to EYD. Quantitative Analysis. Used to determine the extent to which the matter can distinguish between high ability test that participants in terms defined by the criteria to test the ability of participants is low (through statistical analysis). The analysis emphasizes the problems in quantitative analysis of the internal characteristics of the test through the data obtained empirically. Internal characteristics are intended to include quantitative parameters about the level of difficulty, distinguishing features, and reliability. Special multiple-choice questions, two additional parameters that can be seen from the opportunity to guess or answer the question whether or not functioning properly and answer choices, namely the spread of all the alternative answers from subjects who were tested. Level of difficulty. There are several reasons for the claim about the difficulty level. It could be about the difficulty level is determined by the depth of matter, complexity, or other matters relating to the ability measured by the question. However, when we examine more deeply about the level of difficulty, it will be difficult to determine why a problem is more difficult compared with other matter. In general, according to classical theory, the level of difficulty can be expressed in several ways including (1) the proportion of correct answers, (2) difficulty scale linear, (3) Davis index, and (4) bivariate scale. The proportion of correct answers (p), is the number of test takers who answer questions correctly in point was

analyzed in comparison with the number of test participants was entirely the difficulty level of the most commonly used. In essence, quality or absence of test items first of all learning outcomes can be known from the degree of difficulty or level of difficulty which is owned by each item of the item. Grains of the test item learning outcomes can be expressed as grains of good items, if the grains of the item is not too difficult nor too easy in other words the degree of difficulty of the items that are moderately or fairly. Numbers that can provide clues about the level of item difficulty was known as the difficulty index (item difficulty index number), which in the evaluation of learning outcomes are generally denoted by the letter P, which is an abbreviation of the word proportion (proportion = proporsa). Difficulty Lecel Categories P Value P < 0.3 0.3 p 0.7 P > 0.7 difficult Moderate easy Categories

Power differentiator. One of the goals is a matter of quantitative analysis to determine whether or not a problem in the aspect that distinguishes the group measured in accordance with the differences that exist within the group. Indices used in distinguishing between low-capable test takers is the index distinguishing features (item discrimination). Index distinguishing matters determined from the difference in the proportion who answered from each group. This index shows the correspondence between the function about the overall test function. Thus the validity of this same problem with distinguishing features about the power of distinguishing between a highly capable candidates with low-capable test takers.

A number that indicates the extent of distinguishing features ranging from -1 to +1. The negative sign indicates that the low ability test-takers who can answer correctly while high-ability test takers who answer incorrectly. Thus the matter of distinguishing negative power index shows reversal of quality participants. Item discrimination indices are generally given the symbol with the letter D (an abbreviation of the discriminatory power). Item Discrimination Index (D) Classification Interpretation Item distinguishing the item in question is

< 0,20


very weak (bad), considered to have no distinguishing features of good

0,20 0,40 0,40 0,70 0,70 1,00

Are negative (-)


Item in question already has sufficient distinguishing features (medium) Item in question has had a good distinguishing Item in question has had an excellent distinguishing Item in question once the distinguishing negative (very ugly)



Distractor function. When talking about the form of multiple choice objective test item for each item in the test results released to learn has been equipped with several possible answers, sometimes referred to as options or alternatives. Options or alternatives that number ranged between 3 to 5 pieces, and of the possibilities of answers are attached to each item that item, one of which is the correct answer (answer key), while the rest is the wrong answer. That answers one commonly known as the distractor (detractors). Analyzing the function distractor often known by other terms, namely: analyzing patterns of distribution of answers to items. As is the pattern of spread of item response is a pattern that can describe how the testee answer choices on the possibilities of responsibility that has been attached to each item. A possibility exists, namely that of the whole grain alternative that is placed on a particular item, in no way selected by the testee. In other words, testee declared "blank". Blank statement is often known as usual omiet and given the symbol with the letter O.

Distraktor otherwise been able to function well if distraktor it at least has been selected by 5% of all test takers. Following the results of analyzing the function of such so distraktor distraktor already be able to function properly can be used again on the tests to come, while distraktor which can not function properly should be repaired or replaced with another distraktor.

4. Trial (Try out) The test problem in principle is an attempt to obtain empirical information about the extent to which a problem can measure what is to be measured. The empirical information in general regarding all matters that may affect the validity of such questions about the difficulty level, the answer, the matter of distinguishing features, the influence of culture, language used, and so on.

5. Problem Formulation

Scores obtained in order to be believed, required a lot of items about. Therefore. In presenting the items need to be prepared even become a matter of measuring instruments are integrated. Things that may affect the validity of the test as a matter of sequence numbers, the grouping of the forms of matter, if in a test device contained more than one form of matter, good "lay out" problems and so must be addressed in the questions into a test.
There are several things to note:

1) Laying the matter with other matter, not to make the students guess the answer. 2) Commands written in detail about the workmanship, clear, complete and not complicate

3) Lay out the problems that overwhelmed the font, spacing, paper size, and the like should be
tailored to the age of the student. 6. Scorsing Scorsing or inspection of answer sheets and a number of steps to obtain quantitative information from each of the learners. In principle, scorsing should strive to be carried out objectively. That is, if scorsing done by two people or more, the same level of competence, will produce a SCOR or the same number. Or if the same person repeating the process of SCOR, SCOR will be generated the same.

B. Converting scores into value

a. Difference scores and values Sometimes people think that the score that has the same sense with the value, even though such an assumption is not necessarily true experts here are some explanations about the differences in scores and grades. According Suharsimi (2005:235) that the score is the result of work provide a score that is obtained by summing the numbers for each of the test questions answered completely by students. While the change in value is the number of scores by using a specific reference, the reference norm attau standard reference. According to Anas Sudijono (2007:309) that the score is the result of work gives the numbers obtained by summing the numbers for each item obtained testee been answered correctly, taking into account the true answer. While value is a number (can also letters), which is the result of alteration of the score. From the above it can be concluded that the activity scoring (scoring) and the assessment is a series of activities that can not be separated. Scoring is an activity to collect data through the test and non-test in order to obtain the raw score (raw store) to then be processed or converted (changed). b. Change (Process) Raw Score Test Results Based on Learning with Norma (Norm referenced Evaluation) Assessment of Reference Norms (Norm Reperenced evalution) According Saharsimi Arikunto (2005: 238) is defined as giving value to a student's learning achievement of students with less than others in his group. The quality of a greatly be affected by the quality group. A student who, when plunging into group A including the "great", maybe if the move to another group occupied only the quality of "being" only. Its size is relative. There are two important things that need to be understood first in the score to the refineries;

In changing the raw scores (law store) into a value there are two ways that you can take the assessment in the reference benchmark (PAP) and assessment of reference norms (PAN).

In transform raw scores into value may use a variety of scales, the scale of five (stanfive), the scale of nine (stanine), the scale of eleven (stanel), the scale value of Z (Z-scores), and the scale value of T (T-scores) . Several assessment scales mentioned above applies equally to the PAP and PAN, but the PAN as the mean value is the average actual and standard deviation is the actual standard deviation while the PAN applicable average standard deviation of the ideal and the ideal.

c. Techniques to convert the score to the value 1. Change the score to a value using a standard scale of five Change the score to the standard scale or value of five letters using the benchmark as follows:

A Mean + 1,5 SD B Mean + 0,5 SD C Mean 0,5 SD D Mean 1,5 SD E 2. Change

the score to the value of using a standard scale of nine If the raw scores of test results that will be converted to a standard value scale of nine, the benchmark that is used is as follows:

9 M + 1,75SD 8 M + 1,25 SD 7 M + 0,75 SD 6 M + 0,25 SD 5 M 0,25 SD 4 M - 0,75 SD 3 M 1,25 SD 2 M 1,75 SD 1 1. Mengubah skor mentah menjadi nilai standar berskala sebelas(satnel) Nilai standar berkala sebelas adalah rentang nilai sandar mulai dari 1 sampai 10. nilai standar berkala sebelas ini umumnya digunakan dalam lembaga pendiikan tingkat dasar dan menengah. Perubahan skor menjadi nilai menggunakan patokan berikut:

9 M + 1,75SD 8 M + 1,25 SD 7 M + 0,75 SD 6 M + 0,25 SD 5 M 0,25 SD 4 M - 0,75 SD 3 M 1,25 SD 2 M 1,75 SD 1 2. Mengubah skor skala nilai 101 menjadi skor standar Dengan menggunakan skor skala 1-10 maka bilangan bulat yang ada masih menunjukkan penilaian yang agak kasar. Untuk itu dapat digunakan skala 1-100. Dengan skala ini dimungkinkan melakukan penilaian yang lebih halus karena terdapat 100 bilangan bulat. Nilai 5,5 dan 6,4 dengan skala sebelas yang biasanya dibulatkan menjadi 6, dengan skala 1-100 ini dapat dituliskan 55 dan 64. 3. Konversi skor menjadi nilai Z Nilai standar atau Z skor umumnya digunakan untuk mengubah skor-skor mentah yang diperoleh dari berbagai jenis pengukuran yang berbeda-beda. Misalnya dalam tes seleksi penerimaan mahasiswa baru diisyaratkan lima jenis tes: bahasa inggris (X1), IQ (X2), tes kepribadian (X3), tes sikap (X4), dan tes kesehatan jasmani (X5). Skor yang diperoleh para testee adalah sebagai berikut: Testee Skor Mentah Bahasa Inggris IQ Kepribadian Sikap Kesehatan jasmani A 72 114 48 172 221 B 65 105 51 163 205 C 76 115 44 169 224 D 64 107 42 179 198 E 71 101 55 181 207 F 73 120 56 175 219 G 75 125 57 183 225

H 68 109 49 168 216 I 70 103 51 167 224 J 66 111 47 153 211 Dalam z skor testee yang dipandang memiliki kemampuan yang lebih tinggi adalah testee yang z skornya bertanda positif (+), dan yang lemah bertanda negatif (-). Rumus umumnya adalah: Z = X/SDx, dimana: Z = z skor, x = deviasi skor X, SD = standar deviasi dari skor x Dalam rangka menkonversi z skor menjadi nilai standar z, langkah-langkah yang mesti di lakukan adalah sebagai berikut: Menjumlahkan skor variabel XI sampai dengan X5 (X1, X2....dst) Mencari skor rata-rata hitung (mean) dari masing-masing varibel dengan rumus M X1= (X1)/N (satu persatu untuk masing-masing variabel) Mencari deviasi X1, X2, dst. Dengan rumus: X1 =X1- M x1, dst. Menguadratkan deviasi X 1 sampai X5 kemudian di jumlahkan sehingga diperoleh X1, X2,...dst Mencari deviasi standar untuk kelima variabel tersebut dengan menggunakan rumus berikut: SD x1= ((X(1@2))/N) Lalu, menghitung z skor sesuai dengan rumus yang telah tertera di atas. Z skor yang di peroleh oleh masing-masing testee di jumlahkan, maka kemudian akan dikertahui testee yang memeilih z skor yang fositif dan yang negatif. Berikut penerapannya,. Dari data sebelumnya maka dapat diuraikan sebagai berikut: Langkah I, II, dan III. Testee Skor Mentah Deviasi (x) X1 X2 X3 X4 X5 X1 X2 X3 X4 X5 A 72 114 48 172 221 +2 +3 -2 +1 -4 B 65 105 51 163 205 -5 -6 +1 -8 -10 C 76 115 44 169 224 +6 +4 -8 +8 -17 D 64 107 42 179 198 -6 +4 -6 -2 +9 E 71 101 55 181 207 +1 -10 +5 +10 -8 F 73 120 56 175 219 +3 +9 +6 +4 +4 G 75 125 57 183 225 +5 +14 +7 +12 +10 H 68 109 49 168 216 -2 -2 -1 -3 +1 I 70 103 51 167 224 0 -8 +3 -4 +9 J 66 111 47 153 211 -4 0 -3 -18 +6 N=10 700 1110 500 1710 2150 0 0 0 0 0 MX1 = 70, MX2 =111, MX3, = 50, MX4 = 171, MX5 = 215 Langkah IV,V,VI,dan VII

Testee Kuadrat Deviasi (x2) Z Skor Total z skor X1 X2 X3 X4 X5 Z1 Z2 Z3 Z4 Z5 A 4 9 4 1 16 +0,51 +0,41 -0,42 +0,12 -0,45 +0,17 B 25 36 1 64 100 -1,27 -0,83 +0,21 -0,93 -1,13 -3,95 C 36 16 36 4 81 +1,52 +0,05- -1,26 -0,3 +1,02 +1,60 D 36 16 64 64 289 -1,52 -0,55 -1,68 +0,93 +1,92 -4,74 E 1 100 25 100 64 +0,25 -1,38 +1,05 +1,16 -0,90 +0,18 F 9 81 36 16 16 +0,76 +1,25 +1,26 +0,46 +0,45 +4,18 G 25 196 49 144 100 +1,27 +1,94 +1,47 +1,39 +1,13 +7,20 H 4 4 1 9 1 -0,51 -0,28 -O,21 -0,35 +0,11 -1,24 I 0 64 1 16 81 0 -1,11 +O,21 -0,46 +1,02 -0,34 J 16 0 0 324 36 -1,01 0 -0,63 -2,09 +0,67 -3,06 N=10 156 522 226 742 784 0 0 0 0 0 SDX 1= 3,95, SDX2 = 7,22, SDX3 = 4,75, SDX 4=8,16, SDX5 = 8,85 Semua yang bertanda positif menunjukkan nilai z skor tinggi dan negatif rendah. Dalam hal ini tergantung berapa kuota yang ingin diterima oleh perguruan tinggi tersebut. 4. Konversi skor menjadi nilai Tata cara standar Konversi skor menjadi T standar adalah angka standar yang menggunakan mean sebesar 50 dan deviasi standar sebesar 10. Antara skala nilai 101 dan T skor nampaknya sama. Dalam Depdiknas (2004: 21) dijelaskan bahwa rumus yang digunakan untuk menentukan nilai dengan skala 101 adalah: T = 50 + (X-M)/SX 10 T skor dicari atau dihitung dengan maksud untuk meniadakan tanda minus yang terdapat di depan nilai standar z sehingga lebih mudah dipahami. - Penentuan Nilai Akhir Nilai akhir merupakan hasil akumulasi berbagai nilai atau skor yang diperoleh siswa (Masidjo, 1996: 179). Dari pendapat ini maka dapat dikatakan bahwa nilai akhir dapat dihitung dari skor yang diperoleh dari masing-masing hasil pengukuran dan dapat juga diperoleh dari skor yang sudah dikonversi menjadi nilai. Kelebihan yang akan diperoleh apabila menggunakan skor dalam penentuan nilai akhir adalah adanya kemurnian karena belum terpengaruh oleh suatu acuan penilaian. Selain itu juga tidak membutuhkan kerja yang terlalu banyak namun caranya sama. Salah satu cara yang dapat ditempuh dalam penentuan nilai akhir adalah dengan menjumlahkan nilai tugas (T), nilai ulangan harian dan nilai ulangan umum, yang masing-masing diberi bobot 2, 3, dan 5 lalu dibagi 10 (jumlah bobot). Rumusannya:

NA = (2 (T)+ 3 (H)+ 5 (U))/10 Misalnya, Banu (siswa) memperoleh nilai sebagai berikut: Nilai tugas I : 10 Nilai tugas II : 8 Ulangan harian I : 6 Ulangan harian II : 8 Nilai ujian Mid : 7 Nilai ujian akhir : 6 Maka, nilai rata-rata tugas 9, rata-rata ulangan harian 7, rata-rata ujian Mid dan ujian akhir 6. NA = ((2X9)+(3X7)+(5X6))/10 = 69/10 = 6,9 (dibulatkan 7) - Menentukan Kedudukan Siswa Dalam Kelompok Kedudukan siswa dalam kelompok adalah letak urutan siswa dalam tingkatan, dalam istilah yang umum disebut rangking. Ada beberapa macam cara untuk menentukan kedudukan siswa dalam kelompok, yaitu: Dengan rangking sederhana; Dengan rangking persentase; Dengan standar deviasi; Dengan Z skor dan T skor. Berikut akan dijelaskan beberapa saja: Dengan rangking sederhana Merupakan urutan yang merupakan letak seseorang dalam kelompoknya dan dinyatakan dalam bentuk angka biasa. Siswa akan dirangking berdasarkan tingginya skor yang diperoleh. Apabila terdapat skor sama, maka mempunyai rangking yang sama. Dengan rangking persentase Merupakan kedudukan siswa dalam kelompok yang didasarkan pada persentase skor yang ada di bawahnya. Caranya: Menentukan dahulu simpel rangkanya; Mencari banyaknya siswa dalam kelompok tersebut yang ada di bawahnya; Mengalikan dengan 100 setelah dibagikan dengan kelompok. Misal, PR untuk Ani adalah 12/20 X 100 = 60. Artinya, Ani mengalahkan letaknya dalam kelompok mengalahkan 60 % untuk prestasi yang bersangkutan.

Standar deviasi Yang dimaksud dengan penetuan kedudukan siswa dengan standar deviasi adalah penentuan kedudukan dengan membagi kelas atas kelompok-kelompok. Cara ini dapat dilakukan dengan dua cara, yakni dengan 3 rangking dan 11 rangking.

A. Kesimpulan Analisis soal dilakukan untuk mengetahui berfungsi tidaknya sebuah soal. Analisis pada umumnya dilakukan melalui dua cara, yaitu analisis kualitatif (qualitative control) dan analisis kuantitatif (quantitative control). Analisis kualitatif sering pula dinamakan sebagai validitas logis (logical validity) yang dilakukan sebelum soal digunakan. Gunanya untuk melihat berfungsi tidaknya sebuah soal. Analisis soal secara kuantitatif sering pula dinamakan sebagai validitas empiris (empirical validity) yang dilakukan untuk melihat lebih berfungsi tidaknya sebuah soal setelah soal itu diujicobakan kepada sampel yang representatif. Salah satu tujuan dilakukannya analisis adalah untuk meningkatkan kualitas soal, yaitu apakah suatu soal (1) dapat diterima karena telah didukung oleh data statistic yang memadai, (2) diperbaiki, karena terbukti terdapat beberapa kelemahan, atau bahkan (3) tidak digunakan sama sekali karena terbukti secara empiris tidak berfungsi sama sekali. Analisis Kualitatif. Yaitu berupa penelaahan yang dimaksudkan untuk menganalisis soal ditinjau dari segi teknis, isi, dan editorial. Analisis secara teknis dimaksudkan sebagai penelaahan soal berdasarkan prinsip-prinsip pengukuran dan format penulisan soal. Analisis secara isi dimaksudkan sebagai penelaahan khusus yang berkaitan dengan kelayakan pengetahuan yang ditanyakan. Analisis Kuantitatif. Digunakan untuk mengetahui sejauh mana soal dapat membedakan antara peserta tes yang kemampuannya tinggi dalam hal yang didefinisikan oleh kriteria dengan peserta tes yang kemampuannya rendah (melalui analisis statistik).

Penilaian acuan norma merupakan penilaian yang didasarkan pada keadaan kelompok. Prestasi siswa dapat dipengaruhi oleh keadaan kelompoknya yang dinyatakan dalam bentuk distribusi normal. Terdapat perbedaan antara pengertian penskoran dan penilaian. Skor adalah hasil perhitungan terhadap jawaban yang benar/ benar-benar dilakukan. Sedangkan nilai merupakan hasil konversi dari skor. Konversi skor menjadi nilai dapat dilakukan dengan menggunakan panduan skala nilai, yakni skala nilai 5, skala nilai 9, skala nilai 11, skala nilai 101, skala nilai Z dan T. B. Saran

