Rangkuman Psikometri Psiko

Review UAS Psikometri
Item analysis
Pengukuran gejala psikologi

Tingkah laku - Psychological attribute/trait/construct - Definisi operasional/indikator - Sample
tingkah laku dalam bentuk item - Alat ukur/test
Test development process

Test conceptualization - Test construction - Test tryout - Item analysis - Test revision
Tujuan umum
Untuk menentukan apakah setiap item tes merupakan item yang baik (ada kriteria tertentu
untuk mengatakan item baik)
Tujuan khusus
- Meningkatkan reliabilitas dan validitas tes

- Membuat urutan kesulitan yang lebih baik (gampang di awal, sulit di akhir)
- Meningkatkan fungsi item yang membedakan setiap orang
- Memperoleh informasi tentang kualitas distractor untuk item berbentuk ‘multiple choice’
(bagaimana kualitas dari pilihan yg ada, liat penyebarang distractor)
- Membuat distribusi skor yang lebih baik.
Analisis item
- Item dapat dianalisis secara kualitatif (melalui content dan form) dan secara kuantitatif
(melalui properti statistik)
- Analisis item dapat membuat tes menjadi lebih pendek sekaligus meningkatkan validitas
dan reliabilitas
- Walaupun “a longer test is more valid and reliable than a shorter one” namun ketika
sebuah tes diperpendek dengan menghapus item yang kurang baik melalui analisis
item, maka tes yang lebih pendek menjadi lebih valid dan reliabel
Teknik analisis item kualitatif
1. Isi (content)
- Apakah item-item yang dipilih sudah sesuai dengan dimensi-dimensi atau indikator
tingkah laku yang akan diukur (content validity)
- Apakah sudah sesuai dengan apa yang ingin diukur
2. Bentuk (form)
- Apakah item-item ditulis dengan prosedur penulisan yang efektif
- Cara penulisan, environment tes pas diadministrasikan
Teknik analisis item kuantitatif

1. Item difficulty
2. Item discrimination
3. Distractor power
Qualitative item analysis
- Is a general term for various non statistical procedures designed to explore how
individual test items work
- Tidak menggunakan angka, menggunakan judgement kualitatif : revisi hapus
- Involve exploration of the issues through verbal means such as interviews and group
discussions conducted with test taskers and other relevant parties (mis : publisher etc.)
- Table 8-3 : potential areas of exploration by means of qualitative item analysis (Cohen &
Swerdik, 2010)
Qualitative item analysis
- On a one to one basis w/ examiner, examinees asked to take a test, thinking aloud as
they respond to each item
- Or bisa disuruh kerjain dulu terus di wawancara
- Achievement : assessing not only if certain students (such as low/high scorers on
previous examinations) are misinterpreting a particular item but also why and how they
are misinterpreting the item. - apakah yg mrk persepsikan susah saat jawab : layoutnya,
bahasanya, cetakannya, ga jelas fontnya, dll.
- Personality or some aspect of it, regarding the way individuals perceive, interpret, and
respond to the items (respons sesuai ga sm how the perceive or interpret). testee bisa
diminta paraphrase the items.
- Expert panels : sejumlah orang yang ahli dalam bidang yg terkait dengan konstruk or
ahli bahasa, psychometrician, in addition to interviewing testtakers individually or in
groups, expert panel may also provide qualitative analysis of test items. sejauh mana
item sudah memiliki kualifikasi seberapa baik item yang ada.
- A sensitivity review is a study of test items, typicaly conducted during the test
development process, in which items are examined for fairness to all prospective test
takers and for the presence of offensive language, stereotypes, or situation.
- Expert siapa? biasanya seseorang yang sudah sering melakukan penelitian dalam suatu
bidang tingkah laku tertentu pada psikologi.
Item difficulty
- Digunakan pada maximum performance test yang memiliki item yang di skor dikotomi
bener/salah
- Dihitung dengan index kesulitan item (p) melalui proporsi penempuh test yang
menjawab suatu item dengan benar
p = number of persons answering the item correctly / number of test takers who answer the item
- Harus di lihat tipe testnya (time limit or speed test)

Tujuan mengukur item difficulty
- Memilih item dengan derajat kesulitan yang sesuai dengan tujuan tes
- Tujuan tes ada seleksi & mastery. seleksi : sejauh mana tingkat kesukaran item sudah
sesuai dengan selection raton, mastery : 0,8 or 80% dianggap sebagai rata2
menentukan bahwa suatu item pada test mastery dianggap baik
- Mengatur urutan item pada tes agar item difficulty berurut secara progresif, di bagian
awal tes diberikan item yang mudah kemudian semakin sulit (agar penempuh test
memiliki kepercayaan diri mengerjakan tes dan mengurangi membuang waktu pada item
yang sulit)
Indeks kesulitan item (p)
- Sifatnya ordinal (mengurutkan item berdasarkan tingkat kesulitannya)

- Kelemahan p adalah bersifat ordinal → nilai p hanya utk mengurutkan item berdasarkan
kesukaran, tapi tidak dapat diketahui perbedaan kesukaran
- Nilai p adalah 0,0 =< p =< 1,0
- Semakin besar nilai p, semakin mudah item
- Contoh:
- a. p1=0,20; p2=0,40; p3=0,60
1. Item 1 lebih sulit dibanding item 2 dan item 3
2. Tidak dapat dikatakan bahwa perbedaan kesukaran item 1 dan item 2 sama dengan
perbedaan kesukaran item 2 dan item 3
3. Perbedaan proporsi item difficulty akan dianggap sama hanya pada distribusi yang
berbentuk rectangular
Interpretasi indeks kesulitan item
1. Disesuaikan dengan tujuan tes

- Untuk screening/selection : indeks kesulitan disesuaikan dengan selection ratio (berapa
persen yg kita pengen lulus) yang diinginkan (kalau buat beasiswa *misalnya* hrs
susah)
- Untuk mastery testing : diinginkan item dengan indeks kesulitan yang mudah (p=0.8
atau 0.9)
- Untuk kebanyakan situasi, tes dengan rata2 p=0.5 dapat memberikan informasi
mengenai perbedaan individu secara maksimal.
- Perlu ditentukan apakah alat ukur harus memiliki derajat kesukaran item yang bervariasi
or not (kadang ga perlu variasi jg)
- Rekomendasi apakah suatu item akan dipertahankan or not = item sesuai tujuan gaa?
buat decision mw revisi/buang item
Kategorisasi indeks kesulitan item
- p >= 0.8 → sangat mudah

- 0.6 =< p < 0.8 → mudah
- 0.4 =< p < 0.6 → sedang
- 0.2 =< p < 0.4 → sukar
- p < 0.2 → sangat sukar
Indeks kesukaran linear (z)
- Distribusi skor item diasumsikan mengikuti distribusi normal, demikian juga dengan
tingkat kesulitan item
- Melalui konversi berdasarkan kurva normal, maka p yang berskala ordinal dapat
dikonversi menjadi z yang berskala interval. Misal suatu item memiliki p = 0.84, maka z
= -1
- Gambar kurva liat di ppt
1. Proporsi dihitung dari kanan
2. Dengan tabel kurva normal, proporsi dikonversi menjadi z
3. Semakin mudah item, semakin kecil nilai z
Item endorsement index (PoE)
- Dalam typical performance test (skala likert yang tidak ada jawaban benar-salah), maka
konsep yang serupa dengan item difficulty adalah proportion of endorsement index.
- Pada POE, yang dihitung adalah proporsi partisipan yang menjawab setuju pada suatu
item. prinsipnya relatif sama, kesetujuan dilihat sebagai yang “benar” (social desirability)
- Caranya : (a) jawaban diubah menjadi 1 dan 0 (1: setuju, 0 : tidak setuju), (b) cari
proporsi subjek yang setuju
Distractor power analysis
- Apa fungsi dari distraktor?

- Distraktor adalah pilihan jawaban yang salah.
- Subyek yang memiliki kemampuan yang tinggi akan memilih pilihan jawaban benar,
sedangkan subyek yang memiliki kemampuan rendah akan memilih pilihan jawaban
secara acak.
- Namun, individu benar bisa juga karena menebak jawaban yang benar.
- Distraktor yang baik (a) Dipilih oleh individu yang tidak punya kemampuan yang tinggi
(b) Dipilih relatif merata oleh individu-individu yang menjawab salah
- Analisis distraktor dilakukan dengan membandingkan expected distractor power dengan
actual distractor power (Friedenberg, 1995)
- Korelasi skor distraktor dengan skor total
(a) Diharapkan tidak ada hubungan antara skor distraktor dengan skor total (bahkan
hubungan negatif signifikan).
(b) Caranya: Semua individu yang memilih distraktor diberi skor 1 sedangkan memilih
kunci jawaban diberi skor 0; korelasikan skor distraktor dengan skor total (dari jumlah
jawaban benar
Expected distractor power
- Jumlah individu ideal yang diharapkan memilih setiap distraktor pada item yang
dianalisis.
- EDP= jumlah subyek yang menjawab salah / jumlah distraktor
- Actual distractor power
(a) Jumlah individu yang memilih setiap distraktor pada item yang dianalisis
- Contoh tabel dan analisis liat ppt
Item discrimination
- Sejauh mana item memiliki kemampuan membedakan

- Untuk mengetahui sejauh mana item mampu membedakan individu yang memiliki
kemampuan/karakteristik yang tinggi dengan individu yang memiliki
kemampuan/karakteristik yang rendah
- Kalau ga ketauan dr item bahaya karena fungsi tes dan item2 buat bedaan individu
- Item discrimitnation paling penting buat decide item itu bagus atau engga
- Dapat digunakan untuk maximal & typical performance test.
- Optimum : item seharusnya dijawab benar oleh individu-individu berkemampuan tinggi
dan dijawab salah oleh individu-individu berkemampuan rendah.
- Typical (misalnya : skala likert) : pilihan skor yang lebih besar seharusnya dipilih individu
yang memiliki karakteristik dibandingkan individu yang kurang memiliki karakteristik
yang diukur
Cara dalam analisis diskriminasi item (Anastasia & Urbina, 1997)
- Menggunakan kriteria
- Extreme group method (typical performance test ga bisa pake ini)
- Distractor power (friedenberg, 1995) atau analysis of item alternatives (cohen & swerdik,
2010)
Menggunakan kriteria
- Tujuan untuk memaksimalkan homogenitas tes/internal consistency test : korelasi item

dengan total rit. Pada cohen, swerdlik & sturman (2013), analisis ini disebut sebagai
indeks reliabilitas item.
- rit = korelasi item dengan internal (reliabilitas)
- Korelasi pada dasarnya untuk melihat apakah kebervariasian di suatu variabel akan
diikuti tidak oleh variabel yang lain.
- Korelasi ada positif/negatif, semakin tinggi angka semakin kuat hubungan
- Item discrimination berharap korelasi positif yang kuat
- tujuan untuk memaksimalkan validitas kriteria - ric pada cohen, swerdlik, & sturman
(2013) - indeks validitas item
- ric = korelasi item dengan external (validitas)
- Item total correlation cenderung over-estimate karena pada skor total tes terkandung
skor item (terutama apabila item alat ukur sedikit).
- Diatasi dengan corrected item-total correlation
Kelebihan
- Tidak perlu membagi kelompok upper lower

- Dapat ditentukan signifikansi korelasi
Interpretasi
- Apabila korelasi yang didapatkan signifikan, maka dapat dianggap item yang memiliki
kemampuan mendiskriminasi yang baik significant level,
- Statistik : signifikansi akan dipengaruhi oleh degrees of freedom, semakin besar dfnya
semakn mungkin untuk mendapatkan hasil sig walaupun korelasi kecil, lebih sering
digunakan besaran nilai korelasi (biasanya minimal +0,2 or +0,3),
- Semakin tinggi akan lebih sedikit item yang eligible.
Analisis diskriminasi item menggunakan extrem group method
- Membandingkan performa pada itm antara kelompok individu berskor total tinggi (upper
group) dengan kelompok individu berskor total rendah (lower group)
Extreme group method
- D = pu-pL
- pu = item difficulty di kelompok upper
- pL - item difficulty di kelompok lower
Extreme group method (2)
- Digunakan jika jumlah subjek banyak

- Pembagian upper group dan lower group didasarkan pada skor total
- Pembagian kelompok ditentukan berdasarkan 27%-33% skor tertinggi/terendah
- Upper lower lebih mudah dilakukan secara manual
Kelemahan D
- Hanya untuk item maximum performance test yang diskor benar-salah. (krna
menggunakan item difficulty
- Besar kemampuan daya beda item tidak dapat diketahui dengan pasti, karena yang
diketahui hanya selilsih proporsi individu menjawab benar.
Interpretasi nilai D (Ebel dalam Crocker & Algina, 1986)
- D>0.40 : the item is functioning quite satisfactorily

- 0.30<D<0.39 : little/no revisions required
- 0.20<D<0.29 : the item is marginal and needs revision
- D<0.19 : the item should be eliminated or completely revised
Relation of maximum value of D to item difficulty
Percentage passing item (p) Maximum value of D

100 0
90 20
70 60
50 100
30 60
10 20
0 0
Validity
Pengertian validitas
- Judgement or estimate of how well a test measures what it purports to measure in
particular context (Cohen, Swerdlik & Sturman, 2013).
- Apa yang diukur oleh tes dan seberapa tepat tes mengukur apa yang hendak diukur
(Anastasi & Urbina, 1997).
- Agreement between a test score or measure and the quality it is believed to measure
(Kaplan & Saccuzzo, 2005).
- Sebuah tes dapat dikatakan valid hanya apabila interpretasi yang dibuat berdasarkan
hasil test tersebut sesuai dengan kenyataan sebenarnya.
Prosedur validasi
- Validation is the process of gathering and evaluating evidence about validity (Cohen &
Swerdlik & Sturman, 2013).
- Semua prosedur validasi tes mempertimbangkan hubungan antara skor tes dengan
fakta-fakta lain yang observable dan independent dari trait yang akan diukur (Anastasi &
Urbina, 1997).
Persyaratan kriteria pengukuran

- Relevan
- Dapat diobservasi dan diukur
- Reliabel
- Independen (tidak dipengaruhi oleh hasil tes yang hendak diuji).
- Tidak mengandung bias.
Prosedur validasi (Cohen, Swerdlik & Sturman, 2013)

- Three approaches to assessing validity
- … respectively, with content validity, criterion-related validity, and construct validity are :
1. Scrutinizing the test’s content.
2. Relating scores obtained on the test to other test scores or other measures.
3. Executing a comprehensive analysis of :
a. How scores on the test relate to other test scores and measures.
b. How scores on the test can be understood within some theoretical
framework for understanding the construct that the test was designed to
measure.
Traditionally, validity is conceptualized into three categories :

1. Content validity/Content description
2. Criterion related validity/Criterion prediction
3. Construct validity/Construct identification
In this classic conception validity, referred to as the trinitarian view (Guion, 1980 in Cohen,
Swerdlik & Sturman, 2013, it might be useful to visualize construct validity as being “umbrella
validity” since every other variety of validity falls under it.
Sources of validity evidence

- Validity is a unitary concept. It is the degree to which all the accumulated evidence
supports the intended interpretation of test scores for the proposed use.
1. Evidence based on test content
2. Evidence based on response processes
3. Evidence based on Internal structure
4. Evidence base on relations to other variables
Face Validity (Cohen, Swerdlik, & Sturman, 2013)

- What a test appears to measure to the person being tested than to what the study
actually measures.
- Face validity is a judgement concerning how relevant the test items appear to be.
- More a matter of public relations that psychometric soundness, but it seems important
nonetheless.
- ...is really not valid at all because it does not offer evidence to support conclusions
drawn from test scores (Kaplan & Saccuzzo, 2005).
- A test’s lack of validity could contribute to a lack of confidence in the perceived
effectiveness of the test - with a consequent decrease in the test taker's cooperation or
motivation to do his or her best.
- If a test definitely appears to measure what it purports to measure “on the face of it’, then
it could be said to be high in face validity.
- A paper & pencil personality test labeled The Introversion/Extraversion Test, may be
perceived by respondents as a highly face-valid test.
- A personality test in which respondents are asked to report what they see in inkblots
may be perceived as a test with low face validity.
Content validity
- Describes a judgement of how adequately a test samples behavior representative of the
universe of behavior that the test was designed to sample (Cohen, Swerdlik, & Sturman,
2013).
- ...pengujian sistematis terhadap isi suatu tes untuk menentukan apakah tes tersebut
sudah mencakup sampel yang representatif terhadap ‘behavior domain’ yang akan
diukur (Anastasi & Urbina, 1997).
- Assertiveness tests would contain items sampling from hypothetical situations : at home,
on the job, in social situations.
- Education achievement tests, when the proportion of material covered by the test
approximates the proportion of material covered in the course.
- Content validity evidence has been of greatest concern in educational testing (Kaplan &
Saccuzzo, 2005)
The quantification of content validity
- Content validity ratio (CVR) : developed by C.H. Lawshe, is essentially a method for
gauging agreement among raters or judges regarding how essential a particular item is.
1. Essential
2. Useful but not essential
3. Not necessary
- Where :
- ne = numbers of panelists including “essential”
- N = total number of panelists
Criterion related validity (Cohen, Swerdlik, & Sturman, 2013)

- Judgement of how adequately a test score can be used to infer an individual’s most
probable standing on some measure of interest - the measure of interest being the
criterion.
1. Concurrent validity : is an index of the degree to which a test score is related to
some criterion measure obtained at the same time (concurrently).
2. Predictive validity : is an index of the degree to which a test score predicts some
criterion measure.
What is a criterion? (Cohen, Swerdlik, & Sturman, 2013)

- A criterion can be most anything.
- It can be a test score, a specific behavior or group of behaviors, an amount of time, a
rating, a psychiatric diagnosis, a training cost, an index of absenteeism, an index of
alcohol intoxication, and so on.
- Whatever the criterion, ideally it is relevant, valid, and uncontaminated.
What is a criterion?
- Relevant : it is pertinent or applicable to the matter at hand.
- Valid : If one test (X) is being used as the criterion to validate a second test (Y), then
evidence should exist that test X is valid.
- Uncontaminated : a criterion measure that has been based, at least in part, on predictor
measures.
Criterion contamination
- Terjadi ketika pengukuran kriteria dipengaruhi oleh pengetahuan tentang skor tes yang
hendak diuji validitasnya.
- Contoh : “Inmate Violence Potential Test” (IVPT) designed to predict a prisoner’s
potential for violence in the cell block. In part, this evaluation entails ratings from fellow
inmates, guards, and other staff in order to come up w/ a number that represents each
inmate violence potential.
- After all of the inmates in the study have been given scores on this test, the study
authors then attempt to validate the test by asking guards to rate each inmate on their
violence potential.
Concurrent validity (Cohen, Swerdlik, & Sturman, 2013)

- ...is an index of the degree to which a test score is related to some criterion measure
obtained at the same time (concurrently)
- Statements of concurrent validity indicate the extent to which test scorers may be used
to estimate an individual’s present standing on a criterion.
Predictive validity (Cohen, Swerdlik, & Sturman, 2013)

- ...is an index of the degree to which a test score predicts some criterion measure,
- Test scores may be obtained at one time and the criterion measures obtained at a future
time, usually after some intervening event takes place.
- Test scores may be obtained at one time and the criterion measures obtained at a future
time, usually after some intervening event has taken place.
- The intervening event may take varied forms, such as training, experience, therapy,
medication, or simply the passage of time.
- Ukuran ketepatan fungsi suatu tes untuk meramalkan suatu ‘non-test behavior’ keadaan
di masa medatang.
1. Skor tes intelegensi awal masuk kuliah = keberhasilan studi di perguruan tinggi.
2. Skor tes agresivitas saat remaja = perilaku agresif di lingkungan kerja saat
dewasa.
Perbedaan predictive dan concurrent validity

- Anastasi : tergantung tujuannya
1. Meramalkan = predictive validity
2. Mendiagnosis = concurrent
- Cronbach : waktu tersedianya kriteria/evidence
1. Kriteria/evidence di waktu yang akan datang = predictive validity
2. Kriteria/evidence saat ini = concurrent validity
- Teknik concurrent validation dianggap lebih praktis karena jangka waktu antara tes dan
pengukuran kriteria/evidence adalah pendek.
Mengatasi kesulitan menentukan kriteria

- Ditinjau seberapa jauh kriteria biar diskor atau dirumuskan secara operasional.
- Perlu diperhatikan kekhususan dari kriteria (perbedaan tempat, waktu, situasi =
perbedaan koefisien validitas tes).
Common categories criterion measures criterion prediction procedures (Anastasi & Urbina,
1997)
- Academic achievement
- Performance in specialized training
- Job performance
- Contrasted group
- Psychiatric diagnosis
- Rating
- Previously available test
The validity coefficient

- ...is a correlation coefficient that provides a measure of the relationship between test
scores and scores on the criterion measure.
- It is the responsibility of the test developer to report validation data in the test manual.
- It is the responsibility of test users to read carefully the description of the validation study
and then to evaluate the suitability of the test for their specific purposes.
How high should a validity coefficient be for a user or a test developer to infer that the test is
valid?
- The validity coefficient should be high enough to result in the identification and
differentiation of test takers with respect to target attribute(s) (Cohen, Swerdlik &
Sturman, 2013).
- Harus signifikan pada level tertentu dan cukup tinggi untuk dapat mengidentifikasi dan
membedakan individu (Anastasi & Urbina, 1997).
Interpretasi koefisien validitas (Anastasi & Urbina, 1997)

- Terkait dengan tujuan tes.
- Terkait dengan teori konstruk.
- Dikaitkan dengan metode pengujian validitas.
- Khusus pada criterion validity, korelasi diharapkan signifikan dan kuat.
Contoh interpretasi
- Ada koefisien korelasi r = 0,8 (signifikan pada LOS 0,05) sebagai hasil uji validitas
prediktif antara tes SIMAK UI dengan nilai IPK.
- Hal ini menunjukkan bahwa 64% proporsi varians nilai IPK dijelaskan oleh tes SBMPTN.
Dengan demikian, SBMPTN dapat dikatakan valid untuk memprediksi keberhasilan
belajar di perguruan tinggi.
Construct validity
- Judgement about the appropriateness of inferences drawn from test scores regarding
individual standings on a variable called a construct (Cohen, Swerdlik & Sturman, 2013).
- Ukuran seberapa tepat tes mengukur suatu theoretical construct tertentu (trait maupun
abilities (Anastasi & Urbina, 1997).
Pengujian validitas konstruk (Anastasi & Urbina, 1997)

- Pelajari teori seputar konstruk yang akan diukur.
- Kembangkan hipotesis berdasarkan teori.
- Analisis kesesuaian hasil pengujian empirik.
Evidence of construct validity (Cohen, Swerdlik & Sturman, 2013)

- The test is homogeneous, measuring a single construct.
- Test scores increase or decrease as a function of age, the passage of time, or an
experimental manipulation as theoretically predicted.
- Test scores obtained after some event or the mere passage of time (that is , post test
scores) differ from pretest scores as theoretically predicted.
- Test scores obtained by people from distinct groups vary as predicted by the theory.
- Test scores correlate with scores on other tests in accordance with what would be
predicted from a theory that covers the manifestation of the construct in question.
Evidence of homogeneity
- ...refers to how uniform a test is in measuring a single concept
- Contoh :
1. Correlate average subtests scores with the average total test score.
2. Subtests/items that in the test developer’s judgement do not correlate very well
with the test as a whole might have to be reconstructed (or eliminated) lest the
test not measure the construct academic achievement.
- Padanan (Anastasi & Urbina, 1997) = Internal consistency menguji validitas tes untuk
mengukur konstruk yang bersifat unidimensi (bukan gabungan beberapa sub-konstruk).
Internal consistency (Anastasi & Urbina, 1997)

- Digunakan (secara terpaksa) ketika tidak ada/sulit menemukan ‘kriteria luar’ lainnya.
- Dengan tidak digunakannya kriteria eksternal sebagai pembanding, apa yang
sesungguhnya diukur oleh tes tidak dapat diketahui.
Evidence of change with age

- If a test score purports to be a measure of a construct that could be a measure of a
construct that could be expected to change over time, then the test score, too, should
show the same progressive changes with age to be considered a valid measure of the
construct.
- Padanan (Anastasi & Urbina, 1997) = Developmental changes = menguji validitas tes
yang akan digunakan untuk mengukur konstruk yang menurut teori perubahannya
mengikuti tahapan perkembangan.
Evidence of pretest-post test changes

- Depending on the construct being measured, almost any intervening life experience
could be predicted to yield changes in score from pretest to post test.
- Padanan (Anastasi & Urbina, 1997) = Experimental intervention
1. Intervensi dapat berupa pelatihan atau pemberian perlakuan tertentu kepada
subjek.
2. Evidence yang dapat digunakan : kinerja dalam hal-hal khusus yang menurut
teori berkaitan dengan konstruk yang diukur.
- Tes agresivitas dan anger management training
- Tes work attitude dan program magang
Evidence from distinct groups

- If a test is a valid measure of a particular construct, then test scores from groups of
people who would be presumed to differ with respect to that construct should have
correspondingly different test scores.
- Padanan (Anastasi & Urbina, 1997) = Contrasted groups
Test scores correlate with scores on other tests in accordance with what would be predicted
from a theory that covers the manifestation of the construct in question.
- Convergent evidence : correlation with other test
- Discriminant evidence
- Factor analysis
Convergent evidence
- Scores on the test undergoing construct validation tend to correlate highly in the
predicted direction with scores on older, more established, and already validated tests
designed to measure the same (or similar) construct.
- Convergent evidence for validity may come not only from correlations with test
purporting to measure an identical construct but also form correlations with measures
purporting to measure related constructs.
Discriminant evidence
- A validity coefficient showing little (that is, a statistically insignificant) relationship
between test scores and/or other variables with which scores on the test being
construct-validated should not theoretically be correlated.
Convergent-discriminant validation (Anastasi & Urbina, 1997)

- Tes yang mengukur X seharusnya secara jelas berkorelasi dengan convergent factors
dan sekaligus tidak berkorelasi dengan discriminant factors.
- Memeriksa apakah hasil perhitungan korelasi sesuai dengan teori.
Factor analysis
- …mathematical procedures designed to identify factors or specify variables that are
typically attributes or specific variables that are typically attributes, characteristics, or
dimensions on which people may differ.
- Both convergent and discriminant evidence of construct validity can be obtained by the
use of factor analysis.
- Factor analysis is frequently employed as a data reduction method in which several sets
of scores and the correlations between them are analyzed.

Rangkuman Psikometri Psiko

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Rangkuman Psikometri Psiko

Diunggah oleh

Hak Cipta:

Format Tersedia

Review UAS Psikometri

Pengukuran gejala psikologi

Test development process

- Meningkatkan reliabilitas dan validitas tes

Teknik analisis item kualitatif

Teknik analisis item kuantitatif

Qualitative item analysis

Qualitative item analysis

- Harus di lihat tipe testnya (time limit or speed test)

Indeks kesulitan item (p)

- Sifatnya ordinal (mengurutkan item berdasarkan tingkat kesulitannya)

Interpretasi indeks kesulitan item

1. Disesuaikan dengan tujuan tes

Kategorisasi indeks kesulitan item

- p >= 0.8 → sangat mudah

Item endorsement index (PoE)

Distractor power analysis

- Apa fungsi dari distraktor?

Expected distractor power

- Sejauh mana item memiliki kemampuan membedakan

Cara dalam analisis diskriminasi item (Anastasia & Urbina, 1997)

- Tujuan untuk memaksimalkan homogenitas tes/internal consistency test : korelasi item

- Tidak perlu membagi kelompok upper lower

Analisis diskriminasi item menggunakan extrem group method

Extreme group method

Extreme group method (2)

- Digunakan jika jumlah subjek banyak

Interpretasi nilai D (Ebel dalam Crocker & Algina, 1986)

- D>0.40 : the item is functioning quite satisfactorily

Relation of maximum value of D to item difficulty

Percentage passing item (p) Maximum value of D

Persyaratan kriteria pengukuran

Prosedur validasi (Cohen, Swerdlik & Sturman, 2013)

Traditionally, validity is conceptualized into three categories :

Sources of validity evidence

Face Validity (Cohen, Swerdlik, & Sturman, 2013)

Criterion related validity (Cohen, Swerdlik, & Sturman, 2013)

What is a criterion? (Cohen, Swerdlik, & Sturman, 2013)

Concurrent validity (Cohen, Swerdlik, & Sturman, 2013)

Predictive validity (Cohen, Swerdlik, & Sturman, 2013)

Perbedaan predictive dan concurrent validity

Mengatasi kesulitan menentukan kriteria

The validity coefficient

Interpretasi koefisien validitas (Anastasi & Urbina, 1997)

Pengujian validitas konstruk (Anastasi & Urbina, 1997)

Evidence of construct validity (Cohen, Swerdlik & Sturman, 2013)

Internal consistency (Anastasi & Urbina, 1997)

Evidence of change with age

Evidence of pretest-post test changes

Evidence from distinct groups

Convergent-discriminant validation (Anastasi & Urbina, 1997)

Anda mungkin juga menyukai