Anda di halaman 1dari 27

ESTI MA TI ON TO EX AMI NEE PARA MET ER USIN G

NEWT ON-RAPH SON METHO D: A N AP PL ICA TIO N FO R


LAN GUA GE TEST IN G

Widiatmok o
E.: moko.geong@gmail.com

Ce nter for La nguage s Tea ch er Tra ining and Deve lopm en t,


MOE

Pre sen ted at th e Annual Con fer en ce on


Lin guis tic s
At ma Ja ya Catholi c Un iv er sit y Ja ka rt a
15- 16 Fe bru ary 2006

moko_geong@yahoo.com.my
Why?

Curriculum/Syllabus Learning Process Evaluation

Technique

Evaluation
Approach,

Materials
Method,

moko_geong@yahoo.com.my
Steps in the Construction of a Content-
Valid Examination
Test Construction Development of a Item Development
Test Blueprint and Validation

Review and Test Assembly Pretesting of Test


Revision Items

Content-Valid Passing Point Test


Examination Administration

moko_geong@yahoo.com.my
Steps in the Construction of Calibrated
Items
Outlining Topics Domain Constructing
Specification Items

Selecting Items Items Try-out Reviewing and


Revising Items

Good Items Items Calibration Item Banking


(Estimation)

moko_geong@yahoo.com.my
Test & Items
Test:
► questions to measure examinee’s trait in a situation
► examiner focuses on what the examinees are like in their norm groups
► test: easy, the examinee: higher ability; vice versa (Hambleton,
Swaminathan, & Rogers, 1991; Naga, 1992) or
Items:
 their statistics subject to change or inconsistent depending upon the
groups’ traits of examinees
 designed based on the aforementioned judgment
 difficulty (i.e., proportion of examinees passing the item) and
discrimination (i.e., item-total test biserial or point biserial) are group-
dependent; implying that the values of these statistics depend on the
examinee group in which they are acquired (Magnusson, 1967;
Hambleton, 1989)

moko_geong@yahoo.com.my
IRT
► IRT used by nearly all of the largest test
publishers, many state departments of
education, and industrial and professional
organizations (Hambleton & Murray, 1983;
Hambleton, 1989)

moko_geong@yahoo.com.my
IRT: Local independence
∗ composite scores of items by the homogeneous
subpopulation of examinees which are
independent (Naga, 1992 cited in Widiatmoko,
2005)
∗ responses to any two items are uncorrelated in a
homogeneous subpopulation with a particular level
of (Hulin, Drasgow, & Parsons, 1983)
∗ within any group of examinees all characterized by
the same values θ1, θ2, ..., θk, the (conditional)
distributions of the item scores are all independent
of each other (Lord & Novick, 1968) and
(McDonald, 1999)

moko_geong@yahoo.com.my
IRT: Parameter invariance
► parameters characterising an item do not
depend on the trait distribution of the
examinees and the parameter characterising
an examinee does not depend on the set of
test items (Hambleton, Swaminathan, &
Rogers, 1991)

moko_geong@yahoo.com.my
IRT: Unidimension
► presence of a dominant component or
factor influencing test performance
(Hambleton, Swaminathan, & Rogers, 1991)
► item that measures one trait or
characteristic over the examinees (Traub,
1983; Naga, 1992)

moko_geong@yahoo.com.my
Language tests
► designed using the concept of IRT in
discrete-points test paradigm (Weir, 1990)
► discrete-points test focusing on one point of
grammar at a time; at only one element of a
particular component of a grammar; only
one skill at a time and one aspect of a skill
(Oller, 1979)

moko_geong@yahoo.com.my
Question formulated
Is the test characteristic curve generated by
Newton-Raphson method satisfied in one-
parameter logistic model?

moko_geong@yahoo.com.my
Theory: Parameter estimation
► determining the value of an examinee’s trait with
adequate precision and classifying an examinee into trait
categories with small probabilities of misclassification (Lord
& Novick, 1968)
► incorporates item parameter and examinee parameter
► basic consideration: that parameter estimates are chosen
by selecting the values that make an observed data set
appear most likely in light of a particular model (Hulin,
Drasgow, & Parsons, 1983)
► item parameter: estimation to item difficulty and
discrimination
► examinee parameter: estimation to examinee’s trait
► concerns item banking

moko_geong@yahoo.com.my
IRT models
► 1PL, 2PL, 3PL, and 4PL models
► systematic procedure for considering and
quantifying the probability or improbability
of individual item and examinee’s response
patterns in a set of test data (Henning,
1987)
► appropriate for dichotomous data
► distinction among the models: the numbers
of parameters
moko_geong@yahoo.com.my
Model parameters
► 1st: scale of examinee’s trait and item
difficulty
► 2nd: continuous estimate of discriminability
► 3rd: index of pseudo chance-level (guessing)
► 4th: index of carelessness by the high
achiever (Hambleton, 1989)

moko_geong@yahoo.com.my
1PL model
► widely used
► probabilistic where the examinees and items are not only
graded for trait and difficulty, but also judged according to
the probability or likelihood of their response patterns
given the observed examinee’s trait and item difficulty
(Henning, 1987)
► assumption all items are equally discriminating,
► application is for relatively easy tests,
► much smaller sample sizes are required if the main
purpose is to estimate θ (Hulin et al. cited in Crocker &
Algina, 1986)

moko_geong@yahoo.com.my
Estimation to parameters
► quite often occurs in 1PL model
► bi and θ employed
► situations: estimation of trait with item
parameters known and estimation of item
and trait parameters (Hambleton, 1989)
► of latent trait with item parameters known
is the simple estimation. This employs N-R
method
moko_geong@yahoo.com.my
N-R Methods
► finds zeros of the next derivatives of
maximized function (Krass, 2005).
► obtains results where the drift of parameter
estimates arrested and the parameters
estimated more accurately than with the
joint maximum likelihood procedure
(Swaminathan, Hambleton, Sireci, Xing, &
Rizavi, 2003)

moko_geong@yahoo.com.my
N-R Steps
► The method employs the equation (Naga, 2003):
N

∑[ X i − Pi (θ )]
θ S +1 = θ S + i =1
N
;
D ∑ Pi (θ ).Qi (θ )
i =1

► θs: initial examinee’s latent trait; θs+1: following


examinee’s latent trait; N: number of items in the
test; Xi: examinee’s response; Pi(θ): probability of
examinee with trait θ answering the items i
correctly; Qi(θ): probability of examinee with trait
θ answering the items i incorrectly; and D: scaling
constant, i.e., 1.7
moko_geong@yahoo.com.my
Steps
► the first examinee’s responses are put in
line with the item numbers and the item
difficulties. correct response (X=1),
incorrect response (X=0)
► the initial examinee’s trait θs calculated
considering the natural logarithm between
Pi(θ) and Qi(θ)
► Pi(θ) calculated for all items: P (θ ) = e
D (θ −bi )

i ; D (θ −bi )
1+e

moko_geong@yahoo.com.my
Steps
► Qi(θ) = 1 – Pi(θ)
► the examinee’s response Xi is subtracted by the success probability
Pi(θ)
► D, the success probability Pi(θ), and the failure probability Qi(θ)
multiplied
► to obtain the next iteration of examinee’s trait θ1, the calculation is
done
► the distance between θ0 and θ1 primarily used for the decision on the
next iteration
► when the distance is equal to or less than 0.001, it is considered
sufficient to get maximum likelihood and the curve gets convergent.
According to Krass (2005), in the sense of convergence, finding zeros
of the following derivatives of maximized function is undertaken
► the θ estimation is done for all examinees

moko_geong@yahoo.com.my
Methodology
► a survey
► purposive random sampling
► the population: 45 items along with their bi and 2000
examinees responding the items
► 40 bi randomly as the subpopulation of items
► only examinees respond the items correctly and incorrectly
are purposively selected data
► 70 examinees responding the items randomly as the
subpopulation of examinees
► the research analysis units: 40 bi and 70 examinees
responding the items
► the values of examinees’ latent traits analyzed

moko_geong@yahoo.com.my
Analysis
► initial θ of examinees’ latent traits extend from -1.735 to +3.664
► the first iteration includes the examinees 10, 20, 30, and 50
► the second iteration includes the examinees 5, 9, and 39
► the third iteration includes the examinees 1, 2, 4, 7, 8, 11, 12, 13, 14,
15, 17, 18, 19, 21, 22, 24, 25, 27, 28, 31, 32, 34, 35, 37, 41, 42, 43,
44, 49, 51, 53, 54, 55, 57, 59, 61, 63, 64, 65, 67, 68, and 69
► the fourth iteration includes the examinees 3, 6, 16, 23, 26, 29, 36, 38,
45, 46, 47, 56, 58, and 66
► the fifth iteration includes the examinees 33, 40, 48, 52, 62, and 70
► the sixth iteration includes the examinee 60
► it results in the examinees’ traits θ varying from -1.735 to +2.912

moko_geong@yahoo.com.my
Test Characteristic Curve for 1PL

1.0

0.9
0.8

0.7
Probability of Examinees Latent Trait

0.6
0.5

0.4

0.3
0.2

0.1

0.0
-1.74 -1.54 -1.19 -0.98 -0.79 -0.60 -0.41 -0.30 -0.12 0.08 0.18 0.63 0.90 1.05 1.38 2.06 2.91
Examinees Latent Trait

moko_geong@yahoo.com.my
Conclusion
► 1PL model is not sufficiently satisfied.
Hypothetically, it may be due to the number
of examinees, the method employed, the
model chosen, the test length, and the
other factors

moko_geong@yahoo.com.my
Recommendation & Implication
► continuous study
► in language testing, recommended to
employ some methods of estimation for the
widely ranged test items using 2PL model,
3PL model, and other models
► computer programs for the sake of the
accurate and quick iteration
► item banking

moko_geong@yahoo.com.my
Questions

moko_geong@yahoo.com.my
C U Next Ye ar

moko_geong@yahoo.com.my

Anda mungkin juga menyukai