Anda di halaman 1dari 157

RESEARCH DESIGN

Four Functions of Research


1. Basic: research designed to test or refine theory
2. Applied: research conducted in a field of
common practice and concerned with the
application and development of research based
knowledge
3. Action: research designed to solve a specific
classroom or school problem, improve practice,
or make a decision at a single local site
4. Evaluation: research designed to assess the
merit and worth or a specific practice in terms
of the values operating at a site
2

Perbezaan perlaksanaan kajian


konvensional & kajian tindakan
1.
2.
3.
4.
5.
6.
7.
8.

Kajian konvensional
Baca literatur
Kenalpasti masalah
Objektif kajian
Soalan kajian
Kumpul data
Analisa data
Kesimpulan
Penekanan kpd metodologi
khasnya rekabentuk kajian
instrumentasi kesahan &
kebolehpercayaan

1.
2.
3.
4.
5.
6.
7.
8.
9.

Kajian tindakan
Kenalpasti masalah (dengan bukti
kumpul data tinjauan awal)
Baca literatur
Rancang penyelesaian
Laksana
Kumpul data
Analisa data
Refleksi
Ulang semula Pilin kedua (jika
perlu)
Penekanan kpd penyelesaian
masalah tanpa generalisasi ke
setting lain
3

PROCESS OF DESIGNING AND CONDUCTING A RESEARCH


PROJECT:

What--What was studied?


What about--What aspects of
the subject were studied?
What for--What is/was the
significance of the study?
What did prior lit./research say?
What was done--How was the
study conducted?
What was found?
So what?
What now?

1. Introduction,
Research Problems/
Objectives, &
Justification

2. Literature Review

3. Methodology
(Research sample, data
collection, measurement,
data analysis)
4. Results & Discussion
5. Implications
6. Conclusions and
Recommendations for
4
Future Research

Research as Scientific
Inquiry
Scientific inquiry is the search for
knowledge using recognized methods in
data collection, analysis, and
interpretation
The purpose of scientific inquiry is to
develop knowledge
1. Describe phenomena
2. Examine empirical relationships
between or among phenomena
3. Test whether such relationships are
causal in nature
5

Educational Research
Lack of a single, appropriate
methodological approach to study
education
Two major approaches
Quantitative
Qualitative

Quantitative

Qualitative

Goals

tests theory,
establishes facts,
shows relationships,
predicts, or
statistically describes

develops grounded
theory, develops
understanding,
describes multiple
realities, captures
naturally occurring
behavior

Research design

highly structured,
formal, and specific

unstructured, flexible,
evolving

Participants

many participants
representative of the
groups from which
they were chosen
using probabilistic
sampling techniques

few participants chosen


using non-probabilistic
sampling techniques for
specific characteristics
of interest to the
researchers

Quantitative

Qualitative

Researchers role

detached, objective
observers of events

participant observers
reporting participants
perspectives
understood only after
developing long-term,
close, trusting
relationships with
participants

Context

manipulated and
controlled settings

naturalistic settings

numerical data
Data, data
collection, and data collected at specific
times from tests or
analysis

surveys and analyzed


statistically

narrative data
collected over a long
period of time from
observations and
interviews and
analyzed using
interpretive
techniques
8

BAB 3: METODOLOGI

3.1
3.2

Pengenalan
Reka Bentuk Kajian
Eksperimental
Bukan Eksperimental
Mengawal Extraneous Variable (kesahan dalaman &
luaran)

3.3

Populasi dan Sampel


Jenis, jumlah dan taburan populasi
Jumlah dan kaedah pemilihan sampel sesuai dgn reka
bentuk kajian
Kaedah persampelan (selari dgn soalan kajian)

3.4

Instrumen Kajian

Jenis instrumen
Sumber intrumen kebenaran penggunaan & alih bahasa
Kesahan dan kebolehpercayaan
Pentadbiran instrumen
9

3.5

BAB 3: METODOLOGI
Prosedur Kajian
Langkah-langkah dalam mengurus / mentadbir penyelidikan

3.6

Pengumpulan Data
Prosedur pengumpulan data

3.7 Analisis Data


Statistik deskriptif & inferensi
Program statistik diguna
Jenis analisis data
Aras Signifikan ujian statistik

3.8

Ujian Rintis
Menguji kesahan dan kebolehpercayaan instrumen
Sampel dpd populasi yang tidak terlibat dalam kajian sebenar

3.9

Rumusan
10

BAB 4: ANALISIS DAN DAPATAN

4.1 Pengenalan
4.2 Pengurusan Data

Coding
Jenis Data
Input ke dalam program statistik
Menguruskan Data

Pembentukan pembolehubah Transform: compute, recode dll.


Penentuan outlier & Statistical Assumptions
Analisis Kebolehpercayaan

4.3 Analisis Deskriptif

Frekuensi, Crosstab
Min & Sisihan Piawai
Korelasi

4.4 Analisis Inferen

Ujian-t: dependent / independent, one- / two-tailed


Anova One-way atau Factorial
Manova
Ancova Covariate
Regression / Multiple Regression
Chi-Square / Contingency Table / Crosstab

4.5 Rumusan

11

Educational Research
Differentiating characteristics
Goals
Quantitative: tests theory, establishes facts, shows
relationships, predicts, or statistically describes
Qualitative: develops grounded theory, develops
understanding, describes multiple realities, captures
naturally occurring behavior

Research design
Quantitative: highly structured, formal, and specific
Qualitative: unstructured, flexible, evolving

12

Educational Research
Differentiating characteristics
Participants
Quantitative: many participants representative of the
groups from which they were chosen using probabilistic
sampling techniques
Qualitative: few participants chosen using non-probabilistic
sampling techniques for specific characteristics of interest
to the researchers

Data, data collection, and data analysis


Quantitative: numerical data collected at specific times
from tests or surveys and analyzed statistically
Qualitative: narrative data collected over a long period of
time from observations and interviews and analyzed using
interpretive techniques

13

Educational Research
Differentiating characteristics
Researchers role
Quantitative: detached, objective observers of events
Qualitative: participant observers reporting participants
perspectives understood only after developing long-term,
close, trusting relationships with participants

Context
Quantitative: manipulated and controlled settings
Qualitative: naturalistic settings

14

Major Types of Research


Studies
Experimental: A type of research
used to establish cause-and-effect
relationships by manipulating
variables/treatments
Observational/Correlational: A
type of research that measures two
or more variables and looks to see
how the variables are related to each
other.
15

Classes of Research Design


1.
2.
3.
4.

Pre-experimental
Experimental
Quasi-experimental
Ex Post Facto

16

Pre-Experimental
Designs:
No Control Group and/or
Randomization

1. One-shot case study


2. One-group pretest-posttest design
3. Intact-group comparison

17

True Experimental Designs:


Control Group & Randomization

1. Posttest-only control-group
design
2. Pretest-posttest controlgroup design
3. Factorial experimental design
18

Quasi-Experimental Designs:
Control Group But No Randomization

1. Non-equivalent control group


design
2. Time-series designs
3. Others

19

Ex-Post Facto Designs:


Researcher Arrives After Treatment Is Given

Correlational designs
-- Simple predictive
-- Causal modeling
Criterion-group designs

20

Types of Research Designs


R e s e a rc h D e s ig n s

Q u a n tita tiv e

N o n -E x p e rim e n ta l

Q u a li t a t i v e

A n a ly t i c a l S t u d y

C a s e S tu d y

C o n c e p t A n a ly s i s

P h e n o m e n a o lo g y

H i s t o r i c a l A n a ly s i s

M ix e d M e th o d

E x p e r im e n ta l

D e s c r ip tiv e

T ru e

E th n o g r a p h y

C o m p a r a ti v e

Q ua si

G ro u n d e d T h e o ry

C o r r e la t i o n a l

S i n g le S u b je c t

C a u s a l C o m p a ra tiv e
Copyright Allyn & Bacon 2008

21

Quantitative Designs
Two major categories
Experimental
The investigation of causal effects through
direct manipulation of an independent
variable and control of extraneous variables

Non-experimental
The investigation of the current state of a
variable or the relationships, other than
causal, between variables

22

Quantitative Designs
An example of an experimental design
Randomly assign students to one of two
classrooms in which the same social studies
unit is being taught. Teach the first class using
the traditional lecture approach, the second
class using co-operative learning groups.
Examine the achievement differences between
the two groups to see if the type of approach
to instruction had an effect.
This study is characterized by the investigation
of cause (instructional approach) and effect
(achievement), manipulation (choice
23 of

Quantitative Designs
Differentiating the three types of experimental
designs
1. True experimental - Random assignment of
subjects to groups
2. Quasi-experimental - Non-random assignment of
subjects to groups

3. Pre-experimental one shot case study, pre-post


4. Single subject - Non-random selection of a single
subject

24

EXPERIMENTAL DESIGNS
One of the simplest experimental designs is the ONE GROUP PRETESTPOSTTEST DESIGN--EXAMPLE?
One way to examine Efficacy of a Drug:

O1
Measure
Patients Condition
(Pretest)

X
DRUG
Experimental
Condition/
intervention

O2
Measure
Patients Condition
(Posttest)

RESULT: Significant Improvement from O1 to O2


(i.e., sig. O2 - O1 difference)
QUESTION: Did X (the drug) cause the
improvement?
25

EXPERIMENTAL DESIGNS

CONTROL GROUP simulates absence of X

Origin of using Control Groups (A tale from ancient Egypt)


Pretest Post-Test Control Group Design--Suppose random
assignment (R) was used to control confounding variables:

R
R

Exp. Group
Ctrl Group

O1E
O1C

O2E
O2C

RESULT: O2E > O1E & O2C Not> O1C


QUESTION: Did X cause the improvement in Exp.
Group?
26

EXPERIMENTAL DESIGNS
NOT NECESSARILY! Why not?
Power of suggestibility (The Hawthorne Effect)
CONCLUSION?
Need proper form of controle.g., Placebo.
R Exp. Group O1E
X
O2E
R Ctrl Group
O1C
Placebo
O2C
QUESTION: Can we now conclude X caused the improvement
in Exp. Group?

Maybe, but be aware of the Experimenter Effect (it tends to


prejudice the results especially in medical research).
SOLUTION: Double Blind Experiments (neither the subjects
nor the experimenter knows who is getting the placebo/drug).
27

EXPERIMENTAL DESIGNS
Experimental studies need to control for potential
confounding factors that may threaten internal validity
of the experiment:
Hawthorne Effect is only one potential confounding factor
in experimental studies.
Other such factors are:
History?
Biasing events that occur between pretest and post-test

Maturation?
Physical/biological/psychological changes in the subjects

Testing?
Exposure to pretest influences scores on post-test

Instrumentation?
Flaws in measurement instrument/procedure28

EXPERIMENTAL DESIGNS
Experimental studies need to control for potential
confounding factors that may threaten internal validity
of the experiment (Continued):
Selection?
Subjects in experimental & control groups different from the start

Statistical Regression (regression toward the mean)?


Subjects selected based on extreme pretest values

Experimental Mortality?
Differential drop-out of subjects from experimental and control groups
during the study

Etc.
Experimental designs mostly used in natural and physical sciences.
Generally, higher internal validity, lower external

validity

29

30

31

Jenis rekabentuk
eksperimen

A. Rekabentuk satu pembolehubah


melibatkan satu pu bersandar
shj.

1. Rekabentuk pra-eksperimental o tidak


kawal/ malarkan pu ekstraneous wajar
dielakkan.
2. Rekabentuk eksperimental tulen okawalan yg lebih tinggi melalui penugasan
rawak
3. Rekabentuk eksperimen kuasi - guna
kumpulan2 sedia ada lebih baik dpd pra
eksperimen tp tidak sebaik eksperimen
tulen
32
B. Rekabentuk faktorial - melibatkan 2

Kajian rekabentuk eksperimen


(X rawatan/ treatment)
Rekabentuk eksperimen pra
Pra pasca tanpa kumpulan kawalan
O1 ----- X ------ O2
Rekabentuk eksperimen kuasi
Pra pasca dengan kumpulan
kawalan (intact group kumpulan
sedia ada tanpa penugasan rawak/
random assignment)
O1 ----- X1 ------ O2
O3 ------X2-- -----O4
Rekabentuk eksperimen tulen
Pra pasca dengan kumpulan
kawalan (pengagihan kpd kumpulan
adalah secara penugasan rawak)
R
O1 ----- X1 ------ O2
R
O3 ----X2-- -----O4
33

Quantitative Designs

Examples of non-experimental designs


Approximately 10% of Louisianas public school students
do not finish high school.
The GPA of students participating in extra-curricular
activities is higher than that of student who do not
participate
Student attitude is moderately related to achievement
Several factors are related to the high dropout rate in
Louisiana. These include the students age, academic
record, repetition of grade(s), gender, and ethnicity.
These studies are characterized by descriptions (dropout
rate, GPA differences, opinions) or relationships
(attitudes and achievement, factors related to dropping
out)
34

Quantitative Designs
Differentiating the four types of nonexperimental designs
1. Descriptive - Makes careful

descriptions of the current situation


or status of a variable(s) of interest
2. Comparative - Compares two or more
groups on some variable of interest
3. Correlational - Establishes a
relationship (i.e., non-causal)
between or among variables
4. Ex-post-facto (after the fact) - Explores
possible causes and effects among
35
variables that cannot be

Uji dan uji semula


Nama murid

Ujian 1

Ujian 2

Abu

76

78

Bakar

77

78

Chang

62

63

36

Korelasi
Perkaitan antara 2 skor ujian murid
yg sama

U
ji
a
n
2

Ujian 1

37

Korelasi
Magnitud
Arah

38

Correlation Coefficient (Pearson)

R: ranges from -1 (perfect negative correlation) to 1 (perfect


positive correlation)

r= 1.0

r= 1.0

r= -0.8

r= 0

Note: functions having different slopes may have


the same correlation coefficient
39

Korelasi
Calculation

Test significance of produced value


Significance level
Degree of freedom
40

CORRELATIONAL DESIGNS
NON-EXPERIMENTAL/CORRELATIONAL DESIGNS
The design of choice in social sciences since the phenomenon
under study is usually not reproducible in a laboratory setting
Researcher has little or no control over studys indep., dep.
and the numerous potential confounding variables,
Often the researcher concomitantly measures all the study
variables (e.g., independent, dependant, etc.),
Then examines the following types of relationships:
correlations among variables or
differences among groups,

Inability to control for effects of confounding variables makes


causal inferences regarding relationships among variables
more difficult and, thus:
Generally, higher external validity, lower internal validity
41

CORRELATIONAL STUDIES
WHEN IS IT SAFER TO INFER CAUSAL
LINKAGES FROM STRONG CORRELATIONS?
:
Covariation Rule (X and Y must be
correlated)--Necessary but not sufficient condition.
Temporal Precedence Rule (If X is the cause, Y
should not occur until after X).
Internal Validity Rule (Alternative plausible
explanations of Y and X-Y relationships should be
ruled out (i.e., eliminate other possible causes).
In practice, this means exercising caution by
identifying potential confounding variables and
42
controlling for their effects).

Paired Samples Statistics

Mean

Pair 1

Std. Deviation

Std. Error Mean

pretest

12.3667

30

2.38506

.43545

posttest

17.5333

30

1.85199

.33813

43

Paired Samples Statistics

Mean
Pair 1

Std. Deviation

Std. Error Mean

pretest

12.3667

30

2.38506

.43545

posttest

17.5333

30

1.85199

.33813

Paired Samples Test

Paired Differences

95% Confidence Interval of the

Pair 1

pretest posttest

Std.
Std. Error
Mean
Deviation
Mean
-5.16667
3.27038
.59709

Difference

Lower
-6.38785

Upper
-3.94549

t
-8.653

44

df
29

Sig. (2tailed)
.000

pretest

MINIMUM MARKS

MAXIMUM MARKS

20

AVERAGE MARKS

12

RANGE

12

posttes
t
MINIMUM MARKS

15

MAXIMUM MARKS

20

AVERAGE MARKS

18

RANGE

45

Kesimpulan analisis
Based on the provided analysis,
the minimum marks scored by
students after teacher initiated
lesson on Order of Adjectives is
15, while maximum is full mark of
20. The average marks is 18 and
the difference between maximum
and minimum marks is 5. After
teacher conducted lesson on Order
of Adjectives, with exposure using
various methods, most of students
manage to score up to 18/20 and
the range between lowest and
highest marks decreased from 8
(during pre test) to 5, which is a
clear indication that students
understood well the lesson.

Skor pra adalah min=


12.37, sp = 2.38 dan
skor pasca adalah
min17.53, sp=1.85.
Nilai p<.05, maka Ho
ditolak. Ini bermakna
terdapat peningkatan
skor yg signikan iaitu
pengajaran berjaya

46

Qualitative Designs
Much less precision in the definitions
of and distinctions between
qualitative designs in comparison to
quantitative designs
Four major categories of designs
1. Case study
2. Phenomenology
3. Ethnography
4. Grounded theory
47

Qualitative Designs
Case Study
An examination of a specific instance of a
phenomena in its natural context viewed
from the perspective of the participants
This study explored the meaning of
inclusion for three disabled students
who had been placed in a regular
education setting.
This study examines in-depth a
phenomena of interest to the researcher
(i.e., the meaning of inclusion) in a
natural context viewing it from the
participants perspectives
48

Qualitative Designs
Phenomenology
A description of the meaning of an
experience
The purpose of this study was to examine
the meaning of being left out for an
adolescent
This study examines in-depth the
experiences of being left out from the
perspectives of the adolescent experiencing
this phenomena
49

Qualitative Designs
Ethnography
A description the beliefs and practices of a
cultural or social group or system
The purpose of this study was to identify
and describe the conflicts that
experienced second-grade teachers
encountered as they switched from a
traditional approach to teaching
mathematics to a constructivistsociological approach
This study examines the beliefs and
practices of second grade teachers
experiencing a common phenomena
50

Qualitative Designs
Grounded theory
A description of a conceptual understanding of
a particular phenomenon
The purpose of this study was to understand
the relationship of the bar to the teachers
who frequented it on Friday evenings. We
found that teachers used the bar to facilitate
their movement from professional to
personal self.
This study examined a phenomena of
interest to the researcher (i.e., teachers
congregating at a particular bar on Friday
evenings) and developed a conceptual
understanding of it.
51

RESEARCH DESIGN
RESEARCH DESIGN: The blueprint/roadmap that will guide the
research.
The test for the quality of a studys research design is the
studys conclusion validity.
CONCLUSION VALIDITY refers to the extent of researchers
ability to draw accurate conclusions from the research. That is,
the degree of a studys:
a) Internal Validitycorrectness of conclusions regarding the
relationships among variables examined

Whether the research findings accurately reflect how the research variables are
really connected to each other.

b) External Validity Generalizability of the findings to the


intended/appropriate population/setting
Whether appropriate subjects were selected for conducting52
the study

Classification of Variables
Independent Variable: a variable that is
manipulated, measured or selected by the
researcher in order to observe its relation to the
subject's "response on another variable. An
antecedent condition.
Dependent Variable: the variable that is
observed and measured in response to an
independent variable.
Intervening Variable: a hypothetical variable
that is not observed directly in the research
study, but is inferred from the relationship
between the independent and dependent
53

variables

Dependent variable - The values of the dependent variable depend upon


another variable, the independent variable.
Intervening variable - A variable that explains a relation or provides a
causal link between other variables. Also called by some authors
mediating variable or intermediary variable. Example: The statistical
association between income and longevity needs to be explained because
just having money does not make one live longer. Other variables
intervene between money and long life. People with high incomes tend to
have better medical care than those with low incomes. Medical care is an
intervening variable. It mediates the relation between income and
longevity.
Mediating variable - Synonym for intervening variable. Example: Parents
transmit their social status to their children directly, but they also do so
indirectly, through education: viz. Parents status childs education
childs status
Moderating variable - A variable that influences, or moderates, the relation
between two other variables and thus produces an interaction effect
Confounding variable - A variable that obscures the effects of another
variable. If one elementary reading teacher used a phonics textbook in her
class and another instructor used a whole language textbook in his class,
and students in the two classes were given achievement tests to see how
well they read, the independent variables (teacher effectiveness
and
54
textbooks) would be confounded. There is no way to determine if

Skala
1.
2.
3.
4.

sedikit maklumat
Nominal lelaki/ perempuan
Ordinal 1-tidak suka, 2-suka, 3sangat suka
Interval (sela)
Ratio (nisbah)
. Banyak maklumat
55

Four Types of Measurement Scales

(Figure 7.25)

56

Four Types of Measurement Scales

Measurement
Scale

(Table 7.2)

Characteristics

Nominal

Groups and labels data only;


reports frequencies or percentages.

Ordinal
indicate ranking.

Ranks data; uses numbers only to

Interval

Assumes that equal differences between


scores really mean equal differences in
the variable used.

Ratio

All of the above, plus true zero point.

57

Characteristics of the
Types of Data
Level of Measurement
Characteristic

Nominal

Ordinal

Interval

Ratio

Distinctiveness

Yes

Yes

Yes

Yes

Ordering in magnitude

No

Yes

Yes

Yes

Equal intervals

No

No

Yes

Yes

Absolute zero

No

No

No

Yes

Copyright 2000 - 2009 by Michael J. Miller. All rights reserved.

58

A Nominal Scale of Measurement (Figure 7.26)

59

An Ordinal Scale: The Winner of a Horse Race

60

(Figure 7.27)

Descriptive Statistics
Methods used to obtain indices that
characterize or summarize data
collected
Focus is on the sample(s) at hand
Simple description of:
Individuals
Collection of individuals

Used as basis for inferential statistics


61

Inferential Statistics
Methods that allow the researcher to
generalize the characteristics from a
set of sample data to a larger
population.
Concerned with:
Describing the population from the
sample
Testing differences between sample and
population, between two samples,
between two measures of the same
62
population.

Measurement error
Error variance--the extent of
variability in test scores that is
attributable to error rather than a
true measure of behavior.
Observed Score=true score + error
variance
(actual score obtained)
(chance/random error)

(stable score)
(systematic error)
63

Alat mengumpul maklumat


1.
2.
3.
4.
5.
6.

Ujian (bertulis dan tidak bertulis)


Soal selidik
Senarai semak
Skala kadar
Anekdot
Sosiogram

64

Validity, Reliability, and


VALIDITY
Objectivity
Valid Instrument one which measures what it
is supposed to measure
The authors of this text say that this is an oldfashioned definition and propose that a more
accurate definition of validity revolves around
the defensibility of the inferences researchers
make from the data collected through the use
of an instrument.
All researchers want instruments that permit
them to draw warranted conclusions about the
characteristics of the subjects they study.
65

Validity, Reliability, and


Objectivity
A reliable instrument is one
which gives consistent
results.
Objectivity refers to the
absence of subjective
judgments. Complete
objectivity is probably never
attained.
66

Usability
There are a number of practical considerations
every researcher needs to think about.
How easy will it be to use any instruments that are
designed?
How long will it take to administer?
Are the directions clear?
Is it appropriate for the ethnic or other groups to whom it
will be administered?
How easy is it to score?
How easy is it to interpret the results?
How much does it cost?
Do equivalent forms exist?
Have any problems been reported by others who used it?
Does evidence of its reliability and validity exist?

67

Validity
The accuracy of the measure in
reflecting the concept it is supposed
to measure.

68

Reliability
Stability and consistency of the
measuring instrument.
A measure can be reliable without
being valid, but it cannot be valid
without being reliable.

69

Faktor Yang Mempengaruhi Kebolehpercayaan

1. Panjang ujian atau bilangan item. Lebih


banyak item atau lebih panjang ujian, lebih
tinggi kebolehpercayaan.
2. Kepelbagaian kebolehan individu dalam
kumpulan. Keheterogen kebolehpercayaan
lebih tinggi bagi heterogen
berbanding
kumpulan homogen.
3. Kebolehan pelajar yang mengambil ujian.
Jika item terlalu sukar pelajar akan
meneka jawapan menyebabkan ketekalan
keputusan rendah.
70

Faktor Yang Mempengaruhi Kebolehpercayaan


4. Kaedah atau prosedur yang digunakan untuk menganggar
kebolehpercayaan.
Contoh - Kebolehpercayaan dpd
kaedah bentuk setara biasanya lebih rendah berbanding
prosedur uji dan ulang uji atau bentuk belah dua.
5. Pembolehubah yang diukur. Kebolehpercayaan umumnya
lebih tinggi bila kita mengukur pengetahuan atau
kemahiran berbanding sikap atau nilai.
Contohnya
mengukur pencapaian akademik keputusan biasanya
lebih konsisten berbanding sahsiah atau sikap.
6. Jenis ujian. Kebolehpercayaan bagi ujian objektif biasanya
lebih tinggi berbanding ujian esei disebabkan panjang
ujian dan juga perbezaan antara pemeriksa.
Skema
pemarkahan yang jelas boleh membantu mengurangkan
perbezaan
antara
pemeriksa
seterusnya
mempertingkatkan kebolehpercayaan ujian.

71

1. Kesahan Kandungan Kesahan dari segi sejauh


mana sesuatu ujian mewakili kandungan/sukatan
pelajaran yang telah diajar.
2. Kesahan Muka Kesahan dari segi sejauh mana
sesuatu ujian dapat mengukur sesuatu konstruk
tertentu seperti yang dipersepsikan oleh calon
yang menduduki ujian.
3. Kesahan Kriteria Kesahan dari segi sejauh
mana sesuatu ujian mempunyai hubungan
dengan ujian lain, sama ada yang ditadbirkan
secara serentak atau kemudian. Keupaayaan
meramal cth GRE
4. Kesahan Konstruk Kesahan dari segi sejauh
mana sesuatu ujian dapat mengukur sesuatu
konstruk tertentu. Ciri keguruan Medsi. Cth
murid yg lemah akademiknya, bantu guru baiki
kerusi bolehkah?
72

Kinds of evidence for


instrument
1.
2.

3.

Content-related evidence of validity content


and format of the instrument
Criterion-related evidence of validity
relationship between scores obtained using the
instrument and scores obtained using one or
more other instruments or measure (often
called a criterion) ie how strong is the
relationship
Construct-related evidence of validity nature
of the psychological construct or characteristics
being measured by the instrument ie how well
does this construct explain differences in the
behavior of individuals or their performance on
certain task?
73

Kesahan Konstruk Kesahan dari segi


sejauh mana sesuatu ujian dapat
mengukur sesuatu konstruk tertentu.
Kesahan Kandungan Kesahan dari segi
sejauh mana sesuatu ujian mewakili
kandungan/sukatan pelajaran yang telah
diajar.
Kesahan Kriteria Kesahan dari segi
sejauh mana sesuatu ujian mempunyai
hubungan dengan ujian lain, sama ada
yang ditadbirkan secara serentak atau
kemudian.
Kesahan Muka Kesahan dari segi sejauh
mana sesuatu ujian dapat mengukur
sesuatu konstruk tertentu seperti yang
dipersepsikan oleh calon yang menduduki
74

Face validity
Just on its face the instrument
appears to be a good measure of the
concept. intuitive, arrived at
through inspection
e.g. Concept=pain level
Measure=verbal rating scale rate your
pain from 1 to 10.
Face validity is sometimes considered a
subtype of content validity.
Question--is there any time when face validity is not
desirable?
75

Kesahan Konstruk
Untuk menentukan sejauhmana pencapaian dalam ujian
boleh ditafsir sebagai penting atau bermakna untuk
mengukur kualiti yang ingin diukur. Berguna untuk
pembolehubah berbentuk konsep dan tidak dapat diukur
dengan mudah seperti kecerdasan, kebimbangan, dan
personaliti.
Dalam kaedah ini, kita perlu menentukan terlebih dahulu
ciri-ciri atau indikator untuk menunjukkan kualiti yang
ingin diukur.
Contohnya bagi mengukur tahap sosio-ekonomi (SES),
indikator yang boleh digunakan termasuklah tahap
pendidikan; pendapatan; jenis pekerjaan; bilangan
tanggungan; perbelanjaan; harta dimiliki; dan kawasan
tempat tinggal.
76

Kesahan Konstruk
Bagi mengukur kemahiran penakulan sains dan
penyelesaian masalah pula, indikator yang
boleh dipertimbangkan termasuklah
kebolehan menerangkan alasan di sebalik idea
menganalisis perkaitan menggunakan graf,
carta atau jadual
menyelesaikan soalan yang tiada langkah
penyelesaian yang jelas atau serta merta
menghuraikan pemerhatian
menyusun objek atau peristiwa dalam urutan

dan menyatakan sebab.


77

Construct validity
Sensitivity of the instrument to
pick up minor variations in the
concept being measured.
Can an instrument to measure anxiety pick up
different levels of anxiety or just its presence
or absence? Measure two groups known to
differ on the construct.
Ways of arriving at construct validity
Hypothesis testing method
Convergent and divergent
Multitrait-multimatrix method
Contrasted groups approach
factor analysis approach
78

Kesahan Kriterion
Untuk menentukan sejauhmana pencapaian
dalam ujian yang dibina boleh meramalkan
pencapaian pada masa depan (kriterion
ramalan); atau sejauhmana ia berkait dengan
pencapaian ujian lalu yang telah diakui
kesahannya (kriterion ramalan); atau ujian
lain lain pada masa yang sama (kriterion
semasa).
Caranya
ialah
dengan
membandingkan prestasi dalam kedua-dua
ujian dan dapatkan pekali korelasi bagi
kedua-dua ujian tersebut.
79

Criterion related validity


The ability of a measure to measure
a criterion (usually set by the researcher).
If the criterion set for professionalism is nursing is
belonging to nursing organizations and reading
nursing journals, then couldnt we just count
memberships and subscriptions to come up with
a professionalism score.
Can you think of a simple criterion to measure leadership?

Concurrent and predictive validity are often listed as


forms of criterion related validity.

80

Concurrent validity
Correspondence of one measure of a
phenomenon with another of the
same construct.(administered at the same time)
Two tools are used to measure the same
concept and then a correlational analysis
is performed. The tool which is already
demonstrated to be valid is the gold
standard with which the other measure
must correlate.
81

Content validity
Content of the measure is justified
by other evidence, e.g. the
literature.
Entire range or universe of the
construct is measured.
Usually evaluated and scored by
experts in the content area.
A CVI (content validity index) of .80
or more is desirable.
82

Predictive validity
The ability of one measure to predict
another future measure of the same
concept.
If IQ predicts SAT, and SAT predicts QPA, then shouldnt IQ predict
QPA (we could skip SATs for admission decisions)
If scores on a parenthood readiness scale indicate levels of integrity,
trust, intimacy and identity couldnt this test be used to predict
successful achievement of the devleopmental tasks of adulthood?

The researcher is usually looking for a more efficient way to


measure a concept.

83

Kinds of evidence for


instrument
1.
2.

3.

Content-related evidence of validity content


and format of the instrument
Criterion-related
evidence
of
validity

relationship between scores obtained using the


instrument and scores obtained using one or
more other instruments or measure (often
called a criterion) ie how strong is the
relationship
Construct-related evidence of validity nature
of the psychological construct or characteristics
being measured by the instrument ie how well
does this construct explain differences in the
behavior of individuals or their performance on
certain task?
84

Reliability and Validity


Reliability
consistency of measurement across
time and judges

Validity
Extent to which scores on a test or
interview reflect true differences;
measures what it purports to measure

85

Reliability and Validity


(cont.)
Reliability does not indicate how useful the
measurement is
Validity is limited by how reliable the
measurement is (low reliability guarantees low
validity)
If measurements are reliable, they might be valid
If measurements are not reliable, they cannot be
valid
If measurements are valid, they must be reliable
86

Reliability
If a measurement has high reliability,
observed scores will be very close to
their true scores
Assessing reliability requires analyzing
the variability in a set of scores
Total variance in
Variance due
Variance due
a set of scores =
to true scores +
measurement error
87

Reliability (cont.)
Refers to the consistency of scores (not people!)
Important to know value(s) before you conduct
study
Correlation > .70 indicates acceptable reliability

Have other sources determined the reliability


and validity of the instrument?
Which approach to reliability makes the most
sense?
Stability (test-retest)
Equivalence
Internal consistency

88

Assessing Reliability
Reliability = True-score variance / Total
variance
Usually done via correlation
coefficients which express the
strength of the relationship between
two variables
Values can range from -1.00 to +1.00
Correlation of .00 indicates no
relationship
The sign indicates whether the89

Copyright 2000 - 2009 by Michael J. Miller. All rights reserved.

Reliability
Homogeneity, equivalence and
stability of a measure over time
and subjects. The instrument
yields the same results over
repeated measures and subjects.
Expressed as a correlation coefficient (degree of
agreement between times and subjects) 0 to
+1.
Reliability coefficient expresses the relationship
between error variance, true variance and the
observed score.
The higher the reliability coefficient, the lower
the error variance. Hence, the higher
the
90

Stability
The same results are obtained over
repeated administration of the
instrument.
Test-restest reliability
parallel, equivalent or alternate forms

91

Test-Retest reliability
The administration of the same
instrument to the same subjects two
or more times (under similar conditions--not
before and after treatment)

Scores are correlated and expressed


as a Pearson r. (usually .70 acceptable)

92

Parallel or alternate forms


reliability

Parallel or alternate forms of a test


are administered to the same
individuals and scores are correlated.
This is desirable when the researcher
believes that repeated administration
will result in test-wiseness
Sample: I am able to tell my partner how I feel
My partner tries to understand my feelings

93

Homogeneity
Internal consistency (unidimensional)
Item-total correlations
split-half reliability
Kuder-Richardson coefficient
Cronbachs alpha

94

Item to total correlations


Each item on an instrument is
correlated to total score--an item
with low correlation may be deleted.
Highest and lowest correlations are
usually reported.
Only important if you desire
homogeneity of items.

95

Spit Half reliability


Items are divided into two halves and
then compared. Odd, even items, or
1-50 and 51-100 are two ways to
split items.
Only important when homogenity and
internal consistency is desirable.

96

Kuder-Richardson coefficient
(KR-20)
Estimate of homogeneity when items
have a dichotomous response, e.g.
yes/no items.
Should be computed for a test on an
initial reliability testing, and
computed for the actual sample.
Based on the consistency of
responses to all of the items of a
single form of a test.
97

Cronbachs alpha
Likert scale or linear graphic
response format.
Compares the consistency of
response of all items on the scale.
May need to be computed for each
sample.

98

Equivalence
Consistency of agreement of
observers using the same measure
or among alternate forms of a tool.
Parallel or alternate forms (described
under stability)
Interrater reliability

99

Intertater reliability
Used with observational data.
Concordance between two or more
observers scores of the same event
or phenomenon.

100

Stability
Test-retest reliability
Degree of temporal stability of a
measuring instrument or test
Assessed by having instrument
completed by same people during two
different time periods
Problems / issues
Familiarity with test items (practice effect)
Attribute being measured should not change
over time
101

Equivalence
Alternate (parallel) - forms reliability
Degree of relatedness of different forms
of the same test
Fixes familiarity problem

Interrater reliability
The consistency among two or more
researchers / raters who observe and
record participants behavior

102

Internal Consistency
The degree of relatedness of individual
items measuring the same thing (i.e.,
factor / dimension)
How well items hang together
Interitem reliability
Assesses the degree of consistency among
the items on a scale (ideally exceeds .70)
Item-total correlation
Split-half reliability
Cronbachs alpha
103

Means of Classifying Data-Collection


Instruments

Who provides the information?


Researchers can get the
information:
1. Themselves, with little or no
involvement of other people.
2. Directly from the subjects of the
study.
3. From others, frequently referred
to as informants, who are
knowledgeable about the
104

Means of Classifying Data-Collection


Instruments (contd)
Researcher Instruments
Tally sheet
Field notes

Subject Instruments
Spelling tests
Questionnaire
Daily log

Informant Instruments
Rating scale
Anecdotal records
Interview schedule

105

Where Did the Instrument Come


From?
There are essentially two basic ways for a researcher
to acquire an instrument:
Find and administer a previously existing instrument of
some sort
Administer an instrument the researcher personally
developed or had developed by someone else this is not
easy to do. Development of a good instrument takes a
fair amount of time and effort, not to mention a
considerable amount of skill. The authors of our text do
not recommend it for those without a considerable amount
of time, energy and money to invest in the endeavor.

A number of already developed, useful, instruments


exist and can be found on-line

106

Written Response Versus


Performance
Written-response instruments objective tests, short
essay examinations, questionnaires, interview
schedules, rating scales, and checklists
Performance instruments any device designed to
measure either a procedure or a product
Procedures are ways of doing things such as mixing a
chemical solution, diagnosing a problem in an automobile,
writing a letter, solving a puzzle, or setting the margins on
a typewriter.
Products are the end result of procedures such as the
correct chemical solution, the correct diagnosis of the
automobiles problem or a properly typed letter.

Written-response instruments are generally preferred


over performance instruments.
107

Examples of Data-Collection
Instruments
Researcher completes

Rating Scales
Interview Schedules
Tally Sheets
Flowcharts
Performance Checklists
Anecdotal Records
Time-and-motion logs

Subject Completes

Questionnaires
Self-checklists
Attitude Scales
Personality Inventories
Achievement/aptitude
Tests
Performance Tests
Projective Devices
Sociometric Devices

108

Excerpt from a Behavior Rating Scale for


Teachers
(Figure 7.5)

Instructions: For each of the behaviors listed below, circle the


appropriate number, using the following key:
5 = Excellent, 4 = Above Average, 3 = Average,
2 = Below Average, 1 = Poor.
A. Explains course material clearly.
1
2
3
4
5
B. Establishes rapport with students.
1
2
3
4
5
C. Asks high-level questions.
1
2
3
4
5
D. Varies class activities.
1
2
3
4
5

109

Excerpt from a Graphic Rating Scale


(Figure 7.6)

Instructions: Indicate the quality of the students


participation in the following class activities by placing an X
anywhere along each line.
1. Listens to teachers instructions.
AlwaysFrequently Occasionally

Seldom

Never

1. Listens to the opinions of other students.


AlwaysFrequently Occasionally

Seldom

Never

1. Offers own opinions in class discussions.


AlwaysFrequently Occasionally

Seldom

Never

110

Participation Flowchart

111

(Figure 7.10)

Item Formats

Selection Items

True/false
Multiple choice
Matching
Interpretive

Supply Items
Short-answer items which
require the respondent to
supply a word, phrase,
number or symbol
Essay questions

Unobtrusive Measures data collection procedures that involve


no intrusion into the naturally occurring course of events.
Usually no instrument is required; only some form of
Recordkeeping..

112

Scale Development
Scales = the approach used to
measure concepts (constructs).
Two Options:
1.Use published scales.
2.Develop original scales.

113

MEASUREMENT SCALES
Types of Scales

Metric (interval & ratio)


Likert-type
Summated-Ratings (Likert)
Numerical
Semantic Differential
Graphic-Ratings
Nonmetric (nominal & ordinal)
Categorical
Constant Sum Method
Paired Comparisons
Rank Order
Sorting

114

MEASUREMENT SCALES Metric

Examples of Likert-Type Scales:


When I hear about a new restaurant , I eat there to see what it is
like.
Strongly
Agree
1

Agree
Somewhat
2

Neither Agree
Disagree
Strongly
or Disagree
Somewhat
Disagree
3
4
5

When I hear about a new restaurant , I eat there to see what it is


like.
Strongly
1

Strongly Agree
Disagree
4
5

115

MEASUREMENT SCALES Metric

Summated Ratings Scales:


A scaling technique in which respondents are asked to indicate
their degree of agreement or disagreement with each of a
number of statements. A subjects attitude score (summated
rating) is the total obtained by summing over the items in the
scale and dividing by the number of items to get the average.
Example:
My sales representative is
SD D N A
Courteous
___ ___ ___
Friendly
___ ___ ___ ___
Helpful
___ ___ ___ ___
Knowledgeable ___ ___ ___

. . . .
SA
___ ___
___
___
___ ___

116

MEASUREMENT SCALES Metric

Alternative Approach to Summated Ratings scales:


When I hear about a new restaurant , I eat there to see what it is
like.
Strongly
Agree
Agree
Somewhat
Disagree
1
2
3

Neither Agree
or Disagree
4

Disagree
Strongly
Somewhat
5

I always eat at new restaurants when someone tells me they are


good.
Strongly
Agree
1
2

Agree
Somewhat
3 4

Neither Agree
or Disagree
5

Disagree
Somewhat

Strongly
Disagree

This approach includes a separate labeled Likert scale with each item
(statement). The summated rating is a total of the responses for all the
items divided by the number of items.
117

MEASUREMENT SCALES Metric

Numerical Scales:
Example:
Using a 10-point scale, where 1 is not at all important
and 10 is very important, how important is ______ in
your decision to do business with a particular vendor.
Note: you fill in the blank with an attribute, such as reliable
delivery, product quality, complaint resolution, and so forth.

118

MEASUREMENT SCALES Metric

Semantic Differential Scales:


A scaling technique in which respondents are asked to
check which space between a set of bipolar adjectives or
phrases best describes their feelings toward the stimulus
object.
Example:
My sales representative is . . . .
Courteous ___ ___ ___ ___ ___ Discourteous
Friendly___ ___ ___ ___ ___ Unfriendly
Helpful ___ ___ ___ ___ ___ Unhelpful
Honest ___ ___ ___ ___ ___ Dishonest

119

MEASUREMENT SCALES Metric

Graphic-Ratings Scales:
A scaling technique in which respondents are asked to indicate their
ratings of an attribute by placing a check at the appropriate point
on a line that runs from one extreme of the attribute to the other.
Please evaluate each attribute in terms of how important the
attribute is to you personally (your company) by placing an X
at the position on the horizontal line that most reflects your
feelings.
Not Important
Very Important
Courteousness_____________________________________
Friendliness
_____________________________________
Helpfulness
_____________________________________
Knowledgeable
_____________________________________

120

MEASUREMENT SCALES Nonmetric

Categorical scale:
Categorical scales are nominally measured opinion
scales that have two or more response categories.
How satisfied are you with your current job?
[
[
[
[
[

]
]
]
]
]

Very Satisfied
Somewhat Satisfied
Neither Satisfied nor Dissatisfied
Somewhat Dissatisfied
Very Dissatisfied

Note: Some researchers consider this a metric scale when coded 1 5 .


121

MEASUREMENT SCALES Nonmetric

Constant-Sum Method:
A scaling technique in which respondents are asked to divide
some given sum among two or more attributes on the basis of
their importance to them.
Please divide 100 points among the following attributes in
terms of the relative importance of each attribute to you.
Courteous Service
____
Friendly Service
____
Helpful Service
____
Knowledgeable Service
____
Total
100
122

MEASUREMENT SCALES Nonmetric

Paired Comparison Method:


A scaling technique in which respondents are given
pairs of stimulus objects and asked which object in a
pair they prefer most.
Please circle the attribute describing a sales
representative which you consider most desirable.
Courteous versus
Knowledgeable
Friendly
versus
Helpful
Helpful
versus
Courteous

123

MEASUREMENT SCALES Nonmetric

Sorting:
A scaling technique in which respondents are
asked to indicate their beliefs or opinions by
arranging objects (items) on the basis of
perceived importance, similarity, preference or
some other attribute.

124

MEASUREMENT SCALES Nonmetric

Rank Order Method:


A scaling technique in which respondents are presented
with several stimulus objects simultaneously and asked
to order or rank them with respect to a specific
characteristic.
Please rank the following attributes on how important each is to
you in relation to a sales representative. Place a 1 beside the
attribute which is most important, a 2 next to the attribute that
is second in importance, and so on.
Courteous Service ___
Friendly Service___
Helpful Service ___
Knowledgeable Service ___
125

Scale Development
Practical Decisions When Developing
Scales:

Number of items (indicators) to measure a concept?


Number of scale categories?
Odd or even number of categories?

(Include neutral point ?)


Balanced or unbalanced scales?
Forced or non-forced choice?
(Include Dont Know ?)
Category labels for scales?
Scale reliability and validity?

126

Scale Development
Balanced vs. Unbalanced Scales?
Balanced:
To what extent do you consider TV shows with sex and
violence to be acceptable for teenagers to view?

__ Very Acceptable
__ Somewhat Acceptable
__ Neither Acceptable or Unacceptable
__ Somewhat Unacceptable
__ Very Unacceptable
Unbalanced:
__ Very Acceptable
__ Somewhat Acceptable
__ Unacceptable

127

Scale Development

Forced or Non-Forced?
How likely are you to purchase a laptop PC in the next six months?
Very
Unlikely
1

Very
Likely
5 6
__ No Opinion

128

Scale Development
Category Labels for Scales?

Verbal Label:
How important is the size of the hard drive in selecting a laptop PC to purchase?
Very
Somewhat
Neither Important
Somewhat
Very
Unimportant
Unimportant
or Unimportant
Important
Important
1
2
3
4
5
Numerical Label:
How likely are you to purchase a laptop PC in the next six months?
Very
Very
Unlikely
Likely
1
2
3
4
5
Unlabeled:
How important is the weight of the laptop PC in deciding which brand
to purchase?
Very
Very
Unimportant

___

Important

___

___

___

___
129

MEASUREMENT SCALES

Choosing a Measurement
Scale:

Capabilities of Respondents.
Context of Scale Application.
Data Analysis Approach.
Validity and Reliability.

130

MEASUREMENT SCALES

Assessing Measurement Scales:

Validity

Reliability

Measurement Error = occurs when the


values obtained in a survey (observed values)
are not the same as the true values
(population values).

131

RESEARCH DESIGN

Types of Errors:

Nonresponse = problem definition, refusal,


sampling, etc.
Response = respondent or interviewer.
Data Collection Instrument:
Construct Development.
Scaling Measurement.
Questionnaire Design/Sequence, etc.
Data Analysis.
Interpretation.

132

Types of Scores
Raw Scores not easily interpreted, since
they have little meaning
Derived Scores scores which have been
derived from raw scores into more useful
scores on some type of standardized basis
Age and Grade-level Equivalents
Percentile Ranks A percentile is the point below which
a certain percentage of scores fall. The 99 th percentile
is the point below which 99 percent of the scores fall.
Standard Scores
Z-scores
T-scores
133

Examples of Raw Scores and


Percentile
Ranks
(Table 7.1)
Raw
Cumulative
Percentile
Score
95
93
88
85
79
75
70
65
62
58
54
50

Frequency
1
1
2
3
1
4
6
2
1
1
2
1

Frequency
25
24
23
21
18
17
13
7
5
4
3
1

Rank
100
96
92
84
72
68
52
28
20
16
12
4

N = 25

134

Norm-Referenced Versus Criterion-Referenced Instruments

Norm-Referenced Instruments
All derived scores give meaning to individual scores by
comparing them to the scores of a group. This means that
the nature of the group is extremely important. The group
used to determine the derived scores is called the norm
group and instruments that provide such scores are
referred to as norm-referenced instruments.
Examples:
A student
Scored at the 50th percentile in his group
Scored above 90 percent of all the students in the class
Received a higher grade point average in English literature than
any other student in the school.
Ran faster than all but one other student on the team
And one other student in the class were the only ones to receive
As on the midterm
135

Norm-Referenced Versus Criterion-Referenced Instruments


(contd)

Criterion-Referenced Instruments This is usually a


test which focuses on instruction. It is based on a
specific goal, or target (called a criterion), for each
learner to achieve. The criterion for mastery is
usually stated as a fairly high percentage of
questions to be answered correctly.
Examples
A student

Spelled every word in the weekly spelling list correctly


Solved at least 75 percent of the assigned problems
Achieved a score of at least 80 out of 100 on the final exam
Did at least 25 push-ups within a five-minute period
Read a minimum of one nonfiction book a week
136

Norm-Referenced Versus Criterion-Referenced Instruments


(contd)

While a criterion-referenced test may be more useful


at times and in certain circumstances than the more
customary norm-referenced test, it is often inferior for
research purposes. This is mostly because a criterionreferenced test will provide much less variability of
scores, because it is easier. Whereas the usual normreferenced test will provide a range of scores
somewhat less than the possible range, a criterionreferenced test, if it is true to its rationale, will have
most of the students getting a high score. Because in
research we usually want maximum variability in order
to have any hope of finding relationships with other
variables, the use of criterion-referenced tests is often
self-defeating.
137

Measurement Scales
Reconsidered
There are two reasons why you should have at least a
rudimentary understanding of the differences among
these four types of scales.

They convey different amounts of information. If possible,


researchers should use the type of measurement scale that will
provide them with the maximum amount of information needed
to answer the research question being investigated.
Some types of statistical procedures are inappropriate for the
different scales. The way in which the data in a research study
are organized dictates the use of certain types of statistical
analyses.
Often researchers must decide whether to consider data ordinal
or interval level data. It would be possible to analyze the data
both ways and then make certain they are prepared to defend
the assumptions underlying each of these measurement scales.

138

Preparing Data for Analysis

Scoring the Data must be scored accurately and


consistently
If a commercially purchased instrument is used, scoring
procedures are made much easier. Usually a scoring
manual will be provided by the instrument developer.
The scoring of a self-developed test can produce difficulties,
and researchers should carefully prepare their scoring
plans, in writing, ahead of time, and try out their instrument
by administering and scoring it with a pilot group similar to
their population.
After scoring, data should be entered into a summary sheet.
Usually some sort of spreadsheet software is used.

139

Hypothetical Results of a Comparison of Two Counseling


Methods
(Table 7.3)

Score for
Rapport
96-100
91-95
86-90
81-85
76-80
71-75
66-70
61-65
56-60
51-55
46-50
41-45
36-40

Method A

Method B

0
0
0
2
2
5
6
9
4
5
2
0
0

0
2
3
3
4
3
4
4
5
3
2
0
1

N = 35

35

140

Kualiti ujian
Kesahan (validity) instrumen
mengukur apa yg ia kata ia ukur
Kesahan kandungan (JPK JPU)
Kesahan muka mempengaruhi cara
murid menjawab
Kesahan berkaitan konstruk cth
konstruk motivasi

141

Faktor2 yg mempengaruhi
kesahan
1.
2.
3.
4.
5.
6.
7.
8.

Arahan kurang jelas


Kosakata dan struktur ayat kabur
Masa yg diberikan tidak mencukupi
Aras kesukaran kurang sesuai
Bilangan item tidak mencukupi
Susunan item kurang tepat
Pola jawapan (ABCD) boleh diramal
Item yg dibina kurang baik
142

Faktor Yang Mempengaruhi Kebolehpercayaan

1. Panjang ujian atau bilangan item. Lebih


banyak item atau lebih panjang ujian, lebih
tinggi kebolehpercayaan.
2. Kepelbagaian kebolehan individu dalam
kumpulan. Keheterogen kebolehpercayaan
lebih tinggi bagi heterogen
berbanding
kumpulan homogen.
3. Kebolehan pelajar yang mengambil ujian.
Jika item terlalu sukar pelajar akan
meneka jawapan menyebabkan ketekalan
keputusan rendah.
143

Faktor Yang Mempengaruhi Kebolehpercayaan


4. Kaedah atau prosedur yang digunakan untuk menganggar
kebolehpercayaan.
Contoh - Kebolehpercayaan dpd
kaedah bentuk setara biasanya lebih rendah berbanding
prosedur uji dan ulang uji atau bentuk belah dua.
5. Pembolehubah yang diukur. Kebolehpercayaan umumnya
lebih tinggi bila kita mengukur pengetahuan atau
kemahiran berbanding sikap atau nilai.
Contohnya
mengukur pencapaian akademik keputusan biasanya
lebih konsisten berbanding sahsiah atau sikap.
6. Jenis ujian. Kebolehpercayaan bagi ujian objektif biasanya
lebih tinggi berbanding ujian esei disebabkan panjang
ujian dan juga perbezaan antara pemeriksa.
Skema
pemarkahan yang jelas boleh membantu mengurangkan
perbezaan
antara
pemeriksa
seterusnya
mempertingkatkan kebolehpercayaan ujian.

144

1. Kesahan Kandungan Kesahan dari segi sejauh


mana sesuatu ujian mewakili kandungan/sukatan
pelajaran yang telah diajar.
2. Kesahan Muka Kesahan dari segi sejauh mana
sesuatu ujian dapat mengukur sesuatu konstruk
tertentu seperti yang dipersepsikan oleh calon
yang menduduki ujian.
3. Kesahan Kriteria Kesahan dari segi sejauh
mana sesuatu ujian mempunyai hubungan
dengan ujian lain, sama ada yang ditadbirkan
secara serentak atau kemudian. Keupaayaan
meramal cth GRE
4. Kesahan Konstruk Kesahan dari segi sejauh
mana sesuatu ujian dapat mengukur sesuatu
konstruk tertentu. Ciri keguruan Medsi. Cth
murid yg lemah akademiknya, bantu guru baiki
kerusi bolehkah?
145

Pertimbangan umum dalam Perancangan dan


pembinaan ujian

1. Mengetahui kandungan pelajaran


dengan baik.
2. Mengetahui dan memahami
pelajar yang akan diuji
3. Berkemahiran
4. Kreatif
5. Kesahan dan Kebolehpercayaan
Ujian
146

Pertimbangan umum dalam Perancangan dan


pembinaan ujian

i. Mengetahui kandungan pelajaran dengan baik guru


perlulah menguasai dengan baik kandungan pelajaran yang
diajar. Ini penting bagi mempastikan yang guru dapat
menentukan apakah skop kandungan pelajaran yang
hendak diuji serta tahap kebolehan pelajar dalam
memahami topik-topik yang diajar.
ii. Mengetahui dan memahami pelajar yang akan diuji
ujian yang dirancang perlulah mengambil kira latar
belakang serta kebolehan pelajar. Ini perlu supaya guru
dapat menyesuaikan kandungan ujian, format ujian, item
ujian dengan tahap pelajar.
iii. Berkemahiran menulis item ujian memerlukan
kemahiran serta penguasaan bahasa yang baik supaya
dapat menghasilkan ujian yang berkualiti.
147

Pertimbangan umum dalam Perancangan dan


pembinaan ujian

iv. Kreatif menulis item ujian juga memerlukan kreativiti


bagi menghasilkan item-item yang sesuai dan menarik.
Penggunaan pelbagai media, rajah, simbol, gambar serta
lain-lain bentuk rangsangan atau stimulus akan menjadikan
item-item lebih pelbagai bentuk serta dapat mengukur
pelbagai aras kemahiran.
v. Kesahan dan Kebolehpercayaan Ujian Sejauhmanakah
ujian mengukur apa yang sepatutnya diukur adalah
merupakan soalan berkait dengan kesahan ujian. Guru
perlu mempastikan skop kandungan yang diuji merupakan
pengetahuan dan kemahiran yang telah diajar dan penting
untuk diketahui oleh pelajar.
Ini melibatkan kesahan
kandungan yang merupakan aspek penting dalam
penyediaan ujian. Di samping itu, ketekalan skor yang
dihasilkan oleh ujian juga perlu diperhatikan bagi
mempastikan keboleh percayaan ujian.
148

Proses Asas Pembinaan Ujian


1.
2.
3.
4.
5.
6.
7.
8.

Penentuan Tujuan Ujian


Menyediakan Jadual Penentuan Ujian
Penulisan Item
Menilai semula soalan
Analisis Item/Soalan
Pemilihan Soalan Yang Bermutu
Susunan Soalan
Percetakan Soalan.
149

Proses Asas Pembinaan Ujian


1. Penentuan Tujuan Ujian sebelum sesuatu ujian dibina, guru
perlu terlebih dahulu tentukan tujuan ujian diadakan. Adakah
untuk tujuan formatif, sumatif, penempatan atau diagnostik.
2. Menyediakan Jadual Penentuan Ujian - menentukan bidang
cakupan ujian - kandungan yang perlu diuji serta menentukan
aras kemahiran atau jenis perlakuan yang diharapkan
3. Penulisan Item- tentukan perlakuan yang akan diukur dengan
merujuk kepada objektif pengajaran. Di samping itu tentukan
jenis-jenis item yang sesuai
4. Menilai semula soalan - dikaji semula oleh rakan-rakan lain
atau jawatankuasa untuk memperbaiki aspek-aspek seperti
idea yang diuji, kemahiran yang diuji, format item, pokok
soalan, penyusunan ayat, struktur pilihan jawapan dan kunci
soalan.

150

Proses Asas Pembinaan Ujian


5. Analisis Item/Soalan - untuk mengetahui peratus pelajar yang
dapat menjawab sesuatu item dengan betul, keberkesanan
pengganggu, kuasa diskriminasi soalan dan sejauhmana soalan
menepati objektif pembelajaran.
6. Pemilihan Soalan Yang Bermutu - pemilihan soalan-soalan untuk
memenuhi JPU yang ditetapkan - berdasarkan analisis item
7. Susunan Soalan - Soalan yang terpilih disusun mengikut jenis item
untuk mengelakkan kekeliruan, memudahkan pelajar mengekalkan
mental set, memudahkan guru memeriksa. Soalan juga disusun
mengikut aras kesukaran, aktiviti mental berkembang dari mudah
ke kompleks, menimbulkan keyakinan dan motivasi dan jawapan
betul disusun mengikut random pattern.
8. Percetakan Soalan - kualiti percetakan adalah penting dan perkaraperkara seperti kualiti kertas, ruang antara soalan, penggunaan
gambar rajah serta dakwat perlu diberi perhatian.
151

152

Analytical Designs
Descriptions of historical, legal, or policy
issues through an analysis of documents, oral
histories, and relics
Two basic approaches
Concept analysis the study of educational concepts
(e.g., co-operative learning, leadership, etc.) to
describe the different meanings and the uses of the
concept
Historical analysis the systematic collection and
criticism of documents that describe past events of
relevance to education

153

Analytical Designs

An example of a concept analysis


The purpose of this study is to
examine the meanings and uses of
the term standards-based curriculum.
This study examined the varied
meanings, interpretations, and uses
of an important curricular concept.

154

Analytical Designs
An example of an historical
analysis
The purpose of this study is to
examine the changes in standardized
testing over the last 40 years.
This study addresses the historical
developments characterizing the use
of standardized tests over a 40 year
period.
155

Mixed Method Designs


The use of quantitative and
qualitative designs and methods
within a single study
Allows the researcher to better
match the approach to gathering
and analyzing data to the research
questions
Relative emphasis given to any
particular method varies widely
156

Action Research Design


Systematic investigation
Emphasis on teachers, counselors,
and administrators
Brings together characteristics of
systematic inquiry and practice

157

Anda mungkin juga menyukai