000 Pps 23 Nov 2012 - Research Design

RESEARCH DESIGN
Four Functions of Research

1. Basic: research designed to test or refine theory
2. Applied: research conducted in a field of
common practice and concerned with the
application and development of research based
knowledge
3. Action: research designed to solve a specific
classroom or school problem, improve practice,
or make a decision at a single local site
4. Evaluation: research designed to assess the
merit and worth or a specific practice in terms
of the values operating at a site
2
Perbezaan perlaksanaan kajian

konvensional & kajian tindakan
1.
2.
3.
4.
5.
6.
7.
8.
Kajian konvensional
Baca literatur
Kenalpasti masalah
Objektif kajian
Soalan kajian
Kumpul data
Analisa data
Kesimpulan
Penekanan kpd metodologi
khasnya rekabentuk kajian
instrumentasi kesahan &
kebolehpercayaan
1.
2.
3.
4.
5.
6.
7.
8.
9.
Kajian tindakan
Kenalpasti masalah (dengan bukti
kumpul data tinjauan awal)
Baca literatur
Rancang penyelesaian
Laksana
Kumpul data
Analisa data
Refleksi
Ulang semula Pilin kedua (jika
perlu)
Penekanan kpd penyelesaian
masalah tanpa generalisasi ke
setting lain
3
PROCESS OF DESIGNING AND CONDUCTING A RESEARCH

PROJECT:
What--What was studied?

What about--What aspects of
the subject were studied?
What for--What is/was the
significance of the study?
What did prior lit./research say?
What was done--How was the
study conducted?
What was found?
So what?
What now?
1. Introduction,
Research Problems/
Objectives, &
Justification
2. Literature Review
3. Methodology
(Research sample, data
collection, measurement,
data analysis)
4. Results & Discussion
5. Implications
6. Conclusions and
Recommendations for
4
Future Research
Research as Scientific
Inquiry
Scientific inquiry is the search for
knowledge using recognized methods in
data collection, analysis, and
interpretation
The purpose of scientific inquiry is to
develop knowledge
1. Describe phenomena
2. Examine empirical relationships
between or among phenomena
3. Test whether such relationships are
causal in nature
5
Educational Research
Lack of a single, appropriate
methodological approach to study
education
Two major approaches
Quantitative
Qualitative
Quantitative
Qualitative
Goals
tests theory,
establishes facts,
shows relationships,
predicts, or
statistically describes
develops grounded
theory, develops
understanding,
describes multiple
realities, captures
naturally occurring
behavior
Research design
highly structured,
formal, and specific
unstructured, flexible,
evolving
Participants
many participants
representative of the
groups from which
they were chosen
using probabilistic
sampling techniques
few participants chosen

using non-probabilistic
sampling techniques for
specific characteristics
of interest to the
researchers
Quantitative
Qualitative
Researchers role
detached, objective
observers of events
participant observers
reporting participants
perspectives
understood only after
developing long-term,
close, trusting
relationships with
participants
Context
manipulated and
controlled settings
naturalistic settings
numerical data
Data, data
collection, and data collected at specific
times from tests or
analysis
surveys and analyzed

statistically
narrative data
collected over a long
period of time from
observations and
interviews and
analyzed using
interpretive
techniques
8
BAB 3: METODOLOGI
3.1
3.2
Pengenalan
Reka Bentuk Kajian
Eksperimental
Bukan Eksperimental
Mengawal Extraneous Variable (kesahan dalaman &
luaran)
3.3
Populasi dan Sampel

Jenis, jumlah dan taburan populasi
Jumlah dan kaedah pemilihan sampel sesuai dgn reka
bentuk kajian
Kaedah persampelan (selari dgn soalan kajian)
3.4
Instrumen Kajian
Jenis instrumen
Sumber intrumen kebenaran penggunaan & alih bahasa
Kesahan dan kebolehpercayaan
Pentadbiran instrumen
9
3.5
BAB 3: METODOLOGI
Prosedur Kajian
Langkah-langkah dalam mengurus / mentadbir penyelidikan
3.6
Pengumpulan Data
Prosedur pengumpulan data
3.7 Analisis Data

Statistik deskriptif & inferensi
Program statistik diguna
Jenis analisis data
Aras Signifikan ujian statistik
3.8
Ujian Rintis
Menguji kesahan dan kebolehpercayaan instrumen
Sampel dpd populasi yang tidak terlibat dalam kajian sebenar
3.9
Rumusan
10
BAB 4: ANALISIS DAN DAPATAN
4.1 Pengenalan
4.2 Pengurusan Data
Coding
Jenis Data
Input ke dalam program statistik
Menguruskan Data
Pembentukan pembolehubah Transform: compute, recode dll.

Penentuan outlier & Statistical Assumptions
Analisis Kebolehpercayaan
4.3 Analisis Deskriptif
Frekuensi, Crosstab
Min & Sisihan Piawai
Korelasi
4.4 Analisis Inferen
Ujian-t: dependent / independent, one- / two-tailed

Anova One-way atau Factorial
Manova
Ancova Covariate
Regression / Multiple Regression
Chi-Square / Contingency Table / Crosstab
4.5 Rumusan
11
Differentiating characteristics
Goals
Quantitative: tests theory, establishes facts, shows
relationships, predicts, or statistically describes
Qualitative: develops grounded theory, develops
understanding, describes multiple realities, captures
naturally occurring behavior
Research design
Quantitative: highly structured, formal, and specific
Qualitative: unstructured, flexible, evolving
12
Participants
Quantitative: many participants representative of the
groups from which they were chosen using probabilistic
sampling techniques
Qualitative: few participants chosen using non-probabilistic
sampling techniques for specific characteristics of interest
to the researchers
Data, data collection, and data analysis

Quantitative: numerical data collected at specific times
from tests or surveys and analyzed statistically
Qualitative: narrative data collected over a long period of
time from observations and interviews and analyzed using
interpretive techniques
13
Researchers role
Quantitative: detached, objective observers of events
Qualitative: participant observers reporting participants
perspectives understood only after developing long-term,
close, trusting relationships with participants
Context
Quantitative: manipulated and controlled settings
Qualitative: naturalistic settings
14
Major Types of Research

Studies
Experimental: A type of research
used to establish cause-and-effect
relationships by manipulating
variables/treatments
Observational/Correlational: A
type of research that measures two
or more variables and looks to see
how the variables are related to each
other.
15
Classes of Research Design

1.
2.
3.
4.
Pre-experimental
Experimental
Quasi-experimental
Ex Post Facto
16
Pre-Experimental
Designs:
No Control Group and/or
Randomization
1. One-shot case study

2. One-group pretest-posttest design
3. Intact-group comparison
17
True Experimental Designs:

Control Group & Randomization
1. Posttest-only control-group
design
2. Pretest-posttest controlgroup design
3. Factorial experimental design
18
Quasi-Experimental Designs:
Control Group But No Randomization
1. Non-equivalent control group

design
2. Time-series designs
3. Others
19
Ex-Post Facto Designs:

Researcher Arrives After Treatment Is Given
Correlational designs
-- Simple predictive
-- Causal modeling
Criterion-group designs
20
Types of Research Designs

R e s e a rc h D e s ig n s
Q u a n tita tiv e
N o n -E x p e rim e n ta l
Q u a li t a t i v e
A n a ly t i c a l S t u d y
C a s e S tu d y
C o n c e p t A n a ly s i s
P h e n o m e n a o lo g y
H i s t o r i c a l A n a ly s i s
M ix e d M e th o d
E x p e r im e n ta l
D e s c r ip tiv e
T ru e
E th n o g r a p h y
C o m p a r a ti v e
Q ua si
G ro u n d e d T h e o ry
C o r r e la t i o n a l
S i n g le S u b je c t
C a u s a l C o m p a ra tiv e
Copyright Allyn & Bacon 2008
21
Quantitative Designs
Two major categories
Experimental
The investigation of causal effects through
direct manipulation of an independent
variable and control of extraneous variables
Non-experimental
The investigation of the current state of a
variable or the relationships, other than
causal, between variables
22
An example of an experimental design
Randomly assign students to one of two
classrooms in which the same social studies
unit is being taught. Teach the first class using
the traditional lecture approach, the second
class using co-operative learning groups.
Examine the achievement differences between
the two groups to see if the type of approach
to instruction had an effect.
This study is characterized by the investigation
of cause (instructional approach) and effect
(achievement), manipulation (choice
23 of
Differentiating the three types of experimental
designs
1. True experimental - Random assignment of
subjects to groups
2. Quasi-experimental - Non-random assignment of
subjects to groups
3. Pre-experimental one shot case study, pre-post

4. Single subject - Non-random selection of a single
subject
24
EXPERIMENTAL DESIGNS
One of the simplest experimental designs is the ONE GROUP PRETESTPOSTTEST DESIGN--EXAMPLE?
One way to examine Efficacy of a Drug:
O1
Measure
Patients Condition
(Pretest)
X
DRUG
Experimental
Condition/
intervention
O2
Measure
Patients Condition
(Posttest)
RESULT: Significant Improvement from O1 to O2

(i.e., sig. O2 - O1 difference)
QUESTION: Did X (the drug) cause the
improvement?
25
CONTROL GROUP simulates absence of X
Origin of using Control Groups (A tale from ancient Egypt)

Pretest Post-Test Control Group Design--Suppose random
assignment (R) was used to control confounding variables:
R
R
Exp. Group
Ctrl Group
O1E
O1C
O2E
O2C
RESULT: O2E > O1E & O2C Not> O1C

QUESTION: Did X cause the improvement in Exp.
Group?
26
NOT NECESSARILY! Why not?
Power of suggestibility (The Hawthorne Effect)
CONCLUSION?
Need proper form of controle.g., Placebo.
R Exp. Group O1E
X
O2E
R Ctrl Group
O1C
Placebo
O2C
QUESTION: Can we now conclude X caused the improvement
in Exp. Group?
Maybe, but be aware of the Experimenter Effect (it tends to

prejudice the results especially in medical research).
SOLUTION: Double Blind Experiments (neither the subjects
nor the experimenter knows who is getting the placebo/drug).
27
Experimental studies need to control for potential
confounding factors that may threaten internal validity
of the experiment:
Hawthorne Effect is only one potential confounding factor
in experimental studies.
Other such factors are:
History?
Biasing events that occur between pretest and post-test
Maturation?
Physical/biological/psychological changes in the subjects
Testing?
Exposure to pretest influences scores on post-test
Instrumentation?
Flaws in measurement instrument/procedure28
Experimental studies need to control for potential
confounding factors that may threaten internal validity
of the experiment (Continued):
Selection?
Subjects in experimental & control groups different from the start
Statistical Regression (regression toward the mean)?

Subjects selected based on extreme pretest values
Experimental Mortality?
Differential drop-out of subjects from experimental and control groups
during the study
Etc.
Experimental designs mostly used in natural and physical sciences.
Generally, higher internal validity, lower external
validity
29
30
31
Jenis rekabentuk
eksperimen
A. Rekabentuk satu pembolehubah

melibatkan satu pu bersandar
shj.
1. Rekabentuk pra-eksperimental o tidak

kawal/ malarkan pu ekstraneous wajar
dielakkan.
2. Rekabentuk eksperimental tulen okawalan yg lebih tinggi melalui penugasan
rawak
3. Rekabentuk eksperimen kuasi - guna
kumpulan2 sedia ada lebih baik dpd pra
eksperimen tp tidak sebaik eksperimen
tulen
32
B. Rekabentuk faktorial - melibatkan 2
Kajian rekabentuk eksperimen

(X rawatan/ treatment)
Rekabentuk eksperimen pra
Pra pasca tanpa kumpulan kawalan
O1 ----- X ------ O2
Rekabentuk eksperimen kuasi
Pra pasca dengan kumpulan
kawalan (intact group kumpulan
sedia ada tanpa penugasan rawak/
random assignment)
O1 ----- X1 ------ O2
O3 ------X2-- -----O4
Rekabentuk eksperimen tulen
Pra pasca dengan kumpulan
kawalan (pengagihan kpd kumpulan
adalah secara penugasan rawak)
R
O1 ----- X1 ------ O2
R
O3 ----X2-- -----O4
33
Examples of non-experimental designs

Approximately 10% of Louisianas public school students
do not finish high school.
The GPA of students participating in extra-curricular
activities is higher than that of student who do not
participate
Student attitude is moderately related to achievement
Several factors are related to the high dropout rate in
Louisiana. These include the students age, academic
record, repetition of grade(s), gender, and ethnicity.
These studies are characterized by descriptions (dropout
rate, GPA differences, opinions) or relationships
(attitudes and achievement, factors related to dropping
out)
34
Differentiating the four types of nonexperimental designs
1. Descriptive - Makes careful
descriptions of the current situation

or status of a variable(s) of interest
2. Comparative - Compares two or more
groups on some variable of interest
3. Correlational - Establishes a
relationship (i.e., non-causal)
between or among variables
4. Ex-post-facto (after the fact) - Explores
possible causes and effects among
35
variables that cannot be
Uji dan uji semula

Nama murid
Ujian 1
Ujian 2
Abu
76
78
Bakar
77
78
Chang
62
63
36
Korelasi
Perkaitan antara 2 skor ujian murid
yg sama
U
ji
a
n
2
Ujian 1
37
Korelasi
Magnitud
Arah
38
Correlation Coefficient (Pearson)
R: ranges from -1 (perfect negative correlation) to 1 (perfect

positive correlation)
r= 1.0
r= 1.0
r= -0.8
r= 0
Note: functions having different slopes may have

the same correlation coefficient
39
Korelasi
Calculation
Test significance of produced value

Significance level
Degree of freedom
40
CORRELATIONAL DESIGNS
NON-EXPERIMENTAL/CORRELATIONAL DESIGNS
The design of choice in social sciences since the phenomenon
under study is usually not reproducible in a laboratory setting
Researcher has little or no control over studys indep., dep.
and the numerous potential confounding variables,
Often the researcher concomitantly measures all the study
variables (e.g., independent, dependant, etc.),
Then examines the following types of relationships:
correlations among variables or
differences among groups,
Inability to control for effects of confounding variables makes

causal inferences regarding relationships among variables
more difficult and, thus:
Generally, higher external validity, lower internal validity
41
CORRELATIONAL STUDIES
WHEN IS IT SAFER TO INFER CAUSAL
LINKAGES FROM STRONG CORRELATIONS?
:
Covariation Rule (X and Y must be
correlated)--Necessary but not sufficient condition.
Temporal Precedence Rule (If X is the cause, Y
should not occur until after X).
Internal Validity Rule (Alternative plausible
explanations of Y and X-Y relationships should be
ruled out (i.e., eliminate other possible causes).
In practice, this means exercising caution by
identifying potential confounding variables and
42
controlling for their effects).
Paired Samples Statistics
Mean
Pair 1
Std. Deviation
Std. Error Mean
pretest
12.3667
30
2.38506
.43545
posttest
17.5333
30
1.85199
.33813
43
Paired Samples Statistics
Mean
Pair 1
Std. Deviation
Std. Error Mean
pretest
12.3667
30
2.38506
.43545
posttest
17.5333
30
1.85199
.33813
Paired Samples Test
Paired Differences
95% Confidence Interval of the
Pair 1
pretest posttest
Std.
Std. Error
Mean
Deviation
Mean
-5.16667
3.27038
.59709
Difference
Lower
-6.38785
Upper
-3.94549
t
-8.653
44
df
29
Sig. (2tailed)
.000
pretest
MINIMUM MARKS
MAXIMUM MARKS
20
AVERAGE MARKS
12
RANGE
12
posttes
t
MINIMUM MARKS
15
MAXIMUM MARKS
20
AVERAGE MARKS
18
RANGE
45
Kesimpulan analisis
Based on the provided analysis,
the minimum marks scored by
students after teacher initiated
lesson on Order of Adjectives is
15, while maximum is full mark of
20. The average marks is 18 and
the difference between maximum
and minimum marks is 5. After
teacher conducted lesson on Order
of Adjectives, with exposure using
various methods, most of students
manage to score up to 18/20 and
the range between lowest and
highest marks decreased from 8
(during pre test) to 5, which is a
clear indication that students
understood well the lesson.
Skor pra adalah min=

12.37, sp = 2.38 dan
skor pasca adalah
min17.53, sp=1.85.
Nilai p<.05, maka Ho
ditolak. Ini bermakna
terdapat peningkatan
skor yg signikan iaitu
pengajaran berjaya
46
Qualitative Designs
Much less precision in the definitions
of and distinctions between
qualitative designs in comparison to
quantitative designs
Four major categories of designs
1. Case study
2. Phenomenology
3. Ethnography
4. Grounded theory
47
Qualitative Designs
Case Study
An examination of a specific instance of a
phenomena in its natural context viewed
from the perspective of the participants
This study explored the meaning of
inclusion for three disabled students
who had been placed in a regular
education setting.
This study examines in-depth a
phenomena of interest to the researcher
(i.e., the meaning of inclusion) in a
natural context viewing it from the
participants perspectives
48
Qualitative Designs
Phenomenology
A description of the meaning of an
experience
The purpose of this study was to examine
the meaning of being left out for an
adolescent
This study examines in-depth the
experiences of being left out from the
perspectives of the adolescent experiencing
this phenomena
49
Qualitative Designs
Ethnography
A description the beliefs and practices of a
cultural or social group or system
The purpose of this study was to identify
and describe the conflicts that
experienced second-grade teachers
encountered as they switched from a
traditional approach to teaching
mathematics to a constructivistsociological approach
This study examines the beliefs and
practices of second grade teachers
experiencing a common phenomena
50
Qualitative Designs
Grounded theory
A description of a conceptual understanding of
a particular phenomenon
The purpose of this study was to understand
the relationship of the bar to the teachers
who frequented it on Friday evenings. We
found that teachers used the bar to facilitate
their movement from professional to
personal self.
This study examined a phenomena of
interest to the researcher (i.e., teachers
congregating at a particular bar on Friday
evenings) and developed a conceptual
understanding of it.
51
RESEARCH DESIGN
RESEARCH DESIGN: The blueprint/roadmap that will guide the
research.
The test for the quality of a studys research design is the
studys conclusion validity.
CONCLUSION VALIDITY refers to the extent of researchers
ability to draw accurate conclusions from the research. That is,
the degree of a studys:
a) Internal Validitycorrectness of conclusions regarding the
relationships among variables examined
Whether the research findings accurately reflect how the research variables are
really connected to each other.
b) External Validity Generalizability of the findings to the

intended/appropriate population/setting
Whether appropriate subjects were selected for conducting52
the study
Classification of Variables
Independent Variable: a variable that is
manipulated, measured or selected by the
researcher in order to observe its relation to the
subject's "response on another variable. An
antecedent condition.
Dependent Variable: the variable that is
observed and measured in response to an
independent variable.
Intervening Variable: a hypothetical variable
that is not observed directly in the research
study, but is inferred from the relationship
between the independent and dependent
53
variables
Dependent variable - The values of the dependent variable depend upon

another variable, the independent variable.
Intervening variable - A variable that explains a relation or provides a
causal link between other variables. Also called by some authors
mediating variable or intermediary variable. Example: The statistical
association between income and longevity needs to be explained because
just having money does not make one live longer. Other variables
intervene between money and long life. People with high incomes tend to
have better medical care than those with low incomes. Medical care is an
intervening variable. It mediates the relation between income and
longevity.
Mediating variable - Synonym for intervening variable. Example: Parents
transmit their social status to their children directly, but they also do so
indirectly, through education: viz. Parents status childs education
childs status
Moderating variable - A variable that influences, or moderates, the relation
between two other variables and thus produces an interaction effect
Confounding variable - A variable that obscures the effects of another
variable. If one elementary reading teacher used a phonics textbook in her
class and another instructor used a whole language textbook in his class,
and students in the two classes were given achievement tests to see how
well they read, the independent variables (teacher effectiveness
and
54
textbooks) would be confounded. There is no way to determine if
Skala
1.
2.
3.
4.
sedikit maklumat
Nominal lelaki/ perempuan
Ordinal 1-tidak suka, 2-suka, 3sangat suka
Interval (sela)
Ratio (nisbah)
. Banyak maklumat
55
Four Types of Measurement Scales
(Figure 7.25)
56
Four Types of Measurement Scales
Measurement
Scale
(Table 7.2)
Characteristics
Nominal
Groups and labels data only;

reports frequencies or percentages.
Ordinal
indicate ranking.
Ranks data; uses numbers only to
Interval
Assumes that equal differences between

scores really mean equal differences in
the variable used.
Ratio
All of the above, plus true zero point.
57
Characteristics of the
Types of Data
Level of Measurement
Characteristic
Nominal
Ordinal
Interval
Ratio
Distinctiveness
Yes
Yes
Yes
Yes
Ordering in magnitude
No
Yes
Yes
Yes
Equal intervals
No
No
Yes
Yes
Absolute zero
No
No
No
Yes
Copyright 2000 - 2009 by Michael J. Miller. All rights reserved.
58
A Nominal Scale of Measurement (Figure 7.26)
59
An Ordinal Scale: The Winner of a Horse Race
60
(Figure 7.27)
Descriptive Statistics
Methods used to obtain indices that
characterize or summarize data
collected
Focus is on the sample(s) at hand
Simple description of:
Individuals
Collection of individuals
Used as basis for inferential statistics

61
Inferential Statistics
Methods that allow the researcher to
generalize the characteristics from a
set of sample data to a larger
population.
Concerned with:
Describing the population from the
sample
Testing differences between sample and
population, between two samples,
between two measures of the same
62
population.
Measurement error
Error variance--the extent of
variability in test scores that is
attributable to error rather than a
true measure of behavior.
Observed Score=true score + error
variance
(actual score obtained)
(chance/random error)
(stable score)
(systematic error)
63
Alat mengumpul maklumat

1.
2.
3.
4.
5.
6.
Ujian (bertulis dan tidak bertulis)

Soal selidik
Senarai semak
Skala kadar
Anekdot
Sosiogram
64
Validity, Reliability, and

VALIDITY
Objectivity
Valid Instrument one which measures what it
is supposed to measure
The authors of this text say that this is an oldfashioned definition and propose that a more
accurate definition of validity revolves around
the defensibility of the inferences researchers
make from the data collected through the use
of an instrument.
All researchers want instruments that permit
them to draw warranted conclusions about the
characteristics of the subjects they study.
65
Validity, Reliability, and

Objectivity
A reliable instrument is one
which gives consistent
results.
Objectivity refers to the
absence of subjective
judgments. Complete
objectivity is probably never
attained.
66
Usability
There are a number of practical considerations
every researcher needs to think about.
How easy will it be to use any instruments that are
designed?
How long will it take to administer?
Are the directions clear?
Is it appropriate for the ethnic or other groups to whom it
will be administered?
How easy is it to score?
How easy is it to interpret the results?
How much does it cost?
Do equivalent forms exist?
Have any problems been reported by others who used it?
Does evidence of its reliability and validity exist?
67
Validity
The accuracy of the measure in
reflecting the concept it is supposed
to measure.
68
Reliability
Stability and consistency of the
measuring instrument.
A measure can be reliable without
being valid, but it cannot be valid
without being reliable.
69
Faktor Yang Mempengaruhi Kebolehpercayaan
1. Panjang ujian atau bilangan item. Lebih

banyak item atau lebih panjang ujian, lebih
tinggi kebolehpercayaan.
2. Kepelbagaian kebolehan individu dalam
kumpulan. Keheterogen kebolehpercayaan
lebih tinggi bagi heterogen
berbanding
kumpulan homogen.
3. Kebolehan pelajar yang mengambil ujian.
Jika item terlalu sukar pelajar akan
meneka jawapan menyebabkan ketekalan
keputusan rendah.
70

4. Kaedah atau prosedur yang digunakan untuk menganggar
kebolehpercayaan.
Contoh - Kebolehpercayaan dpd
kaedah bentuk setara biasanya lebih rendah berbanding
prosedur uji dan ulang uji atau bentuk belah dua.
5. Pembolehubah yang diukur. Kebolehpercayaan umumnya
lebih tinggi bila kita mengukur pengetahuan atau
kemahiran berbanding sikap atau nilai.
Contohnya
mengukur pencapaian akademik keputusan biasanya
lebih konsisten berbanding sahsiah atau sikap.
6. Jenis ujian. Kebolehpercayaan bagi ujian objektif biasanya
lebih tinggi berbanding ujian esei disebabkan panjang
ujian dan juga perbezaan antara pemeriksa.
Skema
pemarkahan yang jelas boleh membantu mengurangkan
perbezaan
antara
pemeriksa
seterusnya
mempertingkatkan kebolehpercayaan ujian.
71
1. Kesahan Kandungan Kesahan dari segi sejauh

mana sesuatu ujian mewakili kandungan/sukatan
pelajaran yang telah diajar.
2. Kesahan Muka Kesahan dari segi sejauh mana
sesuatu ujian dapat mengukur sesuatu konstruk
tertentu seperti yang dipersepsikan oleh calon
yang menduduki ujian.
3. Kesahan Kriteria Kesahan dari segi sejauh
mana sesuatu ujian mempunyai hubungan
dengan ujian lain, sama ada yang ditadbirkan
secara serentak atau kemudian. Keupaayaan
meramal cth GRE
4. Kesahan Konstruk Kesahan dari segi sejauh
mana sesuatu ujian dapat mengukur sesuatu
konstruk tertentu. Ciri keguruan Medsi. Cth
murid yg lemah akademiknya, bantu guru baiki
kerusi bolehkah?
72
Kinds of evidence for

instrument
1.
2.
3.
Content-related evidence of validity content

and format of the instrument
Criterion-related evidence of validity
relationship between scores obtained using the
instrument and scores obtained using one or
more other instruments or measure (often
called a criterion) ie how strong is the
relationship
Construct-related evidence of validity nature
of the psychological construct or characteristics
being measured by the instrument ie how well
does this construct explain differences in the
behavior of individuals or their performance on
certain task?
73
Kesahan Konstruk Kesahan dari segi

sejauh mana sesuatu ujian dapat
mengukur sesuatu konstruk tertentu.
Kesahan Kandungan Kesahan dari segi
sejauh mana sesuatu ujian mewakili
kandungan/sukatan pelajaran yang telah
diajar.
Kesahan Kriteria Kesahan dari segi
sejauh mana sesuatu ujian mempunyai
hubungan dengan ujian lain, sama ada
yang ditadbirkan secara serentak atau
kemudian.
Kesahan Muka Kesahan dari segi sejauh
mana sesuatu ujian dapat mengukur
sesuatu konstruk tertentu seperti yang
dipersepsikan oleh calon yang menduduki
74
Face validity
Just on its face the instrument
appears to be a good measure of the
concept. intuitive, arrived at
through inspection
e.g. Concept=pain level
Measure=verbal rating scale rate your
pain from 1 to 10.
Face validity is sometimes considered a
subtype of content validity.
Question--is there any time when face validity is not
desirable?
75
Kesahan Konstruk
Untuk menentukan sejauhmana pencapaian dalam ujian
boleh ditafsir sebagai penting atau bermakna untuk
mengukur kualiti yang ingin diukur. Berguna untuk
pembolehubah berbentuk konsep dan tidak dapat diukur
dengan mudah seperti kecerdasan, kebimbangan, dan
personaliti.
Dalam kaedah ini, kita perlu menentukan terlebih dahulu
ciri-ciri atau indikator untuk menunjukkan kualiti yang
ingin diukur.
Contohnya bagi mengukur tahap sosio-ekonomi (SES),
indikator yang boleh digunakan termasuklah tahap
pendidikan; pendapatan; jenis pekerjaan; bilangan
tanggungan; perbelanjaan; harta dimiliki; dan kawasan
tempat tinggal.
76
Kesahan Konstruk
Bagi mengukur kemahiran penakulan sains dan
penyelesaian masalah pula, indikator yang
boleh dipertimbangkan termasuklah
kebolehan menerangkan alasan di sebalik idea
menganalisis perkaitan menggunakan graf,
carta atau jadual
menyelesaikan soalan yang tiada langkah
penyelesaian yang jelas atau serta merta
menghuraikan pemerhatian
menyusun objek atau peristiwa dalam urutan
dan menyatakan sebab.

77
Construct validity
Sensitivity of the instrument to
pick up minor variations in the
concept being measured.
Can an instrument to measure anxiety pick up
different levels of anxiety or just its presence
or absence? Measure two groups known to
differ on the construct.
Ways of arriving at construct validity
Hypothesis testing method
Convergent and divergent
Multitrait-multimatrix method
Contrasted groups approach
factor analysis approach
78
Kesahan Kriterion
Untuk menentukan sejauhmana pencapaian
dalam ujian yang dibina boleh meramalkan
pencapaian pada masa depan (kriterion
ramalan); atau sejauhmana ia berkait dengan
pencapaian ujian lalu yang telah diakui
kesahannya (kriterion ramalan); atau ujian
lain lain pada masa yang sama (kriterion
semasa).
Caranya
ialah
dengan
membandingkan prestasi dalam kedua-dua
ujian dan dapatkan pekali korelasi bagi
kedua-dua ujian tersebut.
79
Criterion related validity

The ability of a measure to measure
a criterion (usually set by the researcher).
If the criterion set for professionalism is nursing is
belonging to nursing organizations and reading
nursing journals, then couldnt we just count
memberships and subscriptions to come up with
a professionalism score.
Can you think of a simple criterion to measure leadership?
Concurrent and predictive validity are often listed as

forms of criterion related validity.
80
Concurrent validity
Correspondence of one measure of a
phenomenon with another of the
same construct.(administered at the same time)
Two tools are used to measure the same
concept and then a correlational analysis
is performed. The tool which is already
demonstrated to be valid is the gold
standard with which the other measure
must correlate.
81
Content validity
Content of the measure is justified
by other evidence, e.g. the
literature.
Entire range or universe of the
construct is measured.
Usually evaluated and scored by
experts in the content area.
A CVI (content validity index) of .80
or more is desirable.
82
Predictive validity
The ability of one measure to predict
another future measure of the same
concept.
If IQ predicts SAT, and SAT predicts QPA, then shouldnt IQ predict
QPA (we could skip SATs for admission decisions)
If scores on a parenthood readiness scale indicate levels of integrity,
trust, intimacy and identity couldnt this test be used to predict
successful achievement of the devleopmental tasks of adulthood?
The researcher is usually looking for a more efficient way to

measure a concept.
83
Kinds of evidence for

instrument
1.
2.
3.
Content-related evidence of validity content

and format of the instrument
Criterion-related
evidence
of
validity
relationship between scores obtained using the

instrument and scores obtained using one or
more other instruments or measure (often
called a criterion) ie how strong is the
relationship
Construct-related evidence of validity nature
of the psychological construct or characteristics
being measured by the instrument ie how well
does this construct explain differences in the
behavior of individuals or their performance on
certain task?
84
Reliability and Validity

Reliability
consistency of measurement across
time and judges
Validity
Extent to which scores on a test or
interview reflect true differences;
measures what it purports to measure
85
Reliability and Validity

(cont.)
Reliability does not indicate how useful the
measurement is
Validity is limited by how reliable the
measurement is (low reliability guarantees low
validity)
If measurements are reliable, they might be valid
If measurements are not reliable, they cannot be
valid
If measurements are valid, they must be reliable
86
Reliability
If a measurement has high reliability,
observed scores will be very close to
their true scores
Assessing reliability requires analyzing
the variability in a set of scores
Total variance in
Variance due
Variance due
a set of scores =
to true scores +
measurement error
87
Reliability (cont.)
Refers to the consistency of scores (not people!)
Important to know value(s) before you conduct
study
Correlation > .70 indicates acceptable reliability
Have other sources determined the reliability

and validity of the instrument?
Which approach to reliability makes the most
sense?
Stability (test-retest)
Equivalence
Internal consistency
88
Assessing Reliability
Reliability = True-score variance / Total
variance
Usually done via correlation
coefficients which express the
strength of the relationship between
two variables
Values can range from -1.00 to +1.00
Correlation of .00 indicates no
relationship
The sign indicates whether the89
Copyright 2000 - 2009 by Michael J. Miller. All rights reserved.
Reliability
Homogeneity, equivalence and
stability of a measure over time
and subjects. The instrument
yields the same results over
repeated measures and subjects.
Expressed as a correlation coefficient (degree of
agreement between times and subjects) 0 to
+1.
Reliability coefficient expresses the relationship
between error variance, true variance and the
observed score.
The higher the reliability coefficient, the lower
the error variance. Hence, the higher
the
90
Stability
The same results are obtained over
repeated administration of the
instrument.
Test-restest reliability
parallel, equivalent or alternate forms
91
Test-Retest reliability
The administration of the same
instrument to the same subjects two
or more times (under similar conditions--not
before and after treatment)
Scores are correlated and expressed

as a Pearson r. (usually .70 acceptable)
92
Parallel or alternate forms

reliability
Parallel or alternate forms of a test

are administered to the same
individuals and scores are correlated.
This is desirable when the researcher
believes that repeated administration
will result in test-wiseness
Sample: I am able to tell my partner how I feel
My partner tries to understand my feelings
93
Homogeneity
Internal consistency (unidimensional)
Item-total correlations
split-half reliability
Kuder-Richardson coefficient
Cronbachs alpha
94
Item to total correlations

Each item on an instrument is
correlated to total score--an item
with low correlation may be deleted.
Highest and lowest correlations are
usually reported.
Only important if you desire
homogeneity of items.
95
Spit Half reliability

Items are divided into two halves and
then compared. Odd, even items, or
1-50 and 51-100 are two ways to
split items.
Only important when homogenity and
internal consistency is desirable.
96
Kuder-Richardson coefficient
(KR-20)
Estimate of homogeneity when items
have a dichotomous response, e.g.
yes/no items.
Should be computed for a test on an
initial reliability testing, and
computed for the actual sample.
Based on the consistency of
responses to all of the items of a
single form of a test.
97
Cronbachs alpha
Likert scale or linear graphic
response format.
Compares the consistency of
response of all items on the scale.
May need to be computed for each
sample.
98
Equivalence
Consistency of agreement of
observers using the same measure
or among alternate forms of a tool.
Parallel or alternate forms (described
under stability)
Interrater reliability
99
Intertater reliability
Used with observational data.
Concordance between two or more
observers scores of the same event
or phenomenon.
100
Stability
Test-retest reliability
Degree of temporal stability of a
measuring instrument or test
Assessed by having instrument
completed by same people during two
different time periods
Problems / issues
Familiarity with test items (practice effect)
Attribute being measured should not change
over time
101
Equivalence
Alternate (parallel) - forms reliability
Degree of relatedness of different forms
of the same test
Fixes familiarity problem
Interrater reliability
The consistency among two or more
researchers / raters who observe and
record participants behavior
102
Internal Consistency
The degree of relatedness of individual
items measuring the same thing (i.e.,
factor / dimension)
How well items hang together
Interitem reliability
Assesses the degree of consistency among
the items on a scale (ideally exceeds .70)
Item-total correlation
Split-half reliability
Cronbachs alpha
103
Means of Classifying Data-Collection

Instruments
Who provides the information?

Researchers can get the
information:
1. Themselves, with little or no
involvement of other people.
2. Directly from the subjects of the
study.
3. From others, frequently referred
to as informants, who are
knowledgeable about the
104
Means of Classifying Data-Collection

Instruments (contd)
Researcher Instruments
Tally sheet
Field notes
Subject Instruments
Spelling tests
Questionnaire
Daily log
Informant Instruments
Rating scale
Anecdotal records
Interview schedule
105
Where Did the Instrument Come

From?
There are essentially two basic ways for a researcher
to acquire an instrument:
Find and administer a previously existing instrument of
some sort
Administer an instrument the researcher personally
developed or had developed by someone else this is not
easy to do. Development of a good instrument takes a
fair amount of time and effort, not to mention a
considerable amount of skill. The authors of our text do
not recommend it for those without a considerable amount
of time, energy and money to invest in the endeavor.
A number of already developed, useful, instruments

exist and can be found on-line
106
Written Response Versus

Performance
Written-response instruments objective tests, short
essay examinations, questionnaires, interview
schedules, rating scales, and checklists
Performance instruments any device designed to
measure either a procedure or a product
Procedures are ways of doing things such as mixing a
chemical solution, diagnosing a problem in an automobile,
writing a letter, solving a puzzle, or setting the margins on
a typewriter.
Products are the end result of procedures such as the
correct chemical solution, the correct diagnosis of the
automobiles problem or a properly typed letter.
Written-response instruments are generally preferred

over performance instruments.
107
Examples of Data-Collection
Instruments
Researcher completes
Rating Scales
Interview Schedules
Tally Sheets
Flowcharts
Performance Checklists
Anecdotal Records
Time-and-motion logs
Subject Completes
Questionnaires
Self-checklists
Attitude Scales
Personality Inventories
Achievement/aptitude
Tests
Performance Tests
Projective Devices
Sociometric Devices
108
Excerpt from a Behavior Rating Scale for

Teachers
(Figure 7.5)
Instructions: For each of the behaviors listed below, circle the

appropriate number, using the following key:
5 = Excellent, 4 = Above Average, 3 = Average,
2 = Below Average, 1 = Poor.
A. Explains course material clearly.
1
2
3
4
5
B. Establishes rapport with students.
1
2
3
4
5
C. Asks high-level questions.
1
2
3
4
5
D. Varies class activities.
1
2
3
4
5
109
Excerpt from a Graphic Rating Scale

(Figure 7.6)
Instructions: Indicate the quality of the students

participation in the following class activities by placing an X
anywhere along each line.
1. Listens to teachers instructions.
AlwaysFrequently Occasionally
Seldom
Never
1. Listens to the opinions of other students.

Seldom
Never
1. Offers own opinions in class discussions.

Seldom
Never
110
Participation Flowchart
111
(Figure 7.10)
Item Formats
Selection Items
True/false
Multiple choice
Matching
Interpretive
Supply Items
Short-answer items which
require the respondent to
supply a word, phrase,
number or symbol
Essay questions
Unobtrusive Measures data collection procedures that involve

no intrusion into the naturally occurring course of events.
Usually no instrument is required; only some form of
Recordkeeping..
112
Scale Development
Scales = the approach used to
measure concepts (constructs).
Two Options:
1.Use published scales.
2.Develop original scales.
113
MEASUREMENT SCALES
Types of Scales
Metric (interval & ratio)

Likert-type
Summated-Ratings (Likert)
Numerical
Semantic Differential
Graphic-Ratings
Nonmetric (nominal & ordinal)
Categorical
Constant Sum Method
Paired Comparisons
Rank Order
Sorting
114
MEASUREMENT SCALES Metric
Examples of Likert-Type Scales:

When I hear about a new restaurant , I eat there to see what it is
like.
Strongly
Agree
1
Agree
Somewhat
2
Neither Agree
Disagree
Strongly
or Disagree
Somewhat
Disagree
3
4
5

like.
Strongly
1
Strongly Agree
Disagree
4
5
115
Summated Ratings Scales:

A scaling technique in which respondents are asked to indicate
their degree of agreement or disagreement with each of a
number of statements. A subjects attitude score (summated
rating) is the total obtained by summing over the items in the
scale and dividing by the number of items to get the average.
Example:
My sales representative is
SD D N A
Courteous
___ ___ ___
Friendly
___ ___ ___ ___
Helpful
___ ___ ___ ___
Knowledgeable ___ ___ ___
. . . .
SA
___ ___
___
___
___ ___
116
Alternative Approach to Summated Ratings scales:

like.
Strongly
Agree
Agree
Somewhat
Disagree
1
2
3
Neither Agree
or Disagree
4
Disagree
Strongly
Somewhat
5
I always eat at new restaurants when someone tells me they are

good.
Strongly
Agree
1
2
Agree
Somewhat
3 4
Neither Agree
or Disagree
5
Disagree
Somewhat
Strongly
Disagree
This approach includes a separate labeled Likert scale with each item
(statement). The summated rating is a total of the responses for all the
items divided by the number of items.
117
Numerical Scales:
Example:
Using a 10-point scale, where 1 is not at all important
and 10 is very important, how important is ______ in
your decision to do business with a particular vendor.
Note: you fill in the blank with an attribute, such as reliable
delivery, product quality, complaint resolution, and so forth.
118
Semantic Differential Scales:

A scaling technique in which respondents are asked to
check which space between a set of bipolar adjectives or
phrases best describes their feelings toward the stimulus
object.
Example:
My sales representative is . . . .
Courteous ___ ___ ___ ___ ___ Discourteous
Friendly___ ___ ___ ___ ___ Unfriendly
Helpful ___ ___ ___ ___ ___ Unhelpful
Honest ___ ___ ___ ___ ___ Dishonest
119
Graphic-Ratings Scales:
A scaling technique in which respondents are asked to indicate their
ratings of an attribute by placing a check at the appropriate point
on a line that runs from one extreme of the attribute to the other.
Please evaluate each attribute in terms of how important the
attribute is to you personally (your company) by placing an X
at the position on the horizontal line that most reflects your
feelings.
Not Important
Very Important
Courteousness_____________________________________
Friendliness
_____________________________________
Helpfulness
_____________________________________
Knowledgeable
_____________________________________
120
MEASUREMENT SCALES Nonmetric
Categorical scale:
Categorical scales are nominally measured opinion
scales that have two or more response categories.
How satisfied are you with your current job?
[
[
[
[
[
]
]
]
]
]
Very Satisfied
Somewhat Satisfied
Neither Satisfied nor Dissatisfied
Somewhat Dissatisfied
Very Dissatisfied
Note: Some researchers consider this a metric scale when coded 1 5 .

121
Constant-Sum Method:
A scaling technique in which respondents are asked to divide
some given sum among two or more attributes on the basis of
their importance to them.
Please divide 100 points among the following attributes in
terms of the relative importance of each attribute to you.
Courteous Service
____
Friendly Service
____
Helpful Service
____
Knowledgeable Service
____
Total
100
122
Paired Comparison Method:

A scaling technique in which respondents are given
pairs of stimulus objects and asked which object in a
pair they prefer most.
Please circle the attribute describing a sales
representative which you consider most desirable.
Courteous versus
Knowledgeable
Friendly
versus
Helpful
Helpful
versus
Courteous
123
Sorting:
A scaling technique in which respondents are
asked to indicate their beliefs or opinions by
arranging objects (items) on the basis of
perceived importance, similarity, preference or
some other attribute.
124
Rank Order Method:

A scaling technique in which respondents are presented
with several stimulus objects simultaneously and asked
to order or rank them with respect to a specific
characteristic.
Please rank the following attributes on how important each is to
you in relation to a sales representative. Place a 1 beside the
attribute which is most important, a 2 next to the attribute that
is second in importance, and so on.
Courteous Service ___
Friendly Service___
Helpful Service ___
Knowledgeable Service ___
125
Scale Development
Practical Decisions When Developing
Scales:
Number of items (indicators) to measure a concept?

Number of scale categories?
Odd or even number of categories?
(Include neutral point ?)

Balanced or unbalanced scales?
Forced or non-forced choice?
(Include Dont Know ?)
Category labels for scales?
Scale reliability and validity?
126
Scale Development
Balanced vs. Unbalanced Scales?
Balanced:
To what extent do you consider TV shows with sex and
violence to be acceptable for teenagers to view?
__ Very Acceptable
__ Somewhat Acceptable
__ Neither Acceptable or Unacceptable
__ Somewhat Unacceptable
__ Very Unacceptable
Unbalanced:
__ Very Acceptable
__ Somewhat Acceptable
__ Unacceptable
127
Scale Development
Forced or Non-Forced?
How likely are you to purchase a laptop PC in the next six months?
Very
Unlikely
1
Very
Likely
5 6
__ No Opinion
128
Scale Development
Category Labels for Scales?
Verbal Label:
How important is the size of the hard drive in selecting a laptop PC to purchase?
Very
Somewhat
Neither Important
Somewhat
Very
Unimportant
Unimportant
or Unimportant
Important
Important
1
2
3
4
5
Numerical Label:
How likely are you to purchase a laptop PC in the next six months?
Very
Very
Unlikely
Likely
1
2
3
4
5
Unlabeled:
How important is the weight of the laptop PC in deciding which brand
to purchase?
Very
Very
Unimportant
___
Important
___
___
___
___
129
MEASUREMENT SCALES
Choosing a Measurement
Scale:
Capabilities of Respondents.
Context of Scale Application.
Data Analysis Approach.
Validity and Reliability.
130
MEASUREMENT SCALES
Assessing Measurement Scales:
Validity
Reliability
Measurement Error = occurs when the

values obtained in a survey (observed values)
are not the same as the true values
(population values).
131
RESEARCH DESIGN
Types of Errors:
Nonresponse = problem definition, refusal,

sampling, etc.
Response = respondent or interviewer.
Data Collection Instrument:
Construct Development.
Scaling Measurement.
Questionnaire Design/Sequence, etc.
Data Analysis.
Interpretation.
132
Types of Scores
Raw Scores not easily interpreted, since
they have little meaning
Derived Scores scores which have been
derived from raw scores into more useful
scores on some type of standardized basis
Age and Grade-level Equivalents
Percentile Ranks A percentile is the point below which
a certain percentage of scores fall. The 99 th percentile
is the point below which 99 percent of the scores fall.
Standard Scores
Z-scores
T-scores
133
Examples of Raw Scores and

Percentile
Ranks
(Table 7.1)
Raw
Cumulative
Percentile
Score
95
93
88
85
79
75
70
65
62
58
54
50
Frequency
1
1
2
3
1
4
6
2
1
1
2
1
Frequency
25
24
23
21
18
17
13
7
5
4
3
1
Rank
100
96
92
84
72
68
52
28
20
16
12
4
N = 25
134
Norm-Referenced Versus Criterion-Referenced Instruments
Norm-Referenced Instruments
All derived scores give meaning to individual scores by
comparing them to the scores of a group. This means that
the nature of the group is extremely important. The group
used to determine the derived scores is called the norm
group and instruments that provide such scores are
referred to as norm-referenced instruments.
Examples:
A student
Scored at the 50th percentile in his group
Scored above 90 percent of all the students in the class
Received a higher grade point average in English literature than
any other student in the school.
Ran faster than all but one other student on the team
And one other student in the class were the only ones to receive
As on the midterm
135

(contd)
Criterion-Referenced Instruments This is usually a

test which focuses on instruction. It is based on a
specific goal, or target (called a criterion), for each
learner to achieve. The criterion for mastery is
usually stated as a fairly high percentage of
questions to be answered correctly.
Examples
A student
Spelled every word in the weekly spelling list correctly

Solved at least 75 percent of the assigned problems
Achieved a score of at least 80 out of 100 on the final exam
Did at least 25 push-ups within a five-minute period
Read a minimum of one nonfiction book a week
136

(contd)
While a criterion-referenced test may be more useful

at times and in certain circumstances than the more
customary norm-referenced test, it is often inferior for
research purposes. This is mostly because a criterionreferenced test will provide much less variability of
scores, because it is easier. Whereas the usual normreferenced test will provide a range of scores
somewhat less than the possible range, a criterionreferenced test, if it is true to its rationale, will have
most of the students getting a high score. Because in
research we usually want maximum variability in order
to have any hope of finding relationships with other
variables, the use of criterion-referenced tests is often
self-defeating.
137
Measurement Scales
Reconsidered
There are two reasons why you should have at least a
rudimentary understanding of the differences among
these four types of scales.
They convey different amounts of information. If possible,

researchers should use the type of measurement scale that will
provide them with the maximum amount of information needed
to answer the research question being investigated.
Some types of statistical procedures are inappropriate for the
different scales. The way in which the data in a research study
are organized dictates the use of certain types of statistical
analyses.
Often researchers must decide whether to consider data ordinal
or interval level data. It would be possible to analyze the data
both ways and then make certain they are prepared to defend
the assumptions underlying each of these measurement scales.
138
Preparing Data for Analysis
Scoring the Data must be scored accurately and

consistently
If a commercially purchased instrument is used, scoring
procedures are made much easier. Usually a scoring
manual will be provided by the instrument developer.
The scoring of a self-developed test can produce difficulties,
and researchers should carefully prepare their scoring
plans, in writing, ahead of time, and try out their instrument
by administering and scoring it with a pilot group similar to
their population.
After scoring, data should be entered into a summary sheet.
Usually some sort of spreadsheet software is used.
139
Hypothetical Results of a Comparison of Two Counseling

Methods
(Table 7.3)
Score for
Rapport
96-100
91-95
86-90
81-85
76-80
71-75
66-70
61-65
56-60
51-55
46-50
41-45
36-40
Method A
Method B
0
0
0
2
2
5
6
9
4
5
2
0
0
0
2
3
3
4
3
4
4
5
3
2
0
1
N = 35
35
140
Kualiti ujian
Kesahan (validity) instrumen
mengukur apa yg ia kata ia ukur
Kesahan kandungan (JPK JPU)
Kesahan muka mempengaruhi cara
murid menjawab
Kesahan berkaitan konstruk cth
konstruk motivasi
141
Faktor2 yg mempengaruhi
kesahan
1.
2.
3.
4.
5.
6.
7.
8.
Arahan kurang jelas

Kosakata dan struktur ayat kabur
Masa yg diberikan tidak mencukupi
Aras kesukaran kurang sesuai
Bilangan item tidak mencukupi
Susunan item kurang tepat
Pola jawapan (ABCD) boleh diramal
Item yg dibina kurang baik
142
1. Panjang ujian atau bilangan item. Lebih

banyak item atau lebih panjang ujian, lebih
tinggi kebolehpercayaan.
2. Kepelbagaian kebolehan individu dalam
kumpulan. Keheterogen kebolehpercayaan
lebih tinggi bagi heterogen
berbanding
kumpulan homogen.
3. Kebolehan pelajar yang mengambil ujian.
Jika item terlalu sukar pelajar akan
meneka jawapan menyebabkan ketekalan
keputusan rendah.
143

4. Kaedah atau prosedur yang digunakan untuk menganggar
kebolehpercayaan.
Contoh - Kebolehpercayaan dpd
kaedah bentuk setara biasanya lebih rendah berbanding
prosedur uji dan ulang uji atau bentuk belah dua.
5. Pembolehubah yang diukur. Kebolehpercayaan umumnya
lebih tinggi bila kita mengukur pengetahuan atau
kemahiran berbanding sikap atau nilai.
Contohnya
mengukur pencapaian akademik keputusan biasanya
lebih konsisten berbanding sahsiah atau sikap.
6. Jenis ujian. Kebolehpercayaan bagi ujian objektif biasanya
lebih tinggi berbanding ujian esei disebabkan panjang
ujian dan juga perbezaan antara pemeriksa.
Skema
pemarkahan yang jelas boleh membantu mengurangkan
perbezaan
antara
pemeriksa
seterusnya
mempertingkatkan kebolehpercayaan ujian.
144
1. Kesahan Kandungan Kesahan dari segi sejauh

mana sesuatu ujian mewakili kandungan/sukatan
pelajaran yang telah diajar.
2. Kesahan Muka Kesahan dari segi sejauh mana
sesuatu ujian dapat mengukur sesuatu konstruk
tertentu seperti yang dipersepsikan oleh calon
yang menduduki ujian.
3. Kesahan Kriteria Kesahan dari segi sejauh
mana sesuatu ujian mempunyai hubungan
dengan ujian lain, sama ada yang ditadbirkan
secara serentak atau kemudian. Keupaayaan
meramal cth GRE
4. Kesahan Konstruk Kesahan dari segi sejauh
mana sesuatu ujian dapat mengukur sesuatu
konstruk tertentu. Ciri keguruan Medsi. Cth
murid yg lemah akademiknya, bantu guru baiki
kerusi bolehkah?
145
Pertimbangan umum dalam Perancangan dan

pembinaan ujian
1. Mengetahui kandungan pelajaran

dengan baik.
2. Mengetahui dan memahami
pelajar yang akan diuji
3. Berkemahiran
4. Kreatif
5. Kesahan dan Kebolehpercayaan
Ujian
146

pembinaan ujian
i. Mengetahui kandungan pelajaran dengan baik guru

perlulah menguasai dengan baik kandungan pelajaran yang
diajar. Ini penting bagi mempastikan yang guru dapat
menentukan apakah skop kandungan pelajaran yang
hendak diuji serta tahap kebolehan pelajar dalam
memahami topik-topik yang diajar.
ii. Mengetahui dan memahami pelajar yang akan diuji
ujian yang dirancang perlulah mengambil kira latar
belakang serta kebolehan pelajar. Ini perlu supaya guru
dapat menyesuaikan kandungan ujian, format ujian, item
ujian dengan tahap pelajar.
iii. Berkemahiran menulis item ujian memerlukan
kemahiran serta penguasaan bahasa yang baik supaya
dapat menghasilkan ujian yang berkualiti.
147

pembinaan ujian
iv. Kreatif menulis item ujian juga memerlukan kreativiti

bagi menghasilkan item-item yang sesuai dan menarik.
Penggunaan pelbagai media, rajah, simbol, gambar serta
lain-lain bentuk rangsangan atau stimulus akan menjadikan
item-item lebih pelbagai bentuk serta dapat mengukur
pelbagai aras kemahiran.
v. Kesahan dan Kebolehpercayaan Ujian Sejauhmanakah
ujian mengukur apa yang sepatutnya diukur adalah
merupakan soalan berkait dengan kesahan ujian. Guru
perlu mempastikan skop kandungan yang diuji merupakan
pengetahuan dan kemahiran yang telah diajar dan penting
untuk diketahui oleh pelajar.
Ini melibatkan kesahan
kandungan yang merupakan aspek penting dalam
penyediaan ujian. Di samping itu, ketekalan skor yang
dihasilkan oleh ujian juga perlu diperhatikan bagi
mempastikan keboleh percayaan ujian.
148
Proses Asas Pembinaan Ujian

1.
2.
3.
4.
5.
6.
7.
8.
Penentuan Tujuan Ujian

Menyediakan Jadual Penentuan Ujian
Penulisan Item
Menilai semula soalan
Analisis Item/Soalan
Pemilihan Soalan Yang Bermutu
Susunan Soalan
Percetakan Soalan.
149

1. Penentuan Tujuan Ujian sebelum sesuatu ujian dibina, guru
perlu terlebih dahulu tentukan tujuan ujian diadakan. Adakah
untuk tujuan formatif, sumatif, penempatan atau diagnostik.
2. Menyediakan Jadual Penentuan Ujian - menentukan bidang
cakupan ujian - kandungan yang perlu diuji serta menentukan
aras kemahiran atau jenis perlakuan yang diharapkan
3. Penulisan Item- tentukan perlakuan yang akan diukur dengan
merujuk kepada objektif pengajaran. Di samping itu tentukan
jenis-jenis item yang sesuai
4. Menilai semula soalan - dikaji semula oleh rakan-rakan lain
atau jawatankuasa untuk memperbaiki aspek-aspek seperti
idea yang diuji, kemahiran yang diuji, format item, pokok
soalan, penyusunan ayat, struktur pilihan jawapan dan kunci
soalan.
150

5. Analisis Item/Soalan - untuk mengetahui peratus pelajar yang
dapat menjawab sesuatu item dengan betul, keberkesanan
pengganggu, kuasa diskriminasi soalan dan sejauhmana soalan
menepati objektif pembelajaran.
6. Pemilihan Soalan Yang Bermutu - pemilihan soalan-soalan untuk
memenuhi JPU yang ditetapkan - berdasarkan analisis item
7. Susunan Soalan - Soalan yang terpilih disusun mengikut jenis item
untuk mengelakkan kekeliruan, memudahkan pelajar mengekalkan
mental set, memudahkan guru memeriksa. Soalan juga disusun
mengikut aras kesukaran, aktiviti mental berkembang dari mudah
ke kompleks, menimbulkan keyakinan dan motivasi dan jawapan
betul disusun mengikut random pattern.
8. Percetakan Soalan - kualiti percetakan adalah penting dan perkaraperkara seperti kualiti kertas, ruang antara soalan, penggunaan
gambar rajah serta dakwat perlu diberi perhatian.
151
152
Analytical Designs
Descriptions of historical, legal, or policy
issues through an analysis of documents, oral
histories, and relics
Two basic approaches
Concept analysis the study of educational concepts
(e.g., co-operative learning, leadership, etc.) to
describe the different meanings and the uses of the
concept
Historical analysis the systematic collection and
criticism of documents that describe past events of
relevance to education
153
Analytical Designs
An example of a concept analysis

The purpose of this study is to
examine the meanings and uses of
the term standards-based curriculum.
This study examined the varied
meanings, interpretations, and uses
of an important curricular concept.
154
Analytical Designs
An example of an historical
analysis
The purpose of this study is to
examine the changes in standardized
testing over the last 40 years.
This study addresses the historical
developments characterizing the use
of standardized tests over a 40 year
period.
155
Mixed Method Designs

The use of quantitative and
qualitative designs and methods
within a single study
Allows the researcher to better
match the approach to gathering
and analyzing data to the research
questions
Relative emphasis given to any
particular method varies widely
156
Action Research Design

Systematic investigation
Emphasis on teachers, counselors,
and administrators
Brings together characteristics of
systematic inquiry and practice
157

000 Pps 23 Nov 2012 - Research Design

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

000 Pps 23 Nov 2012 - Research Design

Diunggah oleh

Hak Cipta:

Format Tersedia

RESEARCH DESIGN

Four Functions of Research

Perbezaan perlaksanaan kajian

PROCESS OF DESIGNING AND CONDUCTING A RESEARCH

What--What was studied?

few participants chosen

surveys and analyzed

Populasi dan Sampel

3.7 Analisis Data

BAB 4: ANALISIS DAN DAPATAN

Pembentukan pembolehubah Transform: compute, recode dll.

4.3 Analisis Deskriptif

4.4 Analisis Inferen

Ujian-t: dependent / independent, one- / two-tailed

Data, data collection, and data analysis

Major Types of Research

Classes of Research Design

1. One-shot case study

True Experimental Designs:

1. Non-equivalent control group

Ex-Post Facto Designs:

Types of Research Designs

3. Pre-experimental one shot case study, pre-post

RESULT: Significant Improvement from O1 to O2

CONTROL GROUP simulates absence of X

Origin of using Control Groups (A tale from ancient Egypt)

RESULT: O2E > O1E & O2C Not> O1C

Maybe, but be aware of the Experimenter Effect (it tends to

Statistical Regression (regression toward the mean)?

A. Rekabentuk satu pembolehubah

1. Rekabentuk pra-eksperimental o tidak

Kajian rekabentuk eksperimen

Examples of non-experimental designs

descriptions of the current situation

Uji dan uji semula

Correlation Coefficient (Pearson)

R: ranges from -1 (perfect negative correlation) to 1 (perfect

Note: functions having different slopes may have

Test significance of produced value

Inability to control for effects of confounding variables makes

Paired Samples Statistics

Std. Error Mean

Paired Samples Statistics

Std. Error Mean

Paired Samples Test

95% Confidence Interval of the

Skor pra adalah min=

b) External Validity Generalizability of the findings to the

Dependent variable - The values of the dependent variable depend upon

Four Types of Measurement Scales

Four Types of Measurement Scales

Groups and labels data only;

Ranks data; uses numbers only to

Assumes that equal differences between

All of the above, plus true zero point.

Copyright 2000 - 2009 by Michael J. Miller. All rights reserved.

A Nominal Scale of Measurement (Figure 7.26)

An Ordinal Scale: The Winner of a Horse Race

Used as basis for inferential statistics

Alat mengumpul maklumat

Ujian (bertulis dan tidak bertulis)

Validity, Reliability, and