x = SAT score
GPA y = GPA
4.00
3.75
3.50
3.25
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300 350 400 450 500 550 600 650 700 750 800
Math SAT
Positive Correlation
as x increases y increases
Scatter Plots and Types of Correlation
x = hours of training
Accidents
60
y = number of accidents
50
40
30
20
10
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
Negative Correlation
as x increases, y decreases
Scatter Plots and Types of Correlation
x = height
y = IQ
IQ
160
150
140
130
IQ
120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation
Analisis Korelasi Menunjukkan
3 perkara penting, iaitu:
[ n x2 - ( x) 2 ] [ n y2 - ( y) 2 ]
-1 0 1
If r is close to If r is close to If r is close
-1 there is a 0 there is no to 1 there is
strong linear a strong
negative correlation positive
correlation correlation
Guildford Rule of Thumb
r Strength of Relationship
r-value Interpretation
0.00 No relationship
The same strength interpretations hold for negative values of r, only the direction
interpretations of the association would change.
Association Between Two Scores Degree and
strength of association
.20–.35:
When correlations range from .20 to .35, there is only a
slight relationship
.35–.65:
When correlations are above .35, they are useful for
limited prediction.
.66–.85:
When correlations fall into this range, good prediction
can result from one variable to the other. Coefficients
in this range would be considered very good.
.86 and above:
Correlations in this range are typically achieved for
studies of construct validity or test-retest reliability.
L1. Nyatakan hipotesis
Hipotesis penyelidikan –
Terdapat hubungan yang signifikan antara tahap
kepimpinan pengajaran Pengetua dengan
prestasi akademik sekolah di Sabah
Hipotesis nol/sifar –
Tiada terdapat hubungan yang signifikan antara
tahap kepimpinan pengajaran Pengetua dengan
prestasi akademik sekolah di Sabah
L2. TETAPKAN ARAS ALPHA = 0.01/ 0.05/ 0.10,
TABURAN PERSAMPELAN, STATISTIK PENGUJIAN
Nilai alpha ditetapkan oleh penyelidik.
Ia merupakan nilai penetapan bahawa penyelidik akan
menerima sebarang ralat semasa membuat keputusan
pengujian hipotesis tersebut.
Ralat yang sekecil-kecilnya ialah 0.01 (1%), 0.05 (5%)
atau 0.10(10%).
Nilai ini juga dipanggil nilai signifikan, aras signifikan,
atau aras alpha.
L2. Taburan Persampelan
Taburan yang bersesuaian dengan analisis yang
dijalankan. Ia merupakan model taburan
korelasi yang mana nilai korelasi itu bertabur
secara normal.
Di kawasan kritikal terletak nilai korelasi yang
“luar biasa” -> Ha adalah benar
Dikawasan tak kritikal terletak nilai korelasi
yang “biasa” -> Ho adalah benar
L3. Nilai Kritikal
Nilai kritikal adalah nilai yang menjadi sempadan
bagi kawasan Ho benar dan Hp benar.
Nilai ini merupakan nilai dimana penyelidik
meletakkan penetapan sama ada cukup bukti
untuk menolak Ho (maka boleh menerima Hp)
ataupun tidak cukup bukti menolak Ho
(menerima Ho).
Nilai ini bergantung kepada nilai alpha dan arah
pengujian hipotesis yang dilakukan.
L4. Nilai Statistik Pengujian
Ini adalah nilai yang dikira dan dijadikan bukti
sama ada hipotesis sifar benar atau salah.
Jika nilai statistik pengujian masuk dalam kawasan
kritikal maka Ho adalah salah, ditolak dan Hp
diterima
Jika nilai statistik pengujian masuk dalam kawasan
tak kritikal maka Ho adalah benar, maka terima
Ho.
L4. Nilai Statistik Pengujian
r diuji =
r diuji = 6 d 2
1
n n 1
2
L5. Membuat Keputusan, Kesimpulan dan
tafsiran
Jika nilai statistik pengujian masuk dalam
kawasan tak kritikal maka Ho adalah benar,
maka terima Ho.
L5. Membuat Keputusan, Kesimpulan dan
Tafsiran
Jika nilai statistik pengujian masuk dalam
kawasan kritikal maka Ho adalah tak benar, maka
Ho ditolak dan seterusnya, Hp diterima (bermakna
ada bukti Hp adalah benar)
Example of Pearson correlation
Data were collected from a randomly selected sample to
determine relationship between average assignment scores
and test scores in statistics. Distribution for the data is
presented in the table below. Assuming the data are normally
distributed.
1. Calculated an appropriate correlation
coefficient. Data set:
Assign Test
8.5 88
2. Describe the nature of relationship
6 66
between the two variable. 9 94
10 98
3. Test the hypothesis on the relationship 8 87
at 0.01 level of significance. 7 72
5 45
6 63
7.5 85
5 77
Calculate the test statistic
X Y XY X2 Y2
8.5 88 748 72.25 7744
6 66 396 36 4356
9 94 846 81 8836
10 98 980 100 9604
8 87 696 64 7569
7 72 504 49 5184
5 45 225 25 2025
6 63 378 36 3969
7.5 85 637.5 56.25 7225
5 77 385 25 5929
Steps in Hypothesis Testing
1. State the null and alternative hypothesis
HO: ρ p = 0, HA: ρ p ≠ 0
2. Calculate the test statistics: r = .865
1 1 1 1.5 1.5 0 0
2 2 1 3.5 1.5 2 4
3 3 2 5.5 3.5 2 4
4 4 3 7.5 6 1.5 2.25
5 5 4 9.5 8 1.5 2.25
6 1 3 1.5 6 -4.5 20.25
7 2 3 3.5 6 -2.5 6.25
8 3 2 5.5 3.5 2 4
9 4 5 7.5 10 -2.5 6.25
10 5 5 9.5 10 -.5 0.25
11 6 5 11 10 1 1
50.5
Make a decision: Reject the null hypothesis
hence accept research hypothesis.
Conclusion: There was a statistically significant
positive correlation between between ratings of
working environment and one’s work
commitment among employees (rho = 0.77, p <
0.05, N = 11).
r = 1 - [6D2]
n [ n2 - 1 ]
r = 1 - [ 6(50.5 )]
11 [ 121 - 1 ]
r = 1 – 0.229
r = 0.77
• Mean of group 1
• Mean of group 2
• Std dev of continuous variable
• No of subjects in group 1
• No of subjects in group 2
• Total no of subjects
Example on Point-biserial
correlation Marital status Need for Achievement
A psychologist hypothesizes an 2 3
association between marital 2 7
status (1-single, 2-married) and 1 12
need for achievement. A 1 16
questionnaire measuring need 1 24
for achievement is administered 2 11
to married and single people. 1 15
1. Calculate the appropriate 2 10
correlation coefficient 2 11
1 18
2. Describe the nature of 1 22
relationship between the two 2 9
variables. 1 19
1 17
3. Test the hypothesis on the
relationship at 0.05 level of
significance
Point-biserial Correlation
r = y1 – y2 [ n1 n2 ]
sy n[n-1]
Y’ = a + bx
Dan selanjutnya dengan mengguna
persamaan tersebut, nilai y boleh ditentukan
bagi sesuatu nilai x yang telah ditentukan dan
juga disebaliknya.
PERSAMAAN BAGI GARIS REGRESI
(LEAST-SQUARES REGRESSION LINE)
Y’ = a + bx
Y’ = Nilai anggaran bagi y
b = kecerunan bagi garis
tersebut
a = pintasan pada paksi y
KECERUNAN GARIS REGRESI
b = n[ xy] - [xy]
[ n x2 - ( x)2 ]
a=y–bx
Data: Tahap kepemimpinan pengetua dengan persepsi
guru terhadap tahap kepemimpinan pengetua
X Y
12 8
2 3
1 4
6 6
5 9
8 6
4 6
15 22
11 14
13 6
PENGIRAAN ANALISIS REGRESI
X Y XY X2 Y2
12 8
2 3
1 4
6 6
5 9
8 6
4 6
15 22
11 14
13 6
PENGIRAAN ANALISIS REGRESI
X Y XY X2 Y2
12 8 96 144 64
2 3 6 4 9
1 4 4 1 16
6 6 36 36 36
5 9 45 25 81
8 6 48 64 36
4 6 24 16 36
15 22 330 225 484
11 14 154 121 196
13 6 78 169 36
54
Example: Height vs. Weight
Graph One: Relationship between Height
and Weight
55
Example: Symptom Index vs Drug A
100
80 What Symptom Index might
60
40
we predict for a standard
20
dose of 150mg?
0
0 50 100 150 200 250
Drug A (dose in mg)
Correlation examples
57
Regression
Regression analysis procedures have as their
primary purpose the development of an
equation that can be used for predicting
values on some DV for all members of a
population.
A secondary purpose is to use regression
analysis as a means of explaining causal
relationships among variables.
The most basic application of regression analysis is the
bivariate situation, to which is referred as simple linear
regression, or just simple regression.
Simple regression involves a single IV and a single DV.
Goal: to obtain a linear equation so that we can predict
the value of the DV if we have the value of the IV.
Simple regression capitalizes on the correlation between
the DV and IV in order to make specific predictions
about the DV.
The correlation tells us how much information about
the DV is contained in the IV.
If the correlation is perfect (i.e r = ±1.00), the IV
contains everything we need to know about the DV,
and we will be able to perfectly predict one from the
other.
Regression analysis is the means by which we
determine the best-fitting line, called the regression
line.
Regression line is the straight line that lies closest to
all points in a given scatterplot
This line sometimes pass through the centroid of the
scatterplot.
Example: Symptom Index vs Drug A
120
100 We can now predict specific
80
60 values of one variable from
40
20
knowledge of the other
0
0 50 100 150 200 250 All points are close to the
Drug A (dose in mg) line
Example: Symptom Index vs Drug B
♠ Purpose
To determine relationship between two metric variables
To predict value of the dependent variable (Y) based on
value of independent variable (X)
♠ Requirement :
DV Interval / Ratio
IV Internal / Ratio
♠ Requirement :
The independent and dependent variables are normally
distributed in the population
The cases represents a random sample from the population
Simple Regression
How best to summarise the data?
160 180
140 160
140
120
120
Symptom Index
Symptom Index
100
100
80
80
60
60
40 40
20 20
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Drug A (dose in mg) Drug A (dose in mg)
180
160
140
100
(constant) 80
60
20
Y = dependent variable 0
0 50 100 150 200 250
X = independent variable
Simple Regression
R2 - “Goodness of fit”
300
DV R2 = 0
250
(0% - randomly scattered
200 points, no apparent
150 relationship between X
and Y)
100
50
Implies that a best-fit
line will be a very poor
0 description of data
0 100 200 300
IV (regressor, predictor)
Simple Regression
High values of R2
300
250
200
R2 = 1
DV
150
0
0 50 100 150 200 250
IV
Simple Regression
R2 - “Goodness of fit”
180 160
160 140
140
120
120
S ymptom Index
S ymptom Index
100
100
80
80
60
60
40 40
20 20
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Drug A (dose in mg) Drug B (dose in mg)
6
Line can then be used
5
4
to predict Y from X
3
0
0 2 4 6
73
Example: Symptom Index vs Drug A
120
100 values of one variable from
80
60 knowledge of the other
40
20 All points are close to the line
0
0 50 100 150 200 250
Drug A (dose in mg)
74
Regression
Establish equation for the best-fit line:
Y = a + bX
75
Regression - Types
Step –Descriptive Analysis
● Calculate a and b
a=y–b X
Ŷ = a + bX
Example on regression analysis Data set:
Scores
Data were collected from a randomly ID Assign Test
selected sample to determine 1 8.5 88
relationship between average
assignment scores and test scores in 2 6 66
statistics. Distribution for 3 9 94
the data is presented in the table 4 10 98
below. 5 8 87
6 7 72
1. Calculate coefficient of determination
and the correlation coefficient 7 5 45
8 6 63
2. Determine the prediction equation. 9 7.5 85
10 5 77
3. Test hypothesis for the slope at 0.05
level of significance
ID X Y
1. Derive Regression / Prediction equation 1 8.5 88
2 6 66
3 9 94
4 10 98
5 8 87
6 7 72
7 5 45
= 215.5 = 8.257 8 6 63
26.1 9 7.5 85
10 5 77
a= y – b x
Summary stat:
= 77.5 – 8.257 (7.2)
n 10
= 18.050 ΣΧ 72
ΣΥ 775
Prediction equation: ΣΧ² 544.5
ΣΥ² 62,441
Ŷ = 18.05 + 8.257X ΣΧΥ 5,795.5
Interpretation of regression equation
Ŷ = 18.05 + 8.257x
ΔY
18.05
ΔX
Example on regression analysis:
MARITAL SATISFACTION
Parents : X Children : Y
1 3
3 2
7 6
9 7
8 8
4 6
5 3
Mean of X Mean of Y
No of pairs
X Y
X squared X squared
Standard deviation Standard deviation
XY
1. Derive Regression / Prediction equation
a= y – b x
= 5.00 +.65 (5.29)
= 8.438
Prediction equation:
Ŷ = 8.44 + 65x
Interpretation of regression equation
Ŷ = 8.43 + .65x
ΔY
8.43
ΔX
ANALISIS “CHI-SQUARE”
(KUASA-DUA KHI)
Ini juga merupakan analisis hubungan tetapi lebih
dikenali sebagai analisis perkaitan (association)
Analisis ini digunakan pakai bagi menentukan perkaitan
antara pasangan pembolehubah yang diukur pada skala
nominal atau ordinal ataupun jika salah satunya
dipadankan dengan data sela dan nisbah.
Dengan itu pembolehubah seperti
Bangsa,
Jantina,
Suka/tidak suka makanan,
Tinggi pencapaian/rendah pencapaian,
Kebimbangan tinggi/ kebimbangan sederhana/
kebimbangan rendah
Data frekuensi dicerap dengan membilang kejadian
(occurance setiap perkara). Sesuai untuk kajian tinjauan
Daripada frekuensi yang dicerap (observed frequency)
analisis “chi-square” memberi kita makluman bahawa
ada/tiada perkaitan antara kedua-dua pemboleh ubah.
ANALISIS “CHI-SQUARE” (KUASA-DUA KHI)
KATAKANLAH, penyelidik mengumpul maklumat
tentang bangsa bagi responden dan juga kategori amalan
pemakanan setiap responden,
ATAU penyelidik tinjau pelajar dibeberapa buah sekolah
dari segi jantina dan minta/tidak minat kepada aliran
sains
ATAU penyelidik tinjau bapa-bapa dan mengumpul
maklumat tahap pendidikan (tinggi/ sederhana/ rendah)
dan dikaitkan dengan kategori gaji
Bagi ketiga-tiga contoh tersebut analisis yang sesuai
dijalankan adalah analisis tak parametrik (analisis kuasa-
dua khi)
dan seterusnya dibina jadual kontingensi atau
jadual“crosstabulation”.
Daripada frekuensi yang dicerap (observed frequency)
analisis “chi-square” memberi kita makluman bahawa
ada/tiada perkaitan antara kedua-dua pemboleh ubah.
ANALISIS “CHI-SQUARE”
(KUASA-DUA KHI)
Terdapat dua cara/kategori – CHI-SQUARE
TEST OF GOODNESS OF FIT dan TEST OF
INDEPENDENCE/DEPENDENCE
TEST GOODNESS OF FIT – menjawab
persoalan “adakah terdapat perbezaan kadar
bagi sesuatu perkara/kejadian/persetujuan”
TEST OF INDEPENDENCE/ DEPENDENCE –
menjawab persoalan “adakah terdapat
perkaitan/kebersandaran/ hubungan antara
dua perkara
ANALISIS “CHI-SQUARE”
(KUASA-DUA KHI)
Dapatan bagi analisis ini lazimnya dalam
bentuk jadual frekuensi yang dipanggil jadual
kontingensi atau jadual “crosstabulation”.
Daripada frekuensi yang dicerap (observed
frequency) analisis “chi-square” ini memberi
kita makluman bahawa ada/tiada perkaitan
yang signifikan antara kedua-dua
pembolehubah yang dikaji
Ataupun ada/tiada perbezaan frekuensi yang
signifikan antara kategori-kategori yang dikaji.
•Daripada jadual tersebut kita boleh telitikan atau
kajikan sama ada terdapat hubungan atau perkaitan
antara kedua-dua pemboleh ubah tersebut.