Statistik Perniagaan
Webex 5
BAB 13: Regresi linear mudah
Y Y
X X
Y Y
X X
Jenis hubungan
Y Y
X X
Y Y
X X
Jenis hubungan
No relationship
X
Pengenalan kepada analisis
regresi
◼ Analisis regresi digunakan untuk:
◼ Meramal nilai pemboleh ubah bersandar berdasarkan
nilai sekurang-kurangnya satu pemboleh ubah tak
bersandar
◼ Menerangkan impak perubahan dalam satu
pemboleh ubah tak bersandar ke atas pemboleh
ubah bersandar
Pemboleh ubah bersandar: pemboleh ubah yang
kita hendak ramal atau terangkan
Pemboleh ubah tak bersandar: pemboleh ubah
yang kita gunakan untuk meramal atau
menerangkan pemboleh ubah bersandar
Regresi Linear Mudah
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = β 0 + β1Xi + ε i
Linear component Random Error
component
Regresi Linear Mudah
Y Yi = β 0 + β1Xi + ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Persamaan Regresi Linear
Mudah (garis ramalan)
Persamaan regresi linear mudah memberikan satu
anggaran tentang garis regresi populasi
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi = b 0 + b1X i
observation i
Kaedah Least Squares
dalam $1000s
◼ Pemboleh ubah tak bersandar (X) = kaki
persegi
Data
Harga rumah dalam
Kaki persegi
$1000s
(X)
(Y)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Plot serak
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Kaki persegi
Menggunakan Excel
3. Choose Regression
Menggunakan Excel
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350 Kecerunan
300
250
= 0.10977
200
150
100
50
Pintasan 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
where:
Y = nilai min pembolehubah bersandar
Yi = nilai cerapan pembolehubah bersandar
Yˆi = nilai ramalan Y untuk nilai Xi yang diberikan
Ukuran variasi
Y
Yi
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2
Y _
_ SSR = (Yi - Y)2 _
Y Y
Xi X
Pekali penentuan, r2
X
r2 =1
Contoh nilai r2
Y
0 < r2 < 1
X
Contoh nilai r2
r2 = 0
Y
No linear relationship
between X and Y:
SSE
(Yi − Yˆi ) 2
i =1
S YX = =
n−2 n−2
Where
SSE = error sum of squares
n = sample size
Simple Linear Regression Example:
Standard Error of Estimate in Excel
Regression Statistics
Multiple R 0.76211
R Square 0.58082
S YX = 41.33032
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Y Y
◼ Linearity
◼ The relationship between X and Y is linear
◼ Independence of Errors
◼ Error values are statistically independent
◼ Normality of Error
◼ Error values are normally distributed for any given
value of X
◼ Equal Variance (also called homoscedasticity)
◼ The probability distribution of the errors has constant
variance
Analisis Residual
e i = Yi − Ŷi
◼ Residual pemerhatian i, ei, merupakan perbezaan
antara nilai pemerhatian dan nilai ramalan
◼ Semak andaian regresi dengan memeriksa residual
◼ Examine for linearity assumption
◼ Evaluate independence assumption
◼ Evaluate normal distribution assumption
◼ Examine for constant variance for all levels of X
(homoscedasticity)
◼ Source: Wikipedia
Residual Analysis for Linearity
Y Y
x x
residuals
x residuals x
Not Linear
✓ Linear
Residual Analysis for
Independence
Not Independent
✓ Independent
residuals
residuals
X
residuals
X
Checking for Normality
Percent
100
0
-3 -2 -1 0 1 2 3
Residual
Residual Analysis for
Equal Variance
Y Y
x x
residuals
x residuals x
2 273.87671 38.12329 60
3 284.85348 -5.853484 40
Residuals
4 304.06284 3.937162 20
5 218.99284 -19.99284
0
6 268.38832 -49.38832 0 1000 2000 3000
-20
7 356.20251 48.79749
-40
8 367.17929 -43.17929
9 254.6674 64.33264 -60
15
◼ Here, residuals show a 10
5
cyclic pattern, not Residuals
0
random. Cyclical
-5 0 2 4 6 8
patterns are a sign of
-10
positive autocorrelation -15
Time (t)
i
e
i=1
2
▪ D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
Mengukur autokorelasi:
The Durbin-Watson Statistic
S YX S YX
S b1 = =
SSX (X i − X) 2
where:
S b1 = Estimate of the standard error of the slope
SSE
S YX = = Standard error of the estimate
n−2
Inferences About the Slope:
t Test
Sb1 = standard
d.f. = n − 2 error of the slope
Inferences About the Slope:
t Test Example
b1 − β1 0.10977 − 0
t STAT = = = 3.32938
b1 S b1 Sb 0.03297
1
Inferences About the Slope:
t Test Example
H0: β1 = 0
Test Statistic: tSTAT = 3.329 H1: β1 ≠ 0
d.f. = 10- 2 = 8
a/2=.025 a/2=.025
Decision: Reject H0
where SSR
MSR =
k
SSE
MSE =
n − k −1
where FSTAT follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
Chap 13-54
Ujian F untuk keertian
Output Excel
Regression Statistics
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 FSTAT = = = 11.0848
Adjusted R Square 0.52842 MSE 1708.1957
Standard Error 41.33032
Observations 10 With 1 and 8 degrees p-value for
of freedom the F-Test
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Chap 13-55
Ujian F untuk keertian
Chap 13-57
Strategi mengelak perangkap
analisis regresi
Chap 13-58
Strategi mengelak perangkap
analisis regresi
Chap 13-59
Tamat bab 13