JTW125 - Webex 5 PDF

JTW 125
Statistik Perniagaan
Webex 5
BAB 13: Regresi linear mudah
29 Mac 2019 9am – 10am

Objektif Pelajaran
◼ Menggunakan analisis regresi untuk meramalkan nilai

pemboleh ubah bersandar berdasarkan nilai pemboleh
ubah tak bersandar.
◼ Maksud pekali regresi b0 dan b1.
◼ Menilai andaian analisis regresi dan mengetahui
perkara yang harus dilakukan jika andaian tidak
dipatuhi
◼ Membuat kesimpulan tentang pekali kecerunan dan
korelasi
◼ Menganggarkan nilai min dan meramal nilai individu
Korelasi vs. Regresi
◼ Plot serak boleh digunakan untuk menunjukkan

hubungan antara dua pemboleh ubah
◼ Analisis korelasi digunakan untuk mengukur
kekuatan kaitan (hubungan linear) antara dua
pemboleh ubah
◼ Korelasi hanya tunjuk kekuatan hubungan
◼ Tiada kesan kausal tersirat dalam korelasi
◼ Plot serak dibincangkan pertama kali dalam Bab 2
◼ Korelasi dibincangkan pertama kali dalam Bab 3
Jenis hubungan
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Jenis hubungan
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Jenis hubungan
No relationship
X
Pengenalan kepada analisis
regresi
◼ Analisis regresi digunakan untuk:
◼ Meramal nilai pemboleh ubah bersandar berdasarkan
nilai sekurang-kurangnya satu pemboleh ubah tak
bersandar
◼ Menerangkan impak perubahan dalam satu
pemboleh ubah tak bersandar ke atas pemboleh
ubah bersandar
Pemboleh ubah bersandar: pemboleh ubah yang
kita hendak ramal atau terangkan
Pemboleh ubah tak bersandar: pemboleh ubah
yang kita gunakan untuk meramal atau
menerangkan pemboleh ubah bersandar
Regresi Linear Mudah
◼ Hanya ada satu pemboleh ubah tak

bersandar, X
◼ Hubungan antara X dan Y diterangkan
oleh satu fungsi linear
◼ Perubahan dalam Y diandaikan
berkaitan dengan perubahan dalam X
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = β 0 + β1Xi + ε i
Linear component Random Error
component
Y Yi = β 0 + β1Xi + ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Persamaan Regresi Linear
Mudah (garis ramalan)
Persamaan regresi linear mudah memberikan satu
anggaran tentang garis regresi populasi
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i intercept
Value of X for
Ŷi = b 0 + b1X i
observation i
Kaedah Least Squares
b0 dan b1 diperolehi dengan mencari nilai yang

meminimumkan jumlah perbezaan kuadrat (sum
of the squared differences) antara Ŷ dan Y :
min  (Yi −Ŷi ) = min  (Yi − (b 0 + b1Xi ))

2 2
Tafsiran kecerunan dan pintasan
◼ b0 adalah anggaran nilai purata Y apabila

nilai X adalah sifar
◼ b1 adalah anggaran perubahan dalam nilai
purata Y akibat daripada satu unit
peningkatan X
Contoh
◼ Seorang ejen hartanah ingin mengkaji
hubungan antara harga jualan sebuah rumah
dan saiznya (diukur dalam kaki persegi)
◼ Satu sampel rawak 10 rumah dipilih

◼ Pemboleh ubah bersandar (Y) = harga rumah
dalam $1000s
◼ Pemboleh ubah tak bersandar (X) = kaki
persegi
Data
Harga rumah dalam
Kaki persegi
$1000s
(X)
(Y)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Plot serak
Model harga rumah: Plot serak

450
400
Harga rumah ($1000s)
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Kaki persegi
Menggunakan Excel
1. Choose Data 2. Choose Data Analysis
3. Choose Regression
Menggunakan Excel
Enter Y rande and X rande and desired options

Output Excel
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 harga rumah = 98.24833 + 0.10977 (kaki persegi)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Perwakilan grafik
Model harga rumah: Plot serak dan Garis ramalan

450
400
House Price ($1000s)
350 Kecerunan
300
250
= 0.10977
200
150
100
50
Pintasan 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
harga rumah = 98.24833 + 0.10977 (kaki persegi)

Tafsiran bo
◼ b0 adalah anggaran nilai purata Y apabila nilai

X adalah sifar (jika X = 0 berada dalam julat
nilai pemerhatian X)
◼ Oleh kerana sebuah rumah tidak boleh
mempunyai keluasan persegi = 0, maka b0
tidak mempunyai aplikasi praktikal
Tafsiran b1
◼ b1 menganggar perubahan dalam nilai

purata Y akibat daripada satu unit
peningkatan dalam X
◼ Here, b1 = 0.10977 tells us that the mean value
of a house increases by 0.10977($1000) =
$109.77, on average, for each additional one
square foot of size
Membuat ramalan
Ramalkan harga sebuah rumah
dengan 2000 kaki persegi:

= 98.25 + 0.1098 (2000)
= 317.85
Harga ramalan sebuah rumah dengan

2000 kaki persegi ialah 317.85($1,000s) =
$317,850
Ukuran variasi
◼ SST = total sum of squares (Total Variation)

◼ Measures the variation of the Yi values around their
mean Y
◼ SSR = regression sum of squares (Explained Variation)
◼ Variation attributable to the relationship between X
and Y
◼ SSE = error sum of squares (Unexplained Variation)
◼ Variation in Y attributable to factors other than X
Ukuran variasi
◼ Jumlah variasi terdiri dari dua bahagian:
SST = SSR + SSE

Jumlah hasil hasil tambah hasil tambah kuasa
tambah kuasa kuasa dua regresi dua ralat
dua
SST =  ( Yi − Y )2 SSR =  ( Ŷi − Y )2 SSE =  ( Yi − Ŷi ) 2
where:
Y = nilai min pembolehubah bersandar
Yi = nilai cerapan pembolehubah bersandar
Yˆi = nilai ramalan Y untuk nilai Xi yang diberikan
Ukuran variasi
Y
Yi  
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2

Y  _
_ SSR = (Yi - Y)2 _
Y Y
Xi X
Pekali penentuan, r2
◼ Pekali penentuan (coefficient of

determination) adalah bahagian daripada
jumlah variasi dalam pemboleh ubah
bersandar yang dijelaskan oleh variasi
pemboleh ubah tak bersandar
◼ Juga dikenali sebagai r-squared dan diwakili
oleh r2
SSR regression sum of squares
r =
2
=
SST total sum of squares
note: 0  r2  1
Contoh nilai r2
Y
r2 = 1
Perfect linear relationship

between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X
X
r2 =1
Contoh nilai r2
Y
0 < r2 < 1
Weaker linear relationships

between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X
X
Contoh nilai r2
r2 = 0
Y
No linear relationship
between X and Y:
The value of Y does not

X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X)
r2 dalam Excel
SSR 18934.9348
r2 = = = 0.58082
Multiple R 0.76211 SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842 58.08% daripada variasi harga
Standard Error 41.33032 rumah diterangkan oleh variasi
Observations 10
dalam kaki persegi
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Ralat piawai anggaran
◼ Sisihan piawai variasi pemerhatian di sekitar

garis regresi diberikan oleh:
SSE
 (Yi − Yˆi ) 2
i =1
S YX = =
n−2 n−2
Where
SSE = error sum of squares
n = sample size
Simple Linear Regression Example:
Standard Error of Estimate in Excel
Multiple R 0.76211
R Square 0.58082
S YX = 41.33032
Adjusted R Square 0.52842
Observations 10
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957

Total 9 32600.5000

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Membandingkan ralat piawai
SYX is a measure of the variation of observed
Y values from the regression line
Y Y
small SYX X large SYX

X
The magnitude of SYX should always be judged relative to the

size of the Y values in the sample data
i.e., SYX = $41.33K is moderately small relative to house prices in
the $200K - $400K range
Andaian regresi
L.I.N.E
◼ Linearity
◼ The relationship between X and Y is linear
◼ Independence of Errors
◼ Error values are statistically independent
◼ Normality of Error
◼ Error values are normally distributed for any given
value of X
◼ Equal Variance (also called homoscedasticity)
◼ The probability distribution of the errors has constant
variance
Analisis Residual
e i = Yi − Ŷi
◼ Residual pemerhatian i, ei, merupakan perbezaan
antara nilai pemerhatian dan nilai ramalan
◼ Semak andaian regresi dengan memeriksa residual
◼ Examine for linearity assumption
◼ Evaluate independence assumption
◼ Evaluate normal distribution assumption
◼ Examine for constant variance for all levels of X
(homoscedasticity)
◼ Analisis grafik residual

◼ Boleh plot residual vs. X
Ralat (errors) vs Residual
◼ The error (or disturbance) of an observed

value is the deviation of the observed value
from the (unobservable) true value of a quantity
of interest (for example, a population mean),
and the residual of an observed value is the
difference between the observed value and
the estimated value of the quantity of interest
(for example, a sample mean).
◼ Source: Wikipedia
Residual Analysis for Linearity
Y Y
x x
residuals
x residuals x
Not Linear
✓ Linear
Residual Analysis for
Independence
Not Independent
✓ Independent
residuals
residuals
X
residuals
X
Checking for Normality
◼ Examine the Stem-and-Leaf Display of the

Residuals
◼ Examine the Boxplot of the Residuals
◼ Examine the Histogram of the Residuals
◼ Construct a Normal Probability Plot of the
Residuals
Residual Analysis for Normality
When using a normal probability plot, normal

errors will approximately display in a straight line
Percent
100
0
-3 -2 -1 0 1 2 3
Residual
Residual Analysis for
Equal Variance
Y Y
x x
residuals
x residuals x
Non-constant variance ✓ Constant variance

Simple Linear Regression
Example: Excel Residual Output
RESIDUAL OUTPUT
Predicted House Price Model Residual Plot
House Price Residuals
1 251.92316 -6.923162 80
2 273.87671 38.12329 60
3 284.85348 -5.853484 40
Residuals
4 304.06284 3.937162 20
5 218.99284 -19.99284
0
6 268.38832 -49.38832 0 1000 2000 3000
-20
7 356.20251 48.79749
-40
8 367.17929 -43.17929
9 254.6674 64.33264 -60
10 284.85348 -29.85348 Square Feet
Does not appear to violate

any regression assumptions
Mengukur autokorelasi:
The Durbin-Watson Statistic
◼ Digunakan apabila data dikumpul

merentasi masa untuk mengesan
kehadiran autokorelasi
◼ Autokorelasi wujud jika residual dalam satu
tempoh masa adalah berkait dengan
residual dalam tempoh masa lain
Autokorelasi
◼ Autokorelasi adalah korelasi ralat (residual)

merentasi masa
Time (t) Residual Plot
15
◼ Here, residuals show a 10
5
cyclic pattern, not Residuals
0
random. Cyclical
-5 0 2 4 6 8
patterns are a sign of
-10
positive autocorrelation -15
Time (t)
◼ Violates the regression assumption that

residuals are random and independent
◼ The Durbin-Watson statistic is used to test for

autocorrelation
H0: residuals are not correlated
H1: positive autocorrelation is present
n ▪ The possible range is 0 ≤ D ≤ 4

 (e − e i i−1 ) 2
▪ D should be close to 2 if H0 is true

D= i= 2
n
 i
e
i=1
2
▪ D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
Mengukur autokorelasi:
◼ A rule of thumb is that test statistic values

in the range of 1.5 to 2.5 are relatively
normal. Any value outside this range could
be a cause for concern. The Durbin–
Watson statistic, while displayed by many
regression analysis programs, is not
applicable in certain situations. For
instance, when lagged dependent
variables are included in the explanatory
variables, then it is inappropriate to use
this test. (Source: Investopedia)
Inferences About the Slope
◼ The standard error of the regression slope

coefficient (b1) is estimated by
S YX S YX
S b1 = =
SSX  (X i − X) 2
where:
S b1 = Estimate of the standard error of the slope
SSE
S YX = = Standard error of the estimate
n−2
Inferences About the Slope:
t Test
◼ t test for a population slope

◼ Is there a linear relationship between X and Y?
◼ Null and alternative hypotheses
◼ H0: β1 = 0 (no linear relationship)
◼ H1: β1 ≠ 0 (linear relationship does exist)
◼ Test statistic where:
b1 − β1
t STAT = b1 = regression slope
coefficient
Sb β1 = hypothesized slope
1
Sb1 = standard
d.f. = n − 2 error of the slope
t Test Example
House Price Estimated Regression Equation:

Square Feet
in $1000s
(x)
(y) house price = 98.25 + 0.1098 (sq.ft.)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Is there a relationship between the
405 2350 square footage of the house and its
324 2450 sales price?
319 1425
255 1700
t Test Example
H0: β1 = 0
From Excel output: H1: β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
From Minitab output: b1 S b1

Predictor Coef SE Coef T P
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
b1 − β1 0.10977 − 0
t STAT = = = 3.32938
b1 S b1 Sb 0.03297
1
t Test Example
H0: β1 = 0
Test Statistic: tSTAT = 3.329 H1: β1 ≠ 0
d.f. = 10- 2 = 8
a/2=.025 a/2=.025
Decision: Reject H0
There is sufficient evidence

Reject H0
-tα/2
Do not reject H0
tα/2
Reject H0 that square footage affects
0
-2.3060 2.3060 3.329 house price
t Test Example
H0: β1 = 0
From Excel output: H1: β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
From Minitab output:

Predictor Coef SE Coef T P p-value
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
Decision: Reject H0, since p-value < α

There is sufficient evidence that
square footage affects house price.
Ujian F untuk keertian (significance)
◼ F Test statistic: F MSR

STAT =
MSE
where SSR
MSR =
k
SSE
MSE =
n − k −1
where FSTAT follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
(k = the number of independent variables in the regression model)
Chap 13-54
Ujian F untuk keertian
Output Excel
Multiple R 0.76211
MSR 18934.9348
R Square 0.58082 FSTAT = = = 11.0848
Adjusted R Square 0.52842 MSE 1708.1957
Observations 10 With 1 and 8 degrees p-value for
of freedom the F-Test
ANOVA
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Chap 13-55
Ujian F untuk keertian
H0: β1 = 0 Ujian statistik:

H1: β1 ≠ 0 MSR
FSTAT = = 11.08
a = .05 MSE
df1= 1 df2 = 8
Keputusan:
Critical Tolak H0
Value:
Fa = 5.32
a = .05 Kesimpulan:
Terdapat bukti bererti untuk
0 F menyatakan bahawa saiz rumah
Do not Reject H0
reject H0 mempengaruhi harga jualan
F.05 = 5.32
pada aras keertian a = 0.05
Chap 13-56
Perangkap analisis regresi
◼ Kekurangan kesedaran mengenai andaian yang

mendasari regresi kuasa dua terkecil
◼ Tidak tahu cara menilai andaian
◼ Tidak tahu alternatif kepada regresi kuasa dua
terkecil sekiranya suatu andaian dilanggar
◼ Menggunakan model regresi tanpa pengetahuan
tentang subjek yang dikaji
◼ Penentuluaran (extrapolating) di luar julat relevan
Chap 13-57
Strategi mengelak perangkap
analisis regresi
◼ Mulakan dengan plot sebaran X vs. Y untuk

melihat hubungan yang mungkin
◼ Melaksanakan analisis residual untuk
memeriksa andaian
◼ Plotkan residual vs. X untuk memeriksa
pelanggaran andaian seperti homoskedastisiti
◼ Menggunakan histogram, paparan batang dan
daun, boxplot, atau plot taburan normal residual
untuk mendedahkan kemungkinan bukan normal
(non-normality)
Chap 13-58
Strategi mengelak perangkap
analisis regresi
◼ Sekiranya terdapat perlanggaran mana-mana

andaian, gunakan kaedah atau model alternatif
◼ Sekiranya tiada bukti perlanggaran andaian,
ujikan untuk keertian pekali regresi dan
membina selang keyakinan
◼ Elakkan dari membuat ramalan di luar julat
relevan
Chap 13-59
Tamat bab 13
Sekian, terima kasih.

JTW125 - Webex 5 PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

JTW125 - Webex 5 PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

JTW 125

29 Mac 2019 9am – 10am

◼ Menggunakan analisis regresi untuk meramalkan nilai

◼ Plot serak boleh digunakan untuk menunjukkan

Linear relationships Curvilinear relationships

Strong relationships Weak relationships

◼ Hanya ada satu pemboleh ubah tak

b0 dan b1 diperolehi dengan mencari nilai yang

min  (Yi −Ŷi ) = min  (Yi − (b 0 + b1Xi ))

◼ b0 adalah anggaran nilai purata Y apabila

◼ Satu sampel rawak 10 rumah dipilih

Model harga rumah: Plot serak

1. Choose Data 2. Choose Data Analysis

Enter Y rande and X rande and desired options

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Model harga rumah: Plot serak dan Garis ramalan

harga rumah = 98.24833 + 0.10977 (kaki persegi)

harga rumah = 98.24833 + 0.10977 (kaki persegi)

◼ b0 adalah anggaran nilai purata Y apabila nilai

harga rumah = 98.24833 + 0.10977 (kaki persegi)

◼ b1 menganggar perubahan dalam nilai

harga rumah = 98.25 + 0.1098 (kaki persegi)

Harga ramalan sebuah rumah dengan

◼ SST = total sum of squares (Total Variation)

◼ Jumlah variasi terdiri dari dua bahagian:

SST = SSR + SSE

SST =  ( Yi − Y )2 SSR =  ( Ŷi − Y )2 SSE =  ( Yi − Ŷi ) 2

◼ Pekali penentuan (coefficient of

Perfect linear relationship

Weaker linear relationships

The value of Y does not

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

◼ Sisihan piawai variasi pemerhatian di sekitar

Residual 8 13665.5652 1708.1957

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

small SYX X large SYX

The magnitude of SYX should always be judged relative to the

◼ Analisis grafik residual

◼ The error (or disturbance) of an observed

◼ Examine the Stem-and-Leaf Display of the

When using a normal probability plot, normal

Non-constant variance ✓ Constant variance

10 284.85348 -29.85348 Square Feet

Does not appear to violate

◼ Digunakan apabila data dikumpul

◼ Autokorelasi adalah korelasi ralat (residual)

◼ Violates the regression assumption that

◼ The Durbin-Watson statistic is used to test for

n ▪ The possible range is 0 ≤ D ≤ 4

▪ D should be close to 2 if H0 is true

◼ A rule of thumb is that test statistic values

◼ The standard error of the regression slope

◼ t test for a population slope

House Price Estimated Regression Equation:

From Minitab output: b1 S b1

There is sufficient evidence

From Minitab output:

Decision: Reject H0, since p-value < α

◼ F Test statistic: F MSR

(k = the number of independent variables in the regression model)

H0: β1 = 0 Ujian statistik:

◼ Kekurangan kesedaran mengenai andaian yang

◼ Mulakan dengan plot sebaran X vs. Y untuk