Anda di halaman 1dari 42

presented by:

Regresi Linier Sederhana


(RLS)
Dudi Barmana, M.Si.
2
Agenda
Persamaan RLS & asumsi yg mendasari
model
Pendugaan (titik & interval) parameter model
Pengujian parameter model dg Uji-t dan Uji-F
(Anova), serta penafsirannya
Korelasi dalam RLS: Koefisien korelasi linier
()
Ukuran penilaian kemampuan/kesesuaian
model
Prediksi menggunakan model
Today Quote
Jika seseorang merasa bahwa mereka tidak pernah
melakukan kesalahan selama hidupnya, maka
sebenarnya mereka tidak pernah mencoba hal-hal baru
dalam hidupnya
---Einstein---
Persamaan RLS dan
Asumsi yg mendasari model
Simple Linear Regression Model
y
i
= |
0
+ |
1
x
i
+ c
i
x
i
: regressor variable
y
i
: response variable
|
0
: the intercept, unknown
|
1
: the slope, unknown
c
i
: error with E(c
i
) = 0 and Var(c
i
) = o
2

(unknown)
The errors are uncorrelated sehingga cov(c
i
,c
j
)
= 0; i j
5
Given x,
E(y|x) = E(|
0
+ |
1
x + c) = |
0
+ |
1
x
Var(y|x) = Var(|
0
+ |
1
x + c) = o
2

Responses are also uncorrelated.
Regression coefficients: |
0
, |
1

|
1
: the change of E(y|x) by a unit change in
x
|
0
: E(y|x=0)


6
Pendugaan parameter model
(titik & interval)
Pendugaan Titik
Least-squares Estimation of the Parameters
Estimation of |
0
and |
1

n pairs: (y
i
, x
i
), i = 1, , n
Method of least squares: Minimize


8

=
+ =
n
i
i i
x y S
1
2
1 0 1 0
)] ( [ ) , ( | | | |





Least-squares normal equations:

9
10

The least-squares estimator:








11
The fitted simple regression model:


A point estimate of the mean of y for a
particular x

Residual:


An important role in investigating the
adequacy of the fitted regression model and
in detecting departures from the underlying
assumption!
12
Properties of the Least-Squares Estimators
and the Fitted Regression Model

are linear combinations of y
i





are unbiased estimators.
0 1

and

| |
xx i i
n
i
i i
S x x c y c / ) ( ,

1
1
= =

=
|
x y
1 0

| | =
0 1

and

| |
13






0 1 1 0 1 0
1 1 0
1
1
)

( )

(
) (
) ( ) ( )

(
| | | | | |
| | |
|
= + = =
= + =
= =


=
x x x y E E
x c
y E c y c E E
i
i i
i
i i
n
i
i i


= = =
= =
i
xx
i
i
xx
i
i
i i
i
i i
S
x x
S
c
y Var c y c Var Var
2
2
2
2
2 2
2
1
) (
) ( ) ( )

(
o o
o
|
)
1
( )

(
2
2
0
xx
S
x
n
Var + =o |
14
Some useful properties:
The sum of the residuals in any regression
model that contains an intercept |
0
is always 0,
i.e.




Regression line always passes through the
centroid point of data,



= = =
i
i i
i
i i
i
i
x x y y y y e 0 )) (

( ) (
1
|

=
i i
i i
y y
) , ( y x

= =
i
i i i
i
i i
x x y y x e x 0 )) (

(
1
|

= + =
i i
i i i i i
x x y y x x y e y 0 )) (

) ))(( (

(
1 1
| |
15
Estimator of o
2

Residual sum of squares:






xy T
xy
i
i
i
i
i
i
i i
i
i
S SS
S y y
x x y y
y y e SS
1
1
2
2
1
2
2
s Re

) (
)) (

(
)

(
|
|
|
=
=
=
= =


16
Since ,
the unbiased estimator of o
2
is


MS
E
is called the residual mean square.
This estimate is model-dependent.
2
) 2 ( ) ( o = n SS E
E
E
E
MS
n
SS
s =

= =
2

2 2
o
Pendugaan Interval
Assume that
i
are normally and
independently distributed
( )
2
0
0 0
~
se

n
t
|
| |

( )
2
1
1 1
~
se

n
t
|
| |

Parameter
0






Pendugaan interval sebesar (1-) 100 %
0 :

X Y
1 0
| |

=
( )
s
X X n
X
t
n
i
i
n
i
i
n
2
1
1
2
1
2
2
1 , 2
0

(
(
(
(

=
=

o
|

( )
( )

=
=

=
n
i
i
n
i
i
X X n
X
s
1
2
1
2
0
se |

Parameter
1







Pendugaan interval sebesar (1-) 100 %
1 :

( )( )
( )
2
1


=
X X
Y Y X X
i
i i
|

( )

=

=
n
i
i
X X
s
1
2
1
) ( se |

( )
( )

n
i
i

, n
X X
s
1
2
2
1 2
1
t |

20
Confidence interval for o
2
:



Pengujian parameter model
dg Uji-t dan Uji-F (Anova), serta
penafsirannya
Hypothesis Testing on the Slope and
Intercept
22
Assume
i
are normally distributed
y
i
~ N(|
0
+ |
1
x
i
, o
2
)

Use of t-Tests
Test on slope:
H
0
: |
1
= |
10
v.s. H
1
: |
1
|
10


) / , ( ~

2
1 1 xx
S N o | |
23
If o
2
is known, under null hypothesis,



(n-2) MS
E
/ o
2
follows a _
2
n-2

If o
2
is unknown,




Reject H
0
if |t
0
| > t
o/2, n-2


) 1 , 0 ( ~
/

2
10 1
0
N
S
Z
xx
o
| |
=
2
1
10 1 10 1
0
~
)

=
n
xx E
t
se S MS
t
|
| | | |
24
Test on intercept:
H
0
: |
0
= |
00
v.s. H
1
: |
0
|
00
If o
2
is unknown



Reject H
0
if |t
0
| > t
o/2, n-2
2
0
00 0
2
00 0
0
~
)

) / / 1 (

=
+

=
n
xx E
t
se
S x n MS
t
|
| | | |
25
Testing Significance of Regression
H
0
: |
1
= 0 v.s. H
1
: |
1
0
Accept H
0
: there is no linear relationship
between x and y.
26
Reject H
0
: x is of value in explaining the
variability in y.


Reject H
0
if |t
0
| > t
o/2, n-2


2
1
1
0
~
)

=
n
t
se
t
|
|
27
The Analysis of Variance (ANOVA)
Use an analysis of variance approach to test
significance of regression




28


SS
T
: the corrected sum of squares of the
observations. It measures the total variability in
the observations.
SS
Res
: the residual or error sum of squares
The residual variation left unexplained by the
regression line.
SS
R
: the regression or model sum of
squares
The amount of variability in the observations
accounted for by the regression line
SS
T
= SS
R
+ SS
Res


+ =
i
i i
i
i i
y y y y y y
2 2 2
)

( )

( ) (
29

The degree-of-freedom:
df
T
= n-1
df
R
= 1
df
Res
= n-2
df
T
= df
R
+ df
Res


Test significance regression by ANOVA
SS
Res
= (n-2) MS
Res
~ _
2
n-2

SS
R
= MS
R
~ _
2
1

SS
R
and SS
Res
are independent

xy R
S SS
1

| =
2 , 1
Re Re
0
~
) 2 /(
1 /

=
n
s
R
s
R
F
MS
MS
n SS
SS
F
30
E(MS
Res
) = o
2

E(MS
R
) = o
2
+ |
1
2
S
xx

Reject H
0
if F
0
> F
o/2,1, n-2

If |
1

0, F
0
follows a noncentral F with 1 and n-
2 degree of freedom and a noncentrality
parameter

2
2
1
o
|

xx
S
=
31
More About the t Test






The square of a t random variable with f degree
of freedom is a F random variable with 1 and f
degree of freedom.

xx s
S MS se
t
/

Re
1
1
1
0
|
|
|
= =
0
Re Re
1
Re
2
1 2
0

F
MS
MS
MS
S
MS
S
t
s
R
s
xy
s
xx
= = = =
|
|
Korelasi dalam RLS:
Koefisien korelasi linier ()
33
The estimator of

34
Test on







100(1-o)% C.I. for

Ukuran penilaian
kemampuan/kesesuaian model
Coefficient of Determination
36
The coefficient of determination:


The proportion of variation explained by the
regressor x
0 R
2
1
Example, R
2
= 0.9018. It means that 90.18%
of the variability in strength is accounted for
by the regression model.

T T
R
SS
SS
SS
SS
R
s Re
2
1 = =
37
R
2
can be increased by adding terms to the
model.
For a simple regression model,



E(R
2
) increases (decreases) as S
xx
increases
(decreases)
R
2
does not measure the magnitude of the
slope of the regression line. A large value of
R
2
imply a steep slope.
R
2
does not measure the appropriateness of
the linear model.
2 2
1
2
1 2


) (
o |
|
+
~
xx
xx
S
S
R E
Prediksi menggunakan model
Prediction of New Observations
39
is the point estimate of the new
value of the response
follows a normal distribution with mean 0
and variance:


0 1 0 0

x y | | + =
0
y
0 0
y y =
]
) ( 1
1 [ )

( ) (
0 2
0 0
xx
S
x x
n
y y Var Var

+ + = = o
40
The 100(1-o)% confidence interval on a future
observation at x
0
(a prediction interval for
the future observation y
0
)



41
pertanyaan

Anda mungkin juga menyukai