Anda di halaman 1dari 32

Non-Normality

13
presented by:
Dudi Barmana, M.Si.
1

Agenda
y Konsekuensi yang akan dihadapi y Identifikasi/pendeteksian (pemeriksaaan pola

sisaan dan uji-uji formal) y Beberapa alternatif solusi: Transformasi

Today Quote
Orang sering melempar batu di jalan kita. Tergantung kita mau membuat batu itu jadi tembok atau jembatan ---Chinese book of wisdom---

Konsekuensi yang akan dihadapi

y If the errors come from a distribution with

thicker or heavier tails than the normal, LS fit may be sensitive to a small subset of the data. y Heavy-tailed error distributions often generate outliers that pull LS fit too much in their direction. y Prediction could be invalid. y Individual T-Test and Model F-Test could be missleading.
5

Identifikasi/pendeteksian

Residual Plot y Graphical analysis is a very effective way to investigate the adequacy of the fit of a regression model and to check the underlying assumption. y Normal Probability Plot:
y Normal probability plot: a simple way to check

the normal assumption.

y A straight plot is indicative of normality yA

shape is indicative of right skew errors (eg gamma)

yA

shape is indicative of a symmetric, short-tailed distribution

yA

shape is indicative of a symmetric longtailed distribution

y Ranked residuals: e[1] < < e[n] y Plot e[i] against Pi = (i-1/2)/n y Sometimes plot e[i] against *-1[ (i-1/2)/n] y Plot nearly a straight line for large sample n > 32

if e[i] normal y Small sample (n<=16) may deviate from straight line even e[i] normal y Usually 20 points are required to plot normal probability plots.
10

Kolmogorov-Smirnov Test
Let

F(x) ! P(Xi e x)

be the cdf for the distribution.

In the uniform(0,1) case:

F(x) ! x, 0 e x e 1

Compare this to the empirical distribution function:

(x) ! 1 (# X in the sample e x) Fn i n

11

Kolmogorov-Smirnov Test

12

Kolmogorov-Smirnov Test
If X1, X2, , Xn really come from the distribution with cdf F, the distance

D ! Dn ! max Fn (x) - F(x)


x

should be small.

13

Kolmogorov-Smirnov Test
Computing the test statistic: Suppose we simulate 7 uniform(0,1)s and get:

0.6

0.2

0.5

0.9

0.1

0.4

0.2

(obviously simplified)

14

Kolmogorov-Smirnov Test

0.6

0.2

0.5

0.9

0.1

0.4

0.2

Put them in order: 0.6 0.2 0.5 0.9 0.1 0.4 0.2

Now the empirical cdf is:

F7 (x) ! 0
(x) ! 1 F7 7
15

for x 0.1
for 0.1 e x 0.2

Kolmogorov-Smirnov Test
0.6 0.2 0.5 0.9 0.1 0.4 0.2

F7 (x) ! 0
(x) ! 1 F7 7 3 F7 (x) ! 7 4 F7 (x) ! 7 5 F7 (x) ! 7 6 F7 (x) ! 7 F (x) ! 1
7

for x

0.1
0.2 0.4 0.5 0.6 0.9

for 0.1 e x for 0.2 e x for 0.4 e x for 0.5 e x for 0.6 e x

16

for x u 0.9

Kolmogorov-Smirnov Test

17

Kolmogorov-Smirnov Test
0.6 0.2 0.5 0.9 0.1 0.4 0.2

9 D7 ! } 0.2571429 35

18

Kolmogorov-Smirnov Test
Let X(1), X(2), ,X(n) be the ordered sample. Then Dn can be estimated by

Dn ! max D , D
 n
where
 n

i D ! max - F(X(i) ) 1eien n i -1 D ! max F(X(i) ) 1eien n


 n

This is exact for the uniform distribution!

19

(assuming non-repeating values)

Kolmogorov-Smirnov Test

We reject that this sample came from the proposed distribution if the empirical cdf is too far from the true cdf of the proposed distribution

ie: We reject if Dn is too large.

ie: How large is large?

20

Kolmogorov-Smirnov Test
In the 1930s, Kolmogorov and Smirnov showed that

npg

lim P n

1/2

Dn e t ! 1- 2 (-1)
i !1

i -1

-2i 2 t 2

So, for large sample sizes, you could assume

Pn

1/2

Dn e t } 1- 2 (-1)
i !1

i -1

-2i 2 t 2

and find the value of t that makes the right hand side for an E level test.
21

1- E

Kolmogorov-Smirnov Test
For small samples, people have worked out and tabulated critical values, but there is no nice closed form solution.

J. Pomeranz (1973) J . Durbin (1968)

Good approximations for n>40:

E cv
22

0.20

0.10

0.05

0.02

0.01

1.0730 1.2239 1.3581 1.5174 1.6276 n n n n n

Kolmogorov-Smirnov Test
For our small sample of size 7,

9 D7 ! } 0.2571429 35
From a table, the critical value for a 0.05 level test for n=7 is 0.483.

23

Kolmogorov-Smirnov Test
For our large sample of size 100,000,

D100000 ! 0.00152392 2
The approximate critical value for a 0.05 level test for n=100,000 is

1.3518 } 0.00429468 9 100000

Bera and Jarque testing


y It can be proved that the coefficients of skewness and kurtosis

can be expressed respectively as: and E [u 3 ]


b1 !

2 3/ 2

b2 !

E [u 4 ]

2 2

y The Bera Jarque test statistic is given by b12 b2  3 2 2 W !T  ~ G 2 24 6 y We estimate b1 and b2 using the residuals from the OLS
25

regression, u . 

Beberapa alternatif solusi: Transformasi

Transformation on y: The Box-Cox Method (Power transformation: y )

27

y Box and Cox (1964) show how the parameters of the

regression model and P can be estimated simultaneously using the method of maximum likelihood. y Use

 Where y ! ln 1[1 / n ln yi ] is the geometric i !1 mean of the observations and fit the model

y P ! XF  I P  1 y y is related to the Jocobian of the transformation y (P ) converting the response variable y into
28

y Computation Procedure:
y Choose P to minimize SSRes(

) y Use 10-20 values of P to compute SSRes( ). Then plot SSRes( ) v.s. P. Finally read the value of P that minimizes SSRes( ) from graph. y A second iteration can be performed using a finer mesh of values if desired. y Cannot select P by directly comparing residual sum of (P ) squares from the regressions of y on x because of a different scale. y Once P is selected, the analyst is free to fit the model using y (P { 0) or ln y (P = 0).

29

y An Approximate Confidence Interval for P


y The C.I. can be useful in selecting the final value for P. y For example: if the 0.596 is the minimizing value of

SSRes( ) , but if 0.5 is in the C.I., then we would prefer choose P = 0.5. If 1 is in the C.I., then no transformation may be necessary. y Maximize

y An approximate 100(1-E)% C.I. for P is

30

y Let
2 can be approximated by 1  zE / 2 / n exp( G / n) 2 2 1  tE / 2,R / n or 1  GE ,1 / n where R is the number of residual degrees of freedom. y This is based on

2 E ,1

y exp(x) = 1 + x + x2/2! + y

G !z $t
2 1 2

2 R

31

pertanyaan

Anda mungkin juga menyukai