13 Nonnormality

Non-Normality
13
presented by:
Dudi Barmana, M.Si.
1
Agenda
y Konsekuensi yang akan dihadapi y Identifikasi/pendeteksian (pemeriksaaan pola
sisaan dan uji-uji formal) y Beberapa alternatif solusi: Transformasi
Today Quote
Orang sering melempar batu di jalan kita. Tergantung kita mau membuat batu itu jadi tembok atau jembatan ---Chinese book of wisdom---
Konsekuensi yang akan dihadapi
y If the errors come from a distribution with
thicker or heavier tails than the normal, LS fit may be sensitive to a small subset of the data. y Heavy-tailed error distributions often generate outliers that pull LS fit too much in their direction. y Prediction could be invalid. y Individual T-Test and Model F-Test could be missleading.
5
Identifikasi/pendeteksian
Residual Plot y Graphical analysis is a very effective way to investigate the adequacy of the fit of a regression model and to check the underlying assumption. y Normal Probability Plot:
y Normal probability plot: a simple way to check
the normal assumption.
y A straight plot is indicative of normality yA
shape is indicative of right skew errors (eg gamma)
yA
shape is indicative of a symmetric, short-tailed distribution
yA
shape is indicative of a symmetric longtailed distribution
y Ranked residuals: e[1] < < e[n] y Plot e[i] against Pi = (i-1/2)/n y Sometimes plot e[i] against *-1[ (i-1/2)/n] y Plot nearly a straight line for large sample n > 32
if e[i] normal y Small sample (n<=16) may deviate from straight line even e[i] normal y Usually 20 points are required to plot normal probability plots.
10
Kolmogorov-Smirnov Test
Let
F(x) ! P(Xi e x)
be the cdf for the distribution.
In the uniform(0,1) case:
F(x) ! x, 0 e x e 1
Compare this to the empirical distribution function:
(x) ! 1 (# X in the sample e x) Fn i n
11
12
If X1, X2, , Xn really come from the distribution with cdf F, the distance
D ! Dn ! max Fn (x) - F(x)

x
should be small.
13
Computing the test statistic: Suppose we simulate 7 uniform(0,1)s and get:
0.6
0.2
0.5
0.9
0.1
0.4
0.2
(obviously simplified)
14
0.6
0.2
0.5
0.9
0.1
0.4
0.2
Put them in order: 0.6 0.2 0.5 0.9 0.1 0.4 0.2
Now the empirical cdf is:
F7 (x) ! 0
(x) ! 1 F7 7
15
for x 0.1
for 0.1 e x 0.2
0.6 0.2 0.5 0.9 0.1 0.4 0.2
F7 (x) ! 0
(x) ! 1 F7 7 3 F7 (x) ! 7 4 F7 (x) ! 7 5 F7 (x) ! 7 6 F7 (x) ! 7 F (x) ! 1
7
for x
0.1
0.2 0.4 0.5 0.6 0.9
for 0.1 e x for 0.2 e x for 0.4 e x for 0.5 e x for 0.6 e x
16
for x u 0.9
17
0.6 0.2 0.5 0.9 0.1 0.4 0.2
9 D7 ! } 0.2571429 35
18
Let X(1), X(2), ,X(n) be the ordered sample. Then Dn can be estimated by
Dn ! max D , D
n
where
n
i D ! max - F(X(i) ) 1eien n i -1 D ! max F(X(i) ) 1eien n

n
This is exact for the uniform distribution!
19
(assuming non-repeating values)
We reject that this sample came from the proposed distribution if the empirical cdf is too far from the true cdf of the proposed distribution
ie: We reject if Dn is too large.
ie: How large is large?
20
In the 1930s, Kolmogorov and Smirnov showed that
npg
lim P n
1/2
Dn e t ! 1- 2 (-1)
i !1
i -1
-2i 2 t 2
So, for large sample sizes, you could assume
Pn
1/2
Dn e t } 1- 2 (-1)
i !1
i -1
-2i 2 t 2
and find the value of t that makes the right hand side for an E level test.
21
1- E
For small samples, people have worked out and tabulated critical values, but there is no nice closed form solution.
J. Pomeranz (1973) J . Durbin (1968)
Good approximations for n>40:
E cv
22
0.20
0.10
0.05
0.02
0.01
1.0730 1.2239 1.3581 1.5174 1.6276 n n n n n
For our small sample of size 7,
9 D7 ! } 0.2571429 35
From a table, the critical value for a 0.05 level test for n=7 is 0.483.
23
For our large sample of size 100,000,
D100000 ! 0.00152392 2
The approximate critical value for a 0.05 level test for n=100,000 is
1.3518 } 0.00429468 9 100000
Bera and Jarque testing

y It can be proved that the coefficients of skewness and kurtosis
can be expressed respectively as: and E [u 3 ]

b1 !
2 3/ 2
b2 !
E [u 4 ]
2 2
y The Bera Jarque test statistic is given by b12 b2 3 2 2 W !T ~ G 2 24 6 y We estimate b1 and b2 using the residuals from the OLS
25
regression, u .
Beberapa alternatif solusi: Transformasi
Transformation on y: The Box-Cox Method (Power transformation: y )
27
y Box and Cox (1964) show how the parameters of the
regression model and P can be estimated simultaneously using the method of maximum likelihood. y Use
Where y ! ln 1[1 / n ln yi ] is the geometric i !1 mean of the observations and fit the model
y P ! XF I P 1 y y is related to the Jocobian of the transformation y (P ) converting the response variable y into
28
y Computation Procedure:
y Choose P to minimize SSRes(
) y Use 10-20 values of P to compute SSRes( ). Then plot SSRes( ) v.s. P. Finally read the value of P that minimizes SSRes( ) from graph. y A second iteration can be performed using a finer mesh of values if desired. y Cannot select P by directly comparing residual sum of (P ) squares from the regressions of y on x because of a different scale. y Once P is selected, the analyst is free to fit the model using y (P { 0) or ln y (P = 0).
29
y An Approximate Confidence Interval for P

y The C.I. can be useful in selecting the final value for P. y For example: if the 0.596 is the minimizing value of
SSRes( ) , but if 0.5 is in the C.I., then we would prefer choose P = 0.5. If 1 is in the C.I., then no transformation may be necessary. y Maximize
y An approximate 100(1-E)% C.I. for P is
30
y Let
2 can be approximated by 1 zE / 2 / n exp( G / n) 2 2 1 tE / 2,R / n or 1 GE ,1 / n where R is the number of residual degrees of freedom. y This is based on
2 E ,1
y exp(x) = 1 + x + x2/2! + y
G !z $t
2 1 2
2 R
31
pertanyaan

13 Nonnormality

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

13 Nonnormality

Diunggah oleh

Hak Cipta:

Format Tersedia

Non-Normality

sisaan dan uji-uji formal) y Beberapa alternatif solusi: Transformasi

Konsekuensi yang akan dihadapi

y If the errors come from a distribution with

the normal assumption.

y A straight plot is indicative of normality yA

shape is indicative of right skew errors (eg gamma)

shape is indicative of a symmetric, short-tailed distribution

shape is indicative of a symmetric longtailed distribution

be the cdf for the distribution.

In the uniform(0,1) case:

Compare this to the empirical distribution function:

(x) ! 1 (# X in the sample e x) Fn i n

D ! Dn ! max Fn (x) - F(x)

Now the empirical cdf is:

i D ! max - F(X(i) ) 1eien n i -1 D ! max F(X(i) ) 1eien n

This is exact for the uniform distribution!

(assuming non-repeating values)

ie: We reject if Dn is too large.

ie: How large is large?

So, for large sample sizes, you could assume

J. Pomeranz (1973) J . Durbin (1968)

Good approximations for n>40:

1.0730 1.2239 1.3581 1.5174 1.6276 n n n n n

1.3518 } 0.00429468 9 100000

Bera and Jarque testing

can be expressed respectively as: and E [u 3 ]

Beberapa alternatif solusi: Transformasi

Transformation on y: The Box-Cox Method (Power transformation: y )

y Box and Cox (1964) show how the parameters of the

y An Approximate Confidence Interval for P

y An approximate 100(1-E)% C.I. for P is

Anda mungkin juga menyukai