Anda di halaman 1dari 19

Australian School of Business

Probability and Statistics


Solutions Week 10
1. (a) Consider the given exponential regression model. First, transform the regression equation so that
you have a linear regression form by taking the logarithms of both sides:
= log( ) + xi log( ) + i

log(yi )

= + xi + i .
where = log( ) and = log( ).
(b) Now, consider the sum of squares:
SS (, ) =

n
X

2i =

n
X
i=1

i=1

(log(yi ) xi )

and differentiating with respect to the parameters and setting to zero, that is,
n
X

(2) (log(yi ) xi ) = 0
SS (, ) =

i=1

n
X

(2xi ) (log yi xi ) = 0.
SS (, ) =

i=1

Rearranging and simplifying leads us to the following normal equations:


!
n
n
X
X
log(yi )
xi =
n +
n
X
i=1

c Katja Ignatieva

xi

i=1
n
X
i=1

x2i

i=1
n
X

xi log (yi ) .

i=1

School of Risk and Actuarial Studies, ASB, UNSW

Page 1 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

Solving, we get:

b=

n
X
i=1

log (yi ) /n b

n
X

xi /n

i=1

b
=log(y) x
Pn
Pn
xi log (yi )
b i=1 xi
Pn
b = i=1
2
i=1 xi
P
Pn
Pn
( ni=1 xi )2
b
x
log
(y
)

log(y)
x
+

i
i
i=1 i
n
Pn
= i=1
2
x
i
i=1
!
Pn
Pn
Pn
2
x
log
(y
)

log(y)
( i=1 xi )
i
i
i=1
i=1 xi
b
Pn
=
1 Pn
2
2
n i=1 xi
i=1 xi
Pn
Pn
xi log (yi ) log(y) i=1 xi
= i=1
Pn
Pn
( i=1 xi )2
2
i=1 xi
Pnn Pn
Pn
x
log
(y
)

i
i
j=1 log(yj )xi /n
i=1
i=1
Pn
=
2
2
i=1 xi nx
n
X
Pn
Pn
xi /n j=1 log(yj )
x
log
(y
)

i
i
i=1
i=1

Pn

| {z }
=x

2
2
i=1 xi nx

Pn
(xi x) log (yi )
Pn
= i=1
2
2
i=1 xi nx
n
X
ci log(yi )
=
i=1

(c)
" n
#
h
i
X
x
x

i
b = x =E
Pn
E |X
2 log(yi )|X = x
2
i=1 xi nx
i=1
=

n
X
i=1

n
X
i=1

xi x
2 E [log(yi )|X = x]
2
i=1 xi nx

Pn

xi x
2 ( + xi )
2
i=1 xi nx

Pn

=0

z }| {
n
X
xi x

Pn
xi (xi x)
Pi=1
+

n
2 nx2
x
x2i nx2
i=1 i
| i=1 {z
}

= Pni=1
=

c Katja Ignatieva

=1

School of Risk and Actuarial Studies, ASB, UNSW

Page 2 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

(d)


n
X

b = x =V ar
V ar |X
=

n
X

ci log(Yi )|X = x

i=1

c2i V ar (log(Yi )|X = x)

i=1

= 2

n
X

c2i

i=1

Pn

2
i=1 (xi x)

2 2
2
i=1 xi nx
2

Pn


2
2
i=1 xi nx

= Pn

(e) We have that (i |X = x) N (0, 2 ) and log(Y ) = + xi + i for i = 1, . . . , n. From that


it follows that (log(Yi )|X = x) N ( + xi , 2 ), because log(Yi ) is a linear function of the
constants and . Note that we have here population parameters, which are constant (but
unknown) and noth random variables itself. Becausei (|X = x) is a linear combination of log(yi )
Pn
Pn xi x
(we have = E
2
2 log(yi )|X = x ) it must hold that (|X = x) has also a
i=1
i=1 xi nx
normally distribution, with parameters mean and variance as given in (c) and (d).
(f)
h
i
b
=x
E [b
|X = x] =E y x|X
" n
#
h
i
X
b =x x
log(yi )/n|X = x E |X
=E
i=1

n
1X
E [log(Yi )|X = x] x
=
n i=1
n

1X
( + xi ) x
n i=1

n
X
1
xi x
= (n + )
n
i=1
!
n
X
xi
x =
= +
n
i=1

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 3 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

(g)


b
=x
V ar (b
|X = x) =V ar log(y) x|X




b =x
=V ar log(y)|X = x + x2 V ar |X


b =x
2xCov log(y), |X
!
n


X
log(yi )
b =x
=V ar
|X = x + x2 V ar |X
n
i=1
!
n
n
X log(yi ) X
ci log(yi )|X = x
,
2xCov
n
i=1
i=1
=

n


1 X
2
b =x
V
ar
(log(y
)|X
=
x)
+
x
V
ar
|X
i
n2 i=1

n
n
2x X X
Cov (log(yi ), log(yj )|X = x)
ci
n i=1 j=1


1
b =x
= V ar (log(yi )|X = x) + x2 V ar |X
n
n
2x X

ci V ar (log(yi )|X = x)
n i=1

2 2 x
2
x2 2
+
+ Pn

2
n
n
i=1 (xi x)

= 2

1
x2
+ Pn
2
n
i=1 (xi x)

n
X

ci
i=1
| {z
}
Pn
=0(due to
i=1 (xi x))

* using Cov (log(yi ), log(yj )|X = x) is equal to zero if i 6= j and equal to V ar (log(yi )|X = x) if
i = j.
(h) We have that is a linear combination of two normally distributed random variables, i.e.,
(log(Y )|X = x) and (|X = x), which is thus also normally distributed. The mean and variance
are given in question (f) and (g).
(i)




b = x =Cov log(y) x,
b |X
b =x
Cov
b, |X




b = x xCov ,
b |X
b =x
=Cov log(y), |X
|
{z
}
=0(see (g))

x 2
2
i=1 (xi x)

= Pn

(j) We have = exp(), = exp(). Moreover, we have that (|X = x), (|X = x) and log(Y ) are
normally distributed with their mean and variance as given in (e), (f), and (g). Thus, ( |X = x),
( |X = x) and (Y |X = x) are lognormally distributed with parameters the mean of the
logarithm of the variable and 2 the variance of the logarithm of the variable. For example,
is E[|X = x] and 2 is V ar (|X = x).

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 4 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

3.5

Concentration

2.5

1.5

0.5

10

20

30
Interval (x)

40

50

60

2. (a) Interesting features are that, in general, the concentration of 3-MT in the brain seems to decrease
as the post mortem interval increases. Another interesting feature is that we observe two observations with a much higher post mortem interval than the other observations.
The data seems to be appropriate for linear regression. The linear relationship seems to hold,especially
for values of interval between 5 and 26 (we have enough observations for that). Care should be
taken into account when evaluating y for x lower than 5 and larger than 26 (only two observations)
because we do not know whether the linear relationship between x and y still holds then.
(b) We test:
H0 : = 0 v.s. H1 : 6= 0
The corresponding test statistic is given by:

R n2
tn2 .
T =
1 R2
We reject the null hypothesis for large and small values of the test statistic.
We have n = 18 and the correlation coefficient is given by:
P
xi yi nxy
r =q P
P
( x2i nx2 )( yi2 ny 2 )

672.8 18 337/18 42.98/18


= 0.827
=p
(9854.5 3372 /18) (109.7936 42.982/18)

Thus, the value of our test statistic is given by:

0.827 16
T = p
= 5.89.
1 (0.827)2

From Formulae and Tables page 163 we observe Pr(t16 4.015) = Pr(t16 4.015) = 0.05%,
* using symmetry property of the student-t distribution. We observe that the value of our test
statistic (-5.89) is smaller than -4.015, thus our p-value should be smaller than 2 0.05% = 0.1%.
Thus, we can reject the null hypothesis even at a significance level of 0.1%, hence we can conclude
that there is a linear dependency between interval and concentration. Note that the alternative
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 5 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

hypothesis is here a linear dependency and not negative linear dependency, so you do accept
the alternative by rejecting the null hypothesis. Although, when you would use as alternative
hypothesis negative dependency, you would accept this alternative, due to the construction of the
test we have to use the phrase a linear dependency and not a negative linear dependency.
(c) The linear regression model is given by:
y = + x +
The (BLUE) estimate of the slope is given by:
P
P
P
xi yi n xi /n yi /n
b
P
P 2
=
xi n( xi /n)2
672.8 337 42.98/18
= 0.0372008
=
9854.4 3342 /18
The (BLUE) estimate of the intercept is given by:
b

b =y x

=42.98/18 + 0.0372008 337/18 = 3.084259

Thus, the estimate of y given a value of x is given by:


b
yb =b
+ x

=3.084259 0.0372008x

b = 3.084259 0.0372008 24 = 2.19


1. One day equals 24 hours, i.e., x = 24, thus yb =
b + 24
b = 3.084259 0.0372008 48 = 1.30
2. Two day equals 48 hours, i.e., x = 48, thus yb =
b + 24
The data set contains accurate data up to 26 hours, as for observations 17 and 18 (at 48 hour
and 60 hours respectively) there was no eye-witness testimony direct available. Predicting 3-MT
concentration after 26 hours may not be advisable, even though x = 48 is within the range of the
x-values (5.5 hours to 60 hours).
(d) The pivotal quantity is given by:

We have:

b =
s.e.()

b
tn2 .
b
s.e.()
P

b2
x2i nx2

b2
9854.5 3372 /18
P
P P

X
X
( xi yi xi yi /n)2
1
2
2
2
P 2
P 2
yi (
yi ) /n

b =
xi ( xi ) /n
n2


1
(672.8

337 42.98/18)2
=
109.7936 42.982 /18
= 0.1413014
16
9854.5 3372 /18
s
0.1413014
b
= 0.00631331
s.e.() =
9854.5 3372 /18
=

From Formulae and Tables page 163 we have t16,10.005 = 2.921.


Using the test statistic, the 99% confidence interval of the slope is given by:
b < < b + t16,1/2 s.e.()
b
b t16,1/2 s.e.()
0.0372008 2.921 0.00631331 < < 0.0372008 + 2.921 0.00631331
0.055641979 < < 0.0188

Thus the 99% confidence interval of is given by: (0.0372008, 0.0188).


Note that = 0 in not within the 99% confidence interval, therefore we would reject the null
hypothesis that equals zero and accept the alternative that 6= 0 at a 1% level of significance.
This confirms the result in b) where the correlation coefficient was shown to not equal zero at the
1% significance level.
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 6 of 19

ACTL2002 & ACTL5101

3. (a)

Probability and Statistics

Solutions Week 10

1. The least squares estimator of minimizes:


S() =

n
X
i=1

(yi xi )2 =

n
X
i=1

yi2 + 2 x2i 2

n
X

(yi xi )

i=1

Differentiating S() with respect to and set it equal to zero gives:


!
n
n
X
X
S()
(yi xi )
0=
x2i
=2

i=1
i=1
Solving for we obtain the LSE estimator for :
Pn
i=1 (yi xi )
b1 = P
n
2 .
i=1 xi

2. The mean value of b1 is given by:

 Pn

h i
(yi xi )
i=1
b
E 1 =E Pn
2
i=1 xi
Pn
(E [yi |xi ] xi )

= i=1Pn
2
i=1 xi
Pn

i=1 (xi xi )
= P
=
n
2
i=1 xi

h i
* using that E b1 given a value of xi only depends on the value of yi , hence the E [yi |xi ]
with the condition and ** using E [yi |xi ] = xi .
For the variance we have:

 Pn
 
i=1 (yi xi )
b
P
V ar 1 =V ar
n
2
i=1 xi
Pn
(x2 V ar (yi |xi ))
= i=1 Pin
2
( i=1 x2i )
2
= Pn

i=1

(b)

x2i

P
P
1. The expected value of the alternative estimator b2 = ni=1 Yi / ni=1 xi is given by:

 Pn
h i
Yi
i=1
b
E 2 =E Pn
xi
Pn i=1
E
[Y
|x ]
Pn i i
= i=1
xi
Pn i=1
xi
= Pi=1
= .
n
i=1 xi
The variance of the estimator is given by:

 Pn

 
Yi
V ar b2 =V ar Pi=1
n
i=1 xi
Pn
V ar (Yi |xi )
= i=1
Pn
2
xi )
(
Pn i=1
2
= i=1 2
(nx)
n 2
2
= 2 2 =
.
n x
nx2
2. We need to prove: V ar(b2 ) V ar(b1 ) which is equivalent to prove that V ar(b2 )V ar(b1 )
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 7 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

0.
2
2
Pn
2
2
nx
i=1 xi


1
1
2
P
0
n
=
2
nx2
i=1 xi
n
n
X
X

x2i nx2 =

(xi x)2 = (n 1)s2x 0

V ar(b2 ) V ar(b1 ) =

i=1

where

s2x

i=1

is the sample variance of X, * using


n
X
i=1

(xi x)2 =
=
=

n
X
i=1

n
X

i=1
n
X
i=1

n
X
i=1

(c)


x2i + x2 2 (xi x)
x2i + nx2 2x

n
X

xi

i=1

x2i + nx2 2nx2


x2i nx2 .

Thus the variance of the estimator b2 is at least as large as the variance of the least squares
estimator b1 and is strictly larger if there is variability in the value xi can take.
Pn
1. Our estimator is now b3 = i=1 ai Yi . The mean of the estimator is:
#
" n
h i
X
ai Yi
E b3 =E
i=1

n
X

i=1
n
X

ai E [Yi |xi ]
ai xi =

n
X

ai xi .

i=1

i=1

h i
Pn
Thus if b3 is unbiased we have E b3 = , which is only the case if i=1 ai xi = 1.
The variance of the estimator is given by:
!
n
X
ai Yi
V ar(b3 ) =V ar
i=1

n
X
i=1

n
X

a2i V ar (Yi |xi )


a2i 2 .

i=1

2. For b1 we have:

Pn
xi Yi
b1 = Pi=1
n
2
i=1 xi
n
X
x
Pn i
=
i=1

hence ai =

Pnxi

i=1

x2i

for i = 1, . . . , n.
Pn

We need to verify the condition

i=1

n
X
i=1

c Katja Ignatieva

i=1

x2i

Yi ,

ai xi = 1:
n
X

x
Pn i

2 xi
i=1 xi
i=1
Pn
x2i
= Pi=1
n
2 = 1.
i=1 xi

ai xi =

School of Risk and Actuarial Studies, ASB, UNSW

Page 8 of 19

ACTL2002 & ACTL5101

Probability and Statistics

For b2 we have:

Pn
xYi
b
2 = Pi=1
n
i=1 xi
n
X
1
Pn
=
i=1

1
hence ai = Pn1 xi = nx
for i = 1, . . . , n.
i=1
Pn
We need to verify the condition i=1 ai xi = 1:
n
X

ai xi =

n
X

i=1

xi

Solutions Week 10

Yi ,

1
Pn

xi

i=1 xi
i=1
Pn
xi
= Pi=1
= 1.
n
i=1 xi

i=1

P
3. We have that b3 is the general notation of a linear estimator. The condition ni=1 ai xi = 1
implies that we only look at unbiased estimators. This means that the linear estimator
with ai = Pnxi x2 , which is the least squares estimator, is the best (i.e., minimum variance)
i=1 i
unbiased estimator (BLUE estimator).
4. (a) The linear regression model is given by:
yi = + xi + i ,
where i N (0, 2 ) i.i.d. distributed for i = 1, . . . , n.
The fitted linear regression equation is given by:
b
yb =
b + x.

The estimated coefficients of the linear regression model are given by (see Formulae and Tables
page 25):
sxy
1122
b =
= Pn
2
2
sxx
i=1 xi nx
1122
1122
=
=
= 0.63223
2
60012 12 836
1774.67
Pn
Pn
b = i=1 yi b i=1 xi

b =y x
n
n
836
867
0.63223
= 28.205.
=
12
12
Thus, the fitted linear regression equation is given by:
yb = 28.205 + 0.63223 x.

(b) The estimate for 2 is given by:


n

b2 =

1 X
(yi ybi )2
n 2 i=1

!
Pn
n
2
X
( i=1 (xi x)(yi y))
1
2
2
Pn
y ny
=
2
2
n 2 i=1 i
i=1 xi nx


11222
1
= 25.289
= 63603
10
60016 8362 /12

We now the pivotal quantity:


s2
2n2
2 /(n 2)

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 9 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

Note: we have the degree of freedom of n 2 because we have to estimate two parameters form
b We have that s2 =
the data (b
and ).
b2 . Thus we have that the 90% confidence interval is
given by:
10b
2
10b
2
< 2 < 2
2
0.95,10
0.05,10
10 25.289
10 25.289
< 2 <
18.3
3.94
13.8 < 2 < 64.2

Thus the 90% confidence interval of 2 is given by (13.8, 64.2).


(c) i) We test the following:
H0 : = 0 v.s. H1 : > 0,
with a level of significance = 0.05.
ii) The test statistic is:
b
T =q
tn2
c2 / (Pn (xi x)2 )

i=1

iii) The rejection region of the test is given by:

C = {(X1 , . . . , Xn ) : T (, t10,10.05 )} = {(X1 , . . . , Xn ) : T (, 1.812)}


iv) The value of the test statistic is given by:
0.63223 0
0.63223 0
T =q
= 5.296.
= p
Pn
2
2
25.289/(60016
8362 /12)
25.289/( i=1 xi nx )

v) The value of the test statistic is in the rejection region, hence we reject the null hypothesis of
a zero correlation.
(d) We have that yi |xi by|xi has a student-t distribution:
V ar(yi |xi )

The predicted value is given by:

yi |xi yb|xi
p
tn2
V ar(yi |xi )

b i = 28.205 + 0.63223 53 = 61.713.


yb|xi =
b + x

The estimated variance of the observation x = 53 is give by:




(x x)2
1
c2

+ Pn
V ar(yi |xi = 53) =
2
(x

n
x)
i
i=1


(53 836/12)2 c2
1
= 6.0657.
+
=
12 60016 8362 /2

Thus, the 95% confidence interval for the value of y given that x = 53 is given by:
p
p
yb t10.05/2 V ar(yi |xi = 53) < y|x = 53 < yb + t10.05/2 V ar(yi |xi = 53)

61.713 2.228 6.0657 < y|x = 53 < 61.713 + 2.228 6.0657


56.2 < y|x = 53 < 67.2

Thus the 95% confidence interval of y given x = 53 is (56.2, 67.2).


(e) i) We test the following hypothesis:
H0 : = 0.75 v.s. H1 : 6= 0.75
ii) The test statistic is given by:
Zr z
N (0, 1)
T = q
1
n3

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 10 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

iii) The critical region is given by:


C = {(X1 , . . . , Xn ) : T {(, z1/2 ) (z1/2 , )}}
iv) The value of the test statistic is given by:
Zr z
q
= 3(zr z ) = 3(1.2880 0.97296) = 0.94512
1
9

where





1
1
1+r
1 + 0.85860
zr = log
= log
= 1.2880
2
1r
2
1 0.85860




1
1+
1 + 0.75
1
= log
= 0.97296
z = log
2
1
2
1 0.75
Pn
(xi x)(xi y)
r = pPn i=1
Pn
2
2
i=1 (xi x)
i=1 (yi y)
1122
Pn
= Pn
2
2
2
2

ny
( i=1 yi
i )(
i=1 xi nx )
1122
= 0.85860
=
962.25 1774.667
v) We have that z0.82894 = 0.95. Thus, the p-value is given by 2 (1 0.82894) = 0.34212. The
value of the test statistic is not in the critical region if the level of significance is lower than
0.34212 (which is normally the case). Hence, for reasonable values of the level of significance we
would not reject the null hypothesis.
(f) The proportion of the variability explained by the model is given by:
SSM
SSE
=1
SSTP
SST
n
(y
ybi )2
i
=1 Pni=1
2
i=1 (yi y i )
P
2
Pn
(xi x)(yi y))
( ni=1
2
2
Pn
2 nx2
i=1 yi ny
x
i=1 i
Pn
=1
2 ny 2
y
i
i=1 i
Pn
2
(xi x)(yi y))
(
Pn
= Pn i=1
2
( i=1 yi ny 2i )( i=1 x2i nx2 )

R2 =

11222
= 0.737193.
962.25 1774.667

Hence, a large proportion of the variability of Y is explained by X.


5. The completed ANOVA table is given below:
Source
Regression
Error
Total

D.F.
1
58
59

Sum of Squares
639.5-475.6=163.9
8.2*58=475.6
639.5

Mean Squares
163.9
8.2

F-Ratio
=19.99

163.9
8.2

6. A simple linear regression problem:


s
(a) Since we know that b = r sxy , then r = b SSxy = 7.445(2.004/21.56) = 69.2%. where sx , sy are
sample standard deviations. Alternatively,
you can use the fact that R2 = r2 , so that from (d)

2
below, r = 0.4794 = r = + 0.4794 = 69.2%. You take the positive square root because of the
positive sign of the coefficient of EP S.

(b) Given EP S = 2, we have:


\
ST KP
RICE = 25.044 + 7.445 (2) = 39.934.

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 11 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

A 95% confidence interval of this estimate is given by:


v
!
u


u 1
(x x0 )2
t
b
+

b + x0 t1/2,n2 s
n (n 1) s2x
v
!
u
2
u 1
(2.338 2)
t
= (39.934) t10.025,46 247
+
48 (47) (2.0042)
| {z }
=2.012896

39.934 4.636 = (35.298, 44.570) .

where s2x is the sample variance of X.


(c) A 95% confidence interval for is:
 
b t1/2,n2 se b

247

= 7.445 2.0147
2.004 47
= 7.445 2.305
= (5.14, 9.75) .

10475
247 = 15.716 and R2 = SSM
SST = 21851 = 47.94%.
(e) A scatter plot or diagram of the fitted values against the residuals (standardised) will provide us
an indication of the constancy of the variation in the errors.

(d) s =

(f) To test for the significance of the variable EPS, we test H0 : = 0 against Ha : 6= 0. The test
statistic is:
 
7.445
b
= 6.508.
t b =   =
1.144
b
se
This is larger than t1/2,n2 = 2.0147 and therefore we reject the null. There is evidence to
support the fact that the EPS variable is a significant predictor of stock price.

(g) To test H0 : = 24 against Ha : > 24, the test statistic is given by:
  b
7.445 24
 0 =
t b =
= 14.47.
1.144
b
se

Thus, since this test statistic is smaller than t1,n2 = t0.95,46 = 1.676, do not reject the null
hypothesis.
P
7. The grand total/sum is P x = 2479 + 2619 + 2441 + 2677 = 10216 so that the grand mean is x =
10216/40 = 255.4. Also,
x2 = 617163 + 687467 + 597607 + 718973 = 2621210. Therefore the total
sum of squares is:
X
2 X 2
2
xx =
SST =
x Nx
=

2621210 (40)(255.4)2 = 12043.6.

The sum of squares between the regions is:


X
2
SSM =
ni xi. x


= 10 (247.9 255.4)2 + (261.9 255.4)2 + (244.1 255.4)2 + (267.7 255.4)2
= 3774.8.

The difference gives the sum of squares within the regions:


SSE = SST SSM = 12043.6 3774.8 = 8268.8.
The one-way ANOVA table is then summarised below:
Source
Between
Within
Total
c Katja Ignatieva

d.f.
3
36
39

ANOVA Table for the One-Way Layout


Sum of Squares
Mean Square
3774.8
1258.27
8268.8
229.69
12043.6

F-Statistic
1258.27
229.69 = 5.478

School of Risk and Actuarial Studies, ASB, UNSW

Page 12 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

Thus, to test the equality of the mean premiums across the regions, we test:
H0 : A = B = C = D = 0

all variances are equal

against the alternative:


Ha : at least one is not zero

all variances are equal

using the F -test. Since F = 5.478 > F0.95 (3, 36) = 2.9 (approximately), we therefore reject H0 . There
is evidence to support a difference in the mean premiums across regions. The one-way ANOVA model
assumptions are as follows: each random variable xij is observed according to the model
xij = + i + ij , for i = 1, . . . , I, and j = 1, 2, . . . , ni
where ij refers to the random error in the j th observation of the ith treatment which satisfies:
- E [ij ] = 0 and V ar (ij ) = 2 for all i, j.
- The ij are independent and normally distributed (normal errors), and where is the overall
mean and i is the effect of the ith treatment with:
I
X

i = 0.

i=1

8. Consider the one-way ANOVA model:


Yij = + i + ij ,

for i = 1, . . . , I and j = 1, . . . , J.

where the error terms ij are i.i.d. normal random variables with mean 0 and common variance 2 .
Since

Yij N + i , 2 ,
then the likelihood function is given by:
L yij ; , i ,


2

N

1 XX
exp (
2 i=1 j=1

yij i

2

where N = I J is the grand total sample size. Now, take the log-likelihood and differentiate with
respect to each parameter:
J
I

N
1 XX
log L = yij ; , i , 2 = log (2) N log
2
2 i=1 j=1

yij i

2

and

J
I
1 XX
(yij i )
2 i=1 j=1

I
I
J
X
1 X X
i = 0
yij IJ J
2 i=1 j=1
i=1

1 X
(ykj k ) = 0,
= 2
k j=1
I

N XX

=
+

i=1 j=1

for k = 1, 2, . . . , I

(yij i )
3

= 0.

PI
Assuming i=1 i = 0 which is a standard assumption in the one-way ANOVA model, we have from
the first equation:

b=
c Katja Ignatieva

I
J
1 XX
yij = y.
IJ i=1 j=1

School of Risk and Actuarial Studies, ASB, UNSW

Page 13 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

From the second equation, we have:

bk =

J
1X
ykj y = y k. y
J j=1

and from the last equation, we have the MLE for the variance of the error term:

I
J
2
1 XX
yij y y i. + y
IJ i=1 j=1

I
J
1 XX
2
(yij y i. ) .
IJ i=1 j=1

9. For the one-way ANOVA model we have Yij N ( + i , 2 ) hence



2 !
1 yij ( + i )
1
exp
.
f (yij ; , i , ) =
2

2
The likelihood function can be written as:

PIi=1 ni
2

ni 
I X
X
1
y

(
+

)
1
ij
i

exp
L(yij ; , i , ) =
2 i=1 j=1

and the log-likelihood function is:


!
I
1X
l(yij ; , i , ) =
ni log(2)
2 i=1

I
X

ni

i=1

i
1 XX
log() +
2 i=1 j=1

Taking the partial derivative of l w.r.t. to and equating to 0:




ni 
I
yij ( + i )
l
1
1 XX

=0
= 2

2 i=1 j=1

ni
I X
X
i=1 j=1

ni
I X
X
i=1 j=1

yij ( + i )

2

yij ( + i ) = 0
yij

I
X

ni +

i=1

I
X

ni i = 0

i=1

| {z }
0

ni
I X
X
i=1 j=1

b=

yij N = 0

I P
ni
P

yij

i=1 j=1

N
Taking the partial derivative of l w.r.t. to i and equating to 0:


ni 
1
yij ( + i )
1 X
l

=0
= 2
i
2 j=1

ni
X

j=1
ni
X
j=1

yij ( + i ) = 0
yij ni ni i = 0

ni i =

ni
X
j=1

bi =
c Katja Ignatieva

ni
P

yij ni

yij

j=1

ni

School of Risk and Actuarial Studies, ASB, UNSW

Page 14 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

10. For a single observation, note that L(Y |) = y 1 . Hence


2y
L(Y | = 2)
=
= 2y,
L(Y | = 1)
1

0 < y < 1.

The form of the critical region of the best test is


2y < k,
or equivalently
y<

k
= c.
2

To find c, note that = 0.05 is specified and


0.05 =
=

Pr(y < c| = 2)
Zc
2ydy = c2 .
0

Therefore c = 0.05 = 0.2236 and the rejection region of the best test is defined by:
y < 0.2236,
i.e. reject H0 when y < 0.2236 for a 5% level of significance.
11. (a) We have the estimated correlation coefficient:
sms
smm sss
P
ms nms
=p P
P
( m2 nm2 ) ( s2 ns2 )
221, 022.58 1136.1 1934.2/10
=p
= 0.764.
(129, 853.03 1136.12/10) (377, 700.62 1934.22/10)

r =

i) We hypothesis is:

H0 : = 0 v.s. H1 : > 0
ii) The test statistic is:

r n2
T =
tn2
1 r2
iii) The critical region is given by:
C = {(X1 , . . . , Xn ) : T (tn2,1 , )})
iv) The value of the test is:

0.764 10 2
r n2
=
= 3.35
T =
1 r2
1 0.7642
v) We have t8,10.005 = 3.35. Thus the p-value is 0.005 and we reject the null hypothesis of a
zero correlation for level of significance less than 0.005 (usually it is larger, thus then we reject
the null).
(b) Given the issue of whether mortality can be used to predict sickness, we require a plot of sickness
against mortality:

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 15 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

230
220
Sickness (s)

210
200
190
180
170
160
100

105

110
115
120
Mortality (m)

125

130

There seems to be an increase linear relationship such that mortality could be used to predict
sickness.
(c) We have the estimates:
P
ms nms
sms
= P 2
b =
smm
m nm2
221, 022.58 1136.1 1934.2/10
= 1.6371
=
129, 853.03 1136.12/10
b = 1934.2 1.6371 1136.1 = 7.426

b =y x
10
10


n
X
1
s2ms
1
2
c
2
(yi ybi ) =
sss
=
n 2 i=1
n2
smm
P

X
1
( ms nms)2
2
2
P
=
(
s ns )
8
( m2 nm2 )


(1278.118)2
1
3587.656
= 186.902
=
8
780.709
b =b
V ar()
2 /smm = 186.902/780.709 = 0.2394
i) Hypothesis:

H0 : = 2 v.s. H1 : < 2
ii) Test statistic:

iii) Critical region:

b
T = q
tn2
c2 /sxx

C = {(X1 , . . . , Xn ) : T (, tn2,1 )}

iv) Value of statistic:


1.6371 2
b
=
= 0.74
T = q
0.2394
c
2
/sxx
c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 16 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

v) We have from Formulae and Tables page 163: t8,10.25 = 0.7064 and t8,10.20 = 0.8889. Thus
the p-value (using symmetry) is between 0.2 and 0.25. Thus, we accept the null hypothesis if the
level of significance is smaller than the p-value (which is usually the case). Note: exact p-value
using computer package is 0.2402.
(d) For a region with m = 115 we have the estimated value:
sb = 7.426 + 1.6371 115 = 195.69

with corresponding variance:






1
(x0 x)2
1
(115 113.61)2
c
2

= 19.1528
= 186.902
+
+
n
smm
10
780.709

The
corresponding 95% confidence limits are 195.69 t8,10.025 s.e.(s|m = 115)
= 195.69 2.306
19.1528 = 185.60 and 195.69 + t8,10.025 s.e.(s|m = 115) = 195.69 + 2.306 19.1528 = 205.78.
12. (a)

1. Points are shown in the scatterplot.

Number of deaths (ni)

40

30

20

10

0
0

(b)

6
8
Quarter (i)

10

12

2. The mean number of deaths increases with an increasing rate with quarter. The variance also
appears to increase with quarter.
P
2 2
1. We have q = 12
i=1 (ni i ) . Take the derivative of q with respect to and equate that
equal to zero we obtain:
12
X
q
i2 (ni i2 ) = 0
=2

i=1

12
X

ni i 2 =

Pi=1
12

i4

i=1

i=1

P12

12
X

ni i

i=1

i4

=
b.

To prove that it is a minimum, we need to prove that

2q
2

> 0:

12
X
2q
i4 > 0.
=2
2
i=1

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 17 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

P
P
(ni i2 )2
2

= 12
2. We have q = 12
i=1 (ni /i i) . Take the derivative of q with respect to
i=1
i2
and equate that equal to zero we obtain:
12
X
q
i(ni /i i) = 0
=2

i=1

12
X
i=1

ni =

P12

12
X

i2

i=1

ni
=e
.
Pi=1
12
2
i=1 i

To prove that it is a minimum, we need to prove that

2 q
2

> 0:

12
X
2 q
i2 > 0.
=
2
2
i=1

3. We have:

(c)

P12
ni i 2
15694
=

b = Pi=1
= 0.259
12
4
60710
i=1 i
P12
ni
174
=
= 0.268

e = Pi=1
12
2
650
i
i=1

1. We have E[Ni ] = i . If we take the logarithm on both sides we obtain:


log(E[Ni ]) = log() + log(i)
Thus = log() and = .
b = 0.2525. i) Hypothesis:
2. It is given that b = 1.6008 and s.e.()
H0 : = 2 v.s. H1 : 6= 2

ii) Test statistic:


T =
iii) Critical region:

b
tn2
b
s.e.()

C = {(X1 . . . , Xn ) : T {(, tn2,1/2 ) (tn2,1/2 , )}}


iv) Value of the test:
T =

b
1.6008 2
= 1.58
=
b
0.2525
s.e.()

v) From formulae and Tables page 163 we obtain t10,10.10 = 1.372 and t10,10.05 = 1.812.
Thus the p-value of the hypothesis is between 0.1 and 0.2 (two-sided test!). For level of
significance lower than 0.1 we will accept the null hypothesis that = = 2 and thus this
assumption seems appropriate. Note: exact p-value using computer package is 0.1452.
13. (a)

1. We have:
X 2
X
SST =
y2
y /n = 70.8744 29.122 /16 = 17.8760
X
X
x =4 (1 + 2 + 3 + 4) = 40
x2 = 4 (12 + 22 + 32 + 42 ) = 120
X
xy =1 2.73 + 2 6.26 + 3 9.22 + 4 10.91 = 86.55
X
X X
sxy =
xy
x
y/n = 86.55 40 29.12/16 = 13.75
2

13.75
20 = 9.453125
SSM = = b12 sxx =
20
SSE =SST SSM = 17.8760 9.453125 = 8.422875.

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 18 of 19

ACTL2002 & ACTL5101

Probability and Statistics

Solutions Week 10

2.
13.75
sxy
=
= 0.6875
b =
sxx
20
b = 29.12 0.6875 40/16 = 0.1012

b =y x

b = 0.1012 + 0.6875x.
Thus, the fitted model is given by yb =
b + x
b = 0.1012 + 0.6875 1 = 0.7887
For x = 1 we have: yb =
b + x
b = 0.1012 + 0.6875 4 = 2.8512
For x = 4 we have: yb =
b + x
q
b = 8.4229/14 = 0.1734.
3. We have s.e.()
20
i) Hypothesis:
H0 : = 0 v.s. H1 : 6= 0
ii) Test statistic:
T =
iii) Critical region:

b
tn2
b
s.e.()

C = {(X1 , . . . , Xn ) : T {(, tn2,1/2 ) (tn2,1/2 , )}}


iv) Value of statistic:
T =

0.6875 0
b
=
= 3.965
b
0.1734
s.e.()

v) We have t14,10.001 = 3.787 and t14,10.0005 = 4.140. Thus the p-value is between 0.1% and
0.2%. Accept the null hypothesis if the level of significance is lower than the p-value (which
is usually not the case). Hence, we have strong evidence against the no linear relationship
hypothesis. Note: exact p-value using computer package is 0.00070481.
(b)

1. We have:
SST =17.8760
SSB =(2.732 + 6.262 + 9.222 + 10.912 )/4 29.122 /16 = 9.6709
SSR =SST SSB = 17.8760 9.6709 = 8.2051
2.

b =29.12/16 = 1.82

b1 =2.73/4 1.82 = 1.1375


b2 =6.26/4 1.82 = 0.255
b3 =9.22/4 1.82 = 0.485

b4 =10.91/4 1.82 = 0.9075

3. Company A: fitted value = 2.73/4 = 0.6825


Company D: fitted value = 10.91/4 = 2.7275
4. Observed F statistic is (9.6709/3)/(8.2051/12) = 4.715 on (3,12) d.f..
5. From Formulae and Tables page 173 and 174 we observe that F3,12 (4.474) = 2.5% and
F3,12 (5.953) = 1%. Thus the p-value is between 0.025 and 0.01, so we have some evidence
against the no company effects hypothesis. Note: exact p-value using computer package is
0.0213.
-End of Week 10 Tutorial Solutions-

c Katja Ignatieva

School of Risk and Actuarial Studies, ASB, UNSW

Page 19 of 19

Anda mungkin juga menyukai