Anda di halaman 1dari 7

STAT 200 Final Exam Solutions

Summer 2008 (modified version)


Note: This modified version is slightly shorter than a standard 2.5 hour final exam.
Problem 1
(a)
Since 51+74=125 patients had duration between 1 and 2 days,
and among them 5+10=15 patients got infections,
P(getting an infection given that duration is between 1 and 2 days) =

15
= 0.12
125

(b)
H0: infection and duration are independent vs. Ha: infection and duration are dependent
Table of expected counts:
Duration (days)
Total
1
2
3
4
Infection
No
infection
Total

51 41
=7.86
266
51 225
=43.1
266

74 41
=11.41
266
74 225
=62.5
266

47 41
=7.24
266
47 225
=39.7
266

94 41
=14.49
266
94 225
=79.5
266

225

4
51

9
74

6
47

1
94

266

Test statistics X 2 =

41

(5 7.86)
(10 11.41)
(8 7.24)
(18 14.49)
+
+
+
+
7.86
11.41
7.24
14.49
( 46 43.14) 2 (64 62.59) 2 (39 39.76) 2 (76 79.51) 2
+
+
+
=
43.14
62.59
39.76
79.51
2

2.536
df = (2-1) (3-1) = 3, and critical value is 32 = 11.34
2
Since X 2 = 2.536 < 3 = 11.34, we fail to reject H0 and conclude that there is not
enough evidence to say infection and duration of catheterization are associated.
(c)
The distribution of duration conditioned on having an infection is:
Duration (days)
1
2
3
# Infection
5
10
8
Proportion

5
= 0.122
41

10
= 0.244
41

8
= 0.195
41

(d)
The marginal distribution of duration of catheterization is:

Total

18

41

18
= 0.439
41

Total #
Proportion

1
51
51
= 0.192
266

Duration (days)
2
3
74
47
74
= 0.278
266

47
= 0.177
266

Total

94

266

94
= 0.353
266

Problem 2
(a)
Let 1 be the true mean gene expression for a stem cell
Let 2 be the true mean gene expression for a mesoderm cell
Let 3 be the true mean gene expression for a neuronal cell
H0: 1 = 2 = 3 vs. Ha: i j for some i j

11 10.2 9 7.5 11 6.9


Since N = 11+9+11 = 31, I = 3, y =
= 8.245
11 9 11

2
SSG = ni ( yi y ) = 11 (10.2 8.245) 2 +9 (7.5 8.245) 2 +11 (6.9 8.245) 2 =
i 1

66.937
I

2
SSE = (ni 1) si = (11-1) 1.8 2 +(9-1) 1.5 2 +(11-1) 2.2 2 = 98.8
i 1

SSG 66.937
SSE
98.8
=
= 33.469, MSE =
=
= 3.529
I 1
3 1
N I 31 3
MSG 33.469
F0 =
=
= 9.48, and the critical value is F 0.05, 2, 28 = 3.34
MSE
3.529
Since F0 = 9.48 > F0.05, 2, 28 = 3.34, we reject H0 and conclude that the true mean gene

MSG =

expression for at least one of the cell types is significantly different at 5% significance
level.
(b)
If the conclusion from part (a) were incorrect in reality, we would have made a type I
error by rejecting H0 when H0 is indeed true. In the context of this example, this means
we concluded that the true mean gene expression for at least one of the cell types is
significantly different, when in reality the true mean expression for all three cell types are
equal.
Problem 3
(a)
Let X be the time 1st runner finishes their share of the race
X =10.5 seconds, X =0.35 seconds, and X ~ N(10.5, 0.35)
10 X
10 10.5
P(X < 10) = P(Z <
) = P(Z <
) = P(Z < -1.43) = 0.0764
X
0.35
(b)

P(at least one of the four runners finishes their share of the race in under 10 seconds)
= 1 P(none of the four runners finishes their share of the race in under 10 seconds)
= 1 P(a runner does not finish their share of the race in under 10 sec ) 4
= 1 (1 - 0.0764 ) 4
[by independence and use the result from pat
(a)]
= 0.2723
(c)
Let X 1 be the time 1st runner finishes their share of the race
Let X 2 be the time 2nd runner finishes their share of the race
Let X 3 be the time 3rd runner finishes their share of the race
Let X 4 be the time 4th runner finishes their share of the race
and T = the total time a team finishes the race
T = X 1 X 2 X 3 X 4 = X1 + X 2 + X 3 + X 4 = 10.5+10.5+10.5+10.5 = 42 seconds
2
2
2
2
2
T2 = X X X X = X + X + X + X = (0.35 ) 2 +(0.35 ) 2 +(0.35 ) 2 +(0.35 ) 2 =
0.49 sec 2
T = 0.49 = 0.7 second
So, T ~ N(42, 0.7)
40 T
40 42
P(T < 40) = P(Z <
) = P(Z <
) = P(Z < -2.86) = 0.0021
T
0.7
1

Problem 4
(a)
Given n1 =30, x1 =45.8, s X1 =1.2, and n2 =30, x2 =46.2, s X 2 =1.1
Assuming equal variances,
(n1 1) s X2 1 (n2 1) s X2 2 (30 1)(1.2) 2 (30 1)(1.1) 2
s 2pooled =
=
= 1.325
(30 1) (30 1)
(n1 1) (n2 1)
s X1 X 2 = s pooled

1
1

= 1.325
n1 n2

1
1

= 0.2972
30 30

*
*
t 60
df = n1 + n2 -2 = 30+30-2 = 58, and t58
= 1.671
*
s
t
( x1 - x2 ) df X X = (45.8 - 46.2) 1.671 0.2972 -0.4 0.497 (-0.897,
0.097)
Therefore, we are 90% confident that the true mean hardness readings determined by
instrument 1 is between 0.897 lower and 0.097 higher than the true mean hardness
readings determined by instrument 2.
1

(b)
Based on this interval, we would fail to reject the null hypothesis because zero is within
the interval.
(c)

H0: 1 = 2 vs. Ha: 1 2


Assuming equal variances, and from part (a), s X X =0.2972
x1 x2 45.8 46.2
Test statistic is t 0 =
=
= -1.346
s X1 X 2
0.2972
1

*
*
t 60
df = n1 + n2 -2 = 30+30-2 = 58, and t58
= 1.671
p-value = 2P(T | t 0 |) = 2P(T |-1.346|) = 2P(T 1.346), and 0.10 < p-value < 0.20
Since p-value > =0.10, we fail to reject H0 and conclude that there is not enough
evidence to say the true mean hardness readings from the two instruments are
significantly different at 10% significance level.

Problem 5
(a)
n=32, x =6.15 hours, s=45 min=0.75 hour
*
*
t30
df = n-1= 32-1= 31, and t31
= 2.457
*
x t31

s = 6.15
0.75 = 6.15 0.326
(5.824, 6.476)
2.457

n
32

Therefore, we are 98% confident that the true mean lifetime of a fully charged battery is
between 5.824 hours and 6.476 hours.
(b)
Margin of error m = 10 min, s=45 min, z * = 2.326
n =(

2.326 45 2
z*s 2
) =109.558 ipods
) =(
10
m

So the sample size should be n = 110 ipods


(c)
H0: = 6 hours vs. Ha: > 6 hours
x 0
6.15 6
Test statistic is t 0 =
=
= 1.131
s / n 0.75 / 32
*
*
t30
df = n-1 = 32-1 = 31, and t31
= 2.457
t
t
p-value = P( 31 1.131) P( 30 1.131), and 0.10 < p-value < 0.15
Since p-value > =0.05, we fail to reject H0 and conclude that there is not enough
evidence to say the true mean lifetime of a fully charged battery is greater than 6 hours at
a significance level of 5%.
Problem 6
Let X = # of hours per month working as a barista, X =40 hours, X =10 hours
Let a = $9/hour
So, aX = a X = $9/hour 40 hours = $360, aX = a X = $9/hour 10 hours= $90
Let Y = # of hours per month working as a tutor, Y =15 hours, Y =3 hours

Let b = $25/hour
So, bY = b Y = $25/hour 15 hours = $375, bY = b Y = $25/hour 3 hours= $75
Let E = total earnings for a month
E = aX bY = aX + bY = $360+$375 = $735
2
2
2
2
2
2
E2 = aX
bY = aX + bY = 90 + 75 = 13725 $ , so E = 13725 = $117.1537
So, E ~ N(735, 117.1537)
850 E
850 735
P(E > 850) = P(Z >
) = P(Z >
) = P(Z > 0.98) = 1 - P(Z 0.98)
E
117 .1537
= 1-0.8365 = 0.1635
Problem 7
(a)
Let X = # of correct answers, n=10, p=

1
1
, and X ~ Bin(10,
)
4
4

P(X 3) = 1 P(X=0) P(X=1) P(X=2)


=1


10
0

10

10
1


10
2

= 1 0.0563 0.1877 0.2816


= 0.4744
(b)
1
1
, and X ~ Bin(100,
)
4
4
1
1
= 25 > 10, n(1-p) = 100 (1- ) = 75 > 10, we can use normal
4
4

Let X = # of correct answers, n=100, p=


Since np = 100

approximation to binomial
X = np = 25, X =

np (1 p )

= 100

1
1
(1 ) = 4.33, and X
4
4

approx .

N(25, 4.33)

29.5 X
29.5 25
) = P(Z
) = P(Z 1.04)
X
4.33
= 1 P(Z<1.04) = 1 0.8508 = 0.1492

P(X 30) = P(X 29.5) = P(Z

(c)
They are neither independent nor disjoint.
By definition, two events are independent if the occurrence of one event gives no
information about whether or not the other event will occur. That is, the events have no
influence on each other. However, in this case, if we know someone has obtained at least
35 correct answers (event B occurs), then we are sure that he has obtained at least 30
correct answers (event A must also occur). The occurrence of B alters the probability of A
to 1. Thus, A and B are not independent.

By definition, two events are disjoint if it is impossible for them to occur together.
However, suppose the student got 35 answers correct; then event A and B both occur.
Thus, A and B are not disjoint.
Problem 8
(a)
Let X = midterm grade, Y = final exam grade
Given x =80%, s x =16%, y =73%, s y =12%
b1 = r (

sy
sx

) = 0.63

12
= 0.4725, and b0 = y - b1 x = 73 - 0.4725 80 = 35.2%
16

So y = b0 + b1 x = 35.2 + 0.4725x
When x=65%, y = 35.2+0.4725 65 = 65.9125%
That is, we predict a final grade of 65.9125% for a student who scored 65% on the
midterm.
(b)
For every 1% increase in the midterm grade, we expect a 0.4725% increase in the
students final exam grade.
(c)
This new observation would decrease the correlation. This is because it deviates largely
from the regression line and is likely to increase scatter.
(d)
Let Y = final exam grade, Y =73%, Y =12%, and Y ~ N(73, 12)
P(Z z )=0.25, z = -0.67
Y Y
z =
, Y = Y + z Y = 73%+(-0.67) (12%) = 64.96%
Y
So, the first quartile of the final exam grades is 64.96%
Problem 9
(a)
The response variable is students' scores on a reading test.
(b)
The factors are teaching methods and teachers.
(c)
A
(d)
A

Problem 10
(a)
B
(b)
D
(c)
C

Anda mungkin juga menyukai