Anda di halaman 1dari 4

STAT 3008: Applied Linear Regression

2014-15 Term 2
Assignment #3 Solutions
Problem 1:
(a) Since (X' X) 1 X' Y , E ( ) (X' X) 1 X' E (Y) (X' X) 1 X' X 2
n
(b) Since X' X 2
xi

E ( )

1
2
n x i ( xi ) 2

(c) As n ,

E ( 1 )

i
i
i
x , (X' X) 1 X' X 2 n xi2 ( xi )2 xi n xi xi3
x
n x 2 ( x ) 2 x 2 2 x x 3
1

x2

n xi2 ( xi ) 2

2
i
3
i

n xi2 ( xi ) 2

x n

x2

n xi3 xi

2
x
i

( xi2 ) 2 xi xi3

0
n xi2 ( xi ) 2
x xi xi3 0

3
2

n x xi xi2 1 n xi xi xi
1

2
2
n xi ( xi )

n xi2 ( xi ) 2

n xi3 xi

2 2
i
3
i

2
i

x x3
1 x x 1
x2

and E ( 0 ) 0 x2 1 0 x2 1 .
4

Therefore ( 0 , 1 ) is not a consistent estimator for ( 0 , 1 ) .

Problem 2: (a)

0
n
X' X 0 SUU

0
0

0
.
0

SVV

Hence

0
0
n

1 ( X' X ) 1 X' Y 0 SUU

2
0
0

0
0

SVV

yi
y

SUY

SUY
/
SUU

SVY SVY / SVV

y
y SUY / SUU ( x )

SUY / SUU
SUY / SUU

(b) 0
1

Problem 3: [See R codings on the next page]

(a)
(b) y 12.79653 0.20479 xi
(c) H0: 1 = 1 vs H1: 1 1
T-statistic = -20.83884, p-value = 4.712817e-06
Decision: Since p-value < =0.05, we reject H0 at =0.05
Conclusion: We have sufficient evidence that there is no perfect inheritance exists in
the diameters of sweet peas.
(d) Hypothesis: H0: The simple linear regression model is adequate
H1: The simple linear regression model is not adequate
Page 1/4

ANOVA table:
Test Statistic:

F0

SS lof / df lof
( RSS 1 SSlof ) /( n J )

Decision: p-value 0.05 , we do not reject H0 at =0.05.

Conclusion: We do not have sufficient evidence that the simple linear regression model
### Problem #3 ###
x<-seq(21,15,-1); y<-c(17.26, 17.07, 16.37, 16.40, 16.13, 16.17, 15.98)
SD<-c(1.069, 1.042, 1.019, 1.095, 0.889, 0.857, 0.948)
### Part (a) ###
plot(x,y, xlab="Average Offspring Diameter", ylab="Parent Diameter")
### Part (b) ###
fit0<-lm(y~x, weights=1/SD^2); summary(fit0)
Coefficients:
Estimate
Std. Error t value Pr(>|t|)
(Intercept)
12.79653 0.68130
18.783 7.88e-06 ***
x
0.20479
0.03816
5.366 0.00302 **
Residual standard error: 0.2047 on 5 degrees of freedom
Multiple R-squared: 0.852,
F-statistic: 28.79 on 1 and 5 DF, p-value: 0.003024
### Part (c) ###
t0<-(0.20479-1)/0.03816; pvalue.c<-2*pt(-abs(t0),5); c(t0, pvalue.c)
 -2.083884e+01 4.712817e-06
### Part (d) ###
SS.lof<-sumsq(fit0\$residuals); SS.total<-4.2617; df.lof<-5; df.total<-196; SS.pe<-SS.totalSS.lof; df.pe<-df.total-df.lof
c(SS.lof, SS.pe, SS.total, df.lof, df.pe, df.total)
 0.2095026 4.0521974 4.2617000 5.0000000 191.0000000 196.0000000
F0<-(SS.lof/df.lof)/(SS.pe/df.pe); pvalue.d<-1-pf(F0, df.lof, df.pe); c(F0, pvalue.d)
 1.97497758 0.08409666

Page 2/4

Problem 4: (a) Y 15952.1 409.9 x 244.5s 4383.11U 2 8975.97U 3 1059.19U 2 s 1582.95U 3 s

fit0<-lm(Salary~Year+factor(Rank)+factor(Sex)+factor(Sex):factor(Rank),data=salary); summary(fit0)
Estimate
(Intercept)

15952.10

Std. Error t value Pr(>|t|)

855.91 18.638 < 2e-16 ***

factor(Rank)2

4383.11

1063.99

factor(Rank)3

8975.97

1133.16

7.921 4.49e-10 ***

factor(Sex)1

244.50

1159.16

0.211 0.833894

Year

409.90

78.21

factor(Rank)2:factor(Sex)1 -1059.19

2188.78

factor(Rank)3:factor(Sex)1 1582.95

-0.484 0.630791

1836.99

0.862 0.393417

Residual standard error: 2432 on 45 degrees of freedom

Multiple R-squared: 0.8509,

0.831

sum(fit0\$res^2)
 266244659

(b) Hypothesis: H0: 1 12 13 0 vs H1: H0 is false

Test Statistic: F0

( RSS1 RSS 2 ) /( df1 df 2 )

0.6055375
RSS 2 / df 2

Decision: Since p-value = Pr( F3,45 F0 ) 0.6148 0.05 , we do not reject H0 at =0.05.
Conclusion: We do not have sufficient evidence that any of the three sex related factors
is different from 0.
fit1<-lm(Salary~factor(Rank)+Year,data=salary); summary(fit1); sum(fit1\$res^2)
Estimate Std. Error t value Pr(>|t|)
(Intercept)

16203.27

638.68 25.370

< 2e-16 ***

factor(Rank)2 4262.28

882.89

factor(Rank)3 9454.52

Year

375.70

70.92

4.828 1.45e-05 ***

5.298 2.90e-06 ***

Residual standard error: 2402 on 48 degrees of freedom

Multiple R-squared: 0.8449,

0.8352

F-statistic: 87.15 on 3 and 48 DF, p-value: < 2.2e-16

 276992734
F0<-(276992734-266244659)/3/(266244659/45); F0
 0.6055375
1-pf(F0,3,45)
 0.6148383

(c) The results in part (b) suggest that sex is not a factor in determining the amount of salary,
and therefore we do not have sufficient evidence on sex discrimination in terms of amount
of salary.
Page 3/4

62.74663

Problem 5: (a) ( X' X ) X' Y 10.54172

1.394128

(b) RSS Y' Y Y' X(X' X)1 X' Y =4107.409, 2 RSS /( n 3) 195.5909, 13.9854
(c) Optimal x = 1 /( 22 ) 3.780757 . g ' ( ) 0 1/( 22 ) 1 /( 222 ) 0 0.358647 2.71191
ar ( ) 2 g ' ( )T ( X' X)1 g ' ( ) 0.456518 . A 95% Confidence interval for the optimal x is
V
[3.780757 1.96 0.456518,3.780757 1.96 0.456518 ] [2.456,5.105]

(d) (Section 4.3: x-values should be scattered like normal distribution in other to obtain a
balance between goodness-of-fit and locating the center).
Compared with linear regression, quadratic regression should rely on more data points on
the two sides to provide better information about the curvature. The problem setup,
however, have only one data point in the middle which is difficult to locate the optimal
value of x easily.
Suggestion: Allocate 1/4 to 1/3 of the data points in the middle of the x range [1,10], and
the rest are evenly spread on the two sides.

Page 4/4