# UNIVERSITY OF CAPE TOWN

## DEPARTMENT OF STATISTICAL SCIENCES

Test 2
Hannah Gerber
21 April 2011
Internal examiners:
Date:
1 hour 30 minutes
Total number of questions: 1
Time:
13 (6 + 1 + 6)
Total number of pages:
Total marks: 50
Instructions: Answer all questions in the answer book(s) provided. The appropriate tables
and formulae have been provided.
Question 1
[50]
Data was gathered about advertising expenditure and sales for 44 outlets in South Africa.
The first variable,  , was related to the on-site advertising expenditure (i.e. at the outlet,
using for example window displays), while  and  were related to advertising via the local
and national media respectively. The variables  ,  and  were all measured in tens of
thousands of Rands. The response variable corresponded to sales (in thousands of cases)
over the last month. Provided below, is some output obtained from Excel. Table 1 is the
correlation matrix, Tables 2 and 3 relate to building the regression model using backward
selection (
, while tables 4 to 6 correspond to output using the forward selection
method (
.
Figures 1 and 2 are the normal probability and residual plots from the model using backward
selection method. The output follows after the questions.
a) With respect to the model constructed using the backward selection method (Tables
1, 2 and 3):
i) State the resulting regression model.
(1)
ii) State and interpret the coefficient of determination associated with the full
model (Table 2).
(2)
iii) What does the correlation coefficient associated with a multiple regression
model measure?
(1)
iv) Comment on the validity of the full model (Table 2) and perform the
(4)
v) Consider the full model (Table 2). Using p-values, state which (if any) of the
variables are significant in the model at the 5% level of significance. Be sure
to mention how you identify a significant variable, using p-values.
(2)
vi) What could your observations in (a)(iv) and (a)(v) be an indication of and
how would you check for it?
(2)
vii) State which variable was removed from the full regression model and explain
why this variable was removed based on the variable selection method in use.
(3)

b) With respect to the model constructed using the forward selection method (Tables 1,
2 and 4 to 6):
i) State the regression model resulting from the forward selection procedure as
well as why this particular model was selected. Note that the full model can
be found in Table 2.
(2)
ii) Identify the first variable to enter the model and state how this variable was
selected.
(2)
iii) Calculate the missing values, A-F, in Table 4.
(6)
iv) Consider Table 5. Calculate the 95% confidence interval for the National
media variable. What can you infer from this confidence interval?
(5)
v) Consider table 5. State and interpret the Local media variables coefficient.
(2)
vi) If 

was chosen, rather than 
would the regression model
(3)
c) Comparing the forward and backward selection models:
i) State if the models stated in (a)(i) and (b)(i) are the same or if they differ and
if they differ give a possible explanation.
(3)
ii) If the models differ, which would you prefer and justify your answer? (2)
d) Regarding the regression assumptions:
i) State the assumptions associated with performing the multiple regression
analysis.
(4)
ii) If the regression assumptions are satisfied, what should you observe on the
given graphs (Figures 1 and 2)?
(2)
iii) Are any concerns raised in Figures 1 and 2? If so, specify and state what
features of the graph gave rise to your concerns.
(4)

Sales

On-site

Local
media

Sales

On-site

0.8417

Local media

0.8424

0.9744

National media

0.4740

0.3759

0.4099

## Table 1: Correlation matrix

National
media

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.8612

R Square

0.7416

0.7222

Standard Error

1.8254

Observations

44

ANOVA
df

SS

MS

Significance
F

Regression

382.6588

127.5529

38.2793

7.82E-12

Residual

40

133.2863

3.3321

Total

43

515.9451

Coefficients

Standard
Error

t Stat

P-value

Intercept

1.0232

1.2028

0.8506

0.4000

On-site

0.9656

0.7092

1.3616

0.1809

Local media

0.6291

0.7783

0.8083

0.4236

National media

0.6760

0.3557

1.9003

0.0646

## Table 2: Backward selection - Step 1 (full model)

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.8587

R Square

0.7374

0.7246

Standard Error

1.8176

Observations

44

ANOVA
df

SS

MS

Significance
F

Regression

380.4813

190.2406

57.5789

1.24E-12

Residual

41

135.4638

3.3039

Total

43

515.9451

Coefficients

Standard
Error

t Stat

P-value

Intercept

1.0172

1.1977

0.8493

0.4006

On-site

1.5221

0.1701

8.9478

3.45E-11

National media

0.7362

0.3463

2.1254

0.0396

## Table 3: Backward selection - Step 2

3

SUMMARY OUTPUT
Regression Statistics
Multiple R

R Square

0.7097
B

Standard Error

Observations

44

ANOVA
df
Regression

SS

MS

Significance
F

366.2080

366.2080

7.51E-13

3.5651

Residual

149.7371

Total

43

515.9451

Coefficients

Standard
Error

t Stat

P-value

Intercept

2.8315

0.6989

0.0002

Local media

1.7926

0.1768

10.1350

7.51E-13

## Table 4: Forward selection - Step 1

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.8542

R Square

0.7296

0.7165

Standard Error

1.8443

Observations

44

ANOVA
df

SS

MS

Significance
F

Regression

376.4809

188.2405

55.3393

2.25E-12

Residual

41

139.4642

3.4015

Total

43

515.9451

Coefficients

Standard
Error

t Stat

P-value

Intercept

1.0861

1.2144

0.8943

0.3763

Local media

1.6577

0.1894

8.7516

6.31E-11

National media

0.6205

0.3570

1.7378

0.0897

## Table 5: Forward selection - Step 2.1

4

SUMMARY OUTPUT
Regression Statistics
Multiple R

0.8475

R Square

0.7183

0.7046

Standard Error

1.8826

Observations

44

ANOVA
df

SS

MS

Significance
F

Regression

370.6255

185.3128

52.2835

5.24E-12

Residual

41

145.3196

3.5443

Total

43

515.9451

Coefficients

Standard
Error

t Stat

P-value

Intercept

2.9099

0.7004

4.1544

0.0001

On-site

0.8112

0.7266

1.1164

0.2707

Local media

0.9387

0.7849

1.1960

0.2385

## Normal probability plot

4
Expected normal values

-8

2
1
0
-6

-4

-2

-1 0

-2
-3
-4
Residuals

Figure 1: Normal probability plot for the model using backward selection

Residuals

4
3
2
1
0
-1 0
-2
-3
-4
-5
-6
-7

10

15

Predicted values

20

## FORMULAE: MULTIPLE REGRESSION

Multiple regression model (for the sake of defining  and ):
    #\$

'

"





#

'


 !
  
   % & (







!
'
#







  

Parameter estimate: #) *   
Residual: '+  , 
Variance of regression coefficient estimate: ./#)0 1 2  * *03*03
Mean squared error: 2 
Sum of squares:

45  45



9

;


;


## Multiple coefficient of determination: 7 

;

>>?@A?@BBCDE
>>FDFGH


Adjusted multiple coefficient of determination: 7IJ0
,

Test statistics:

#)0 ,#0


2L* *03*03

>>FDFGH

>>@??D?



P67
MO
P6< 
#)
O  
N MO
*
2
*3*3
O

Confidence intervals:

MN


#)0 Q NR K 2L* *03*03




## - Q NR K 2L % S *  S



>>@??D?

>>FDFGH

