ch12 MultipleRegression1

Chapter 12
Multiple Regression
Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.
Chap 12-1
Chapter Goals
After completing this chapter, you should be
able to:
apply multiple regression analysis to business

decision-making situations
analyze and interpret the computer output for a

multiple regression model
perform residual analysis for the multiple

regression model
test the significance of the independent variables

in a multiple regression model
Chapter Goals
(continued)
After completing this chapter, you should be

able to:
use a coefficient of partial determination to test

portions of the multiple regression model
incorporate qualitative variables into the

regression model by using dummy variables
use interaction terms in regression models
Multiple Regression Model

Idea: Examine the linear relationship between
1 dependent (Y) & 2 or more independent variables (Xi)
Multiple Regression Model with k Independent Variables:
Y-intercept
Population slopes
Random Error
Yi 0 1X1i 2 X 2i k Xki
12.2 Multiple Regression Equation

The coefficients of the multiple regression model are
estimated using sample data
Multiple regression equation with k independent variables:
Estimated
(or predicted)
value of Y
Estimated
intercept
Estimated slope coefficients
a b X b X b X
Y
i
1 1i
2 2i
k ki
In this chapter we will always use Excel to obtain the
regression slope coefficients and other regression
summary measures.
Multiple Regression Equation

(continued)
Two variable model

Y
pe
o
Sl
e
bl
a
i
ar
v
r
fo
X1
ab X b X
Y
1 1
2 2
X1
varia
r
o
f
e
lo p
X2
ble X 2
Example:
Two Independent Variables
A distributor of frozen desert pies wants to

evaluate factors thought to influence demand
Dependent variable:
Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100s)
Data are collected for 15 weeks
Pie Sales Example

Week
Pie
Sales
Price
($)
Advertising
($100s)
350
5.50
3.3
460
7.50
3.3
350
8.00
3.0
430
8.00
4.5
350
6.80
3.0
380
7.50
4.0
430
4.50
3.0
470
6.40
3.7
450
7.00
3.5
10
490
5.00
4.0
11
340
7.20
3.5
12
300
7.90
3.2
13
440
5.90
4.0
14
450
5.00
3.5
15
300
7.00
2.7
Multiple regression equation:
Sales = b0 + b1 (Price)
+ b2 (Advertising)
Multiple Regression Output

Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
Sales 306.526 - 24.975(Price) 74.131(Adv ertising)
15
df
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Multiple Regression Equation

where
Sales is in number of pies per week
Price is in $
Advertising is in $100s.
b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
b2 = 74.131: sales will

increase, on average,
by 74.131 pies per
week for each $100
increase in
advertising, net of the
effects of changes
due to price
Using The Equation to Make

Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:
306.526 - 24.975 (5.50) 74.131 (3.5)
428.62
Predicted sales
is 428.62 pies
Note that Advertising is

in $100s, so $350
means that X2 = 3.5
Predictions in PHStat
PHStat | regression | multiple regression
Check the
confidence and
prediction interval
estimates box
Predictions in PHStat
(continued)
Input values
<
Predicted Y value
<
Confidence interval for the

mean Y value, given
these Xs
<
Prediction interval for an

individual Y value, given
these Xs
12.6 Coefficient of
Multiple Determination
Reports the proportion of total variation in Y

explained by all X variables taken together
2
Y .12..k
SSR regression sum of squares
SST
total sum of squares
Multiple Coefficient of Determination

(continued)
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
Regression
15
df
SSR 29460.0
.52148
SST 56493.3
52.1% of the variation in pie sales
is explained by the variation in
price and advertising
47.46341
Observations
ANOVA
2
Y.12
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Adjusted r2
r2 never decreases when a new X variable is

added to the model
This can be a disadvantage when comparing
models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X
variable is added
Did the new X variable add enough
explanatory power to offset the loss of one
degree of freedom?
Adjusted r2
(continued)
Shows the proportion of variation in Y explained by

all X variables adjusted for the number of X
variables used
2
adj
1 (1 r
2
Y .12..k
n 1
)

n k 1
(where n = sample size, k = number of independent variables)
Penalize excessive use of unimportant independent

variables
Smaller than r2
Useful in comparing among models
Adjusted r2
(continued)
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
2
radj
.44172
44.2% of the variation in pie sales is

explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
12.10 Residuals in Multiple

Regression (Model Checking)
Two variable model
Sample
observation
Y b0 b1X1 b 2 X 2
<
Residual =
ei = (Yi Yi)
Y
Yi
<
Yi
x2i
X1
<
x1i
X2
The best fit equation, Y ,

is found by minimizing the
sum of squared errors, e2
Multiple Regression Assumptions

Errors (residuals) from the regression model:
<
ei = (Yi Yi)
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
Residual Plots Used

in Multiple Regression
These residual plots are used in multiple

regression:
<
Residuals vs. Yi
Residuals vs. X1i
Residuals vs. X2i
Residuals vs. time (if time series data)

Use the residual plots to check for
violations of regression assumptions
Is the Model Significant?
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all

of the X variables considered together and Y
Use F test statistic
Hypotheses:
H0: 1 = 2 = = k = 0 (no linear relationship)
H1: at least one i 0 (at least one independent
variable affects Y)
F-Test for Overall Significance
Test statistic:
SSR
MSR
k
F
SSE
MSE
n k 1
where F has (numerator) = k and
(denominator) = (n k - 1)
degrees of freedom

(continued)
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
MSR 14730.0
F
6.5386
MSE
2252.8
With 2 and 12 degrees
of freedom
SS
MS
P-value for
the F-Test
F
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%

(continued)
Test Statistic:
H0: 1 = 2 = 0
H1: 1 and 2 not both zero
= .05
df1= 2
df2 = 12
Decision:
Critical
Value:
Since F test statistic is in

the rejection region (pvalue < .05), reject H0
F = 3.885
= .05
Do not
reject H0
Reject H0
F.05 = 3.885
MSR
F
6.5386
MSE
Conclusion:
F
There is evidence that at least one

independent variable affects Y
Are Individual Variables Significant?
Use t-tests of individual variable slopes
Shows if there is a linear relationship between

the variable Xi and Y
Hypotheses:
H0: i = 0 (no linear relationship)
H1: i 0 (linear relationship does exist

between Xi and Y)

(continued)
H0: i = 0 (no linear relationship)

H1: i 0 (linear relationship does exist
between xi and y)
Test Statistic:
bi 0
t
S bi
(df = n k 1)

(continued)
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
t-value for Price is t = -2.306, with

p-value .0398
t-value for Advertising is t = 2.855,
with p-value .0145
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Inferences about the Slope:

t Test Example
From Excel output:
H0: i = 0
H1: i 0
Price
Advertising
d.f. = 15-2-1 = 12
Coefficients
Standard Error
t Stat
P-value
-24.97509
10.83213
-2.30565
0.03979
74.13096
25.96732
2.85478
0.01449
The test statistic for each variable falls

in the rejection region (p-values < .05)
= .05
t/2 = 2.1788
Decision:
/2=.025
/2=.025
Reject H0 for each variable
Conclusion:
Reject H0
Do not reject H0
-t/2
-2.1788
Reject H0
t/2
2.1788
There is evidence that both

Price and Advertising affect
pie sales at = .05
12.12 Nonlinear Relationships
The relationship between the dependent

variable and an independent variable may
not be linear
Can review the scatter diagram to check for
non-linear relationships
Example: Quadratic model
Yi 0 1X1i 2 X1i2 i
The second independent variable is the square

of the first variable
Quadratic Regression Model

Model form:
Yi 0 1X1i 2 X i
2
1i
where:
0 = Y intercept
1 = regression coefficient for linear effect of X on Y
2 = regression coefficient for quadratic effect on Y
i = random error in Y for observation i
Linear vs. Nonlinear Fit

Y
X
Linear fit does not give
random residuals
residuals
residuals
Nonlinear fit gives

random residuals
Testing the Overall

Quadratic Model
Estimate the quadratic model to obtain the

regression equation:
2
Yi b0 b1X1i b 2 X1i
Test for Overall Relationship

H0: 1 = 2 = 0 (no overall relationship between X and Y)
H1: 1 and/or 2 0 (there is a relationship between X and Y)
F test statistic =
MSR
MSE
Testing for Significance:

Quadratic Effect
Testing the Quadratic Effect

Hypotheses
H0: 2 = 0
(The quadratic term does not improve the model)
H1: 2 0
(The quadratic term improves the model)
The test statistic is
b2 2
t
Sb 2
d.f. n 3
where:
b2 = squared term slope
coefficient
2 = hypothesized slope (zero)
Sb 2= standard error of the slope
Testing for Significance:

Quadratic Effect
(continued)
Testing the Quadratic Effect

Compare r2 from simple regression to
adjusted r2 from the quadratic model
If adj. r2 from the quadratic model is larger

than the r2 from the simple model, then the
quadratic model is a better model
Example: Quadratic Model

Purity
Filter
Time
15
22
33
40
10
54
12
67
13
70
14
78
15
85
15
87
16
99
17
Purity increases as filter time

increases:
Simple regression results:
(continued)
Y
^ = -11.283 + 5.985 Time
Coefficients
Standard
Error
-11.28267
3.46805
-3.25332
0.00691
5.98520
0.30966
19.32819
2.078E-10
Intercept
Time
t Stat
P-value
R Square
0.96888
Adjusted R Square
0.96628
Standard Error
6.15997
F
373.57904
Significance F
2.0778E-10
t statistic, F statistic, and r2

are all high, but the
residuals are not random:
Quadratic regression results:

Y
^ = 1.539 + 1.565 Time + 0.245 (Time)2
Coefficients
Standard
Error
Intercept
1.53870
2.24465
0.68550
0.50722
Time
1.56496
0.60179
2.60052
0.02467
Time-squared
0.24516
0.03258
7.52406
1.165E-05
R Square
0.99494
Adjusted R Square
0.99402
Standard Error
2.59513
t Stat
F
1080.7330
P-value
Significance F
2.368E-13
The quadratic term is significant and

improves the model: adj. r2 is higher and
SYX is lower, residuals are now random
(continued)
12.9 Model Building
Goal is to develop a model with the best set of

independent variables
Stepwise regression procedure
Easier to interpret if unimportant variables are

removed
Lower probability of collinearity
Provide evaluation of alternative models as variables
are added
Best-subset approach
Try all combinations and select the best using the

highest adjusted r2 and lowest standard error
Stepwise Regression
Idea: develop the least squares regression

equation in steps, adding one explanatory
variable at a time and evaluating whether
existing variables should remain or be removed
The coefficient of partial determination is the

measure of the marginal contribution of each
independent variable, given that other
independent variables are in the model
Best Subsets Regression
Idea: estimate all possible regression equations

using all possible combinations of independent
variables
Choose the best fit by looking for the highest

adjusted r2 and lowest standard error
Stepwise regression and best subsets
regression can be performed using PHStat
12.11 Alternative Best Subsets

Criterion
Calculate the value Cp for each potential

regression model
Consider models with Cp values close to or

below k + 1
k is the number of independent variables in the

model under consideration
Alternative Best Subsets Criterion
(continued)
The Cp Statistic
(1 Rk2 )(n T )
Cp
(n 2(k 1))
2
1 RT
Where
k = number of independent variables included in a

particular regression model
T = total number of parameters to be estimated in the
full regression model
R k2 = coefficient of multiple determination for model with k
independent variables
2
R T = coefficient of multiple determination for full model with
all T estimated parameters
10 Steps in Model Building

1. Choose explanatory variables to include in the
model
2. Estimate full model and check VIFs
3. Check if any VIFs > 5
4.
If no VIF > 5, go to step 5

If one VIF > 5, remove this variable
If more than one, eliminate the variable with the
highest VIF and go back to step 2
5. Perform best subsets regression with

remaining variables
10 Steps in Model Building

(continued)
6. List all models with Cp close to or less than (k

+ 1)
7. Choose the best model
Consider parsimony
Do extra variable make a significant contribution?
8. Perform complete analysis with chosen model,

including residual analysis
9. Transform the model if necessary to deal with
violations of linearity or other model
assumptions
10. Use the model for prediction
Model Building Flowchart

Choose X1,X2,Xk
Run regression
to find VIFs
Any
VIF>5?
Yes
Remove
variable with
highest
VIF
Yes
More
than one?
No
Run subsets
regression to obtain
best models in
terms of Cp
Do complete analysis
Add quadratic term and/or
transform variables as indicated
No
Remove
this X
Perform
predictions

ch12 MultipleRegression1

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

ch12 MultipleRegression1

Diunggah oleh

Hak Cipta:

Format Tersedia

Chapter 12

Statistics for Managers Using Microsoft Excel, 4e 2004 Prentice-Hall, Inc.

apply multiple regression analysis to business

analyze and interpret the computer output for a

perform residual analysis for the multiple

test the significance of the independent variables

After completing this chapter, you should be

use a coefficient of partial determination to test

incorporate qualitative variables into the

use interaction terms in regression models

Multiple Regression Model

12.2 Multiple Regression Equation

Estimated slope coefficients

Multiple Regression Equation

Two variable model

A distributor of frozen desert pies wants to

Data are collected for 15 weeks

Pie Sales Example

Multiple regression equation:

Multiple Regression Output

Sales 306.526 - 24.975(Price) 74.131(Adv ertising)

Multiple Regression Equation

b2 = 74.131: sales will

Using The Equation to Make

Note that Advertising is

PHStat | regression | multiple regression

Confidence interval for the

Prediction interval for an

Reports the proportion of total variation in Y

SSR regression sum of squares

Multiple Coefficient of Determination

r2 never decreases when a new X variable is

Shows the proportion of variation in Y explained by

(where n = sample size, k = number of independent variables)

Penalize excessive use of unimportant independent

44.2% of the variation in pie sales is

12.10 Residuals in Multiple

The best fit equation, Y ,

Multiple Regression Assumptions

Residual Plots Used

These residual plots are used in multiple

Residuals vs. X1i

Residuals vs. X2i

Residuals vs. time (if time series data)

Is the Model Significant?

F-Test for Overall Significance of the Model

Shows if there is a linear relationship between all

Use F test statistic

F-Test for Overall Significance

F-Test for Overall Significance

F-Test for Overall Significance

Since F test statistic is in

There is evidence that at least one

Are Individual Variables Significant?

Use t-tests of individual variable slopes

Shows if there is a linear relationship between

H0: i = 0 (no linear relationship)

H1: i 0 (linear relationship does exist

Are Individual Variables Significant?

H0: i = 0 (no linear relationship)

Are Individual Variables Significant?

t-value for Price is t = -2.306, with

Inferences about the Slope: