Anda di halaman 1dari 28

Chapter 6

Variable
Screening
Methods

Copyright 2012 Pearson Education, Inc. All rights reserved.

Why variable screening?

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 2

Commonly used variable


screening methods

Forward Selection
Backward Elimination
Stepwise Regression
Best Subset Regression

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 3

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 4

Stepwise Regression
This is the most popular of the three methods. This is a
combination of forward and backward methods.

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 5

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 6

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 7

Some caveats of this procedure:


Stepwise and other variables screening methods do not
guarantee that the best model has been achieved since:

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 8

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 9

Figure 6.1 MINITAB stepwise


regression results for executive salaries

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 10

Figure 6.2 SAS backward stepwise


regression for executive salaries

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 11

Best Subset Regression


While the stepwise regressions are objective
methods of arriving at a satisfactory model,
there are some subjective methods as well.
These methods intend to select the subset of
explanatory variables which yield the best model
with respect to some measure. Several such
measures are described in the following slides.
In the absence of multicollinearity various
method usually results in the same model
selected
Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 12

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 13

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 14

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 15

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 16

Figure 6.3 MINITAB all-possibleregressions selection results for executive


salaries

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 17

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 18

Figure 6.4 MINITAB plots of all-possibleregressions selection criteria for Example


6.2

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 19

Which model to use?


According to Chatterjee and Hadi (2006):
[Regression Analysis by Example] a regression
analysis serves different objectives. These are:

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 20

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 21

So a desirable strategy of model building is to select 2 or 3


models which are good with respect to describing data e.g. using
adjusted R2 or mean square error criteria or Cp criteria.
Among these short listed models choose model or models which
are good for prediction out of sample if objective is to predict a
new observation. PRESS statistic is a suitable measure in this
regards. However these all stepwise and best subset regressions
work well when the explanatory variables are not collinear. In
the case of collinear data, the estimate of the coefficient will not
be precise i.e. their variances will be large. So if the objective is
control one should then finds a model which has low
collinearity in addition to being good with respect to PRESS and
adjusted R2.
Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 22

How to measure collinearity?


In the cases of multicollinearity or simply collinearity
the variances of the estimated regression coefficients
are very large making the coefficients less precise. This
can create problem if our interest is to estimate the
coefficient precisely e.g. in the control objective of
the regression. Omitting an important variable from
regression creates bias in estimation of included
coefficients. This bias becomes serious when the
omitted variable is correlated with the included
variable. Thus a good sign of multicollinearity is that
inclusion or exclusion of variables from the regression
may significantly change the values or even signs of the
coefficients.
6- 23
Copyright o 2012 Pearsoeen Education, Inc. All rights reserved.

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 24

Is there evidence of multicollinearity


in the executives salary data?
A regression with full set of x variables gives:

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 25

Consider the data set reported


by Montgomery et al (2012).
This data is from Hald (1952) know as Hald
cement data. This is related to heat evolved in
calories per gram of cement (y) and three x
variables, namely, tricalcium aluminate (x1),
tricalcium silicate(x2) tetrraclcium ferrite (x3)

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 26

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 27

Copyright 2012 Pearson Education, Inc. All rights reserved.

6- 28

Anda mungkin juga menyukai