2 - Multiple Regression Models

Types of regression models
Regression Models
Simple
1 order
Multiple
1 order
2 order
Higher order
Interaction
2 order
Higher order
A quadratic second order model
E(Y)=0+ 1x+ 2 x2
Interpretation of model parameters:
0: y-intercept. The value of E(Y) when x1 = x2 = 0
1 : is the shift parameter;
2 : is the rate of curvature;
Example with quadratic terms

The true model, supposedly unknown, is
Yi 100.00
= 2 + xi2 + i, with i~N(0,2)
75.00

50.00
25.00
0.00
2.00
Data: (x,y). See SQM.sav
4.00
6.00
8.00
10.00
Model 1: E(Y) = 0 + 1x
Model Summary
Model
1
R
,973a
R Square
,947
a. Predictors: (Constant), x
Model
1
Regression
Residual
Total
Sum of
Squares
80624,915
4500,202
85125,117
a. Predictors: (Constant), x
Adjusted
R Square
,947
ANOVAb
df
1
103
104
Mean Square
80624,915
43,691
F
1845,332
Sig.
,000a
Coefficientsa
b. Dependent Variable: y
Unstandardized
Coefficients
Model
B
Std. Error
1
(Constant)
-19,959
1,483
x
10,744
,250
a. Dependent Variable: y
Std. Error of
the Estimate
6,60994
Standardized
Coefficients
Beta
,973
t
-13,454
42,957
Sig.
,000
,000
Linear Regression
100.00
y = -19.96 + 10.74 * x
R-Square = 0.95
75.00

50.00
25.00
0.00
2.00
4.00
6.00
8.00
10.00
Model 2: E(Y) = 0 + 1x2

Model Summary
Model
1
R
,996a
R Square
,991
Adjusted
R Square
,991
a. Predictors: (Constant), XSquare
Model
1
Regression
Residual
Total
Sum of
Squares
84381,422
743,695
85125,117
Std. Error of
the Estimate
2,68707
Smaller variance and SE
ANOVAb
df
1
103
104
Mean Square
84381,422
7,220
F
11686,632
Sig.
,000a
a. Predictors: (Constant), XSquare

Coefficientsa
Model
1
(Constant)
XSquare
Unstandardized
Coefficients
B
Std. Error
2,340
,417
,997
,009
Standardized
Coefficients
Beta
,996
t
5,608
108,105
Sig.
,000
,000
Linear Regression
100.00
y = 2.34 + 1.00 * XSquare

R-Square = 0.99
75.00
50.00
25.00
0.00
0.00
25.00
50.00
XSquare
75.00
100.00
Model 3: E(Y) = 0 + 1x + 2x2

Model Summary
Model
1
R
.996a
R Square
.991
Adjusted
R Square
.991
Std. Error of
the Estimate
2.66608
a. Predictors: (Constant), XSquare, x

ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
84400.103
725.014
85125.117
df
2
102
104
Mean Square
42200.052
7.108
F
5936.999
Sig.
.000a
a. Predictors: (Constant), XSquare, x

Model
1
(Constant)
x
XSquare
Coefficientsa
Unstandardized
Coefficients
B
Std. Error
4.177
1.206
-.830
.512
1.071
.046
Standardized
Coefficients
Beta
-.075
1.069
t
3.463
-1.621
23.046
Sig.
.001
.108
.000
Regression Models
Simple
1 order
Multiple
1 order
2 order
Higher order
Interaction
2 order
Higher order
A third order model with 1 IV
E(Y)=0+ 1x+ 2 x2+ 3 x3

Use with caution given
numerical problems that
could arise
>0
3
<0
3
Regression Models
Simple
1 order
Multiple
1 order
2 order
Higher order
Interaction
2 order
Higher order
First-Order model in k Quantitative variables

E(Y)=0+1x1+2 x2 + ... + k xk
0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0
1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk
are held fixed;
2: change in E(Y) for a 1-unit increase in x2 when x1, x3,...,
xk are held fixed;
...
A bivariate model
E(Y)=0+1x1+2 x2
Changing x2 changes only the y-intercept.
In the first order model a 1-unit change in one independent

variable will have the same effect on the mean value of y
regardless of the other independent variables.
A bivariate model
Y
Response
P la n e
X1
Y i = 0 + 1X 1i + 2X 2i + i
(( OO bb ss ee rr vv e dd YY ))
00
X2
( X 1 i , X 22 i )
E ( Y ) = 0 + 11 X 1 i + 2 X 2 i
Example: executive salaries
Y = Annual salary (in dollars)

x1 = Years of experience
x2 = Years of education
x3 = Gender : 1 if male; 0 if female
x4 = Number of employees supervised
x5 = Corporate assets (in millions of dollars)
E(Y)=0+ 1x1+ 2 x2 + 4 x4 + 5 x5
Data: ExecSal.sav
Do not consider
x3
(Gender) for the moment
Exsecutive salaries: Computer Output

Riepilogo del modello
Modello
R-quadrato
R
R-quadrato corretto
,870a
,757
,747
Deviazione standard Errore

della stima
12685,309
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education,
Number of Employees supervised
Simple regression
Multiple regression
Modello
R
1
dimension0
R-quadrato
,783a
,613
.
Predittori: (Costante), Years of Experience
R-quadrato
corretto
,609
Deviazione
standard Errore
della stima
15760,006
Coefficient of determination
The coefficient R2 is computed exactly as in the

simple regression case.
R2
n
Explained variation SSR

SSE
1
Total variation
SST
SST
( yi y ) 2
i 1
SST (Total)
( y i y ) 2
i 1
SSR (Regression)
( yi y i ) 2
i 1
SSE (Error)
A drawback of R2: it increases with the number of added

variables, even if these are NOT relevant to the problem.
Adjusted R2 and estimate of the variance 2
A solution: Adjusted R2
Each additional variable reduces adjusted R2, unless
SSE varies enough to compensate
Ra2
n 1 SSE
SSE
2
1
1
SST
SST
An unbiased estimator of the variance 2 is computed as
2
i
SSE
s
n k 1 n k 1
2
Exsecutive salaries: Computer Output (2)

Coefficientia
Model
Coefficienti non
standardizzati
Variables
1
B
(Costante)
Years of
Experience
Years of
Education
Number of
Employees
supervised
Corporate
assets (in
million $)
Deviazione
standard
Errore
Coefficienti
standardizz
ati
Beta
T-tests
t
Sig.
-37082,148
17052,089
-2,175
,032
2696,360
173,647
,785 15,528
,000
2656,017
563,476
,243
4,714
,000
41,092
7,807
,272
5,264
,000
244,569
83,420
,149
2,932
,004
Variabile dipendente: Annual salary in $
Testing overall significance: the F-test

1. Shows If There Is a Linear Relationship
Between All X Variables Together & Y
2. Uses F Test Statistic
3. Hypotheses
H0: 1 = 2 = ... = k = 0
No Linear Relationship
Ha: At Least One Coefficient Is Not 0
At Least One X Variable Affects Y
The F-test for 1 single coefficient is equivalent to the t-test
Anova table
Anovab
Modello
1
Somma dei
quadrati
F-statistic
Media dei
quadrati
df
Regressione
4,766E10
Residuo
1,529E10
95
Totale
6,295E10
99
4 1,192E10 74,045
Sig.
,000a
1,609E8
. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of
Education, Number of Employees supervised
a
df = k: number of
b. Variabile dipendente: Annual salary in $
regression slopes
p-vale of F-test
df = n-1: n=
number of
observations
MSE (mean
square error),
the estimate of
variance
Decision: reject
H0, i.e. accept
this model
Interaction (second order) model

E(Y)=0+ 1x1+ 2 x2 + 3 x1x2
0: y-intercept. The value of E(Y) when x1 = x2 = 0
1+ 3 x2 : change in E(Y) for a 1-unit increase in x1
when x2 is held fixed;
2 + 3 x1 : change in E(Y) for a 1-unit increase in x2
when x1 is held fixed;
3: controls the rate of change of the surface.
Interaction (second order) model

E(Y)=0+ 1x1+ 2 x2 + 3 x1x2
Contour lines are not parallel
The effect of one variable depends on the level of the other
Example: Antique grandfather clocks auction

Clocks are sold at an auction on competitive offers. Data are:
Y : auction price in dollars
X1: age of clocks
X2: number of bidders
Model 1: E(Y) = 0 + 1x1 + 2x2

Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2
Data: GFCLOCKS.sav
Data summaries
Descriptive Statistics
Minimu Maximu Mean
Std.
N
Skewness
Kurtosis
m
m
Deviatio Statistic Std. Error Statistic Std. Error
Statistic Statistic
Statistic
Statistic Statistic
n
Age
32
108
194 144.94 27.395
.216
.414
-1.323
.809
Bidders
32
5
15
9.53
2.840
.420
.414
-.788
.809
Price
32
729
2131 1326.88 393.487
.396
.414
-.727
.809
Valid N (listwise)
32
If data are Normal Skewness is 0

If data are Normal (eccess) Kurtosis is 0
Note: Skewness and Kurtosis are not
enough to establish Normality
P-P plot for Normality
If data are Normal.

Points should be
along the straight
line.
In this example the
situation is fairly
good
Bivariate scatter-plots
2000
2000
1600
1200
1200
800
1600
800
120
140
160
Age
180
10
Bidders
12
14
Model 1: E(Y) = 0 + 1x1 + 2x2

Model Summary
Model
1
R
R Square
a
.945
.892
Adjusted
R Square
.885
Std. Error of
the Estimate
133.485
a. Predictors: (Constant), Bidders, Age

ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
4283062.960
516726.540
4799789.500
df
2
29
31
Mean Square
2141531.480
17818.157
F
120.188
Sig.
.000a
a. Predictors: (Constant), Bidders, Age

Coefficientsa
b. Dependent Variable: Price
Model
1
(Constant)
Age
Bidders
Unstandardized
Coefficients
B
Std. Error
-1338.951
173.809
12.741
.905
85.953
8.729
a. Dependent Variable: Price
Standardized
Coefficients
Beta
.887
.620
t
-7.704
14.082
9.847
Sig.
.000
.000
.000
Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2

Model Summary
Model
1
R
R Square
a
.977
.954
Adjusted
R Square
.949
Std. Error of
the Estimate
88.915
a. Predictors: (Constant), AgeBid, Age, Bidders

ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
4578427.367
221362.133
4799789.500
df
3
28
31
Mean Square
1526142.456
7905.790
F
193.041
Sig.
.000a
t
1.086
.432
-3.120
6.112
Sig.
.287
.669
.004
.000
a. Predictors: (Constant), AgeBid, Age, Bidders

Coefficientsa
b. Dependent Variable: Price
Model
1
(Constant)
Age
Bidders
AgeBid
Unstandardized
Coefficients
B
Std. Error
320.458
295.141
.878
2.032
-93.265
29.892
1.298
.212
a. Dependent Variable: Price
Standardized
Coefficients
Beta
.061
-.673
1.369
Interpreting interaction models

The coefficient for the interaction term is significant.
If an interaction term is present then also the
corresponding first order terms need to be included to
correctly interpret the model.
In the example an uncareful analyst could estimate the
effect of Bidders as negative, since b2=-93.26
Since an interaction term is present, the slope estimate
for Bidders (x2) is
b2 + b3x1
Note: b = ^
For x1= 150 (age) the estimated slope for Bidders is

-93.26 + 1.3 (150) = 101.74
Models with qualitative Xs

Regression models can also include qualitative (or
categorical) independent variables (QIV).
The categories of a QIV are called levels
Since the levels of a QIV are not measured on a natural
numerical scale in order to avoid introducing fictitious
linear relations in the model we need to use a specific
type of coding.
Coding is done by using IV which assume only two values:
0 or 1.
These coded IV are called dummy variables
Models with QIV

Suppose we want to model Income (Y) as a function of
Sex (x) -> use coded, or dummy, variables
x = 1 if Male, x = 0 if Female
E(Y) = 0+ 1x
E(Y) = 0+ 1 if x =1, i.e. Male
E(Y) = 0 if x =0, i.e. Female
0 is the base level, i.e Female is the reference category
1 is the additional effect if Male
In this simple model, only the means for the two groups are
modeled
QIV with q levels

As a general rule, if a QIV has q levels we need q-1 dummies
for coding. The uncoded level is the reference one.
Example: a QIV has three levels, A, B and C
Define
x1 = 1 level A, x1 = 0 if not
x2 = 1 level B, x2 = 0 if not
Model: E(Y) = 0+ 1x1 + 2x2
C is the reference level
Interpreting s
0 = C
(mean for base level C)
1 = A - C
(additional effect wrt C if level A)
2 = B - C
(additional effect wrt C if level B)
Models with dummies

Even if models which consider only dummy variables do in
practice estimate the means of various groups, the testing
machinery of the regression setup can be useful for group
comparisons.
Dummies can be used in combination with any other
dummies and quantitative Xs to construct models with
first order effects (or main effects) and interactions to
test hypotheses of interest.
In order to define dummies in SPSS see
Computing dummy vars in SPSS.ppt
Example: executive salaries

A managing consulting firms has developed a regression
model in order to analyze executives salary structure
Data: ExecSal.sav
A simple model: E(Y) = 0 + 3x3

Male group
Female group
This model estimates the means of the two groups (M,F)

We wanto to test if the difference in means is
significant, i.e. not due to chance
Regression Output
Model Summary
Model
1
R
R Square
a
.392
.153
Adjusted
R Square
.145
a. Predictors: (Constant), Gender
Model
1
Regression
Residual
Total
Salary difference between

groups is significant
Std. Error of
the Estimate
23320.282
ANOVAb
Sum of Squares
9651865066.845
53295882433.156
62947747500.001
a. Predictors: (Constant), Gender
df
Mean Square
9651865066.845
543835535.032
1
98
99
F
17.748
Sig.
.000a
Coefficientsa
b. Dependent Variable: Annual salary in $

Model
1
(Constant)
Gender
Unstandardized
Coefficients
B
Std. Error
83847.059
3999.395
20739.305
4922.915
Standardized
Coefficients
Beta
.392
t
20.965
4.213
Sig.
.000
.000
95% Confidence Interval for B

Lower Bound
Upper Bound
75910.389
91783.729
10969.940
30508.670
a. Dependent Variable: Annual salary in $
Mean increment for Male
C.I. for mean increment
Model 2: E(Y) = 0 + 1x1 + 3x3
It seems that
the two groups
are separated
Model 2 considers
same slope but
different
intercepts
If x3 = 0 (female) then E(Y) = 0 + 1x1

If x3 = 1 (male)
then E(Y) = 0 + 3 + 1x1
Computer output for model 2

R square improved greatly
Model Summary
Model
1
R
R Square
a
.860
.740
Adjusted
R Square
.735
Std. Error of
the Estimate
12981.615
a. Predictors: (Constant), Years of Experience, Gender

b
ANOVA
Model
1
Regression
Residual
Total
Sum of Squares
46601081714.527
16346665785.474
62947747500.001
df
2
97
99
Mean Square
23300540857.264
168522327.685
F
138.264
Sig.
.000a
a. Predictors: (Constant), Years of Experience, Gender

a
b. Dependent Variable: Annual salary in $
Model
1
(Constant)
Gender
Years of Experience
Unstandardized
Coefficients
B
Std. Error
50614.312
3161.279
18894.215
2743.253
2633.831
177.875
Coefficients
Standardized
Coefficients
Beta
.357
.767
t
16.011
6.888
14.807
Sig.
.000
.000
.000

Lower Bound
Upper Bound
44340.048
56888.576
13449.618
24338.812
2280.799
2986.863
New intercept for

Male is significant
In this model effect of experience

is assumed equal for the two
groups
Model 3: E(Y) = 0 + 1x1 + 3x3 + 4x1x3

With this model we want to test whether gender and
experience interacts, i.e. if male salary tend to grow at
a faster (slower) rate with experience.
If x3 = 0 (female) then E(Y) = 0 + 1x1
If x3 = 1 (male)
then E(Y) = (0 + 3) + (1 + 4)x1
New intercept for

male
New slope for male
Remark: running regression for the two groups together

allows to have higher degrees of freedom (n) for
estimating parameters and model variance.
Model 3: E(Y) = 0 + 1x1 + 3x3 + 4x1x3
Model 3 considers
different slope
and different
intercepts
Computer output for model 3

Model Summary
Model
1
R
R Square
.868a
.754
Adjusted
R Square
.746
Std. Error of
the Estimate
12700.080
a. Predictors: (Constant), ExpGender, Years of

Coefficientsa
Experience, Gender
Model
1
(Constant)
Gender
Years of Experience
ExpGender
Unstandardized
Coefficients
B
Std. Error
58049.768 4461.179
7798.504 5497.470
2044.541
308.565
864.122
373.653
Standardized
Coefficients
Beta
.147
.595
.301
There is evidence that

salaries for the two groups
grow at different rate with
experience
t
13.012
1.419
6.626
2.313
Sig.
.000
.159
.000
.023

Lower Bound Upper Bound
49194.397
66905.139
-3113.888
18710.896
1432.045
2657.036
122.426
1605.818
Estimated lines:
Y^ = 58049.8 + 2044.5*(Years of Experience) for female
^ = 65848.3 + 2908.7*(Years of Experience) for male
Y
A complete second order model

E(Y)=0+ 1x1+ 2 x2 + 3 x1x2+ 4x12+ 5 x22
0: y-intercept. The value of E(Y) when x 1 = x2 = 0
1 and 2 : shifts along the x1 and x2 axes;
3 : rotation of the surface;
4 and 5 : controls the rate of curvature.
Back to Executive salaries

What about if
suspect that rate
of growth
changes and has
opposite signs for
M and F?
x3 = Gender (1 if Male)
Note: x32 = x3 since
it is a dummy
E(Y)=0+ 1x1+ 2 x3 + 3 x1x3+ 4x12

E(Y)=0+ 1x1+ 2 x3 + 3 x1x3+ 4x12+ 5 x3x12
Model 4
Model 5
Comparing Model 4 and 5

Model 4
If x3 = 0 (female) then
E(Y) = 0 + 1x1 + 4x12
If x3 = 1 (male)
Model 5
then
E(Y) = (0 + 2) + (1 + 3)x1 + 4x12

Different intercept and slope for M
and F but same curvature
If x3 = 0 (female) then
E(Y) = 0 + 1x1 + 4x12
If x3 = 1 (male)
then
E(Y) = (0 + 2) + (1 + 3)x1 + (4+5)x12

Different intercept, slope and
curvature for M and F
Model 5: computer output

Modello
R
dimension0
,875a
R-quadrato
corretto
R-quadrato
,766
Deviazione
standard Errore
della stima
,754
12507,735
a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen
Anovab
Modello
1
Somma
dei
quadrati
Media dei
quadrati
df
Regressione
4,824E10
Residuo
1,471E10
94
Totale
6,295E10
99
b. Variabile dipendente: Annual salary in $
9,648E9 61,673
1,564E8
Sig.
,000a
Model 5: computer output

Coefficientia
Modello
Coefficienti non
standardizzati
Deviazion
e
standard
Errore
Beta
Sig.
(Costante)
52391,973 6497,971
8,063
,000
Years of
Experience
Gender
ExpGen
ExpSqu
Exp2Gen
3373,970 1165,248
,982 2,895
,005
21122,152 8285,802
-2081,897 1459,842
-53,181
45,001
112,836
54,950
,399
-,724
-,422
,904
2,549
-1,426
-1,182
2,053
a. Variabile dipendente: Annual salary in $
Which model is preferable? Model 3 or model 5?
,012
,157
,240
,043
A test for comparing nested models

Two models are nested if one model contains all the terms
of the other model and at least one additional term.
The more complex of the two models is called the
complete (or full) model.
The other is called the reduced (or restricted) model.
Example: model 1 is nested in model 2
Model 1: E(Y)=0+ 1x1+ 2 x2 + 3 x1x2
Model 2: E(Y)=0+ 1x1+ 2 x2 + 3 x1x2+ 4x12+ 5 x22
To compare the two models we are interested in testing
H0: 4 = 5 = 0, vs. H1: at least one, 4 or 5, differs from 0
F-test for comparing nested models

Reduced model:
E(Y) = 0+ 1x1+ + 2 xg
Complete Model:
E(Y) = 0+ 1x1+ + 2 xg + g+1 xg+1 + + kxk
To test
H0: g+1 = = k = 0
H1: at least one of the parameters being tested is not 0
Compute
( SSE R SSEC ) /( k g )
F
MSEC
Reject H0 when F > F, where F is the level critical

point of an F distribution with (k-g, n-(k+1)) d.f.
F-test for nested models

Where:
SSER = Sum of squared errors for the reduced model;
SSEC = Sum of squared errors for the complete model;
MSEC = Mean square error for the complete model;
Remark:
k g = number of parameters tested
k +1 = number of parameters in the complete model
n = total sample size
Compute partial F-tests with SPSS

1. Enter your complete model in the Regression dialog box
choose the Method Enter
2. Click on Next
3. In the new box for Independent variables, enter those
you want to remove (i.e. those youd like to test)
choose the Method Remove
4. In the Statistics option select R squared change
5. Ok.
Applying the F-test

Let us use the F-test to compare Model 3 and Model 5 in
the executive salaries example.
Note that Model 3 is nested in Model 5
Model 3:
E(Y) = 0 + 1x1 + 2x3 + 3x1x3
Model 5:
E(Y) = 0 + 1x1 + 2x3 + 3x1x3 + 4x12 + 5x3x12
Apply the F-test for H0: 4 = 5 = 0
Computer output
Variabili inserite/rimossec
Modello
Variabili
Variabili
inserite
rimosse Metodo
1
Exp2Gen,
.
Per
Gender, Years
blocchi
of Experience,
ExpSqu,
ExpGena
2
.a
Do NOT reject H0: 4 = 5 = 0,

i.e. Model 3 is better
Exp2Gen, Rimuovi
ExpSqub
F-statistic
a. Tutte le variabili richieste sono state immesse.

b. Tutte le variabili richieste sono state rimosse.
F p-value
c. Variabile dipendente: Annual salary in $

Variazione dell'adattamento
Model
RDeviazione
R- quadrat standard Variazione
quadr
o
Errore della
di RVariazio
ato corretto
stima
quadrato ne di F df1
,875
,766
,754 12507,735
,868b
,754
,746 12700,080
b. Predittori: (Costante), Gender, Years of Experience, ExpGen
,766 61,673
-,012
2,488
df2
Sig.
Variazio
ne di F
94
,000
94
,089
A quadratic model example: Shipping costs

Although a regional delivery service bases the charge for shipping a
package on the package weight and distance shipped, its profit per
package depends on the package size (volume of space it occupies) and
the size and nature of the delivery truck.
The company conducted a study to investigate the relationship
between the cost of shipment and the variables that control the
shipping charge: weight and distance.
Y : cost of shipment in dollars
X1: package weight in pounds
X2: distance shipped in miles
It is suspected that non linear effect may be present
Model: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22

Data: Express.sav
Scatter plots
16.0
16.0
12.0
Cost of shipment
12.0
8.0
8.0
4.0
0.00
4.0
2.00
4.00
6.00
Weight of parcel in lbs.
8.00
50
100
150
200
250
Distance shipped
Scatter plots in multiple regression often do not show too much information
Model: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22

Model Summary
Model
1
R
.997a
R Square
.994
Adjusted
R Square
.992
Std. Error of
the Estimate
.4428
a. Predictors: (Constant), Weight*Distance, Distance

b
ANOVA
squared, Weight squared, Weight of parcel
in lbs.,
Distance shipped
Model
1
Regression
Residual
Total
Sum of
Squares
449.341
2.745
452.086
df
5
14
19
Mean Square
89.868
.196
F
458.388
Sig.
.000a
a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,

Coefficientsa
Weight of parcel in lbs., Distance shipped
b. Dependent Variable: Cost of shipment
Unstandardized
Standardized
Coefficients
Coefficients
Model
B
Std. Error
Beta
t
1
(Constant)
.827
.702
1.178
-.609
.180
-.316
-3.386
Distance shipped
.004
.008
.062
.503
Weight squared
.090
.020
.382
4.442
Distance squared
1.51E-005
.000
.075
.672
Weight*Distance
.007
.001
.850
11.495
Sig.
.259
.004
.623
.001
.513
.000
a. Dependent Variable: Cost of shipment
Not significant, try to eliminate

Distance squared
Model: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12

Model Summary
Model
1
R
.997a
R Square
.994
Adjusted
R Square
.992
Std. Error of
the Estimate
.4346
a. Predictors: (Constant), Weight*Distance, Distance

b
ANOVA
shipped, Weight squared, Weight of parcel
in lbs.
Model
1
Regression
Residual
Total
Sum of
Squares
449.252
2.833
452.086
df
4
15
19
Mean Square
112.313
.189
F
594.623
Sig.
.000a
a. Predictors: (Constant), Weight*Distance, Distance shipped, Weight squared,

Coefficientsa
Unstandardized
Model
1
(Constant)
Distance shipped
Weight squared
Weight*Distance
Coefficients
B
Std.
.475
-.578
.009
.087
.007
a. Dependent Variable: Cost of shipment
Error
.458
.171
.003
.019
.001
Standardized
Coefficients
Beta
-.300
.141
.369
.842
t
1.035
-3.387
3.421
4.485
11.753
Sig.
.317
.004
.004
.000
.000
Applying the F-test: Shipping costs

A company conducted a study to investigate the relationship
between the cost of shipment and the variables that control the
shipping charge: weight and distance.
Y : cost of shipment in dollars
X1: package weight in pounds
X2: distance shipped in miles
It is suspected that non linear effect may be present,

use the F-test for nested models to decide between
Model 1: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22
Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2
Data: Express.sav
ANOVA Tables
Full model
Model
1
Regression
Residual
Total
ANOVAb
Sum of
Squares
449.341
2.745
452.086
df
5
14
19
Mean Square
89.868
.196
F
458.388
Sig.
.000a
a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,

Weight of parcel in lbs., Distance shipped
Reduced model
ANOVAb
Model
1
Regression
Residual
Total
Sum of
Squares
445.452
6.633
452.086
df
3
16
19
Mean Square
148.484
.415
F
358.154
Sig.
.000a
a. Predictors: (Constant), Distance shipped, Weight of parcel in lbs., Weight*Distance

F-statistic
To test H0: 4 = 5 = 0, from the ANOVA tables we have
F
( SSE R SSEC ) / 2 (6.633 2.745) / 2
9.92
MSEC
0.196
The critical value F (at 5% level) for and F-distribution

with 2 and 14 d.f. is 3.74
Since F (9.92) > F (3.74) the null hypothesis is rejected at
the 5% significance level. I.e. the model with quadratic
terms is preferred over the reduced one.
Computer output
Variables Entered/Removedc
Model
1
Variables
Entered
Weight*
Distance,
Distance
squared,
Weight
squared,
Weight of
parcel in
lbs.,
Distancea
shipped
Variables
Removed
Method
2
.
Distance
squared,
Weight b
squared
Enter
F-statistic
Remove
F p-value
a. All requested variables entered.

b. All requested variables removed.
Model Summary
c. Dependent Variable: Cost of shipment
Change Statistics
Model
1
2
R
.997a
.993b
R Square
.994
.985
Adjusted
R Square
.992
.983
Std. Error of
the Estimate
.4428
.6439
R Square
Change
.994
-.009
F Change
458.388
9.917
df1
df2
5
2
14
14
Sig. F Change
.000
.002
a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared, Weight of parcel in lbs., Distance shipped
b. Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Distance shipped
Reject H0: 4 = 5 = 0
Executive salaries: a final model (?)

Try adding other variables to model 3

E(Y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x3 + 5x4 + 6x5
Model 6
Computer Output: Model 6

Modello
R
R-quadrato
,963a
R-quadrato
corretto
,927
,922
Errore della
stima
7020,089
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of
Employees supervised, ExpGender
Anovab
Somma
dei
quadrati
Model
1
Regressione
Residuo
Totale
Media dei
quadrati
df
5,836E10
4,583E9
93
6,295E10
99
Sig.
9,727E9 197,384 ,000a

4,928E7
a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender
Computer Output: Model 6

Coefficients
Model
Coefficienti non
standardizzati
B
(Costante)
Years of Experience
Gender
ExpGender
Years of Education
Number of Employees
supervised
Corporate assets (in million
$)
a. Variabile dipendente: Annual salary in $
Deviazion
e standard
Errore
-38331,331 9533,238
2178,964
171,979
13203,101 3137,775
669,546
209,042
2689,594
311,914
53,239
4,470
180,310
46,600
Coefficient
i
standardiz
zati
Beta
,634
,249
,233
,246
,353
,110
Sig.
-4,021
,000
12,670
,000
4,208
,000
3,203
,002
8,623
,000
11,910
,000
3,869
,000
Executive salaries: comparison of models

Mod.
Predictors
Adj. R2
x1, x2, x4, x5
Standard
error
0.747 12685.31
x1, x3
0.735 12981.62
138.26
x1, x3, x1x3
0.746 12700.08
98.09
x1, x3, x1x3,

x4, x5
0.922
7020.09
F-stat
74.05
197.38

2 - Multiple Regression Models

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

2 - Multiple Regression Models

Diunggah oleh

Hak Cipta:

Format Tersedia

Types of regression models

A quadratic second order model

Example with quadratic terms

Data: (x,y). See SQM.sav

Model 2: E(Y) = 0 + 1x2

a. Predictors: (Constant), XSquare

Smaller variance and SE

a. Predictors: (Constant), XSquare

y = 2.34 + 1.00 * XSquare

Model 3: E(Y) = 0 + 1x + 2x2

a. Predictors: (Constant), XSquare, x

a. Predictors: (Constant), XSquare, x

Types of regression models

A third order model with 1 IV

E(Y)=0+ 1x+ 2 x2+ 3 x3

Types of regression models

First-Order model in k Quantitative variables

In the first order model a 1-unit change in one independent

Example: executive salaries

Y = Annual salary (in dollars)

(Gender) for the moment

Exsecutive salaries: Computer Output

Deviazione standard Errore

The coefficient R2 is computed exactly as in the

Explained variation SSR

A drawback of R2: it increases with the number of added

Adjusted R2 and estimate of the variance 2

An unbiased estimator of the variance 2 is computed as

Exsecutive salaries: Computer Output (2)

Variabile dipendente: Annual salary in $

Testing overall significance: the F-test

The F-test for 1 single coefficient is equivalent to the t-test

Interaction (second order) model

Interaction (second order) model

The effect of one variable depends on the level of the other

Example: Antique grandfather clocks auction

Model 1: E(Y) = 0 + 1x1 + 2x2

If data are Normal Skewness is 0

P-P plot for Normality

If data are Normal.

Model 1: E(Y) = 0 + 1x1 + 2x2

a. Predictors: (Constant), Bidders, Age

a. Predictors: (Constant), Bidders, Age

a. Dependent Variable: Price

Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2

a. Predictors: (Constant), AgeBid, Age, Bidders

a. Predictors: (Constant), AgeBid, Age, Bidders

a. Dependent Variable: Price

Interpreting interaction models

For x1= 150 (age) the estimated slope for Bidders is

Models with qualitative Xs

Models with QIV

QIV with q levels

Model: E(Y) = 0+ 1x1 + 2x2

C is the reference level

(mean for base level C)

(additional effect wrt C if level A)

(additional effect wrt C if level B)

Models with dummies

Example: executive salaries

A simple model: E(Y) = 0 + 3x3

This model estimates the means of the two groups (M,F)

a. Predictors: (Constant), Gender

Salary difference between