Anda di halaman 1dari 65

Types of regression models

Regression Models
Simple

1 order

Multiple

1 order

2 order
Higher order

Interaction
2 order
Higher order

A quadratic second order model

E(Y)=0+ 1x+ 2 x2
Interpretation of model parameters:
0: y-intercept. The value of E(Y) when x1 = x2 = 0
1 : is the shift parameter;
2 : is the rate of curvature;

Example with quadratic terms


The true model, supposedly unknown, is
Yi 100.00
= 2 + xi2 + i, with i~N(0,2)

75.00

50.00

25.00

0.00

2.00

Data: (x,y). See SQM.sav

4.00

6.00

8.00

10.00

Model 1: E(Y) = 0 + 1x
Model Summary
Model
1

R
,973a

R Square
,947

a. Predictors: (Constant), x

Model
1

Regression
Residual
Total

Sum of
Squares
80624,915
4500,202
85125,117

a. Predictors: (Constant), x

Adjusted
R Square
,947

ANOVAb
df
1
103
104

Mean Square
80624,915
43,691

F
1845,332

Sig.
,000a

Coefficientsa

b. Dependent Variable: y
Unstandardized
Coefficients
Model
B
Std. Error
1
(Constant)
-19,959
1,483
x
10,744
,250

a. Dependent Variable: y

Std. Error of
the Estimate
6,60994

Standardized
Coefficients
Beta
,973

t
-13,454
42,957

Sig.
,000
,000

Linear Regression
100.00

y = -19.96 + 10.74 * x
R-Square = 0.95

75.00

50.00

25.00

0.00

2.00

4.00

6.00

8.00

10.00

Model 2: E(Y) = 0 + 1x2


Model Summary
Model
1

R
,996a

R Square
,991

Adjusted
R Square
,991

a. Predictors: (Constant), XSquare

Model
1

Regression
Residual
Total

Sum of
Squares
84381,422
743,695
85125,117

Std. Error of
the Estimate
2,68707

Smaller variance and SE

ANOVAb
df
1
103
104

Mean Square
84381,422
7,220

F
11686,632

Sig.
,000a

a. Predictors: (Constant), XSquare


b. Dependent Variable: y

Coefficientsa

Model
1

(Constant)
XSquare

Unstandardized
Coefficients
B
Std. Error
2,340
,417
,997
,009

a. Dependent Variable: y

Standardized
Coefficients
Beta
,996

t
5,608
108,105

Sig.
,000
,000

Linear Regression
100.00

y = 2.34 + 1.00 * XSquare


R-Square = 0.99

75.00

50.00

25.00

0.00

0.00

25.00

50.00

XSquare

75.00

100.00

Model 3: E(Y) = 0 + 1x + 2x2


Model Summary
Model
1

R
.996a

R Square
.991

Adjusted
R Square
.991

Std. Error of
the Estimate
2.66608

a. Predictors: (Constant), XSquare, x


ANOVAb
Model
1

Regression
Residual
Total

Sum of
Squares
84400.103
725.014
85125.117

df
2
102
104

Mean Square
42200.052
7.108

F
5936.999

Sig.
.000a

a. Predictors: (Constant), XSquare, x


b. Dependent Variable: y

Model
1

(Constant)
x
XSquare

Coefficientsa
Unstandardized
Coefficients
B
Std. Error
4.177
1.206
-.830
.512
1.071
.046

a. Dependent Variable: y

Standardized
Coefficients
Beta
-.075
1.069

t
3.463
-1.621
23.046

Sig.
.001
.108
.000

Types of regression models

Regression Models
Simple

1 order

Multiple

1 order

2 order
Higher order

Interaction
2 order
Higher order

A third order model with 1 IV

E(Y)=0+ 1x+ 2 x2+ 3 x3


Use with caution given
numerical problems that
could arise

>0
3

<0
3

Types of regression models

Regression Models
Simple

1 order

Multiple

1 order

2 order
Higher order

Interaction
2 order
Higher order

First-Order model in k Quantitative variables


E(Y)=0+1x1+2 x2 + ... + k xk
Interpretation of model parameters:
0: y-intercept. The value of E(Y) when x1 = x2 =...= xk= 0
1: change in E(Y) for a 1-unit increase in x1 when x2,.., xk
are held fixed;
2: change in E(Y) for a 1-unit increase in x2 when x1, x3,...,
xk are held fixed;
...

A bivariate model

E(Y)=0+1x1+2 x2
Changing x2 changes only the y-intercept.

In the first order model a 1-unit change in one independent


variable will have the same effect on the mean value of y
regardless of the other independent variables.

A bivariate model

Y
Response
P la n e

X1

Y i = 0 + 1X 1i + 2X 2i + i
(( OO bb ss ee rr vv e dd YY ))

00

X2

( X 1 i , X 22 i )
E ( Y ) = 0 + 11 X 1 i + 2 X 2 i

Example: executive salaries

Y = Annual salary (in dollars)


x1 = Years of experience
x2 = Years of education
x3 = Gender : 1 if male; 0 if female
x4 = Number of employees supervised
x5 = Corporate assets (in millions of dollars)
E(Y)=0+ 1x1+ 2 x2 + 4 x4 + 5 x5

Data: ExecSal.sav

Do not consider

x3

(Gender) for the moment

Exsecutive salaries: Computer Output


Riepilogo del modello

Modello
R-quadrato
R
R-quadrato corretto
,870a
,757
,747

Deviazione standard Errore


della stima
12685,309

a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education,
Number of Employees supervised

Simple regression

Multiple regression
Riepilogo del modello
Modello

R
1
dimension0

R-quadrato
,783a

,613

.
Predittori: (Costante), Years of Experience

R-quadrato
corretto
,609

Deviazione
standard Errore
della stima
15760,006

Coefficient of determination

The coefficient R2 is computed exactly as in the


simple regression case.
R2
n

Explained variation SSR


SSE

1
Total variation
SST
SST

( yi y ) 2

i 1

SST (Total)

( y i y ) 2

i 1

SSR (Regression)

( yi y i ) 2

i 1

SSE (Error)

A drawback of R2: it increases with the number of added


variables, even if these are NOT relevant to the problem.

Adjusted R2 and estimate of the variance 2

A solution: Adjusted R2
Each additional variable reduces adjusted R2, unless
SSE varies enough to compensate

Ra2

n 1 SSE
SSE
2
1

1
SST

SST

An unbiased estimator of the variance 2 is computed as

2
i

SSE
s

n k 1 n k 1
2

Exsecutive salaries: Computer Output (2)


Coefficientia

Model
Coefficienti non
standardizzati

Variables
1

B
(Costante)
Years of
Experience
Years of
Education
Number of
Employees
supervised
Corporate
assets (in
million $)

Deviazione
standard
Errore

Coefficienti
standardizz
ati

Beta

T-tests
t

Sig.

-37082,148

17052,089

-2,175

,032

2696,360

173,647

,785 15,528

,000

2656,017

563,476

,243

4,714

,000

41,092

7,807

,272

5,264

,000

244,569

83,420

,149

2,932

,004

Variabile dipendente: Annual salary in $

Testing overall significance: the F-test


1. Shows If There Is a Linear Relationship
Between All X Variables Together & Y
2. Uses F Test Statistic
3. Hypotheses
H0: 1 = 2 = ... = k = 0
No Linear Relationship
Ha: At Least One Coefficient Is Not 0
At Least One X Variable Affects Y

The F-test for 1 single coefficient is equivalent to the t-test

Anova table
Anovab
Modello
1

Somma dei
quadrati

F-statistic
Media dei
quadrati

df

Regressione

4,766E10

Residuo

1,529E10

95

Totale

6,295E10

99

4 1,192E10 74,045

Sig.
,000a

1,609E8

. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of
Education, Number of Employees supervised
a

df = k: number of
b. Variabile dipendente: Annual salary in $
regression slopes

p-vale of F-test
df = n-1: n=
number of
observations
MSE (mean
square error),
the estimate of
variance

Decision: reject
H0, i.e. accept
this model

Interaction (second order) model


E(Y)=0+ 1x1+ 2 x2 + 3 x1x2
Interpretation of model parameters:
0: y-intercept. The value of E(Y) when x1 = x2 = 0
1+ 3 x2 : change in E(Y) for a 1-unit increase in x1
when x2 is held fixed;
2 + 3 x1 : change in E(Y) for a 1-unit increase in x2
when x1 is held fixed;
3: controls the rate of change of the surface.

Interaction (second order) model


E(Y)=0+ 1x1+ 2 x2 + 3 x1x2
Contour lines are not parallel

The effect of one variable depends on the level of the other

Example: Antique grandfather clocks auction


Clocks are sold at an auction on competitive offers. Data are:
Y : auction price in dollars
X1: age of clocks
X2: number of bidders

Model 1: E(Y) = 0 + 1x1 + 2x2


Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2
Data: GFCLOCKS.sav

Data summaries
Descriptive Statistics
Minimu Maximu Mean
Std.
N
Skewness
Kurtosis
m
m
Deviatio Statistic Std. Error Statistic Std. Error
Statistic Statistic
Statistic
Statistic Statistic
n
Age
32
108
194 144.94 27.395
.216
.414
-1.323
.809
Bidders
32
5
15
9.53
2.840
.420
.414
-.788
.809
Price
32
729
2131 1326.88 393.487
.396
.414
-.727
.809
Valid N (listwise)
32

If data are Normal Skewness is 0


If data are Normal (eccess) Kurtosis is 0
Note: Skewness and Kurtosis are not
enough to establish Normality

P-P plot for Normality

If data are Normal.


Points should be
along the straight
line.
In this example the
situation is fairly
good

Bivariate scatter-plots

2000

2000

1600

1200

1200

800

1600

800

120

140

160

Age

180

10

Bidders

12

14

Model 1: E(Y) = 0 + 1x1 + 2x2


Model Summary
Model
1

R
R Square
a
.945
.892

Adjusted
R Square
.885

Std. Error of
the Estimate
133.485

a. Predictors: (Constant), Bidders, Age


ANOVAb

Model
1

Regression
Residual
Total

Sum of
Squares
4283062.960
516726.540
4799789.500

df
2
29
31

Mean Square
2141531.480
17818.157

F
120.188

Sig.
.000a

a. Predictors: (Constant), Bidders, Age


Coefficientsa
b. Dependent Variable: Price

Model
1

(Constant)
Age
Bidders

Unstandardized
Coefficients
B
Std. Error
-1338.951
173.809
12.741
.905
85.953
8.729

a. Dependent Variable: Price

Standardized
Coefficients
Beta
.887
.620

t
-7.704
14.082
9.847

Sig.
.000
.000
.000

Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2


Model Summary
Model
1

R
R Square
a
.977
.954

Adjusted
R Square
.949

Std. Error of
the Estimate
88.915

a. Predictors: (Constant), AgeBid, Age, Bidders


ANOVAb

Model
1

Regression
Residual
Total

Sum of
Squares
4578427.367
221362.133
4799789.500

df
3
28
31

Mean Square
1526142.456
7905.790

F
193.041

Sig.
.000a

t
1.086
.432
-3.120
6.112

Sig.
.287
.669
.004
.000

a. Predictors: (Constant), AgeBid, Age, Bidders


Coefficientsa
b. Dependent Variable: Price

Model
1

(Constant)
Age
Bidders
AgeBid

Unstandardized
Coefficients
B
Std. Error
320.458
295.141
.878
2.032
-93.265
29.892
1.298
.212

a. Dependent Variable: Price

Standardized
Coefficients
Beta
.061
-.673
1.369

Interpreting interaction models


The coefficient for the interaction term is significant.
If an interaction term is present then also the
corresponding first order terms need to be included to
correctly interpret the model.
In the example an uncareful analyst could estimate the
effect of Bidders as negative, since b2=-93.26
Since an interaction term is present, the slope estimate
for Bidders (x2) is

b2 + b3x1

Note: b = ^

For x1= 150 (age) the estimated slope for Bidders is


-93.26 + 1.3 (150) = 101.74

Models with qualitative Xs


Regression models can also include qualitative (or
categorical) independent variables (QIV).
The categories of a QIV are called levels
Since the levels of a QIV are not measured on a natural
numerical scale in order to avoid introducing fictitious
linear relations in the model we need to use a specific
type of coding.
Coding is done by using IV which assume only two values:
0 or 1.
These coded IV are called dummy variables

Models with QIV


Suppose we want to model Income (Y) as a function of
Sex (x) -> use coded, or dummy, variables
x = 1 if Male, x = 0 if Female

E(Y) = 0+ 1x
E(Y) = 0+ 1 if x =1, i.e. Male
E(Y) = 0 if x =0, i.e. Female
0 is the base level, i.e Female is the reference category
1 is the additional effect if Male
In this simple model, only the means for the two groups are
modeled

QIV with q levels


As a general rule, if a QIV has q levels we need q-1 dummies
for coding. The uncoded level is the reference one.
Example: a QIV has three levels, A, B and C
Define

x1 = 1 level A, x1 = 0 if not
x2 = 1 level B, x2 = 0 if not

Model: E(Y) = 0+ 1x1 + 2x2

C is the reference level

Interpreting s
0 = C

(mean for base level C)

1 = A - C

(additional effect wrt C if level A)

2 = B - C

(additional effect wrt C if level B)

Models with dummies


Even if models which consider only dummy variables do in
practice estimate the means of various groups, the testing
machinery of the regression setup can be useful for group
comparisons.
Dummies can be used in combination with any other
dummies and quantitative Xs to construct models with
first order effects (or main effects) and interactions to
test hypotheses of interest.
In order to define dummies in SPSS see
Computing dummy vars in SPSS.ppt

Example: executive salaries


A managing consulting firms has developed a regression
model in order to analyze executives salary structure
Y = Annual salary (in dollars)
x1 = Years of experience
x2 = Years of education
x3 = Gender : 1 if male; 0 if female
x4 = Number of employees supervised
x5 = Corporate assets (in millions of dollars)
Data: ExecSal.sav

A simple model: E(Y) = 0 + 3x3


Male group

Female group

This model estimates the means of the two groups (M,F)


We wanto to test if the difference in means is
significant, i.e. not due to chance

Regression Output
Model Summary
Model
1

R
R Square
a
.392
.153

Adjusted
R Square
.145

a. Predictors: (Constant), Gender

Model
1

Regression
Residual
Total

Salary difference between


groups is significant

Std. Error of
the Estimate
23320.282

ANOVAb

Sum of Squares
9651865066.845
53295882433.156
62947747500.001

a. Predictors: (Constant), Gender

df

Mean Square
9651865066.845
543835535.032

1
98
99

F
17.748

Sig.
.000a

Coefficientsa

b. Dependent Variable: Annual salary in $


Model
1

(Constant)
Gender

Unstandardized
Coefficients
B
Std. Error
83847.059
3999.395
20739.305
4922.915

Standardized
Coefficients
Beta
.392

t
20.965
4.213

Sig.
.000
.000

95% Confidence Interval for B


Lower Bound
Upper Bound
75910.389
91783.729
10969.940
30508.670

a. Dependent Variable: Annual salary in $

Mean increment for Male

C.I. for mean increment

Model 2: E(Y) = 0 + 1x1 + 3x3

It seems that
the two groups
are separated
Model 2 considers
same slope but
different
intercepts

If x3 = 0 (female) then E(Y) = 0 + 1x1


If x3 = 1 (male)

then E(Y) = 0 + 3 + 1x1

Computer output for model 2


R square improved greatly

Model Summary
Model
1

R
R Square
a
.860
.740

Adjusted
R Square
.735

Std. Error of
the Estimate
12981.615

a. Predictors: (Constant), Years of Experience, Gender


b

ANOVA

Model
1

Regression
Residual
Total

Sum of Squares
46601081714.527
16346665785.474
62947747500.001

df
2
97
99

Mean Square
23300540857.264
168522327.685

F
138.264

Sig.
.000a

a. Predictors: (Constant), Years of Experience, Gender


a
b. Dependent Variable: Annual salary in $
Model
1

(Constant)
Gender
Years of Experience

Unstandardized
Coefficients
B
Std. Error
50614.312
3161.279
18894.215
2743.253
2633.831
177.875

Coefficients

Standardized
Coefficients
Beta
.357
.767

t
16.011
6.888
14.807

Sig.
.000
.000
.000

95% Confidence Interval for B


Lower Bound
Upper Bound
44340.048
56888.576
13449.618
24338.812
2280.799
2986.863

a. Dependent Variable: Annual salary in $

New intercept for


Male is significant

In this model effect of experience


is assumed equal for the two
groups

Model 3: E(Y) = 0 + 1x1 + 3x3 + 4x1x3


With this model we want to test whether gender and
experience interacts, i.e. if male salary tend to grow at
a faster (slower) rate with experience.
If x3 = 0 (female) then E(Y) = 0 + 1x1
If x3 = 1 (male)

then E(Y) = (0 + 3) + (1 + 4)x1

New intercept for


male

New slope for male

Remark: running regression for the two groups together


allows to have higher degrees of freedom (n) for
estimating parameters and model variance.

Model 3: E(Y) = 0 + 1x1 + 3x3 + 4x1x3

Model 3 considers
different slope
and different
intercepts

Computer output for model 3


Model Summary
Model
1

R
R Square
.868a
.754

Adjusted
R Square
.746

Std. Error of
the Estimate
12700.080

a. Predictors: (Constant), ExpGender, Years of


Coefficientsa
Experience, Gender
Model
1

(Constant)
Gender
Years of Experience
ExpGender

Unstandardized
Coefficients
B
Std. Error
58049.768 4461.179
7798.504 5497.470
2044.541
308.565
864.122
373.653

Standardized
Coefficients
Beta
.147
.595
.301

There is evidence that


salaries for the two groups
grow at different rate with
experience

t
13.012
1.419
6.626
2.313

Sig.
.000
.159
.000
.023

95% Confidence Interval for B


Lower Bound Upper Bound
49194.397
66905.139
-3113.888
18710.896
1432.045
2657.036
122.426
1605.818

a. Dependent Variable: Annual salary in $

Estimated lines:
Y^ = 58049.8 + 2044.5*(Years of Experience) for female
^ = 65848.3 + 2908.7*(Years of Experience) for male
Y

A complete second order model


E(Y)=0+ 1x1+ 2 x2 + 3 x1x2+ 4x12+ 5 x22
Interpretation of model parameters:
0: y-intercept. The value of E(Y) when x 1 = x2 = 0
1 and 2 : shifts along the x1 and x2 axes;
3 : rotation of the surface;
4 and 5 : controls the rate of curvature.

Back to Executive salaries


What about if
suspect that rate
of growth
changes and has
opposite signs for
M and F?

x1 = Years of experience
x3 = Gender (1 if Male)
Note: x32 = x3 since
it is a dummy

E(Y)=0+ 1x1+ 2 x3 + 3 x1x3+ 4x12


E(Y)=0+ 1x1+ 2 x3 + 3 x1x3+ 4x12+ 5 x3x12

Model 4
Model 5

Comparing Model 4 and 5


Model 4
If x3 = 0 (female) then
E(Y) = 0 + 1x1 + 4x12
If x3 = 1 (male)

Model 5

then

E(Y) = (0 + 2) + (1 + 3)x1 + 4x12


Different intercept and slope for M
and F but same curvature

If x3 = 0 (female) then
E(Y) = 0 + 1x1 + 4x12
If x3 = 1 (male)

then

E(Y) = (0 + 2) + (1 + 3)x1 + (4+5)x12


Different intercept, slope and
curvature for M and F

Model 5: computer output


Riepilogo del modello
Modello
R
dimension0

,875a

R-quadrato
corretto

R-quadrato

,766

Deviazione
standard Errore
della stima

,754

12507,735

a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen

Anovab
Modello
1

Somma
dei
quadrati

Media dei
quadrati

df

Regressione

4,824E10

Residuo

1,471E10

94

Totale

6,295E10

99

a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen

b. Variabile dipendente: Annual salary in $

9,648E9 61,673
1,564E8

Sig.
,000a

Model 5: computer output


Coefficientia
Modello

Coefficienti non
standardizzati

Deviazion
e
standard
Errore

Beta

Sig.

(Costante)

52391,973 6497,971

8,063

,000

Years of
Experience
Gender
ExpGen
ExpSqu
Exp2Gen

3373,970 1165,248

,982 2,895

,005

21122,152 8285,802
-2081,897 1459,842
-53,181
45,001
112,836
54,950

,399
-,724
-,422
,904

2,549
-1,426
-1,182
2,053

a. Variabile dipendente: Annual salary in $

Which model is preferable? Model 3 or model 5?

,012
,157
,240
,043

A test for comparing nested models


Two models are nested if one model contains all the terms
of the other model and at least one additional term.
The more complex of the two models is called the
complete (or full) model.
The other is called the reduced (or restricted) model.
Example: model 1 is nested in model 2
Model 1: E(Y)=0+ 1x1+ 2 x2 + 3 x1x2
Model 2: E(Y)=0+ 1x1+ 2 x2 + 3 x1x2+ 4x12+ 5 x22
To compare the two models we are interested in testing
H0: 4 = 5 = 0, vs. H1: at least one, 4 or 5, differs from 0

F-test for comparing nested models


Reduced model:
E(Y) = 0+ 1x1+ + 2 xg
Complete Model:
E(Y) = 0+ 1x1+ + 2 xg + g+1 xg+1 + + kxk

To test
H0: g+1 = = k = 0
H1: at least one of the parameters being tested is not 0
Compute

( SSE R SSEC ) /( k g )
F
MSEC

Reject H0 when F > F, where F is the level critical


point of an F distribution with (k-g, n-(k+1)) d.f.

F-test for nested models


Where:
SSER = Sum of squared errors for the reduced model;
SSEC = Sum of squared errors for the complete model;
MSEC = Mean square error for the complete model;
Remark:
k g = number of parameters tested
k +1 = number of parameters in the complete model
n = total sample size

Compute partial F-tests with SPSS


1. Enter your complete model in the Regression dialog box
choose the Method Enter
2. Click on Next
3. In the new box for Independent variables, enter those
you want to remove (i.e. those youd like to test)
choose the Method Remove
4. In the Statistics option select R squared change
5. Ok.

Applying the F-test


Let us use the F-test to compare Model 3 and Model 5 in
the executive salaries example.
Note that Model 3 is nested in Model 5

Model 3:
E(Y) = 0 + 1x1 + 2x3 + 3x1x3
Model 5:
E(Y) = 0 + 1x1 + 2x3 + 3x1x3 + 4x12 + 5x3x12
Apply the F-test for H0: 4 = 5 = 0

Computer output
Variabili inserite/rimossec
Modello
Variabili
Variabili
inserite
rimosse Metodo
1
Exp2Gen,
.
Per
Gender, Years
blocchi
of Experience,
ExpSqu,
ExpGena
2

.a

Do NOT reject H0: 4 = 5 = 0,


i.e. Model 3 is better

Exp2Gen, Rimuovi
ExpSqub

F-statistic

a. Tutte le variabili richieste sono state immesse.


b. Tutte le variabili richieste sono state rimosse.

F p-value

c. Variabile dipendente: Annual salary in $


Riepilogo del modello

Variazione dell'adattamento

Model

RDeviazione
R- quadrat standard Variazione
quadr
o
Errore della
di RVariazio
ato corretto
stima
quadrato ne di F df1

,875

,766

,754 12507,735

,868b

,754

,746 12700,080

a. Predittori: (Costante), Exp2Gen, Gender, Years of Experience, ExpSqu, ExpGen

b. Predittori: (Costante), Gender, Years of Experience, ExpGen

,766 61,673
-,012

2,488

df2

Sig.
Variazio
ne di F

94

,000

94

,089

A quadratic model example: Shipping costs


Although a regional delivery service bases the charge for shipping a
package on the package weight and distance shipped, its profit per
package depends on the package size (volume of space it occupies) and
the size and nature of the delivery truck.
The company conducted a study to investigate the relationship
between the cost of shipment and the variables that control the
shipping charge: weight and distance.
Y : cost of shipment in dollars
X1: package weight in pounds
X2: distance shipped in miles
It is suspected that non linear effect may be present

Model: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22


Data: Express.sav

Scatter plots
16.0

16.0

12.0

Cost of shipment

12.0

8.0

8.0

4.0

0.00

4.0

2.00

4.00

6.00

Weight of parcel in lbs.

8.00

50

100

150

200

250

Distance shipped

Scatter plots in multiple regression often do not show too much information

Model: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22


Model Summary
Model
1

R
.997a

R Square
.994

Adjusted
R Square
.992

Std. Error of
the Estimate
.4428

a. Predictors: (Constant), Weight*Distance, Distance


b
ANOVA
squared, Weight squared, Weight of parcel
in lbs.,
Distance shipped

Model
1

Regression
Residual
Total

Sum of
Squares
449.341
2.745
452.086

df
5
14
19

Mean Square
89.868
.196

F
458.388

Sig.
.000a

a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,


Coefficientsa
Weight of parcel in lbs., Distance shipped
b. Dependent Variable: Cost of shipment
Unstandardized
Standardized
Coefficients
Coefficients
Model
B
Std. Error
Beta
t
1
(Constant)
.827
.702
1.178
Weight of parcel in lbs.
-.609
.180
-.316
-3.386
Distance shipped
.004
.008
.062
.503
Weight squared
.090
.020
.382
4.442
Distance squared
1.51E-005
.000
.075
.672
Weight*Distance
.007
.001
.850
11.495

Sig.
.259
.004
.623
.001
.513
.000

a. Dependent Variable: Cost of shipment

Not significant, try to eliminate


Distance squared

Model: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12


Model Summary
Model
1

R
.997a

R Square
.994

Adjusted
R Square
.992

Std. Error of
the Estimate
.4346

a. Predictors: (Constant), Weight*Distance, Distance


b
ANOVA
shipped, Weight squared, Weight of parcel
in lbs.

Model
1

Regression
Residual
Total

Sum of
Squares
449.252
2.833
452.086

df
4
15
19

Mean Square
112.313
.189

F
594.623

Sig.
.000a

a. Predictors: (Constant), Weight*Distance, Distance shipped, Weight squared,


Coefficientsa
Weight of parcel in lbs.
b. Dependent Variable: Cost of shipment
Unstandardized
Model
1

(Constant)
Weight of parcel in lbs.
Distance shipped
Weight squared
Weight*Distance

Coefficients
B
Std.
.475
-.578
.009
.087
.007

a. Dependent Variable: Cost of shipment

Error
.458
.171
.003
.019
.001

Standardized
Coefficients
Beta
-.300
.141
.369
.842

t
1.035
-3.387
3.421
4.485
11.753

Sig.
.317
.004
.004
.000
.000

Applying the F-test: Shipping costs


A company conducted a study to investigate the relationship
between the cost of shipment and the variables that control the
shipping charge: weight and distance.
Y : cost of shipment in dollars
X1: package weight in pounds
X2: distance shipped in miles

It is suspected that non linear effect may be present,


use the F-test for nested models to decide between
Model 1: E(Y) = 0 + 1x1 + 2x2 + 3x1x2 + 4x12 + 5x22
Model 2: E(Y) = 0 + 1x1 + 2x2 + 3x1x2

Data: Express.sav

ANOVA Tables
Full model
Model
1

Regression
Residual
Total

ANOVAb
Sum of
Squares
449.341
2.745
452.086

df
5
14
19

Mean Square
89.868
.196

F
458.388

Sig.
.000a

a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared,


Weight of parcel in lbs., Distance shipped
b. Dependent Variable: Cost of shipment

Reduced model
ANOVAb
Model
1

Regression
Residual
Total

Sum of
Squares
445.452
6.633
452.086

df
3
16
19

Mean Square
148.484
.415

F
358.154

Sig.
.000a

a. Predictors: (Constant), Distance shipped, Weight of parcel in lbs., Weight*Distance


b. Dependent Variable: Cost of shipment

F-statistic
To test H0: 4 = 5 = 0, from the ANOVA tables we have
F

( SSE R SSEC ) / 2 (6.633 2.745) / 2

9.92
MSEC
0.196

The critical value F (at 5% level) for and F-distribution


with 2 and 14 d.f. is 3.74
Since F (9.92) > F (3.74) the null hypothesis is rejected at
the 5% significance level. I.e. the model with quadratic
terms is preferred over the reduced one.

Computer output
Variables Entered/Removedc
Model
1

Variables
Entered
Weight*
Distance,
Distance
squared,
Weight
squared,
Weight of
parcel in
lbs.,
Distancea
shipped

Variables
Removed

Method

2
.

Distance
squared,
Weight b
squared

Enter

F-statistic
Remove

F p-value

a. All requested variables entered.


b. All requested variables removed.

Model Summary

c. Dependent Variable: Cost of shipment

Change Statistics
Model
1
2

R
.997a
.993b

R Square
.994
.985

Adjusted
R Square
.992
.983

Std. Error of
the Estimate
.4428
.6439

R Square
Change
.994
-.009

F Change
458.388
9.917

df1

df2
5
2

14
14

Sig. F Change
.000
.002

a. Predictors: (Constant), Weight*Distance, Distance squared, Weight squared, Weight of parcel in lbs., Distance shipped
b. Predictors: (Constant), Weight*Distance, Weight of parcel in lbs., Distance shipped

Reject H0: 4 = 5 = 0

Executive salaries: a final model (?)


Y = Annual salary (in dollars)
x1 = Years of experience
x2 = Years of education
x3 = Gender : 1 if male; 0 if female
x4 = Number of employees supervised
x5 = Corporate assets (in millions of dollars)

Try adding other variables to model 3


E(Y) = 0 + 1x1 + 2x2 + 3x3 + 4x1x3 + 5x4 + 6x5
Model 6

Computer Output: Model 6


Riepilogo del modello

Modello
R

R-quadrato

,963a

R-quadrato
corretto

,927

,922

Errore della
stima

7020,089

a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of
Employees supervised, ExpGender

Anovab
Somma
dei
quadrati

Model
1

Regressione
Residuo
Totale

Media dei
quadrati

df

5,836E10

4,583E9

93

6,295E10

99

Sig.

9,727E9 197,384 ,000a


4,928E7

a. Predittori: (Costante), Corporate assets (in million $), Years of Experience, Years of Education, Gender, Number of Employees supervised, ExpGender

Computer Output: Model 6


Coefficients

Model

Coefficienti non
standardizzati

B
(Costante)
Years of Experience
Gender
ExpGender
Years of Education
Number of Employees
supervised
Corporate assets (in million
$)

a. Variabile dipendente: Annual salary in $

Deviazion
e standard
Errore

-38331,331 9533,238
2178,964

171,979

13203,101 3137,775
669,546

209,042

2689,594

311,914

53,239

4,470

180,310

46,600

Coefficient
i
standardiz
zati

Beta
,634
,249
,233
,246
,353
,110

Sig.

-4,021

,000

12,670

,000

4,208

,000

3,203

,002

8,623

,000

11,910

,000

3,869

,000

Executive salaries: comparison of models


Mod.

Predictors

Adj. R2

x1, x2, x4, x5

Standard
error
0.747 12685.31

x1, x3

0.735 12981.62

138.26

x1, x3, x1x3

0.746 12700.08

98.09

x1, x3, x1x3,


x4, x5

0.922

7020.09

F-stat
74.05

197.38

Anda mungkin juga menyukai