Hapter: Basics of Regression Analysis

1/14/2019
Learning objectives
11.1 Apply regression analysis to estimate linear
relationships between two variables.
CHAPTER 11 11.2 Estimate the multiple linear regression model and
interpret the coefficients.
Basics of regression 11.3 Calculate and interpret the standard error of the
estimate.
analysis 11.4 Calculate and interpret the coefficient of
determination R2.
continued
Copyright © 2016 McGraw-Hill Education (Australia) Pty Ltd Copyright © 2016 McGraw-Hill Education (Australia) Pty Ltd
Jaggia, Essentials of Business Statistics, 1e 11-1 Jaggia, Essentials of Business Statistics, 1e 11-2
Updated by: Azizur Rahman Updated by: Azizur Rahman
1 2
How are home loan (and other debt

Learning objectives repayments) and income related?
11.5 Understand the difference between R2 and • A recent study found that Australian households
adjusted R2. repay about $2,000 monthly on their mortgages.
11.6 Conduct tests of individual significance.
11.7 Conduct a test of joint significance.
continued
3 4
How are home loan (and other debt Simple linear regression model
repayments) and income related?
LO 11.1 Apply regression analysis to estimate linear
• Economist Madelyn Davis believes that income relationships between two variables.
differences between regions are the primary reason
for disparate mortgage and other debt repayments. • With regression analysis, we explicitly assume that
one variable, called the response variable, is
• She also wonders about the likely effect of influenced by other variables, called the
unemployment. explanatory variables.
• Madelyn would like to use regression analysis to: • Using regression analysis, we may predict the
– Make predictions for debt repayments for given values of response variable given values for our explanatory
income and the unemployment rate
variables.
– Find models that best fit the data, using goodness-of-fit
measures
– Determine the significance of income and unemployment at
the 5% level.
continued
5 6
1
1/14/2019
LO 11.1 Simple linear regression LO 11.1 Simple linear regression

model model
• If the value of the response variable is uniquely • The simple linear regression model is defined as
determined by the values of the explanatory y = b 0 + b 1x + e
variables, we say that the relationship is where y and x are, respectively, the response and
deterministic. explanatory variables and e is the random error term.
• However, in most fields of research we find that the • The coefficients b 0 and b 1 are the unknown
relationship is inexact, due to the omission of parameters to be estimated.
relevant factors that influence the response variable.
• By fitting our data to the model, we obtain the
• In regression analysis, we include a random error equation
term that acknowledges that the actual relationship ŷ = b0 + b1x
between the response and explanatory variables is where ŷ is the estimated response variable, b0 is the
not deterministic. estimate of b 0 and b1 is the estimate of b 1.
continued continued
7 8

model model
• Since the predictions cannot be totally accurate, the • The two parameters b 0 and b 1 are estimated by
difference between the predicted and actual value is minimising the sum of the squared residuals.
called the residual, e = y – ŷ. – The slope coefficient is first estimated as
– Example: In this scatterplot of home loan repayments
against income with a superimposed sample regression
equation, home loan repayments rise with income. Vertical b1 =
 (x − x )( y − y )
i i
 (x − x )
2
distance between ŷ and y represents the residual e. i
– Then the intercept is computed as
b0 = y − b1 x
continued continued
9 10

model model
• Example: Household home loan repayments • Example:
– We denote home loan repayments as y and income as x. – The sample regression equation:
With x = 1,560.93 and y = 1,855.86 ŷ = 413.714 + 0.92x

n n – The slope b1 = 0.924 implies that in a region where the
 ( x − x )( y
i =1
i i − y ) = 1,218,830.51  (x − x)
i =1
i
2
= 1,319,215.23 median weekly household income increases by $1, average
home loan repayments are expected to increase by 0.924
1,218,830.51 cents.
so, b1 = = 0.9239
1,319,215.23 – The intercept b0 = 413.714 suggests that if income were 0,
home loan repayments would still be $413.71.
and
b0 = y − b1 x = 1,855.86 − 0.924(1,560.93) = 413.714 – We can use the sample regression equation to predict
home loan repayments for individual cities or regions.
continued continued
11 12
2
1/14/2019

model model
• Example: Using Excel for simple linear regression • Example:
– Open the Hh_Income_Debt_Repayments data in Excel. – The Excel output will look like this:
– Choose Data > Data Analysis > Regression.
– In the dialog box, for Input Y Range, select the data for
your response variable and, for Input X Range, select the
data for your explanatory variable(s).
– We can display the output on a new page, in the current

worksheet, or even in a new workbook.
continued
13 14
LO 11.2 Multiple linear regression

Multiple linear regression model
model
LO 11.2 Estimate the multiple linear regression model and
interpret the coefficients. • Suppose there are k explanatory variables. The
multiple linear regression model is defined as
• If there is more than one explanatory variable, we
use multiple regression. y = b 0 + b 1x1 + b 2x2 + … + b kxk + e
• For example, we analysed how home loan where x1, x2, …, xk are the explanatory variables and
repayments are influenced by income, but ignored the b j values are the unknown parameters that we
the possible effect of unemployment. will estimate from the data.
• Multiple regression allows us to explore how several • As before, e is the random error term.
variables influence the response variable.
continued continued
15 16
LO 11.2 Multiple linear regression LO 11.2 Multiple linear regression

model model
• The sample multiple regression equation is • Example: Using Excel for multiple linear regression
ŷ = b0 + b1x1 + b2x2 + … + bkxk – Recall from the introductory case that, in addition to
income, the unemployment rate may also influence a
• In multiple regression, there is a slight modification region’s average home loan repayments.
in the interpretation of the slopes b1 to bk, as they – Using Excel, we can easily add the additional explanatory
show ‘partial’ influences. variable by choosing Data > Data Analysis > Regression,
as before, but now select both income and the
• For example, if there are k = 3 explanatory variables, unemployment rate data for Input X Range.
the value b1 estimates how a change in x1 will
influence y assuming x2 and x3 are held constant.
continued continued
17 18
3
1/14/2019
LO 11.2 Multiple linear regression LO 11.2 Multiple linear regression

model model
• Example: • Example:
– The output reflects the additional coefficient estimate for – The estimated equation is
unemployment. The other coefficients also change slightly.
ŷ = –202.28 + 1.08x1+ 68.45x2
– The coefficient of 1.08 on income indicates that if income

increases by $1, then home loan repayments are expected
to increase by $1.08, assuming unemployment does not
change.
– The coefficient of 68.45 on unemployment indicates that an

increase in unemployment of 1% is expected to lead to an
increase in home loan repayments of $68.45, assuming
income does not change.
continued continued
19 20
LO 11.2 Multiple linear regression

Goodness-of-fit measures
model
LO 11.3 Calculate and interpret the standard error of the
• Example: estimate.
– We can also use the estimated equation to predict home
• We will introduce three measures to judge how well
loan repayments given values for median income and the
unemployment rate. the sample regression fits the data:
– Standard error of the estimate
– Suppose we wish to predict home loan repayments that
would occur in a city with a median weekly household – Coefficient of determination
income of $2,000 and a 5% unemployment rate. – Adjusted R2.
– We simply insert those values into our estimated equation. • To compute the standard error of the estimate, we
ŷ = –202.28 + 1.08(2,000) + 68.45(5) = $2,300.42 first compute the error sum of squares.
n n
SSE =  ei2 =  ( yi − yˆ i )
2
i =1 i =1
continued
21 22
LO 11.3 Goodness-of-fit measures LO 11.3 Goodness-of-fit measures
• Dividing SSE by the appropriate degrees of freedom, • Example: Household home loan repayments
n – k – 1, yields the mean square error. – This table shows the standard error of the estimate for the
SSE simple linear regression for home loan repayments (Model
MSE = 1) and the multiple linear regression (Model 2).
n − k −1
• The square root of the MSE is the standard error of
the estimate se.
se = MSE =
SSE
=
 ei 2
n − k −1 n − k −1
• In general, the less dispersion around the regression – Notice that according to the standard error, adding the
line, the smaller the se, which implies a better fit for unemployment rate as an explanatory variable did improve
the model. goodness-of-fit a little bit (i.e. 201.80 < 205.76).
continued
23 24
4
1/14/2019
Goodness-of-fit measures LO 11.4 Goodness-of-fit measures

LO 11.4 Calculate and interpret the coefficient of
determination R2. • The coefficient of determination is computed as
• The coefficient of determination, commonly SSE
R2 = 1−
referred to as R2, is another goodness-of-fit measure SST
(y − y)  ( y − ŷ )
that is easier to interpret than the standard error.
where SST = and SSE =
2 2
i i
• The R2
quantifies the fraction of variation in the
response variable that is explained by changes in – The SST, or total sum of squares, is the total variation in
the response variable.
the explanatory variables.
– The SST can be broken down into two components: the
variation explained by the regression equation (the sum of
squares due to regression, or SSR) and the unexplained
variation (the sum of squares due to error, or SSE).
continued continued
25 26
LO 11.4 Goodness-of-fit measures Goodness-of-fit measures

LO 11.5 Understand the difference between R2 and
• The R2 is also reported with the Excel regression adjusted R2.
statistics. Here it is in the second row for each model • More explanatory variables always result in a higher
estimated for the introductory case. R2.
• Some of these variables may be unimportant and
should not be in the model.
• The adjusted R2 tries to balance the raw
explanatory power against the desire to include only
important predictors.
• The adjusted R2 is computed as

Adjusted R2 = 1 − 1 − R 
2
( )  n n− k− 1− 1
 
continued
27 28
LO 11.5 Goodness-of-fit measures LO 11.5 Goodness-of-fit measures
• The adjusted R2 penalises the R2 for adding • Example: Household home loan repayments
additional explanatory variables. – Compare the simple linear regression for home loan
repayments (Model 1) with the multiple linear regression
• As with our other goodness-of-fit measures, we (Model 2).
typically use computers to calculate the adjusted R2.
• The adjusted R2 is shown directly below the R2 in
the Excel regression output.
– The R2 is slightly higher in the multiple regression, and the

adjusted R2 is also higher with the standard error lower,
implying we are better off with the second model.
continued
29 30
5
1/14/2019
Tests of significance LO 11.6 Tests of significance

LO 11.6 Conduct tests of individual significance.
• This test could have one of three forms:
• For the introductory case, with two explanatory
variables (income and unemployment) to choose
from, we can formulate three possible linear models.
Model 1: home loan repayments = b0 + b1Income + e • The test statistic will follow a t distribution with
Model 2: home loan repayments = b0 + b1Unemployment + e degrees of freedom df = n – k – 1.
Model 3: home loan repayments = b0 + b1Income + bj − b j0
b2Unemployment + e • It is calculated as: tdf =
seb j
• Consider our standard multiple regression model.
• sebj is the standard error of the estimator bj .
y = b 0 + b 1x1 + b 2x2 + … + b kxk + e
• By far the most common hypothesis test for an
• In general, we can test whether b j is equal to, greater individual coefficient is to test whether its value
than, or less than some hypothesised value b j0. differs from zero.
continued continued
31 32
LO 11.6 Tests of significance LO 11.6 Tests of significance

• If a coefficient is equal to zero, then it implies that • Example: Household home loan repayments
the explanatory variable is not a significant predictor – To test whether income influences home loan, we set up
of the response variable. the following hypotheses
H0: b 1 = 0; HA: b 1 ≠ 0
• Virtually all statistical software will automatically
then, examine the regression output.
report a test statistic and p-value with each
coefficient estimate.
– These values can be used to test whether the regression
coefficient differs from zero.
– To perform a one-sided test where the hypothesised value
is zero, divide the computer-generated p-value in half.
– If we wish to test whether the coefficient differs from a non-
zero value, then we cannot use the value of the computer- – The value of the test statistic is t13 = 5.006, and its p-value
generated test statistics and its p-value (i.e. we need to is very close to zero. We reject the null hypothesis and
compute a new value of the test statistic and its p-value). conclude that income is a significant predictor.
continued continued
33 34

• A confidence interval for a b j parameter can be Capital asset pricing model
constructed using the formula: b j  t 2,df sb j • The capital asset pricing model follows the equation
• This can also be used to perform the two-sided test y =  + bx + e
to determine whether a coefficient differs from zero. y = the risk-adjusted return of an asset, R – Rf
• In the results for the introductory case, the interval x = the risk-adjusted return to the market, RM – Rf .
for income, [0.6140, 1.5464], does not include zero,
• The estimate of b is called the investment’s beta
indicating income is a significant predictor.
value.
• A beta greater than 1 implies the investment is
aggressive, while a beta value less than 1 implies it
is conservative.
continued continued
35 36
6
1/14/2019

• Example: Capital asset pricing model • Example:
– We use 60 months of data to estimate the beta for Johnson – A simple regression yields b1 = 0.5844, with a standard
& Johnson (J&J). error seb1 = 0.0803.
– We want to test whether J&J is a conservative stock, so we

set up the competing hypotheses:
H0: b ≥ 1 (J&J stock is not conservative)
HA: b < 1 (J&J stock is conservative). – The test statistic for our hypothesis is then calculated as:
t58 = (0.5844 – 1)/0.0803 = –7.276
– Since the value of the test statistic is less than the
critical value t0.05,58 = –1.672, we reject the null
hypothesis and conclude that the stock’s beta is less
than 1, making it a conservative investment.
continued
37 38
Tests of significance LO 11.7 Tests of significance

LO 11.7 Conduct a test of joint significance. • The test statistic for a test of joint significance is
• In addition to conducting tests of individual SSR k MSR
F(df1 ,df 2 ) = =
significance, we may also want to test the joint SSE (n − k − 1) MSE
significance of all k variables at once. where MSR and MSE are, respectively, the mean
square regression and the mean square error.
• The competing hypotheses for a test of joint
significance are: • The numerator degrees of freedom are df1 = k,
while the denominator degrees of freedom are
H0: b 1 = b 2 = … = b k = 0
df2 = n – k – 1.
HA: At least one b j ≠ 0.
• Fortunately, statistical software will generally report
the value of F(df1, df2) and its p-value as standard
output, making computation by hand rarely
necessary.
continued continued
39 40
LO 11.7 Tests of significance

• Example: Household home loan repayments
– For the introductory case, we want to conduct a test of joint
significance for the model:
Home loan repayments = b 0 + b 1Income +
b 2Unemployment + e
– We set up the following hypotheses:

H0: b 1 = b 2 = 0
HA: At least one b j ≠ 0.
– From the ANOVA portion of the regression results, we see

that F(2,13) = 14.6047 and its p-value is quite small,
4.736E–04 (see value under Significance F), so we reject
the null hypothesis and conclude that the explanatory
variables (regression) are jointly significant.
Copyright © 2016 McGraw-Hill Education (Australia) Pty Ltd

Jaggia, Essentials of Business Statistics, 1e 11-41
Updated by: Azizur Rahman
41

Hapter: Basics of Regression Analysis

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Hapter: Basics of Regression Analysis

Diunggah oleh

Hak Cipta:

Format Tersedia

1/14/2019

How are home loan (and other debt

11.6 Conduct tests of individual significance.

11.7 Conduct a test of joint significance.

LO 11.1 Simple linear regression LO 11.1 Simple linear regression

LO 11.1 Simple linear regression LO 11.1 Simple linear regression

– Then the intercept is computed as

LO 11.1 Simple linear regression LO 11.1 Simple linear regression

With x = 1,560.93 and y = 1,855.86 ŷ = 413.714 + 0.92x

LO 11.1 Simple linear regression LO 11.1 Simple linear regression

– Choose Data > Data Analysis > Regression.

– We can display the output on a new page, in the current

LO 11.2 Multiple linear regression

LO 11.2 Multiple linear regression LO 11.2 Multiple linear regression

LO 11.2 Multiple linear regression LO 11.2 Multiple linear regression

– The coefficient of 1.08 on income indicates that if income

– The coefficient of 68.45 on unemployment indicates that an

LO 11.2 Multiple linear regression

LO 11.3 Goodness-of-fit measures LO 11.3 Goodness-of-fit measures

Goodness-of-fit measures LO 11.4 Goodness-of-fit measures

LO 11.4 Goodness-of-fit measures Goodness-of-fit measures

LO 11.5 Goodness-of-fit measures LO 11.5 Goodness-of-fit measures

– The R2 is slightly higher in the multiple regression, and the

Tests of significance LO 11.6 Tests of significance

LO 11.6 Tests of significance LO 11.6 Tests of significance

LO 11.6 Tests of significance LO 11.6 Tests of significance

LO 11.6 Tests of significance LO 11.6 Tests of significance

– We want to test whether J&J is a conservative stock, so we

Tests of significance LO 11.7 Tests of significance

LO 11.7 Tests of significance

– We set up the following hypotheses:

– From the ANOVA portion of the regression results, we see

Copyright © 2016 McGraw-Hill Education (Australia) Pty Ltd

Anda mungkin juga menyukai