Anda di halaman 1dari 111

CHAPTER

REGRESSION
ANALYSIS
Fandy Valentino
11
Statistika Industri II
Program Studi Teknik Industri
Institut Teknologi Sumatera

1
Introduction

2
Dependensi Variabel

mempengaruhi
𝑥 𝑦
Variabel independen/ prediktor Variabel dependen/ respon
 Hubungan: 𝑦 = 𝛼 + 𝛽𝑥
𝛼: intercept
𝛽: slope
 Syarat utama:
Tipe variabel
Arah dependensi
3
Tugas

1. Gambarkan himpunan titik dari tabel pada


koordinat Kartesius dan tentukan
persamaan kurva yang melalui himpunan
titik tersebut.
x -1 0 1 2 3 4
y -2 0 2 4 6 8
 -2 = 2 x (-1)
0=2x0
…
 𝑦 = 2𝑥

4
Tugas

2. Gambarkan himpunan titik dari tabel pada


koordinat Kartesius dan tentukan
persamaan kurva yang melalui himpunan
titik tersebut.
x -1 0 1 2 3 4
y -1 1 3 5 7 9
 -1 = 2 x (-1) + 1
1=2x0+1
…
 𝑦 = 2𝑥 + 1

5
Tugas

3. Gambarkan himpunan titik dari tabel pada


koordinat Kartesius dan tentukan persamaan
kurva yang melalui himpunan titik tersebut.
x -1 0 1 2 3 4
y 3 3 3 3 3 3
 3 = 0 x (-1) + 3
3=0x0+3
…
 𝑦 = 0𝑥 + 3 = 3

6
Tugas

4. Gambarkan himpunan titik berikut


x -1 0 1 2 3 4
y -3 -1 0 4 3 8
 Prediksi garis lurus?

7
Tugas

4c. Prediksi 𝑦 = 2𝑥
𝑥𝑖 𝑦𝑖 𝑦ො𝑖 = 2𝑥𝑖 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 𝑒𝑖2
-1 -3 -2 -1 1
0 -1 0 -1 1
1 0 2 -2 4
2 4 4 0 0
3 3 6 -3 9
4 8 8 0 0
Jumlah kuadrat error 15

8
Tugas

3
4d. Prediksi 𝑦 = 𝑥
2
3
𝑥𝑖 𝑦𝑖 𝑦ො𝑖 = 𝑥𝑖 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 𝑒𝑖2
2
-1 -3 -1,5 -1,5 2,25
0 -1 0 -1 1
1 0 1,5 -1,5 2,25
2 4 3 1 1
3 3 4,5 -1,5 2,25
4 8 6 2 4
Jumlah kuadrat error 12,75

9
Tugas

4e. Prediksi 𝑦 = 2𝑥 − 1
𝑥𝑖 𝑦𝑖 𝑦ො𝑖 = 2𝑥𝑖 − 1 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 𝑒𝑖2
-1 -3 -3 0 0
0 -1 -1 0 0
1 0 1 -1 1
2 4 3 1 1
3 3 5 -2 4
4 8 7 1 1
Jumlah kuadrat error 7

10
Tugas

3
4f. Prediksi 𝑦 = 2 𝑥 − 1
3
𝑥𝑖 𝑦𝑖 𝑦ො𝑖 = 𝑥𝑖 − 1 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 𝑒𝑖2
2
-1 -3 -2,5 -0,5 0,25
0 -1 -1 0 0
1 0 0,5 -0,5 0,25
2 4 2 2 4
3 3 3,5 -0,5 0,25
4 8 5 3 9
Jumlah kuadrat error 13,75

11
Tugas

4g. Kesimpulan
Garis Prediksi Jumlah Kuadrat Error
𝑦 = 2𝑥 15
3
𝑦= 𝑥 12,75
2
𝑦 = 2𝑥 − 1 7
3
𝑦 = 𝑥−1 13,75
2

12
Inferensi

Prediksi 1: 𝑦ො = 𝑎1 + 𝑏1 𝑥
sampling
Prediksi 2: 𝑦ො = 𝑎2 + 𝑏2 𝑥

inferensi
Populasi: 𝑦 = 𝛼 + 𝛽𝑥 መ = 𝑎 + 𝑏𝑥
Sampling: 𝑦ො = 𝛼ො + 𝛽𝑥
13
The Method of Least Squares

11.1
14
11.1 The Method of Least Squares
 We introduce the ideas of regression analysis in the simple setting where the
distribution of a random variable Y depends on the value x of one other variable.
Where:
x = independent variable, also called predictor variable, or input variable.
y = dependent variable, or response variable.
 Regression of Y on x: The relationship between x and the mean E [Y | x ] of the
corresponding distribution of Y.
 Linear regression curve of Y on x
➢ That is, for any given x, the mean of the distribution of the Y’s is given by α + βx. In
general, Y will differ from this mean, and we shall denote this difference by ε, writing
Y = α + βx + ε
➢ ε is a random variable. In this model we can choose α so that the mean of the
distribution of this random variable is equal to zero.
15
11.1 The Method of Least Squares (Cont)

Example: Metal Alloy Cooling Rate


 An engineer conducts an experiment with the purpose of showing that adding
a new component to the existing metal alloy increases the cooling rate. Faster
cooling rates lead to stronger materials and improve other properties. Let
➢ x = percentage of the new component present in the metal.
➢ y = cooling rate, during a heat-treatment stage, in ◦F per hour .

 The engineer decides to consider several different percentages of the new


component. Suppose the observed data are

16
11.1 The Method of Least Squares (Cont)

Example: Metal Alloy Cooling Rate


 The first step: plot the data in a scatter plot or
scattergram. The predictor variable x is
located on the horizontal axes and the
response variable y on the vertical axis.

17
11.1 The Method of Least Squares (Cont)

 How to estimate the parameters 𝛼 and 𝛽 of the regression line using the
observed data in a manner that somehow provides the best fit to the data
 There is n paired observations (𝑥𝑖 , 𝑦𝑖 ) for which it is reasonable to assume
that the regression of Y on x is linear. We want to determine the line (that is,
the equation of the line) which in some sense provides the best fit. If we
predict y by means of the equation
𝑦ො = 𝑎 + 𝑏𝑥
where a and b are constants, then 𝑒𝑖 , the error in predicting the value of 𝑦
corresponding to the given 𝑥𝑖 is
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖

18
11.1 The Method of Least Squares (Cont)

 Since we cannot simultaneously minimize each of the ei individually, we might


try to make their sum σ𝑛𝑖=1 𝑒𝑖 as close as possible to zero.
 However, since this sum can be made equal to zero by many choices of totally
unsuitable lines for which the positive and negative errors cancel, we shall
minimize the sum of the squares of the 𝑒𝑖 . In other words, we apply the
principle of least squares and choose a and b so that

is a minimum.

19
11.1 The Method of Least Squares (Cont)

 That process is equivalent to minimizing


the sum of the squares of the vertical
distances from the points to the line in
any scatter plot (see Figure 11.2).
 The procedure of finding the equation of
the line which best fits a given set of
paired data, called the method of least
squares, yields values for a and b
(estimates of α and β).

20
11.1 The Method of Least Squares (Cont)
 Before minimizing the sum of squared deviations to obtain the least squares
estimators, it is convenient to introduce some notation for the sums of squares and
sums of cross-products.

 The first expressions are preferred on conceptual grounds because they highlight
deviations from the mean and on computing grounds because they are less
susceptible to roundoff error.
 The second expressions are for handheld calculators. Least squares estimates. 21
11.1 The Method of Least Squares (Cont)

 The least squares estimates are


𝑆𝑥𝑦
𝛽መ = 𝑏 =
𝑆𝑥𝑥
𝛼ො = 𝑎 = 𝑦ത − 𝑏𝑥ҧ
where 𝑥ҧ and 𝑦ത are, respectively, the means of the values of x and y.

 The least squares estimates determine the best-fitting line



𝑦ො = 𝛼ො + 𝛽𝑥

22
11.1 The Method of Least Squares (Cont)

 The individual deviations of the observations yi from their fitted values 𝑦ො𝑖 =
መ i are called the residuals.
𝛼ො + 𝛽x

 The minimum value of the sum of squares is called the residual sum of
squares or error sum of squares.

23
11.1 The Method of Least Squares (Cont)

EXAMPLE 1 Least squares calculations for the cooling rate data


 Calculate the least squares estimates and sum of squares error for the
cooling rate data below, where:
➢ x = percentage of the new component present in the metal.
➢ y = cooling rate, during a heat-treatment stage, in ◦F per hour .

24
11.1 The Method of Least Squares (Cont)

 EXAMPLE 1 (Solution)
 The structure of the table guides the calculations.

Continue to the next slide


25
11.1 The Method of Least Squares (Cont)
 EXAMPLE 1 (Solution Cont)

26
11.1 The Method of Least Squares (Cont)

EXAMPLE 2 A numerical example of Evaporation


Air velocity (x)
fitting a straight line by least squares coefficient (y)
20 0.18
 The table beside are measurements of
the air velocity and evaporation 60 0.37
coefficient of burning fuel droplets in 100 0.35
an impulse engine. 140 0.78
180 0,56
 Fit a straight line to these data by the
method of least squares, and use it to 220 0.75
estimate the evaporation coefficient of 260 1.18
a droplet when the air velocity is 190 300 1.36
cm/s.
340 1.17
380 1.65
27
11.1 The Method of Least Squares (Cont)

𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
20 0.18
60 0.37
100 0.35
140 0.78
180 0,56
220 0.75
260 1.18
300 1.36
340 1.17
380 1.65
Σ 28
11.1 The Method of Least Squares (Cont)

EXAMPLE 2 (solution)
1) For these n = 10 pairs ( xi, yi ) we 2) and then we obtain
first calculate 20002
𝑆𝑥𝑥 = 532000 − = 132000
10
2000 × 8.35
𝑆𝑥𝑦 = 2175.40 − = 505.40
10
8.352
𝑆𝑦𝑦 = 9.1097 − = 2.13745
10
3) Consequently, the estimate of slope is

29
Continue to the next slide
11.1 The Method of Least Squares (Cont)
EXAMPLE 2 (solution cont)
4) and then the estimate of intercept becomes

5) The equation of the straight line that best fits the given data in the sense of least squares,

6) For x = 190, we predict that the evaporation coefficient will be

7) Finally, the residual sum of squares is

30
11.1 The Method of Least Squares (Cont)

Normal Equations for the Least Squares Estimators


 A necessary condition that the sum of squared deviations,

 be a minimum is the vanishing of the partial derivatives with respect to a and


b. We thus have

31
11.1 The Method of Least Squares (Cont)

 and we can rewrite these two equations as

 This set of two linear equations in the unknowns a and b, called the normal
equations, gives the same values of 𝛼ො and 𝛽መ for the line which provides the
best fit to a given set of paired data in accordance with the criterion of least
squares.

32
11.1 The Method of Least Squares (Cont)

EXAMPLE 4 The least squares estimates 𝑥 𝑦 𝑥2 𝑥𝑦


obtained from the normal equations 20 0.18
 Solve the normal equations for the data in 60 0.37
Example 2 and confirm the values for the 100 0.35
least squares estimates.
140 0.78
180 0,56
220 0.75
260 1.18
300 1.36
340 1.17
380 1.65
Σ 33
11.1 The Method of Least Squares (Cont)

EXAMPLE 4 (Solution)
 Using the calculations in Example 2, the normal equations are

 Solving this system of equations by use of determinants or the method of


elimination, we obtain a = 0.069 and b = 0.00383. As they must be, these
values are the same, up to rounding error, as those obtained in Example 2.

34
Inferences Based on the Least
Squares Estimators

11.2
35
Inferences

Prediction 1: 𝑦ො = 𝑎1 + 𝑏1 𝑥
sampling

Prediction 2: 𝑦ො = 𝑎2 + 𝑏2 𝑥

inferences
Population: 𝑦 = 𝛼 + 𝛽𝑥 መ = 𝑎 + 𝑏𝑥
Samples: 𝑦ො = 𝛼ො + 𝛽𝑥
36
11.2 Inferences Based on the Least Squares Estimators

 The method of the preceding section is used when the relationship


between x and the mean of Y is linear or close enough to a straight line
so that the least squares line yields reasonably good predictions.
 In this section, we shall assume that the regression is linear in x and,
furthermore, that the n random variables Yi are independently normally
distributed with the means α + βxi and the common variance σ2 for i =
1, 2, …, n. Equivalently, we write the model as

where it is assumed that the εi are independent normally distributed


random variables having zero means and the common variance σ2.
37
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

 The various assumptions we have made here


are illustrated in Figure, showing the
distributions of Yi for several values of xi.
 Note that these additional assumptions are
required to discuss the goodness of predictions
based on least squares equations.
 The values for the least squares estimators of α
and β are given by
𝑆𝑥𝑦

𝛽=𝑏=
𝑆𝑥𝑥
𝛼ො = 𝑎 = 𝑦ത − 𝑏𝑥ҧ

38
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

 Note the close relationship between 𝑆𝑥𝑥 and 𝑆𝑦𝑦 and the respective sample
variances of the x and the y, in fact
𝑆𝑥𝑥 σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2
= = 𝑠𝑥2
𝑛−1 𝑛 𝑛−1
𝑆𝑦𝑦 σ𝑖=1 𝑦𝑖 − 𝑦ത 2
= = 𝑠𝑦2
𝑛−1 𝑛−1
 The estimate of σ2 is
(𝑆𝑥𝑦 )2
𝑆𝑆 𝑆𝑦𝑦 − 𝑆 σ 𝑛
(𝑦 − 𝑦
ො ) 2
𝐸 𝑥𝑥 𝑖=1 𝑖 𝑖
𝑠𝑒2 = = =
𝑛−2 𝑛−2 𝑛−2
 Traditionally, se is referred to as the standard error of the estimate. The 𝑠𝑒2 estimate is
the residual sum of squares, or the error sum of squares, divided by n − 2

39
11.2 Inferences Based on the Least Squares Estimators
(Cont.)
 Statistics for inferences about α and β
𝑎−𝛼 𝑛𝑆𝑥𝑥
𝑡𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 2
𝑠𝑒 𝑆𝑥𝑥 + 𝑛 𝑥ҧ
𝑏−𝛽
𝑡𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 = 𝑆𝑥𝑥
𝑠𝑒
are random variables having the t distribution with n - 2 degrees of freedom.

40
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

 To construct confidence intervals for the regression coefficients 𝛼 and 𝛽, we


substitute for the middle term of −𝑡𝛼 < 𝑡 < 𝑡𝛼 the appropriate t statistic
2 2

 Confidence interval for regression coefficients

1 𝑥ҧ 2 1 𝑥ҧ 2
𝑎 − 𝑡𝛼 × 𝑠𝑒 + < 𝛼 < 𝑎 + 𝑡𝛼 × 𝑠𝑒 +
2 𝑛 𝑆𝑥𝑥 2 𝑛 𝑆𝑥𝑥
1 1
𝑏 − 𝑡𝛼 × 𝑠𝑒 < 𝛽 < 𝑏 + 𝑡𝛼 × 𝑠𝑒
2 𝑆𝑥𝑥 2 𝑆𝑥𝑥

41
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 5. A confidence interval for the intercept x y


 With reference to Example 2, construct a 95% confidence 20 0.18
interval for the regression coefficient α. 60 0.37
20002 100 0.35
𝑆𝑥𝑥 = 532000 − = 132000
10 140 0.78
2000 × 8.35
𝑆𝑥𝑦 = 2175.40 − = 505.40 180 0,56
10
8.352 220 0.75
𝑆𝑦𝑦 = 9.1097 − = 2.13745
10 260 1.18
300 1.36
340 1.17
380 1.65
42
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 5. (Solution cont)


2
Recall from Example 2 that n = 10, 𝑥ҧ = 200, and Syy−𝑆𝑥𝑦 /Sxx = 0.20238. Then,

se = 0.0253 = 0.1591. The 95% confidence limits are


1 𝑥ҧ 2 1 𝑥ҧ 2
𝑎 − 𝑡𝛼 × 𝑠𝑒 + < 𝛼 < 𝑎 + 𝑡𝛼 × 𝑠𝑒 +
2 𝑛 𝑆𝑥𝑥 2 𝑛 𝑆𝑥𝑥
and, consistent with the computer-based calculation, the 95% confidence interval is

43
11.2 Inferences Based on the Least Squares Estimators
(Cont.)
 In connection with tests of hypotheses concerning the regression coefficients
α and β, those concerning β are of special importance because β is the slope
of the regression line. That is, β is the change in the mean of Y corresponding
to a unit increase in x.
 If β = 0, the regression line is
horizontal and the mean of Y does
not depend linearly on x. For tests
of the null hypothesis β = β0
 We use the statistic test and the
criteria are like those in the table
with μ replaced by β.

44
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 6. A test of hypotheses concerning the slope parameter


 With reference to Example 2, test the null hypothesis 𝛽 = 0 against the alternative
hypothesis 𝛽 ≠ 0 at the 0.05 level of significance.

20002
 𝑆𝑥𝑥 = 532000 − 10 = 132000
2000×8.35
 𝑆𝑥𝑦 = 2175.40 − = 505.40
10
8.352
 𝑆𝑦𝑦 = 9.1097 − = 2.13745
10

45
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 6. (Solution)
1. Null hypothesis: 𝛽 = 0 ; Alternative hypothesis: 𝛽 ≠ 0
2. Level of significance: 𝛼 = 0.05
3. Criterion: Reject the null hypothesis if t < −2.306 or t > 2.306, where 2.306
is the value of t0.025 for 10 − 2 = 8 degrees of freedom
4. Calculations: Using the quantities obtained in Examples 2 and 5, we get

46
Continue to the next slide
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 6. (Solution cont)


5. Decision:
➢ Since t = 8.744 exceeds 2.306, the
null hypothesis must be rejected;
➢ we conclude that there is a
relationship between air velocity and
the average evaporation coefficient.
➢ The evidence for non-zero slope β is
extremely strong with P-value less
than 0.00003.

47
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

 If x is held fixed at x0, the quantity we want to estimate is α + βx0 and it would seem
reasonable to use 𝛼ො + 𝛽x መ 0, where α and β are again the values obtained by the
method of least squares. In fact, it can be shown that this estimator is unbiased,
has the variance

 (1 − α)100% confidence limits for α + βx0 are given by

48
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 7 Modeling and making inferences concerning the effect of prestressing


sheets of aluminum alloy

 Because of their strength and lightness, sheets of


an aluminum alloy are an attractive option for
structural members in automobiles. Engineers
discovered that prestraining a sheet of one
aluminum alloy may increase its strength. One
aspect of their experiments concerns the effect of
prestrain (%) on the peak load (kN) that
corresponds to the critical buckling load.

49
Continue to the next slide
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 7 Modeling and making inferences concerning the effect of prestressing


sheets of aluminum alloy

(a) Does prestraining increase the strength of


the aluminum alloy?
(b) Obtain a 95% confidence interval for the
mean peak load when the prestrain is set at
9 percent.

50
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

 EXAMPLE 7 (Solution)
(a) The scatter plot in Figure 11.7
suggests fitting a straight line
model. Using computer software,
we obtain

51
11.2 Inferences Based on the Least Squares Estimators
(Cont.)
EXAMPLE 7 (Solution cont)
b) The estimated regression line is

➢ With x = 9 percent prestrain, we estimate 𝑦=8.90667+0.1146(9)=9.938


ො kN.
➢ Next, a further simple calculation gives 𝑥ҧ = 5.25 and Sxx = σ12
𝑖=1 𝑥𝑖 − 𝑥ҧ
2
=
236.250.
➢ Since t0.025 = 2.228 for degree of freedom n − 2 = 10, the half length of the
95% confidence interval is

52
Continue to the next slide
11.2 Inferences Based on the Least Squares Estimators
(Cont.)

EXAMPLE 7 (Solution cont)


b) The 95% confidence interval becomes
( 9.938 − 0.152, 9.938 + 0.152 ) or ( 9.79, 10.09 ) kN

➢ We are 95% confident that mean strength is between 9.79 and 10.09 kN
for all alloy sheets that could undergo a prestrain of 9 percent.

53
Curvilinear Regression

11.3
54
11.3 Curvilinear Regression

 Polynomial curve fitting is also used to obtain approximations


when the exact functional form of the regression curve is
unknown.

 Given a set of data consisting of n points (xi, yi), we estimate


the coefficients 𝛽0 , 𝛽1 , 𝛽2 , … , 𝛽𝑝 of the pth-degree polynomial
by minimizing
𝑛
𝑝 2
෍ 𝑦𝑖 − 𝛽0 + 𝛽1 𝑥𝑖 + 𝛽2 𝑥𝑖2 + ⋯ + 𝛽𝑝 𝑥𝑖
𝑖=1

 In other words, we are now applying the least squares criterion


by minimizing the sum of the squares of the vertical distances
from the points to the curve.
55
11.3 Curvilinear Regression

 Taking the partial derivatives with respect to 𝛽0 , 𝛽1 , 𝛽2 , … , 𝛽𝑝 equating these


partial derivatives to zero, rearranging some of the terms, and letting bi be
the estimate of 𝛽𝑖 , we obtain the 𝑝 + 1 normal equations

 Where the subscripts and limits of summation are omitted for simplicity.
Note that this is a system of 𝑝 + 1 linear equations in the 𝑝 + 1 unknowns
𝑏0 , 𝑏1 , 𝑏2 , … , 𝑏𝑝 . If the x’s include 𝑝 + 1 distinct values, then the normal
equations will have a unique solution.
56
11.3 Curvilinear Regression

EXAMPLE 11 Fitting a quadratic function by


the method of least squares
The following are data on the drying time of
a certain varnish and the amount of an
additive that is intended to reduce the
drying time:
(a) Draw a scatter plot to verify that it is
reasonable to assume that the
relationship is parabolic.
(b) Fit a second-degree polynomial by the
method of least squares.
(c) Use the result of part (b) to predict the
drying time of the varnish when 6.5
grams of the additive is being used.
57
11.3 Curvilinear Regression

EXAMPLE 11 (SOLUTION)
a) As can be seen from Figure 11.12, the
overall pattern suggests fitting a second-
degree polynomial having one relative
minimum.
b) Normal equation method of least squares

58
Continue to the next slide
11.3 Curvilinear Regression

Normal equation method of least squares 𝑥 𝑦 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒙𝑦 𝒙𝟐 𝒚


for second-degree polynomial 0 12.0
෍ 𝑦 = 𝑛𝑏0 + 𝑏1 ෍ 𝑥 + 𝑏2 ෍ 𝑥 2 1 10.5
2 10.0
෍ 𝑥𝑦 = 𝑏0 ෍ 𝑥 + 𝑏1 ෍ 𝑥 2 + 𝑏2 ෍ 𝑥 3 3 8.0
4 7.0
෍ 𝑥 2 𝑦 = 𝑏0 ෍ 𝑥 2 + 𝑏1 ෍ 𝑥 3 + 𝑏2 ෍ 𝑥 4
5 8.0
6 7.5
7 8.5
8 9.0

Σ 59
Continue to the next slide
11.3 Curvilinear Regression
EXAMPLE 11 (Solution Cont)
b) Alternatively, the summations required for substitution into the normal
equations are
𝑥 2 = 204 3 = 1296 ෍ 𝑥 4 = 8772
෍ 𝑥 = 36 ෍ ෍ 𝑥

෍ 𝑦 = 80.5෍ 𝑥𝑦 = 299.0 ෍ 𝑥 2 𝑦 = 1697.0

Thus we have to solve the following system of three linear equations in the
unknowns b0, b1, and b2:

80.5 = 9𝑏0 + 36 b1 + 204 b2


299.0 = 36𝑏0 + 204 b1 + 1,296 b2
1,697.0 = 204𝑏0 + 1,296 b1 + 8.772 b2
Continue to the next slide
60
11.3 Curvilinear Regression

EXAMPLE 11 (Solution Cont)


Getting 𝛽መ0 = 12.2, 𝛽መ1 = −1.85, 𝑎𝑛𝑑 𝛽መ2 = 0.183 the equation of the
least squares polynomial is
ෝ = 12.2 − 1.85𝑥 + 0.183𝑥 2
𝒚
c) Substituting x = 6.5 into this equation, we get

ෝ = 12.2 − 1.85(6.5) + 0.183(6.5)2 = 7.9


𝒚
that is, a predicted drying time of 7.9 hours.

61
Multiple Regression

11.4
62
11.4 Multiple Regression

 Statistical methods of prediction and optimization are often referred to under


the general heading of response surface analysis. One of the method of
response surface analysis is multiple regression

 As in the case of one independent variable, we shall first treat the problem
where the regression equation is linear, namely, where for any given set of
values 𝑥1 , 𝑥2 , … 𝑥𝑟 , for the 𝑟 independent variables, the mean of the
distribution of Y is given by
𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑟 𝑥𝑟

63
11.4 Multiple Regression (cont)

 For two independent variables, this is


the problem of fitting a plane to a set
of n points with coordinates
(𝑥𝑖1 , 𝑥𝑖2 , 𝑦𝑖 ) as is illustrated in figure.
 Applying the method of least squares
to obtain estimates of the coefficients
𝛽0 , 𝛽1 , 𝛽2 , we minimize the sum of the
squares of the vertical distances from
the observations 𝑦𝑖 to the plane;
symbolically, we minimize

64
11.4 Multiple Regression (cont)

 The resulting normal equations are

 As before, we write the least squares estimates of β0, β1, and β2 as b0, b1,
and b2. Note that in the abbreviated notation σ 𝑥1 stands for σ𝑛𝑖=1 𝑥𝑖1 , σ 𝑥1 , 𝑥2
stands for σ𝑛𝑖=1 𝑥𝑖1 𝑥𝑖2 , σ 𝑥1 𝑦 stands for σ𝑛𝑖=1 𝑥𝑖1 𝑦𝑖 , and so forth.

65
11.4 Multiple Regression (cont)

EXAMPLE 12 A multiple regression with two predictor variables


The following are data on the
number of twists required to break
a certain kind of forged alloy bar
and the percentages of two alloying
elements present in the metal:
➢Fit a least squares regression
plane and use its equation to
estimate the number of twists
required to break one of the bars
when x1 = 2.5 and x2 = 12.

66
11.4 Multiple Regression (cont)
𝑦 𝑥1 𝑥2 𝑥1 𝑦 𝑥2 𝑦 𝑥12 𝑥1 𝑥2 𝑥22
41 1 5 41 205 1 5 25
49 2 5 98 245 4 10 25
69 3 5 207 345 9 15 25
65 4 5 260 325 16 20 25
40 1 10 40 400 1 10 100
50 2 10 100 500 4 20 100
58 3 10 174 580 9 30 100
57 4 10 228 570 16 40 100
31 1 15 31 465 1 15 225
36 2 15 72 540 4 30 225
44 3 15 132 660 9 45 225
57 4 15 228 855 16 60 225
19 1 20 19 380 1 20 400
31 2 20 62 620 4 40 400
33 3 20 99 660 9 60 400
43 4 20 172 860 16 80 400
Σ 723 40 200 1963 8210 120 500 3000 67
11.4 Multiple Regression (cont)

EXAMPLE 12 (Solution cont)


 The unique solution of this system of equations is b0 = 46.4, b1 = 7.78, b2 =
−1.65, and the equation of the estimated regression plane is

 Finally, substituting x1 = 2.5 and x2 = 12 into this equation, we get

68
11.4 Multiple Regression (cont)

 Categorical variables can be included in any regression analysis.


When there are only two categories, we create a dummy variable
x1 = 1 if the case corresponds to the second category and 0
otherwise.
 The rapidly increasing number of predictor variables places strong
limits on the number of categorical variables that can be included
in most regression problems. The next example illustrates the
dummy variable technique.

69
11.4 Multiple Regression (cont)

EXAMPLE 13| Multiple regression to


understand problems fixing relay towers
 Wireless providers lose a great deal of
income when relay towers do not
function properly. Breakdowns must be
assessed and fixed in a timely manner.
To gain understanding of the problems
involved, engineers collected the data. A
suitable number of years replaces the
three categories of experience). Table
11.1 already contains the dummy
variable for difficulty.
 Fit a multiple regression of assessment
time to difficulty and experience.
70
𝑥1 𝑥2 𝑦 𝑥1 𝑦 𝑥2 𝑦 𝑥12 𝑥1 𝑥2 𝑥22
11.4 Multiple Regression 0 1,5 3 0 4,5 0 0 2,25
0 2 2,3 0 4,6 0 0 4
0 4,5 1,7 0 7,65 0 0 20,25
0 8 1,2 0 9,6 0 0 64
1 1,5 6,7 6,7 10,05 1 1,5 2,25
1 0,5 7,1 7,1 3,55 1 0,5 0,25
1 2,5 5,3 5,3 13,25 1 2,5 6,25
1 3 5 5 15 1 3 9
1 5 5,6 5,6 28 1 5 25
1 6 4,5 4,5 27 1 6 36
0 0 4,5 0 0 0 0 0
0 0,5 4,7 0 2,35 0 0 0,25
 Normal Equation … 0 3,5 4 0 14 0 0 12,25
0 4 4,5 0 18 0 0 16
0 5 3,1 0 15,5 0 0 25
0 6 3 0 18 0 0 36
1 0 7,9 7,9 0 1 0 0
1 3 6,9 6,9 20,7 1 3 9
1 5,5 5 5 27,5 1 5,5 30,25
1 5 5,3 5,3 26,5 1 5 25
1 3,5 6,9 6,9 24,15 1 3,5 12,25
Σ 11 70,5 98,2 66,2 289,9 11 35,5 335,25 71
11.4 Multiple Regression (cont)

EXAMPLE 13 (Solution)
 We use software to produce the statistical analysis.

72
Continue to the next slide
11.4 Multiple Regression (cont)

EXAMPLE 13 (Solution cont)


 All of the parameter estimates are significantly different from 0. The estimated
regression

 The value 𝛽෢1 = 2.719 tells us that if x1 is increased by one unit, while x2 is held
constant, the estimated mean assessment time will increase by 2.719 hours.
 This change in x1 corresponds to changing from a simple to difficult problem.
෢2 = −0.3641 implies that if x2 is increased by one unit, while x1 is held
Similarly, 𝛽
constant, the estimated mean assessment time decreases by 0.3641 hours.

73
Correlation

11.6
74
11.6 Correlation

 There are problems where the x’s as well as the y’s are values assumed by
random variables.
 This would be the case, for instance, if we studied the relationship between:
➢input and output of a wastewater treatment plant,
➢the tensile strength and the hardness of aluminum,
➢impurities in the air and the incidence of a certain disease.
 Problems like these are referred to as problems of correlation analysis, where
it is assumed that the data points (𝑥𝑖 , 𝑦𝑖 ) for i = 1, 2, . . . , n are values of a
pair of random variables whose joint density is given by 𝑓(𝑥, 𝑦)

75
11.6 Correlation (cont)

 The scatter plot provides a visual impression of the relation between the x
and y values in a bivariate data set. The best interpretation of the sample
correlation coefficient is in terms of the standardized observations

 where the subscript x on s distinguishes the sample variance of the x


observations,

 from the sample variance of the y observations.

76
11.6 Correlation (cont)

 The sample correlation coefficient r is the sum of products of the standardized


variables divided by n − 1, the same𝑛divisor used for sample variance
1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝑟= ෍
𝑛−1 𝑠𝑥 𝑠𝑦
𝑖−1
 Correspondence between the values of r and the pattern of scatter

77
11.6 Correlation (cont)
 Alternatively, if one component of the pair tends to be large when the other is small,
and vice versa, the correlation coefficient r is negative. This case corresponds to a
northwest to southeast pattern in the scatter plot. It can be shown that the value of r
is always between −1 and 1, inclusive.
1. The magnitude of r describes the strength of a linear relation and its sign
indicates the direction.
➢ r = +1 if all pairs (𝑥𝑖 , 𝑦𝑖 ) exactly on a straight line having a positive slope.
➢ r > 0 if the pattern in the scatter plot runs from lower left to upper right.
➢ r < 0 if the pattern in the scatter plot runs from upper left to lower right.
➢ r = −1 if all pairs (𝑥𝑖 , 𝑦𝑖 ) lie exactly on a straight line having a negative slope.
➢ A value of r near −1 or +1 describes a strong linear relation.

78
11.6 Correlation (cont)

2. A value of r close to zero implies that the linear association is weak. There may still
be a strong association along a curve.

From the definitions of 𝑆𝑥𝑥 , 𝑆𝑥𝑦 and 𝑆𝑦𝑦 , we obtain a simpler calculation formula for r.
𝑆𝑥𝑦
𝑟=
𝑆𝑥𝑥 × 𝑆𝑦𝑦

79
11.6 Correlation (cont)
EXAMPLE 14 Calculating the sample correlation coefficient
 The data in the table are the numbers of minutes it took 10 mechanics to assemble a
piece of machinery in the morning, 𝑥, and in the late afternoon, 𝑦. Calculate 𝑟.
𝑥 11,1 10,3 12,0 15,1 13,7 18,5 17,3 14,2 14,8 15,3 142,3
𝑦 10,9 14,2 13,8 21,5 13,2 21,1 16,4 19,3 17,4 19,0 166,8

 The first step is plot the data to make


sure a linear pattern exists and that there
are no outliers.

80
11.6 Correlation (cont)

𝑥 𝑦 𝑥2 𝑥𝑦 𝑦2 EXAMPLE 14 (Solution)
11,1 10,9 123,2 120,99 118,8  Determine the summations needed for the
10,3 14,2 106,1 146,26 201,6 formulas. We get
2
12,0 13,8 144,0 165,6 190,4 142,3
𝑆𝑥𝑥 = 2085,3 − = 60,381
15,1 21,5 228,0 324,65 462,3 10
142,3 × 166,8
13,7 13,2 187,7 180,84 174,2 𝑆𝑥𝑦 = 2434,7 − = 61,126
10 2
18,5 21,1 342,3 390,35 445,2 166,8
𝑆𝑦𝑦 = 2897,8 − = 115,576
17,3 16,4 299,3 283,72 269,0 10
14,2 19,3 201,6 274,06 372,5  So,
𝑆𝑥𝑦 61,126
14,8 17,4 219,0 257,52 302,8 𝑟= = = 0,732
15,3 19,0 234,1 290,7 361,0 𝑆𝑥𝑥 × 𝑆𝑦𝑦 60,381 × 115,576
Σ 142,3 166,8 2085,3 2434,7 2897,8
81
11.6 Correlation (cont)

EXAMPLE 14 (Solution cont)


 The positive value for r confirms a
positive association where long
assembly times tend to pair
together and so do short assembly
times. Further, it captures the
orientation of the pattern in figure,
which runs from lower left to upper
right. Since r = 0.732 is moderately
large, the pattern of scatter is
moderately narrow.

82
11.6 Correlation (cont)

Correlation and Regression


 There are two important relationships between r and the least squares fit of
a straight line. First,
𝑆𝑥𝑦 𝑆𝑥𝑥 𝑆𝑥𝑦 𝑆𝑥𝑥
𝑟= = × = ×𝑏
𝑆𝑥𝑥 × 𝑆𝑦𝑦 𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦
 so the sample correlation coefficient, r, and the least squares estimate of
slope, β, have the same sign.
 The second relationship concerns the proportion of variability in y explained
by x in a least squares fit. The total variation in y is
𝑛
2
𝑆𝑦𝑦 = ෍ 𝑦𝑖 − 𝑦ത
𝑖=1
83
11.6 Correlation (cont)
Correlation and Regression
 While the unexplained part of the variation is the sum of squares residuals
2
𝑆𝑥𝑦
𝑆𝑦𝑦 −
𝑆𝑥𝑥
 This leaves the difference
2 2
𝑆𝑥𝑦 𝑆𝑥𝑦
𝑆𝑦𝑦 − 𝑆𝑦𝑦 − =
𝑆𝑥𝑥 𝑆𝑥𝑥
as the regression sum of squares due to fitting x.
 This decomposes the total variability in y into two components: one due to
regression and the other due to error.

84
11.6 Correlation (cont)

Correlation and Regression


 The proportion of the y variability explained by the linear relation
(determination coefficient) is

 where r is the sample correlation coefficient. To summarize, the strength of


the linear relationship is measured by the proportion of the y variability
explained by the linear relation, the square of the sample correlation
coefficient.

85
11.6 Correlation (cont)
EXAMPLE 16 Calculating the proportion of y variation attributed to the linear relation
 Refer to Example 14 concerning the data on assembly times. Find the proportion of
variation in y, the afternoon assembly times, that can be explained by a straight-
line fit to x, the morning assembly times.
𝑥 𝑦 𝑥2 𝑥𝑦 𝑦2 142,32
11,1 10,9 123,2 120,99 118,8 𝑆𝑥𝑥 = 2085,3 − = 60,381
10
10,3 14,2 106,1 146,26 201,6 142,3 × 166,8
12,0 13,8 144,0 165,6 190,4 𝑆𝑥𝑦 = 2434,7 − = 61,126
10 2
15,1 21,5 228,0 324,65 462,3 166,8
13,7 13,2 187,7 180,84 174,2 𝑆𝑦𝑦 = 2897,8 − = 115,576
10
18,5 21,1 342,3 390,35 445,2
17,3 16,4 299,3 283,72 269,0 𝑆𝑥𝑦 61,126
14,2 19,3 201,6 274,06 372,5 𝑟= = = 0,732
14,8 17,4 219,0 257,52 302,8 𝑆𝑥𝑥 × 𝑆𝑦𝑦 60,381 × 115,576
15,3 19,0 234,1 290,7 361,0
Σ 142,3 166,8 2085,3 2434,7 2897,8 86
11.6 Correlation (cont)

EXAMPLE 16 (Solution)
 In the earlier example, we obtained 𝑟 = 0,732. Consequently, the proportion
of variation in y attributed to x is 𝑟 2 = 0,7322 = 0,536
 The result we have obtained here implies that 𝑟 2 = 53,6% of the variation
among the afternoon times is explained by (is accounted for or may be
attributed to) the corresponding differences among the morning times.

87
11.6 Correlation (cont)

Inference about the Correlation Coefficient (Normal Populations)


 To develop a population measure of association, or correlation, for two
random variables X and Y, we begin with the two standardized variables

 Each of these two standardized variables is free of its unit of measurement


so their product is free of both units of measurement. The expected value of
this product, which is also the covariance, is then the measure of association
between X and Y called the population correlation coefficient. This measure
of relationship or association is denoted by ρ (rho).

88
11.6 Correlation (cont)

Inference about the Correlation Coefficient (Normal Populations)


 The population correlation coefficient ρ is positive when both components (𝑋, 𝑌)
are simultaneously large or simultaneously small with high probability. A negative
value for ρ prevails when, with high probability, one member of the pair (𝑋, 𝑌) is
large and the other is small.
 The value of ρ is always between −1 and 1, inclusive.
 The extreme values ±1 arise only when probability is assigned to a straight line.
 When ρ = ±1, we say that there is a perfect linear correlation (relationship, or
association) between the two random variables
 When ρ = 0, we say there is no correlation (relationship, or association) between
the two random variables.

89
11.6 Correlation (cont)

Inference about the Correlation Coefficient (Normal Populations)


 Inferences about ρ are based on the sample correlation coefficient. Whenever r is
based on a random sample from a bivariate normal population, we can perform a
test of significance (a test of the null hypothesis ρ = ρ0) or construct a confidence
interval for ρ on the basis of the Fisher Ƶ transformation:

 This statistic is a value of a random variable having approximately a normal


distribution with

90
11.6 Correlation (cont)

Inference about the Correlation Coefficient (Normal Populations)


 Thus, we can base inferences about ρ on

 is a random variable having approximately the standard normal distribution.


 In particular, we can test the null hypothesis of no correlation, namely, the
null hypothesis ρ = 0, with the statistic

91
11.6 Correlation (cont)

EXAMPLE 17 Testing for nonzero correlation in a normal population


 With reference to Example 14, where 𝑛 = 10 and 𝑟 = 0,732, test the null
hypothesis 𝜌 = 0 against the null hypothesis 𝜌 ≠ 0 at the 0.05 level of
significance.

92
11.6 Correlation (cont)
EXAMPLE 17. (Solution)
1. Null hypothesis: 𝜌 = 0 ; Alternative hypothesis: 𝜌 ≠ 0
2. Level of significance: 𝛼 = 0,05
3. Criterion: Reject the null hypothesis if 𝑍 < −1,96 or 𝑍 > 1,96, where 𝑍 = Ƶ 𝑛 − 3
4. Calculations: The value of Ƶ corresponding to 𝑟 = 0,732 is
1 1 + 𝑟 1 1 + 0,732
Ƶ = ln = ln = 0,933
2 1 − 𝑟 2 1 − 0,732
so that
𝑍 = Ƶ 𝑛 − 3 = 0,933 × 10 − 3 = 2,47
5. Decision: Since 𝑍 = 2,47 exceeds 1,96, the null hypothesis must be rejected; we
conclude that there is a relationship between the morning and later afternoon times it
takes a mechanic to assemble the given kind of machinery.
93
11.6 Correlation (cont)

 To construct a confidence interval for ρ, we first construct a confidence interval


For 𝜇Ƶ, the mean of the sampling distribution of Ƶ and convert to r and ρ using
the inverse transformation. To obtain this transformation, we solve
1 1+𝑟
Ƶ = ln
2 1−𝑟
 for r to obtain
𝑒 Ƶ − 𝑒 −Ƶ
𝑟= Ƶ
𝑒 + 𝑒 −Ƶ
 Use of the theory above, the confidence intervals for 𝜇Ƶ (normal population) as
𝑧𝛼Τ2 𝑧𝛼Τ2
Ƶ− < 𝜇Ƶ < Ƶ +
𝑛−3 𝑛−3

94
11.6 Correlation (cont)

EXAMPLE 18 Determining a confidence interval for ρ (normal population)


 If 𝑟 = 0,70 for the mathematics and physics grades of 30 students, construct a
95% confidence interval for the population correlation coefficient.
Solution:
 The value of Ƶ that corresponds to 𝑟 = 0,70 is
1 1 + 𝑟 1 1 + 0,7
Ƶ = ln = ln = 0,867
2 1 − 𝑟 2 1 − 0,7
 Substituting 𝑛 = 30 and 𝑧0,025 = 1,96 into the preceding confidence interval
formula for 𝜇, we get
1,96 1,96
0,867 − < 𝜇Ƶ < 0,867 +
30 − 3 30 − 3
0,490 < 𝜇Ƶ < 1,244
95
11.6 Correlation (cont)

EXAMPLE 18 (Solution cont)


 Then, transforming the confidence limits back to the corresponding values of r
𝑒 0,490 − 𝑒 −0,490
𝑟 = 0,490 −0,490 = 0,45
𝑒 +𝑒
and
𝑒 1,244 − 𝑒 −1,244
𝑟 = 1,244 −1,244
= 0,85
𝑒 +𝑒
 We get the 95% confidence interval
0,45 < 𝜌 < 0,85
 For the true strength of the linear relationship between grades of students in
the two given subjects.

96
Multiple Linear Regression
(Matrix Notation)

11.7
97
11.7 Multiple Linear Regression (Matrix Notation)

 The model we are using in multiple linear regression


lends itself uniquely to a unified treatment in matrix
notation.4 This notation makes it possible to state
general results in compact form and to use to great
advantage many of the results of matrix theory.

 It is customary to denote matrices by capital letters in


boldface type and vectors by lowercase boldface type.
To express the normal equations in matrix notation, let
us define the three matrices beside.

98
11.7 Multiple Linear Regression (Matrix Notation)
 Example we have pairs (𝑥𝑖1 , 𝑥𝑖2 , 𝑦𝑖 ) in table
𝑥𝑖1 𝑥11 𝑥21 … 𝑥𝑛1
𝑥𝑖2 𝑥12 𝑥22 … 𝑥𝑛2
𝑦𝑖 𝑦1 𝑦2 … 𝑦𝑛
 We will find the value (𝑏0 , 𝑏1 , 𝑏2 ) such that
𝑏0 + 𝑏1 𝑥11 + 𝑏2 𝑥12 = 𝑦1
𝑏0 + 𝑏1 𝑥21 + 𝑏2 𝑥22 = 𝑦2

𝑏0 + 𝑏1 𝑥𝑛1 + 𝑏2 𝑥𝑛2 = 𝑦𝑛
 In matrix notation

99
11.7 Multiple Linear Regression (Matrix Notation) [cont]

 𝑿, is an n×(1+2) matrix consisting essentially of


the given values of the x’s with the column of 1’s
appended to accommodate the constant term
 𝒚 is an n × 1 matrix (or column vector) consisting
of observed values of the response variable
 𝒃 is the (1 + 2) × 1 matrix (or column vector)
consisting of possible values of the regression
coefficients
 We get:
𝑋𝑛×3 𝑏3×1 = 𝑦𝑛×1

100
11.7 Multiple Linear Regression (Matrix Notation) [cont]

 Multiplying on the left by 𝑋′ is transpose of 𝑋


′ ′
𝑋3×𝑛 𝑋𝑛×3 𝑏3×1 = 𝑋3×𝑛 𝑦𝑛×1
𝑋′𝑋3×3 𝑏3×1 = 𝑋 ′ 𝑦3×1

 Multiplying on the left by 𝑋 ′ 𝑋 −1 is inverse from 𝑋 ′ 𝑋


𝑋 ′ 𝑋 −1 3×3 𝑋′𝑋3×3 𝑏3×1 = 𝑋 ′ 𝑋 −1 3×3 𝑋 ′ 𝑦3×1

 Since 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑋 = 𝐼, with 𝐼 is identity matrix and by definition 𝐼 × 𝑏 = 𝑏


 We have assumed here that 𝑋 ′ 𝑋is nonsingular, so that its inverse exists.
 Finally, the solution 𝑏 satisfies
𝑏 = 𝑋′𝑋 −1 𝑋 ′ 𝑦

101
11.7 Multiple Linear Regression (Matrix Notation) [cont]

 To verify this relation, we first determine 𝑿’𝑿, 𝑿’𝑿𝒃, and 𝑿’𝒚.

102
11.7 Multiple Linear Regression (Matrix Notation) [cont]

EXAMPLE 20 Fitting a straight line using the matrix formulas


 Use the matrix relations to fit a straight line to the data

Solution
 Here k = 1 and, dropping the subscript 1, we have

103
Continue to the next slide
11.7 Multiple Linear Regression (Matrix Notation) [cont]

 EXAMPLE 20 (Solution cont)


 Consequently,

 and the fitted equation is

 The vector of fitted values is

104
Continue to the next slide
11.7 Multiple Linear Regression (Matrix Notation) [cont]

 EXAMPLE 20 (Solution cont)


 so the vector of residuals

 Finally,
1 1 6
𝑠𝑒2 = 𝑦 − 𝑦ො ′ 𝑦 − 𝑦ො = −1 2 + 22 + −1 2 + 02 + 02 = =2
𝑛−𝑘−1 5−1−1 3

105
11.7 Multiple Linear Regression (Matrix Notation) [cont]
𝑦 𝑥1 𝑥2 𝑥1 𝑦 𝑥2 𝑦 𝑥12 𝑥1 𝑥2 𝑥22
EXAMPLE 19 Calculating the least 41 1 5 41 205 1 5 25
squares estimates using 49 2 5 98 245 4 10 25
𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦 69 3 5 207 345 9 15 25
65 4 5 260 325 16 20 25
 With reference to the Example 40 1 10 40 400 1 10 100
12, use the matrix expressions to 50 2 10 100 500 4 20 100
determine the least squares 58 3 10 174 580 9 30 100
estimates of the multiple 57 4 10 228 570 16 40 100
regression coefficients. 31 1 15 31 465 1 15 225
36 2 15 72 540 4 30 225
44 3 15 132 660 9 45 225
57 4 15 228 855 16 60 225
19 1 20 19 380 1 20 400
31 2 20 62 620 4 40 400
33 3 20 99 660 9 60 400
43 4 20 172 860 16 80 400
Σ 723 40 200 1963 8210 120 500 3000 106
11.7 Multiple Linear Regression (Matrix Notation) [cont]
EXAMPLE 19 (Solution)
 Substituting σ 𝑥1 = 40, σ 𝑥2 = 200, σ 𝑥12 = 120, σ 𝑥1 𝑥2 = 500, σ 𝑥22 = 3000,
and n = 16 into the expression for 𝑿’𝑿 above, we get
16 40 200
𝑋 ′ 𝑋 = 40 120 500
200 500 3000

 Then the inverse of this matrix can be obtained by any one of a number of
different techniques; using the one based on cofactors, we find that
1 110000 −20000 −4000
(𝑋 ′ 𝑋)−1 = −20000 8000 0
160000
−400 0 320

 Where 160,000 is the value of |𝑿’𝑿|, the determinant of 𝑿’𝑿


107
Continue to the next slide
11.7 Multiple Linear Regression (Matrix Notation) [cont]

EXAMPLE 19 (Solution cont)


 Substituting σ 𝑦 = 40, σ 𝑥1 𝑦 = 1963, and σ 𝑥2 𝑦 = 8210 into the expression for 𝑿’𝒚,
we get

 Finally

108
Continue to the next slide
11.7 Multiple Linear Regression (Matrix Notation) [cont]

 The residual sum of squares also has a convenient matrix expression. The predicted
෢0 + 𝛽
values 𝑦𝑖 = 𝛽 ෢1 𝑥𝑖1 + 𝛽
෢2 𝑥𝑖2 can be collected as a matrix

𝑦ො 𝑋𝛽መ

 Then the residual


𝑛
sum of squares

෍ 𝑦𝑖 − 𝑦ෝ𝑖 2 = 𝑦 − 𝑦ො ′ 𝑦 − 𝑦ො = 𝑦 − 𝑋𝛽መ 𝑦 − 𝑋𝛽መ
𝑖=1
 Consequently, the estimate 𝑠𝑒2 of 𝜎 2 can be expressed as
1 ′
2
𝑠𝑒 = 𝑦 − 𝑋𝛽 𝑦 − 𝑋𝛽መ

𝑛−3 109
11.7 Multiple Linear Regression (Matrix Notation) [cont]
EXAMPLE 19 (Solution cont)
 The same matrix expressions for 𝒃 and the residual sum of squares hold for
any number of predictor variables. If the mean of 𝑌 has the form 𝛽0 + 𝛽1 𝑥1 +
𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 we define the matrices

1 ′
 Then 𝛽መ = 𝑋 ′ 𝑋 −1 𝑋 ′ 𝑦
and 𝑠𝑒2 = 𝑦 − 𝑋𝛽መ 𝑦 − 𝑋𝛽መ
𝑛−𝑘−1
 Generally, the sum of squares error, SSE, has degrees of freedom
dof = n − (number of β’s in model)
𝑣 = 𝑛 − (𝑘 + 1) 110
END

111

Anda mungkin juga menyukai