Anda di halaman 1dari 2

Statistical-Regression_handout.

xmcd Statistical Analysis of Regression 8

So, for this example that we are working through, we can find a 95% confidence interval for a and b in the
following manner.
1 − .95
For the 95% confidence interval, k := 1 − k = 0.975
2
qt( k , n − 2 ) = 2.262 The function qt(α,β) in MATHCAD returns the critical value for the
Student's t test at the α probability level for β degrees of freedom

σ⋅ qt( k , n − 2 ) ⋅ Axx σ⋅ qt( k , n − 2 )


b uncertainty :=
auncertainty := Sxx
Sxx
auncertainty = 2.092 b uncertainty = 1.364

We can thus report at the 95% confidence interval the plus/minus uncertainties of the parameters; i.e.,
4.322 < a < 8.505 0.445 < b < 3.174

Check 4: Confidence Level for the Calculated Values


Similarly, we can estimate the accuracy of predicted values obtained from our equations. This is a
confidence interval on the mean value obtained at any x value. This is again done with a Student t test in the
following manner.

1 (x − Ax)2
y uncertainty = t ⋅ σ⋅ 1 + +
k , n− 2 n Sxx

where the t statistic is the same as above. This equation can be applied at each desired value of x and the
plus and minus values generated to give an uncertainty band at the desired confidence level. In the example
we are using as an illustration, we would generate the 95% confidence band for predicted values in the
following manner:

(xi − Ax)
2
1
y uncertainty := qt( k , n − 2 ) ⋅ σ⋅ 1 + + get the high and
i n Sxx low values for the
uncertainty band at
ylo := a + b ⋅ x − y uncertainty yhi := a + b ⋅ x + y uncertainty each desired x
i i i i i i

12

y 95% confidence
10
fit interval is shown
for the sample
yhi problem
ylo 8

Notice how the uncertainty


band expands at the two
6
ends.
1 1.2 1.4 1.6 1.8 2
x
Statistical-Regression_handout.xmcd Statistical Analysis of Regression 9

ChEn 475
Statistical Analysis of Regression
Lesson 3. Analysis of Multiple Linear Regression

Most of what we have learned from simple linear regression also applies to multiple linear regression where we
have more than one random independent variable. Thus,

y = b + b ⋅ x1 + b ⋅ x2 + ....
0 1 2
This of course is the situation that we had in the introductory illustration after we had linearized the surface
tension equation; i.e.,

( 1) + C2⋅ ln(1 − x) + C3⋅ x⋅ ln(1 − x) + C4⋅ x ⋅ ln(1 − x) + C5⋅ x ⋅ ln(1 − x)


2 3
ln( y ) = ln C

To illustrate the statistical analysis applicable to multiple linear regression, let us use the following data:

⎛ 25.5 1.74 5.30 10.80 ⎞ n := 13 i := 1 .. n


⎜ ⎟
⎜ 31.2
6.32 5.42 9.40

⎜ 25.9 6.22 8.41 7.20 ⎟
⎜ 38.4 10.52 4.63 8.50 ⎟
⎜ ⎟
⎜ 18.4 1.19 11.60 9.40 ⎟
⎜ 26.7 1.22 5.85 9.90 ⎟
〈1〉
data := ⎜ 26.4 4.10 6.62 8.00 ⎟ Y := data X := submatrix( data , 1 , 13 , 2 , 4 )
⎜ ⎟
⎜ 25.9 6.32 8.72 9.10 ⎟
⎜ 32.0 4.08 4.42 8.70 ⎟ 〈2〉 〈3〉 〈4〉
x1 := data x2 := data x3 := data
⎜ 25.2 4.15 7.60 9.20

⎜ ⎟
⎜ 39.7 10.15 4.83 9.40 ⎟
⎜ 35.7 1.72 3.12 7.60 ⎟
⎜ ⎟
⎝ 26.5 1.70 5.30 8.20 ⎠

We can quickly arrive at the values of the coefficients in MATHCAD using the REGRESS function:

⎛⎜ 1.016 ⎞⎟
−1.862 ⎟
z := regress( X , Y , 1 ) coeffs := submatrix( z , 4 , length ( z) , 1 , 1 ) coeffs = ⎜
⎜ −0.343 ⎟
⎜ 39.157 ⎟
Yc := coeffs + coeffs ⋅ x1 + coeffs ⋅ x2 + coeffs ⋅ x3
i 4 1 i 2 i 3 i ⎝ ⎠
Note that in MATHCAD, the last element of
the coeffs vector is the constant term in the
regression. However this is ONLY TRUE for
more than 2 coefficients.