Regression Analysis 1-1-11

Regression Analysis
Regression Analysis
Least-Squares Linear Regression

Enables fit of linear or exponential function to data. The goal in regression analysis is the development of a statistical model that can be used to predict the values of a dependent or response variable from the values of the independent variable(s).
Linear Fits Most Common For exponential functions, data must be transformed.
Regression Analysis
Method of Least Squares
If we have N pairs of data (xi, yi) we seek to fit a straight line through the data of the form: Determine constants, a0 and a1, such that the distance between the actual y data and the fitted/ predicted line is minimized.
y = a0 + a1 x
a0 = x i " x i y i ! " x i2 " y i "
Each xi is assumed to be error free. All the error is assumed to be in the y values.
" x " y ! N" x y a = (" x ) ! N " x

i i 1 2 i 2 i
x i ! N " x i2 "
2 i i
Regression Analysis
Manual Calculation Method

Raw Data yi xi 1.2 2 2.4 3.5 3.5 12.6 1 1.6 3.4 4 5.2 15.2 xiyi 1.2 3.2 8.16 14 18.2 44.76 xi
2
Sum
1 2.56 11.56 16 27.04 58.16
Seeking an equation with the form: y=a0+a1x y=0.879+0.540x (15.2)(44.76)! (58.16)(12.6) = 0.879 a0 = (15.2)2 ! (5)(58.16)
(15.2)(12.6)! (5)(44.76) = 0.540 a1 = (15.2)2 ! (5)(58.16)

Regression Analysis 4
How good is the fit?

Coefficient of Determination (R2) measures the goodness of fit and the proportion of the variation of the y values associated with the variation in the x variable in the regression. The ratio of the explained variation to the total variation.

R2 =1 Perfect Fit (good prediction) R2 =0 No correlation between x and y For engineering data, R2, will normally be quite high (0.8-0.90 or higher) A low value might indicate that some important variable was not considered, but is affecting the results.
R2
" (ax + b ! y ) = 1! " (y ! y )

i i 2 i
= Excel Function RSQ (yi 's, x i 's)
where y = average of the yi 's

Standard Error of Estimate SEE
The standard error of estimate (SEE or Syx) is a statistical measure of how well the best-fit line represents the data. This is, effectively, the standard deviation of the differences between the data points and the best-fit line.
It provides an estimation of the scatter/random error in the data about the fitted line. This is analogous to standard deviation for sample data. It has the same units as y. 2 degrees of freedom are lost to calculate coefficients a0 and a1.
sey = SEE = Syx =
" ( yi ! yi ) N !2
= Excel Function STEYX(yi ' s, xi ' s)
where yi = actual value of y for a given x i yi = predicted value of y for a given x i

Linear Regression Assumptions

Variation in the data is assumed to be normally distributed and due to random causes. Assuming random variation exists in y values, while x values are error free. Since error has been minimized in the y direction, an erroneous conclusion may be made if x is estimated based on a value for y. For power law or exponential relationships, data needs to be transformed before carrying out linear regression analysis. (As we will discuss later, the method of least squares can also be applied to nonlinear functional relationships.)
Regression Analysis
Linear Regression Example
Use Excel Chart>>Add Trendline to obtain coefficients Functions RSQ() and STEYX() to determine R2 and SEE
3.00
Output, Volts
2.50 2.00 1.50 1.00 0.50 0.00 0.00
y = 0.9977x + 0.0295 R2 = 0.9993
0.50
1.00
1.50 Length, cm
2.00
2.50
3.00
Regression Analysis
Regression Analysis using Excel Analysis Tools
Linear regression is a standard feature of statistical programs and most spreadsheet programs. It is only necessary to input the x and y data. The remaining calculations are performed immediately.
Excel Regression Analysis macro

Performs linear regression only Non-linear relationships must be transformed Calculates the slope, intercept, SEE, and the upper and lower confidence intervals for the slope and intercept Does not produce any graphical output on the users plot. Does not update automatically. The user must interpret the results.
Linear Regression in Excel 2008

Y = m1iX + b
Torque, N-m (Y) 4.89 4.77 3.79 3.76 2.84 4.12 2.05 1.61 RPM (X) 100 201 298 402 500 601 699 799 Y Predicted Residual Residual/SEE=Residual/sey 4.998433207 0.108433207 0.17558474 4.559896053 -0.210103947 -0.340219088 4.138726707 0.348726707 0.564689451 3.687163697 -0.072836303 -0.117943051 3.261652399 0.421652399 0.682777249 2.823115245 -1.296884755 -2.100031702 2.397603947 0.347603947 0.562871377 1.963408745 0.353408745 0.572271025
Outlier
-0.004341952 5.432628409 0.000954031 0.481645161 0.775391233 0.617554846 20.71311576 6
m1 se1 r^2 F
b seb sey df
=LINEST(A2:A9,B2:B9,TRUE,TRUE)
Regression Analysis
10
Linear Regression Example: Omit Outlier

Torque, N-m (Y) 4.89 4.77 3.79 3.76 2.84 2.05 1.61 RPM (X) 100 201 298 402 500 699 799 Y Predicted 5.000219168 4.504157858 4.02774254 3.516946736 3.03561992 2.058231795 1.567081983 Residual 0.110219168 -0.265842142 0.23774254 -0.243053264 0.19561992 0.008231795 -0.042918017 Residual/SEE=Residual/sey 0.504559919 -1.21696881 1.088334807 -1.112646171 0.895506407 0.037683406 -0.196469559
-0.004911498 0.000348477 0.975447633 198.6463557 9.479149271
5.49136898 0.170606738 0.218446143 5 0.238593586
m1 se1 r^2 F m1
b seb sey df b
Regression Analysis
11
Uncertainties on Regression
Confidence Interval for Regression Line SEE=sey TINV(a=0.05,n=5) 95% C.I.=TINV(=0.05,=5)*SEE/SQRT(7)
Prediction Band for Regression Line 95% P.I.=TINV(=0.05,=5)*SEE
Uncertainty in Slope b=TiINV(0.05,5)*se1
Uncertainty in Intercept b=TiINV(0.05,5)*seb
0.218446143 2.570581835 0.212239784
0.561533687
0.000895789
0.438558582
Regression Analysis
12
Regression Line Confidence Intervals & Prediction Band
Not only do you want to obtain a curve fit relationship but you also want to establish a confidence interval in the equation or measure of random uncertainty in a curve fit. =N-2 in determination of t-value. Two degrees of freedom are lost because m1 and b are determined. 6 Syx Sey SEE CI = !y " t# ,$ = t# ,$ = t# ,$ N N N 5 where
Prediction Band -95% CI - 95% Torque, Lease Squares Fit CI +95% Prediction Band +95% Data
Torque, N-m
4 3 2 1 0
t#
,$
= TINV (# , $ )
(two-sided t-table) # = 1% P PB " t# ,$ SEE = t# ,$ Syx = t# ,$ Sey
200
400 RPM
600
800
1000
Regression Analysis
13
Regression Line Confidence Interval & Prediction Band
1 (x * " x )2 sey CI!in!Curve!Fit! = t! 2,n " 2 # sey + $ t! 2,n " 2 # n Sxx n
!yPrediction!Band
n +1 x # x = t" 2,n # 2 sey + n Sxx

*
$ t" 2,n # 2 sey
More accurate Approximate -minimum at mean -flares out at low & high extremes
Regression Analysis
14
Summations Used in Statistics & Regression

Variable Sample Standard Deviation Expressions used in regression analysis Sum of squares for evaluating CI & PI Standard error of estimate
Sxx = " ( xi ! x )
2
Expression
$ 1 2' Sx = & " # ( xi ! x ) ) %N !1 (
1/2
# " ( yi ! y predicted ! at ! x = x )2 & i sey = SEE = Syx = % ( $ ' N !2
1/2
CI in slope and intercept

Slope, m
CI !in!slope = t! 2,v " se1

Intercept, b
CI in Intercept = t! 2,v " seb

Note 1: =n-2. Note 2: m & b are not independent variables. Therefore, do not apply RSS to y=mx+b to determine y. Instead, use CI for curve fit.
Outliers in x-y Data Sets
Method involves computing the ratio of the residuals (predicted-actual) to the standard error of estimate (sey=SEE)
1. 2.
3.
Residuals=ypredicted-yactual at each xi Plot the ratio of residuals/SEE for each xi. These are the standardized residuals. Standardized residuals exceeding 2 may be considered outliers. Assuming the residuals are normally distributed, you can expect that 95% of residuals are in the range 2 (that is, within 2 standard deviations from best fit line)
Regression Analysis
17
Linear Regression with Data Transformation
Regression Analysis
18
Data Transformation
Commonly, test data do not show an approximate linear relationship between the dependent (Y) and independent (X) variables and a direct linear regression is not useful.
The form of the relationship expected between the dependent and independent variables is often known. The data needs to be transformed prior to performing a linear regression. Transformations often can be accomplished by taking the logarithms of or natural logarithms of one or both sides of the equation.
Regression Analysis
19
Common Transformations
Relationship Plot Method Log y vs. Log x (log plot) Log(y)=Log()+Log(x) Ln y vs. x (log-log paper) Ln(y)=Ln()+Ln(x) Transformed Intercept, b Log() Transformed Slope, m1 Ln() Log() Ln() Log(e)
y=x
y=ex
Log y vs. x (semi-log plot) Log(y)=Log()+Log(e)x Ln y vs. x (semi-log plot) Ln(y)=Ln()+x
Regression Analysis
20
Regression with Transformation
Example
A velocity probe provides a voltage output that is related to velocity, U, by the form E=+U , , and are constants
Output Voltage, VDC
4.5 4 3.5 3 0 10
U (ft/s) 0 10 20 30 40 Ei (V) 3.19 3.99 4.3 4.48 4.65
Output Voltage, VDC

50
10
1 1 10 Velocity, ft/s 100
20 30 Velocity, ft/s
40
Regression Analysis
21
Data Relationship Transformation

E=+U (E==3.19 at U=0) Log(E-3.19)=Log(U) Log(E-3.19)=Log()+Log(U)= Log()+Log(U) Y
U (ft/s) 0 10 20 30 40 Ei (V) 3.19 3.99 4.3 4.48 4.65 Lets Tranform this X 1.00 1.30 1.48 1.60
m1 X
Y -0.097 0.045 0.111 0.164
Perform Regression on the transformed Data

Solution (Excel 2004 Output)

SUMMARY OUTPUT Regression Statistics Multiple R 0.998723855 R Square 0.997449339 Adjusted R Square 0.996174009 Standard Error 0.01 Observations 4 ANOVA df Regression Residual Total 1 2 3 SS MS F Significance F 0.038118269 0.038118 782.1106 0.00127614 9.74754E-05 4.87E-05 0.038215745
t t*SEE t!value 3.18TINV (0.05,2) = 4.3026 ," = 0.02 SEE=0.0070
Intercept X Variable 1
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% -0.525 0.021056315 -24.9274 0.001605 -0.61547736 -0.4342812 0.432 0.015438034 27.96624 0.001276 0.36531922 0.49816831
Y=-0.525+0.432X
Regression with Transformation & Uncertainty

Y predicted -0.0931 0.0368 0.1129 0.1668 Y+ -0.0781 0.0519 0.1279 0.1818 Y-0.1082 0.0218 0.0978 0.1518 Transform it Back Again E 3.19 4.00 4.28 4.49 4.66 E+ 3.19 4.03 4.32 4.53 4.71 E3.19 3.97 4.24 4.44 4.61
Example 4.10 5
4.5
E, V
B=Logb -0.525=Logb b=0.298
3.5
E=3.19+0.298U0.432
0 10 20 U, ft/s 30 40 50
Regression Analysis
24
Multiple and Polynomial Regression
Regression analysis can also be performed in situations where there is more than one independent variable (multiple regression) or for polynomials of an independent variable (polynomial regression) Polynomial Expression Seeks the form
Y=b+m1*x+m2*x2++mkxk
Multiple Regression seeks a function of the form

Y = b + m1 x1 + m2 x2 + m3 x3 + .... + mk xk where x may represent several independent variables For example: x1 = x1 x2 = x2 x3 = x1 ! x2
Linear Regression in Excel 2004
Input the result values
Input the independent variable
Input desired confidence level
Regression Analysis
26
Excel 2004 Linear Regression Output

SUMMARY OUTPUT Regression Statistics Multiple R 0.99964308 R Square 0.99928628 Adjusted R Square 0.99910785 Standard Error 0.02788582 Observations 6 ANOVA df Regression Residual Total 1 4 5 SS 4.35502286 0.00311048 4.35813333 MS 4.35502286 0.00077762 F Significance F 5600.45805 1.9107E-07
R2 SEE=sey N
Intercept X Variable 1
Coefficients Standard Error t Stat 0.02952381 0.02018228 1.46285828 0.99771429 0.01333197 74.8362082
P-value Lower 95% Upper 95% 0.21733392 -0.02651117 0.08555879 1.9107E-07 0.9606988 1.03472978
intercept b"
slope m1"
The lower and upper bounds for the coefficients. To obtain the +- bound, simply subtract the lower from the upper and divide by two.

Regression Analysis 1-1-11

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Regression Analysis 1-1-11

Diunggah oleh

Hak Cipta:

Format Tersedia

Regression Analysis

Least-Squares Linear Regression

Method of Least Squares

" x " y ! N" x y a = (" x ) ! N " x

Manual Calculation Method

1 2.56 11.56 16 27.04 58.16

(15.2)(12.6)! (5)(44.76) = 0.540 a1 = (15.2)2 ! (5)(58.16)

How good is the fit?

" (ax + b ! y ) = 1! " (y ! y )

= Excel Function RSQ (yi 's, x i 's)

where y = average of the yi 's

Standard Error of Estimate SEE

sey = SEE = Syx =

= Excel Function STEYX(yi ' s, xi ' s)

where yi = actual value of y for a given x i yi = predicted value of y for a given x i

Linear Regression Assumptions

Linear Regression Example

2.50 2.00 1.50 1.00 0.50 0.00 0.00

y = 0.9977x + 0.0295 R2 = 0.9993

Regression Analysis using Excel Analysis Tools

Excel Regression Analysis macro

Linear Regression in Excel 2008

-0.004341952 5.432628409 0.000954031 0.481645161 0.775391233 0.617554846 20.71311576 6

Linear Regression Example: Omit Outlier

-0.004911498 0.000348477 0.975447633 198.6463557 9.479149271

5.49136898 0.170606738 0.218446143 5 0.238593586

Regression Line Confidence Intervals & Prediction Band

(two-sided t-table) # = 1% P PB " t# ,$ SEE = t# ,$ Syx = t# ,$ Sey

Regression Line Confidence Interval & Prediction Band

1 (x * " x )2 sey CI!in!Curve!Fit! = t! 2,n " 2 # sey + $ t! 2,n " 2 # n Sxx n

n +1 x # x = t" 2,n # 2 sey + n Sxx

$ t" 2,n # 2 sey

Summations Used in Statistics & Regression

# " ( yi ! y predicted ! at ! x = x )2 & i sey = SEE = Syx = % ( $ ' N !2

CI in slope and intercept

CI !in!slope = t! 2,v " se1

CI in Intercept = t! 2,v " seb

Outliers in x-y Data Sets

Linear Regression with Data Transformation

Log y vs. x (semi-log plot) Log(y)=Log()+Log(e)x Ln y vs. x (semi-log plot) Ln(y)=Ln()+x

Regression with Transformation

Output Voltage, VDC

Output Voltage, VDC

1 1 10 Velocity, ft/s 100

Data Relationship Transformation

Perform Regression on the transformed Data

Solution (Excel 2004 Output)

t t*SEE t!value 3.18TINV (0.05,2) = 4.3026 ," = 0.02 SEE=0.0070

Regression with Transformation & Uncertainty

B=Logb -0.525=Logb b=0.298

Multiple and Polynomial Regression

Multiple Regression seeks a function of the form

Linear Regression in Excel 2004

Input the result values

Input the independent variable

Input desired confidence level

Excel 2004 Linear Regression Output

Anda mungkin juga menyukai