No Relationship
Population
Slope
Coefficient
Independent
Variable
y 0 1x
Linear component
Random
Error
term, or
residual
Random Error
component
y 0 1x
(continued)
Observed Value
of y for xi
Predicted Value
of y for xi
Slope = 1
Random Error
for this x value
Intercept = 0
xi
Estimate of
the regression
Estimate of the
regression slope
intercept
y i b0 b1x
Independent
variable
e (y y)
2
(y (b
b1x))
( x x )( y y )
(x x)
2
algebraic
equivalent:
b1
x y
xy
n
2
(
x
)
2
x
and
b0 y b1 x
Interpretation of the
Slope and the Intercept
b0 is the estimated average value of
y when the value of x is zero
b1 is the estimated change in the
average value of y as a result of a
one-unit change in x
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Regression Statistics
Excel Output
Multiple R
0.76211
R Square
0.58082
Adjusted R
Square
Standard Error
0.52842
41.33032
Observations
10
ANOVA
df
SS
MS
F
11.084
8
Regression
18934.9348
18934.934
8
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficien
ts
Intercept
Square Feet
Standard Error
Significance
F
0.01039
t Stat
Pvalue
-35.57720
232.0738
6
0.03374
0.18580
98.24833
58.03348
1.69296
0.1289
2
0.10977
0.03297
3.32938
0.0103
9
Lower 95%
Upper
95%
Graphical Presentation
House price model: scatter plot and
regression line
Slope
= 0.10977
Intercept
= 98.248
SST
Total sum
of Squares
SST ( y y )2
SSE
Sum of
Squares Error
SSE ( y y )2
SSR
Sum of
Squares
Regression
SSR ( y y )2
where:
y
yi
2
SSE = (yi - yi )
_
y
Xi
_
y
Coefficient of Determination,
R2
The coefficient of determination is the
portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
The coefficient of determination is also
called R-squared and is denoted as R2
SSR
R
SST
2
where
0 R 1
2
Coefficient of Determination,
R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R
SST
total sum of squares
2
R r
2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Examples of Approximate
R2 Values
y
R2 = 1
R2 = 1
R = +1
2
Examples of Approximate
R2 Values
y
0 < R2 < 1
Examples of Approximate
R2 Values
R2 = 0
No linear relationship
between x and y:
R2 = 0
Regression Statistics
Excel Output
Multiple R
0.76211
R Square
0.58082
Adjusted R
Square
Standard Error
0.52842
41.33032
Observations
SSR 18934.9348
R
0.58082
SST 32600.5000
2
10
ANOVA
df
SS
MS
Regression
18934.9348
18934.934
8
Residual
13665.5652
1708.1957
Total
32600.5000
Coefficien
ts
Intercept
Square Feet
Standard Error
11.084
8
0.01039
t Stat
Pvalue
-35.57720
232.0738
6
0.03374
0.18580
98.24833
58.03348
1.69296
0.1289
2
0.10977
0.03297
3.32938
0.0103
9
Lower 95%
Upper
95%
small s
small sb1
large sb1
large s
Square Feet
(x)
245
1400
312
1600
279
1700
308
1875
199
1100
219
1550
405
2350
324
2450
319
1425
255
1700
Residual Analysis
Purposes
Examine for linearity assumption
Examine for constant variance for all
levels of x
Evaluate normal distribution
assumption
Graphical Analysis of Residuals
Can plot residuals vs. x
Can create histogram of residuals to
check for normality
Not Linear
residuals
residuals
Linear
x
Non-constant variance
residuals
residuals
Constant variance
Excel Output
RESIDUAL OUTPUT
Predicted
House
Price
Residuals
251.92316
-6.923162
273.87671
38.12329
284.85348
-5.853484
304.06284
3.937162
218.99284
-19.99284
268.38832
-49.38832
356.20251
48.79749
367.17929
-43.17929
254.6674
64.33264
10
284.85348
-29.85348
Summary
Introduced correlation analysis
Discussed correlation to measure the
strength of a linear association
Introduced simple linear regression analysis
Calculated the coefficients for the simple
linear regression equation
measures of variation (R2 and s)
Addressed assumptions of regression and
correlation
Summary
(continued)
R software regression
yx=c(245,1400,312,1600,279,1700,308,1875,19
9,1100,219,1550,405,2350,324,2450,319,1425,2
55,1700)
mx=matrix(yx,10,2, byrow=T)
hprice=mx[,1]
sqft=mx[,2]
reg1=lm(hprice~sqft)
summary(reg1)
plot(reg1)