Anda di halaman 1dari 44

What is Modeling?

3
?

m l m m = a + bl
1.0 15.0
21
20

1.5 17.0 19
18
a? b?
2.0 18.0 17 a=-4
2.5 19.5 16
15
b = + 0.33
3.0 21.0 1 2 3 l

-4 0.33 20.7

m = a + b l
l = 20.7

m = 2.9
Simple Linear Regression

The equation that describes how y is related to x and an error


term is called the regression model.
The simple linear regression model is: y = 0 + 1x +
where:
0 and 1 are called parameters of the model, is a
random variable called the error term.
Simple Linear Regression

The simple linear regression equation is: E(y) = 0 + 1x


Graph of the regression equation is a straight line.
0 is the y intercept of the regression line.
1 is the slope of the regression line.
E(y) is the expected value of y for a given x value.
Simple Linear Regression

E(y)
Positive Linear Relationship

Slope 1 is positive
Intercept
0

x
Simple Linear Regression

Negative Linear Relationship


E(y)

Intercept
0

Slope 1 is Negative

x
Simple Linear Regression

E(y)
No Relationship

Intercept Regression line


0 Slope 1 is Zero

x
Simple Linear Regression

The estimated simple linear regression equation


y = b0 + b1 x
The graph is called the estimated regression line.
b0 is the y intercept of the line
b1 is the slope of the line.
y is the estimated value of y for a given x value
Simple Linear Regression

Regression Model Sample Data:


y =0 + 1x + x y
Regression Equation x1 y 1
E(y) = 0 + 1x . .
Unknown Parameters . .
0 , 1 x n yn

Estimated
Regression Equation
b0 and b1
provide estimates of y = b0 + b1 x
0 and 1
Sample Statistics
b 0, b1
Simple Linear Regression

Least Squares Criterion


min ( yi y i ) 2

where:
yi = observed value of the dependent variable for the ith
observation
yi = estimated value of the dependent variable for the ith
observation
Simple Linear Regression

Slope for the Estimated Regression Equation


( xi x )( yi y )
b1 =
i( x x ) 2

y - Intercept for the Estimated Regression Equation


b0 = y b1 x
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
Simple Linear Regression

Example: Reed Auto Sales


Reed Auto periodically has a Number of Number of
special week-long sale. As part of TV Ads Cars Sold
the advertising campaign Reed 1 14
runs one or more television 3 24
commercials during the weekend 2 18
preceding the sale. Data from a 1 17
sample of 5 previous sales are: 3 27
Simple Linear Regression

Slope for the Estimated Regression Equation


( xi x )( yi y ) 20
b1 = = =5
( xi x ) 2
4

y-Intercept for the Estimated Regression Equation


b0 = y b1 x = 20 5(2) =10

Estimated Regression Equation


y = 10 + 5 x
Simple Linear Regression
Simple Linear Regression

Relationship Among SST, SSR, SSE


SST = SSR + SSE

i
( y y ) 2
= i
(
y y ) 2
+ i i
( y
y ) 2

where:
SST = Total Sum of Squares
SSR = Sum of Squares due to Regression
SSE = Sum of Squares due to Error
Simple Linear Regression

The coefficient of determination is: r2 = SSR/SST


where:
SSR = sum of squares due to regression
SST = total sum of squares

r2 = SSR/SST = 100/114 = 0.8772

The regression relationship is very strong; 88% of the


variability in the number of cars sold can be explained
by the linear relationship between the number of TV
ads and the number of cars sold.
Simple Linear Regression

rxy = sign of b1 coefficient of determination


rxy = sign of b1 r 2

Where b1 is the the slope of the estimated regression


equation y = b0 + b1 x

The sign of b1 in the equation y = 10 + 5 x is +.

rxy = + 0.8772
Hence, rxy = +0.9366
Simple Linear Regression

1. The error is a random variable with mean of zero.


2. The variance of , denoted by 2, is the same for all
values of the independent variable.
3. The values of are independent
4. The error is a normally distributed random variable.
Simple Linear Regression

To test for a significant regression relationship, we must


conduct a hypothesis test to determine whether the
value of b1 is zero.
Two tests namely, t-test and F-test are commonly used.
Both the t test and F test require an estimate of 2, the
variance of in the regression model.
Simple Linear Regression

An Estimate of
The mean square error (MSE) provides the estimate
of 2, and the notation s2 is also used.

s2 = MSE = SSE/(n - 2)
where: SSE = ( yi y i ) 2 = ( yi b0 b1 xi ) 2
Simple Linear Regression

An Estimate of
To estimate we take the square root of s2.

The resulting s is called the standard error of the estimate.

SSE
s = MSE =
n2
Simple Linear Regression

Hypotheses
H0: 1 = 0
H1: 1 0

b1
t=
Test Statistic
sb1
Simple Linear Regression

Rejection Rule

Reject H0 if p-value <

or t < -t/2 or t > t /2

where:
t/2 is based on a t distribution
with n - 2 degrees of freedom
Simple Linear Regression
1. Determine the hypotheses.
H0: 1 = 0
H1: 1 0

2. Specify the level of significance. = 0.05


3. Select the test statistic.
b1
t=
sb1
4. State the rejection rule.
Reject H0 if p-value < 0.05 or |t | > 3.182 (with 3
degrees of freedom)
Simple Linear Regression

5. Compute the value of the test statistic.


b1 5
t= = = 4.63
sb1 1.08

6. Determine whether to reject H0.


t = 4.541 provides an area of 0.01 in the upper tail.
Hence, the p-value is less than 0.02.
(Also, t = 4.63 > 3.182). We can reject H0.
Simple Linear Regression
We can use a 95% confidence interval for 1 to test
the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1 .

The form of a confidence interval for 1 is:

b1 t / 2 sb1 Margin of Error


Point Estimator
Where t/2 is the t value providing an area of /2 in the
upper tail of a t-distribution with n - 2 degrees of
freedom
Simple Linear Regression
Rejection Rule
Reject H0 if 0 is not included in the confidence interval for 1.
95% Confidence Interval for 1
b1 t / 2 sb1 = 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
Conclusion
0 is not included in the confidence interval. Reject H0
Simple Linear Regression

Hypotheses
H0: 1 = 0
H1: 1 0

Test Statistic
F = MSR/MSE
Simple Linear Regression

Rejection Rule
Reject H0 if p-value < or F > F

where:
F is based on an F distribution with 1 degree of
freedom in the numerator and n-2 degrees of freedom
in the denominator
Simple Linear Regression

1. Determine the hypotheses.


H0: 1 = 0
H1: 1 0
2. Specify the level of significance.
= 0.05
3. Select the test statistic.
F = MSR/MSE

4. State the rejection rule.


Reject H0 if p-value < 0.05 or F > 10.13 (with 1 d.f. in
numerator and 3 d.f. in denominator)
Simple Linear Regression

5. Compute the value of the test statistic.


F = MSR/MSE = 100/4.667 = 21.43

6. Determine whether to reject H0.


F = 17.44 provides an area of 0.025 in the upper tail.
Thus, the p-value corresponding to F = 21.43 is less than
2(0.025) = 0.05. Hence, we reject H0.
The statistical evidence is sufficient to conclude that we
have a significant relationship between the number of TV
ads aired and the number of cars sold.
Simple Linear Regression
If the assumptions about the error term appear
questionable, the hypothesis tests about the significance
of the regression relationship and the interval estimation
results may not be valid.
The residuals provide the best information about .

Residual for Observation i


yi y i
Much of the residual analysis is based on an examination of
graphical plots.
Simple Linear Regression

If the assumption that the variance of is the same for all


values of x is valid, and the assumed regression model is
an adequate representation of the relationship between
the variables, then -
The residual plot should give an overall impression of a
horizontal band of points
Simple Linear Regression

y y Good Pattern
Residual

x
Simple Linear Regression

y y Non-constant Variance
Residual

x
Simple Linear Regression

y y Model Form Not Adequate


Residual

x
Simple Linear Regression
Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
Simple Linear Regression
Multiple Regression

The simple linear regression model was used to analyze how one
variable (the dependent variable y) is related to one other
variable (the independent variable x).
Multiple regression allows for any number of independent
variables.
We expect to develop models that fit the data better than would
a simple linear regression model.
Simple regression considers the relation
between a single explanatory variable and
response variable
Multiple regression simultaneously considers the influence of
multiple explanatory variables on a response variable Y

The intent is to look at the


independent effect of each
variable while adjusting out
the influence of potential
confounders
The Model
We now assume we have k independent variables potentially
related to the one dependent variable. This relationship is
represented in this first order linear equation:
dependent independent variables
variable

error variable
coefficients
In the one variable, two dimensional case we drew a regression
line; here we imagine a response surface.
Regression Modeling

A simple regression model


(one independent variable)
fits a regression line in 2-
dimensional space

A multiple regression model


with two explanatory
variables fits a regression
plane in 3-dimensional
space
Required Conditions

For these regression methods to be valid the following four


conditions for the error variable ( ) must be met:
The probability distribution of the error variable ( ) is
normal.
The mean of the error variable is 0.
The standard deviation of is , which is a constant.
The errors are independent.
Estimating the Coefficients
The sample regression equation is expressed as:

We will check the following:


Assess the model
How well it fits the data
Is it useful
Are any required conditions violated?
Employ the model
Interpreting the coefficients
Predictions using the prediction equation
Estimating the expected value of the dependent variable

Anda mungkin juga menyukai