Anda di halaman 1dari 3

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright

Linear Regression Models


Simple Linear Regression
Simple linear regression is the study of the linear relationship between two random
variables X and Y. We call X the independent or explanatory variable and we call Y the
dependent, the predicted or the forecast variable. We assume that we can represent the
relationship between population values of X and Y using the equation of the linear model
Yi = 0 + 1 X i + i .
We call 0 the Y-intercept and it represents the expected value of Y that is independent
of X or the expected value of Y when X equals zero, if appropriate. We call 1 the slope
and it represents the expected change in Y per unit change in X, i.e., the expected
marginal change in Y with respect to X. Finally, we call i the random error in Y for
each observation i .
Given a sample of X and Y values, we use the method of least squares to estimate sample
values for 0 and 1 which we call b0 and b1 , respectively. We represent the predicted
value of Y using the prediction line equation or the simple linear regression equation
Y = b0 + b1 X .
We call b0 the sample Y-intercept and it represents the expected value of Y that is
independent of X (or the expected value of Y when X=0, if appropriate), and we call b1
the sample slope and it represents the expected change in Y per unit change in X, i.e., the
expected marginal change in Y with respect to X. Finally, we call Y the predicted value
of Y.
The coefficient of Determination and the Correlation coefficient
The coefficient of determination is the statistic r 2 and it measures the proportion of the
linear variation in Y that is explained by X using the regression model.
The correlation coefficient is the statistic r and it measures the strength of the linear
association between X and Y.
t Test for the Slope 1
H0: 1 = 0 (there is no linear relationship)
H1: 1 0 (there is a linear relationship)
For = .05 the p-value for the slope (the coefficient of the explanatory variable) should
be less than .05 for the sample slope b1 to be statistically significantly different from zero,
indicating the presence of a statistically significant linear relationship between X and Y.

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright

Important Notice: The p-value for the slope and the Significance F are the same in
Simple Linear Regression, leading to the same conclusion. Acceptable values of are
less than 0.10 with preferred values being less than or equal to 0.05.
Multiple-Linear Regression
Multiple-linear regression or multiple regression is the study of the linear relationship
between more than two random variables X1 , X2 , , Xk, and Y. We call the Xi,
i=1,2,,k, the independent or explanatory variables and we call Y the dependent, the
predicted or the forecast variable. We assume that we can represent the relationship
between population values of Xi and Y using the equation of the linear model
Yi = 0 + 1 X i + 2 X 2 + ... + k X k + i .
We call 0 the Y-intercept and it represents the expected value of Y that is independent
of X or the expected value of Y when each Xi equals zero, if appropriate. We call i the
slope of Y with variable Xi, holding each Xj, ji, constant; and it represents the expected
change in Y per unit change in Xi, i.e., the expected marginal change in Y with respect to
Xi, holding each Xj, ji, constant. Finally, we call i the random error in Y for each
observation i .
Given a sample of X and Y values, we use the method of least squares to estimate sample
values for 0 and i which we call b0 and bi for all i = 1,,k, respectively. We represent
the predicted value of Y using the prediction line equation or the simple linear regression
equation
Y = b0 + b1 X 1 + ... + bk X k .
We call b0 the sample Y-intercept and it represents the expected value of Y that is
independent of all Xi (i=1, 2,, n) in the model or the expected value of Y when each Xi
equals zero, if appropriate. We call bi the sample slope of Y with variable Xi, holding
each Xj, ji, constant; and it represents the expected change in Y per unit change in Xi,
i.e., the expected marginal change in Y with respect to Xi, holding each Xj, ji, constant.
Finally, we call Y the predicted value of Y.
The Adjusted r 2
The adjusted r 2 measures the proportion of the linear variation in Y that is explained by
all Xi (i=1, 2,, n) in the multiple-regression modeladjusted for the number of
independent variables (Xi) and sample size.
The coefficient of Partial Determination
The coefficient of partial determination measures the proportion of the linear variation in
Y that is explained by a particular Xi, holding each Xj, ji, constant.

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright

Significance Testing
In a multiple-regression model with two or more independent variables we recommend
that those independent variables that fail the t-test of significance for their slope
coefficients, be removed from the model, one independent variable at a time, and the
model be run again without them. We also recommend that the independent variable that
has the highest insignificant p-value be removed first and the model be run again without
this variable. This process should be repeated until we are left with only those
independent variables which pass the significance test. Recall that this t-test of
significance is the same as the slope test in a simple linear regression model (outlined
above).
Caution: Removing all independent variables that fail the t-test after the first run is not
advisable because two or more of these variables might be collinear or highly correlated,
i.e. they explain the same variability in the dependent variable Y.
Interactions
Interaction terms, or cross-product terms, are introduced into a multiple-regression model
when the effect of an independent variable, Xi on the dependent variable Y changes
according to the values of other independent variables, Xj, ji. In such cases we
recommend running the model with all the relevant interaction terms, and removing those
interaction terms that are not statistically significant, one term at a time starting with the
one that has the highest insignificant p-value, while keeping those that are statistically
significant, then running the model again followed by significance testing to confirm the
effects of all remaining interaction terms.

Ali Jenzarli, Ph.D.

All Rights Reserved

Copyright

Anda mungkin juga menyukai