Assumptions
Variables are normally distributed.
Continuous nature
Assumption of a linear relationship
between the independent and
dependent variables.
Assumption of homoscedasticity.
Linear Regression
Model Type X=size of house, Y=cost of house
Deterministic Model: an equation or set of equations
that allow us to fully determine the value of the dependent
variable from the values of the independent variables.
Dirty Interpretation
A quick example
.
.
.
.
. . .. . . .
. . ..
. . .
.
. .
$500
Predicted values
if perfect relationship
5 years
10 years
20 years
How It Is Applied
Analysts collect data over the past two years and
crunch it. The computer gives these results:
Y = 100 and .020X
The constant is 100:
If they do not fly at all, the computer estimates
there is still a cost of $100
The .020 is the regression coefficient:
This gets interpreted as: for every mile flown, there
is $.02 change in maintenance costs.
How It Is Applied
Y = 100 and .020X
Interpreting the regression coefficient:
For every mile flown, the maintenance costs
goes up by 2 cents.
For every 100 miles flown, costs are $2
For every 1,000 miles, the costs are $20
For every 100,000 miles, the costs are
$20,000
Practicality
Even More
Interpretation?
Regression coefficient: For every increase in the
percent of children in poverty within a school, the
average test score goes down by .04
R-squared: 25% of the test scores are explained
by the percent of children in poverty in the school
Researchers will ask: what other factors might
explain differences in test scores in the schools?
controlling
Multiple Regression:
An Example
Hypothesis: Income is a function of education
and seniority?
We suggest that income (the dependent
variable) will increase as both education and
seniority increases (two independent
variables)
Y (Income) = a + education + seniority +
error
17
12
Information
confounding
Overfitting
Multicollinearity
Multicollinearity
is bad
Tolerance: low is bad
Residuals Diagnostics
You cannot completely wipe out confounding simply
by adjusting for variables in multiple regression
unless variables are measured with zero error (which
is usually impossible).
Example: meat eating and mortality
References
Kleinbaum, D. G. Applied
Regression Analysis and
Multivariable Methods.
Third Edition (2011)