Nora
Breakout: Thursday 11:00-12:30pm
Connie
Office Hours: Tuesday 5:30 - 7:30pm
Any Questions?
The Linear Regression Model
Approach to Research
Otherwise known as……
12
10
Y 6
0
1 2 3 4 5 6 7 8 9 10
X
…but life is full of errors…
Y = 1X + u
Simple Linear Regression
12
10
Y 6
0
1 2 3 4 5 6 7 8 9 10
X
The Error Term
Our models do not predict behavior
perfectly.
So we add a term to adjust or compensate
for the errors in prediction (u).
Much of our ability to estimate β1 depends
upon the assumptions we make about the
errors (u).
Sometimes u is called the “Disturbance”
The 'Goal' of Ordinary Least
Squares
Ordinary Least Squares (OLS) is a
method of finding the linear model which
minimizes the sum of the squared errors.
Such a model provides the best
explanation/prediction of the data.
It is the “Best Linear Unbiased Estimator”
It’s BLUE
Other Goals are Possible
i =1
n
=∑uˆ i2
i =1
Picking the Parameters
SST = ∑ ( y i − y )
i =1
n
SSE = ∑( yˆ i − y )
2
i =1
n
SSR = ∑ ( yˆ i − y i )
2
i =1
“Explained and “Unexplained”
Variation
yˆ = Βˆ 0 + Βˆ 1 x
û i
( yi − y )
Y
yi
ŷ i
Β̂1
Β̂ 0
Xi
X
“Explained and “Unexplained”
Variation Square this quantity
and sum across all
Square this observations and
quantity and sum we have our SST
across all (Total Sum of
Squares)
observations and û i
we have our SSR
(Residual Sum of ( yi − y )
Squares)
Y
Square this yi
quantity and sum
ŷ i
across all
observations and
we have our SSE
(Explained Sum
of Squares) Xi
X
Some Confusing Terminology
Occasionally you may see people refer
instead to USS (Unexplained) and ESS
(Error)
These terms are interchangeable, but…
ESS can be confused with explained sum
of squares
USS is not confused with any
mathematical jargon, but does pose
issues for statistical work on the US Navy.
Let’s Test Some “Theories”
Presidential approval depends upon the
performance of the US economy
The development of US military power
was a response to America’s threatening
environment
Plotting Approval and Inflation
76.2143
(mean) approve
28.3333
-.263876 12.8595
(mean) inflat
Regressing Approval on
Inflation
. reg approve inflat
------------------------------------------------------------------------------
approve | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
inflat | -2.213684 .5337539 -4.147 0.000 -3.289394 -1.137973
_cons | 63.80565 2.711964 23.527 0.000 58.34004 69.27125
------------------------------------------------------------------------------
Fitting Inflation to Approval
(mean) approve Fitted values
76.2143
28.3333
-.263876 12.8595
(mean) inflat
Plotting US Power & Disputes
.38
uscapbl
.03
0 7
numtargt
Regress US Power on Disputes
. reg uscapbl numtargt
------------------------------------------------------------------------------
uscapbl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
numtargt | .0201142 .0046621 4.314 0.000 .010913 .0293155
_cons | .1455665 .0067132 21.684 0.000 .1323172 .1588157
------------------------------------------------------------------------------
Fitting Disputes to US Power
uscapbl Fitted values
.38
.03
0 7
numtargt
A Brief Review of Critical Concepts
Measures of Central Tendency n
(Mean, Median, Mode) x (1/ n) xi
i 1
Var ( X ) E[( X E ( X )) 2 ]
Population Variance n
(1/ n) ( xi x ) 2 2
i 1
Standard Deviation sd ( X ) 2
Covariance Cov( X , Y ) E ( X E ( X ))(Y E (Y ))
Cov( X , Y )
Correlation Corr ( X , Y ) XY
sd ( X )* sd (Y ) X Y
Marginal Effect
y 1x
Distributions – The Usual Suspects
Normal Distribution
Standard Normal
Chi-Square
t
F
The Normal Distribution
(Probability Density Function)
1
f ( x) exp[( x u )2 / 2 2 ]
2
X :Normal (:u, 2 )
x
µ
The Standard Normal
Distribution (PDF)
1
( z) exp[ z 2 / 2]
2
Z :Normal ( :0,1)
Z
0
Chi-Square Distribution
Let Zi , i 1, 2..., n be independent random variables, each distributed
standard normal.
n
= Zi2
i=1
:( :n, 2n)
df=2
f(x)
df=4
df=6
x
t-distribution:
The Statistical Workhorse
Let have a chi-square distribution
df=6
with n degrees of freedom.
Z
T= As the
degrees of
df=4
/n
freedom increase, the
t :( :0, n /(n 2))
t-distribution
approaches the normal
distribution.
df=2
-3 3
0
Quick Review:
Hypothesis Testing
H0: βj=0
Quick Review:
Hypothesis Testing
To test the hypothesis, I need to have a rejection rule. That
is, I will reject the null hypothesis if, t is greater than some
critical value (c).
| t | c
c is up to me to some extent, I must determine what level of
significance I am willing to accept. For instance, if my t-
value is 1.85 with 40 df and I was willing to reject only at the
5% level, my c would equal 2.021 and I would not reject the
null. On the other hand, if I was willing to reject at the 10%
level, my c would be 1.684, and I would reject the null
hypotheses.
t-distribution:
5 % rejection rule for the that H0: βj=0
with 25 degrees of freedom
Looking at table G-
2, I find the critical
value for a two-
tailed test is 2.06
-2.06 2.06
0
Quick Review:
0
F Distribution
2/ k1
k1
F
2/ k 2
k2
F and Chi Square
testing involves
df=2,8 only a one-tailed
f(x)
test of the area
df=6,20 underneath the
right portion of the
df=6,8
curve.