Anda di halaman 1dari 23

INTRODUCTION

The Objective: To determine whether a change in w causes a change


in y.

E.g.- If years of schooling goes up by 1 unit, does that lead to an


increase in average starting salary of a person?

To study a casual relationship it is important to hold the effect of


all other relevant factors on y fixed – in other words – the notion of
ceteris paribus.

Thus we want to estimate E(y|w, c), where c is a vector of con-


trol variables, which we want to be fixed while studying the ef-
fect of change in w on E(y|.), i.e., average value of y conditional on
value of w and c. If w is continuous, this effect may be captured by
∂E(y|w, c)/∂w, which is the partial effect of w on E(y|.).
E.g.- Starting salary of a person may be affected by many factors like
intelligence, skill etc. apart from schooling. So while studying whether
increase in years of schooling causes starting salary to increase, we
need to hold the effect of other factors constant. But in practice, it
is not easy, since skill may not be observed or measurable at all. So
there may be no data for skill – for holding it constant in the above
exercise.

Data Structure: We assume that the population is well-specified


and we know the population model (this may be arrived at by
insights provided by economic theory or informal reasoning).

We also assume that random sample can be drawn from the popu-
lation and sample observations are independently and identically
distributed (iid).
We assume away fixed regressors assumption common in the un-
dergraduate textbook because it does not allow us to study cases
where one or more explanatory variables may be correlated with the
error term – which is called endogenous explanatory variables in
the parlance of econometrics.

Sometimes, complication in estimation arise because either we are


unable to collect a random sample (motivating example – self
selection) or we cannot obtain data on some of the explanatory
variables (unobserved explanatory variables), so that we are not able
to control them (1.e., hold their values fixed while studying the effect
of w on y).

In such cases we nay not estimate the structural model, i.e., E(y|w, c),
directly. But with the help of auxiliary assumptions and algebraic ma-
nipulations we nay arrive at an estimable model. These assumptions
are called identifying assumptions. They help us recover parameters
of the original structural model from the model we estimate.

Conditional expectations, i.e., E(·|·), are of immense importance be-


cause they not just provide us with a setting for specifying economic
(structural) models, but also because they are a useful tool for ma-
nipulating structural equations into estimable equations. For these
reasons we study the properties of conditional expectations closely.
Properties of Conditional Expectation

1. Let a1(x), · · · , aG(x) and b(x) be scalar functions of x. Let


y1, · · · , yG be random scalars. Then,
G
X G
X
E( aj (x)yj + b(x)|x) = aj (x)E(yj |x) + b(x),
j=1 j=1
provided E(|yj |) < ∞, E[|aj (x)yj |] < ∞, and E[|b(x)|] < ∞.

Proof : Since we are conditioning on x, hold x as invariant or constant


throughout the exercise. So aj (x) are also constant for all values of
j.
Hence the property follows.
It means E(.|x) is a linear operator.
2. E(y) = E[E(y|x)] → Law of Iterated Expectations.

Proof : Let x be a discrete random vector taking values c1, c2, · · · , cM


with (marginal) probabilities p1, p2, · · · , pM .

Thus, E[E(y|x)] = E(y|x = c1)p1 + E(y|x = c2)p2 + · · · + E(y|x =


cM )pM = E(y).

In other words, if x is also a scalar, namely x, consider:


y
HH
HH
H y1 ··· yN
x
HH
H
HH

c1 p11 ··· p1N p1


... ... ··· ... ...
... ... ··· ... ...
cM pM 1 ··· pM N ```
pM
P
```
``` pi = 1
q1 ··· qN P `` ```
qj = 1
```
```
```
`
`

N
X p1j
E(y|x = c1) = yj
j=1 p1

...

...

N
X pM j
E(y|x = cM ) = yj
j=1 pM
Therefore,
N
X p1j N
X pM j
E[E(y|x)] = yj p1 · · · + yj pM
j=1 p1 j=1 pM
= y1p11 + y2p12 + · · · + yN p1N
+y1p21 + y2p22 + · · · + yN p2N
···
+y1pM 1 + y2pM 2 + · · · + yN pM N

= y1(p11 + p21 + · · · + pM 1) + y2(p12 + p22 + · · · + pM 2)


+ · · · + yN (p1N + p2N + · · · + pM N )
= y1 q 1 + y2 q 2 + · · · + yN q N
= E(y)
3. E(y|x) = E[E(y|w)|x], where x = f (w), f being a non-stochastic
vector valued function of w → General Law of Iterated Expectations.

There is another result similar to above but simpler to verify.

E(y|x) = E[E(y|x)|w]

This result follows from conditional aspect of expectation. Since x


is a function of w, knowing w implies knowing x; given E(y|x) is a
function of x, the expected value of E(y|x) given w is just E(y|x).

To summarize both the above results: “The smaller information set


always dominates”. Here x has less information than w, since knowing
w implies knowing x, but not vice-versa.
As a special case of GLIE,

E(y|x) = E[E(y|x, z)|x]

Proof: The exercise is similar to the previous one. Consider E(y|x) =


E[E(y|x, z)|x], where x and z are scalar random variables. Since we
hold x invariant throughout, it looks as follows:

x = xi

y
HH
HH
H y1 ··· yN
z
HH
H
HH

z1 pi11 ··· pi1N pi1·


... ... ··· ... ...
zM piM 1 ··· piM N ```
piM ·
P
```
``` = pi··
pi·1 ··· pi·N P `` ```
= pi··
```
```
```
`
N
X pi1j
E(y|z = z1, x = xi) = yj
j=1 pi1·

...

N
X piM j
E(y|z = zM , x = xi) = yj
j=1 piM ·

Therefore,

E[E(y|x = xi, z)|x]


N pi1j pi1· N piM j piM ·
X  X 

= yj ( ) + ··· + yj ( )
j=1  pi1· pi··

j=1  piM · pi··
 


N
X pi1j N
X piM j
= yj + ··· + yj
j=1 p i·· j=1 pi··
p p
= y1 i·1 + · · · + yN i·N
pi·· pi··

= E(y|x = xi)

4. If f (x) ∈ <j is a function of x such that E(y|x) = g[f (x)] for some
scalar function g(·); then E[y|f (x)] = E(y|x).

Proof : Let us consider scalar x. Thus,

E[y|f (x)] = E[E(y|x)|f (x)] (by property 3)


= E[g[f (x)]|f (x)]
= g[f (x)]
= E[y|x]
5. If (u, v) is independent of x, then E(u|x, v) = E(u|v).

Proof : Consider scalar u, v, x. Hold v constant. Thus,

v = vj

u
HH
H
HH
u1 ··· uN
x
H
HH
HH

x1 p1j1 ··· p1jN p1j·


... ... ··· ... ...
xM p1j1 ··· p1jN p1j·
M p1j1 ··· M p1jN M p1j·

Note that the conditional probabilities of (u, v) do not depend on


where the value of x is fixed as x is independent of (u, v).
Hence,
N
X p1jk
E(u|x = x1, v = vj ) = uk
k=1
p1j·

...

N
X p1jk
E(u|x = xM , v = vj ) = uk
k=1
p1j·
Therefore,
N
X p1jk 1 N
X p1jk 1
E[E(u|x, v = vj )|v = vj ] = uk + ··· + uk
k=1
p1j· M k=1
p1j· M
u p1j1 p1j1
= 1( + ··· + ) → M terms
M p1j· p1j·
+···
uN p1jN p1jN
+ ( + ··· + ) → M terms
M p1j· p1j·
p1j1 p1jN
= u1 + · · · + uN
p1j· p1j·
= E[u|v = vj ]
= E[u|x = xi, v = vj ]

⇒ E(u|v) = E(u|x, v).


6. If u ≡ y−E(y|x), then E[g(x)u] = 0 for any function g(x) ∈ <J , pro-
vided E[|gj (x)u|] < ∞, j = 1, · · · , J and E[|u|] < ∞. In particular,E(u) =
0 and cov(xj , u) = 0, j = 1, · · · , J.

Proof :

E(u|x) = E[(y − E(y|x)|x]


= E(y|x) − E[E(y|x)|x]
= E(y|x) − E(y|x) = 0

Now, by property 2,

E[g(x)u] = E(E[g(x)u]|x)
= E[g(x)E(u|x)] by property 1
= 0 as E(u|x) = 0
For the special cases, if J = 1 and g(x) = 1, then E(u) = 0.
Also, if g(x) = x,

E[g(x)u] = E[xu] = 0

⇒ E(xu) − E(x)E(u) = 0 as E[u] = 0

⇒ cov(x, u) = 0

⇒ cov(xj , u) = 0, ∀j = 1 to K.

7. If c : < → < is a convex function defined on < and E[|y|] < ∞, then
c[E(y|x)] ≤ E[c(y)|x] → Conditional Jensen’s Inequality.

We do not prove this property.


As examples, consider

[E(y)]2 ≤ E[y 2]

Also, if y > 0, then

−log[E(y)] ≤ E[−log(y)]

or, log[E(y)] ≥ E[log(y)]

8. If E(y 2) < ∞ and µ(x) ≡ E(y|x), then µ is a solution to


min
E[(y − m(x))2],
m∈M
where M is the set of all functions m : <K → < such that E[m(x)2] <
∞.
Proof : By property (7): [E(y|x)]2 ≤ E[y 2|x]. ⇒ E[{E(y|x)}2] ≤
E[E(y 2|x)] = E(y 2). But E(y 2) < ∞ ⇒ E[µ(x)2] < ∞, so that µ ∈ M .

Next, for any m ∈ M ,

E[(y − m(x))2] = E[(y − µ(x) + µ(x) − m(x))2]


= E[(y − µ(x))2] + E[(µ(x) − m(x))2]
+2E[(µ(x) − m(x))u],

where u ≡ y − µ(x).

Hence by Property 6, E[(y − m(x))2] = E(u2) + E[(µ(x) − m(x))2],


the last term being zero by property 6.

The RHS of the above expression is clearly minimized when m ≡ µ.


Properties of Conditional Variance

Definition: var(y|x) ≡ σ 2(x) ≡ E[{y−E(y|x)}2|x] = E(y 2|x)−[E(y|x)]2.

Properties:

1.

var[a(x)y + b(x)|x] = [a(x)]2var(y|x),

where, a(x) and b(x) are scalar functions of x.

2.

var(y) = E[var(y|x)] + var[E(y|x)]


Proof :

var(y) = E[[y − E(y)]2]


= E[[y − E(y|x) + E(y|x) − E(y)]2]
= E[[y − E(y|x)]2] + E[[E(y|x) − E(y)]2]
+2E[y − E(y|x)][E(y|x) − E(y)]

Now, y − E(y|x) ≡ u. Also, E(y) is a population constant and E(y|x)


is a function of x, so that E(y|x) − E(y) is a function of x, say g(x).

By property 6 of Conditional Expectation, E[ug(x)] = 0. Therefore,

var(y) = E[[y − E(y|x)]2] + E[[E(y|x) − E(y)]2]


= E[E[[y − E(y|x)]2|x] + E[[E(y|x) − E[E(y|x)]]2]
= E[var(y|x)] + var[E(y|x)].
3.

var(y|x) = E[var(y|x, z)|x] + var[E(y|x, z)|x]

Proof is similar to CE property 3.

4.

E[var(y|x)] ≥ E[var(y|x, z)]

This follows by applying E[ ] to both the sides of property 3 and the


LIE as var(·) ≥ 0, hence E[var(·)] ≥ 0.

Anda mungkin juga menyukai