1 PDF

INTRODUCTION
The Objective: To determine whether a change in w causes a change

in y.
E.g.- If years of schooling goes up by 1 unit, does that lead to an

increase in average starting salary of a person?
To study a casual relationship it is important to hold the effect of

all other relevant factors on y fixed – in other words – the notion of
ceteris paribus.
Thus we want to estimate E(y|w, c), where c is a vector of con-

trol variables, which we want to be fixed while studying the ef-
fect of change in w on E(y|.), i.e., average value of y conditional on
value of w and c. If w is continuous, this effect may be captured by
∂E(y|w, c)/∂w, which is the partial effect of w on E(y|.).
E.g.- Starting salary of a person may be affected by many factors like
intelligence, skill etc. apart from schooling. So while studying whether
increase in years of schooling causes starting salary to increase, we
need to hold the effect of other factors constant. But in practice, it
is not easy, since skill may not be observed or measurable at all. So
there may be no data for skill – for holding it constant in the above
exercise.
Data Structure: We assume that the population is well-specified

and we know the population model (this may be arrived at by
insights provided by economic theory or informal reasoning).
We also assume that random sample can be drawn from the popu-
lation and sample observations are independently and identically
distributed (iid).
We assume away fixed regressors assumption common in the un-
dergraduate textbook because it does not allow us to study cases
where one or more explanatory variables may be correlated with the
error term – which is called endogenous explanatory variables in
the parlance of econometrics.
Sometimes, complication in estimation arise because either we are

unable to collect a random sample (motivating example – self
selection) or we cannot obtain data on some of the explanatory
variables (unobserved explanatory variables), so that we are not able
to control them (1.e., hold their values fixed while studying the effect
of w on y).
In such cases we nay not estimate the structural model, i.e., E(y|w, c),
directly. But with the help of auxiliary assumptions and algebraic ma-
nipulations we nay arrive at an estimable model. These assumptions
are called identifying assumptions. They help us recover parameters
of the original structural model from the model we estimate.
Conditional expectations, i.e., E(·|·), are of immense importance be-

cause they not just provide us with a setting for specifying economic
(structural) models, but also because they are a useful tool for ma-
nipulating structural equations into estimable equations. For these
reasons we study the properties of conditional expectations closely.
Properties of Conditional Expectation
1. Let a1(x), · · · , aG(x) and b(x) be scalar functions of x. Let

y1, · · · , yG be random scalars. Then,
G
X G
X
E( aj (x)yj + b(x)|x) = aj (x)E(yj |x) + b(x),
j=1 j=1
provided E(|yj |) < ∞, E[|aj (x)yj |] < ∞, and E[|b(x)|] < ∞.
Proof : Since we are conditioning on x, hold x as invariant or constant

throughout the exercise. So aj (x) are also constant for all values of
j.
Hence the property follows.
It means E(.|x) is a linear operator.
2. E(y) = E[E(y|x)] → Law of Iterated Expectations.
Proof : Let x be a discrete random vector taking values c1, c2, · · · , cM

with (marginal) probabilities p1, p2, · · · , pM .
Thus, E[E(y|x)] = E(y|x = c1)p1 + E(y|x = c2)p2 + · · · + E(y|x =

cM )pM = E(y).
In other words, if x is also a scalar, namely x, consider:

y
HH
HH
H y1 ··· yN
x
HH
H
HH
c1 p11 ··· p1N p1

... ... ··· ... ...
... ... ··· ... ...
cM pM 1 ··· pM N ```
pM
P
```
``` pi = 1
q1 ··· qN P `` ```
qj = 1
```
```
```
`
`
N
X p1j
E(y|x = c1) = yj
j=1 p1
...
...
N
X pM j
E(y|x = cM ) = yj
j=1 pM
Therefore,
N
X p1j N
X pM j
E[E(y|x)] = yj p1 · · · + yj pM
j=1 p1 j=1 pM
= y1p11 + y2p12 + · · · + yN p1N
+y1p21 + y2p22 + · · · + yN p2N
···
+y1pM 1 + y2pM 2 + · · · + yN pM N
= y1(p11 + p21 + · · · + pM 1) + y2(p12 + p22 + · · · + pM 2)

+ · · · + yN (p1N + p2N + · · · + pM N )
= y1 q 1 + y2 q 2 + · · · + yN q N
= E(y)
3. E(y|x) = E[E(y|w)|x], where x = f (w), f being a non-stochastic
vector valued function of w → General Law of Iterated Expectations.
There is another result similar to above but simpler to verify.
E(y|x) = E[E(y|x)|w]
This result follows from conditional aspect of expectation. Since x

is a function of w, knowing w implies knowing x; given E(y|x) is a
function of x, the expected value of E(y|x) given w is just E(y|x).
To summarize both the above results: “The smaller information set

always dominates”. Here x has less information than w, since knowing
w implies knowing x, but not vice-versa.
As a special case of GLIE,
E(y|x) = E[E(y|x, z)|x]
Proof: The exercise is similar to the previous one. Consider E(y|x) =

E[E(y|x, z)|x], where x and z are scalar random variables. Since we
hold x invariant throughout, it looks as follows:
x = xi
y
HH
HH
H y1 ··· yN
z
HH
H
HH
z1 pi11 ··· pi1N pi1·

... ... ··· ... ...
zM piM 1 ··· piM N ```
piM ·
P
```
``` = pi··
pi·1 ··· pi·N P `` ```
= pi··
```
```
```
`
N
X pi1j
E(y|z = z1, x = xi) = yj
j=1 pi1·
...
N
X piM j
E(y|z = zM , x = xi) = yj
j=1 piM ·
Therefore,
E[E(y|x = xi, z)|x]

N pi1j pi1· N piM j piM ·
X X

= yj ( ) + ··· + yj ( )
j=1 pi1· pi··

j=1 piM · pi··

N
X pi1j N
X piM j
= yj + ··· + yj
j=1 p i·· j=1 pi··
p p
= y1 i·1 + · · · + yN i·N
pi·· pi··
= E(y|x = xi)
4. If f (x) ∈ <j is a function of x such that E(y|x) = g[f (x)] for some
scalar function g(·); then E[y|f (x)] = E(y|x).
Proof : Let us consider scalar x. Thus,
E[y|f (x)] = E[E(y|x)|f (x)] (by property 3)

= E[g[f (x)]|f (x)]
= g[f (x)]
= E[y|x]
5. If (u, v) is independent of x, then E(u|x, v) = E(u|v).
Proof : Consider scalar u, v, x. Hold v constant. Thus,
v = vj
u
HH
H
HH
u1 ··· uN
x
H
HH
HH
x1 p1j1 ··· p1jN p1j·

... ... ··· ... ...
xM p1j1 ··· p1jN p1j·
M p1j1 ··· M p1jN M p1j·
Note that the conditional probabilities of (u, v) do not depend on

where the value of x is fixed as x is independent of (u, v).
Hence,
N
X p1jk
E(u|x = x1, v = vj ) = uk
k=1
p1j·
...
N
X p1jk
E(u|x = xM , v = vj ) = uk
k=1
p1j·
Therefore,
N
X p1jk 1 N
X p1jk 1
E[E(u|x, v = vj )|v = vj ] = uk + ··· + uk
k=1
p1j· M k=1
p1j· M
u p1j1 p1j1
= 1( + ··· + ) → M terms
M p1j· p1j·
+···
uN p1jN p1jN
+ ( + ··· + ) → M terms
M p1j· p1j·
p1j1 p1jN
= u1 + · · · + uN
p1j· p1j·
= E[u|v = vj ]
= E[u|x = xi, v = vj ]
⇒ E(u|v) = E(u|x, v).

6. If u ≡ y−E(y|x), then E[g(x)u] = 0 for any function g(x) ∈ <J , pro-
vided E[|gj (x)u|] < ∞, j = 1, · · · , J and E[|u|] < ∞. In particular,E(u) =
0 and cov(xj , u) = 0, j = 1, · · · , J.
Proof :
E(u|x) = E[(y − E(y|x)|x]

= E(y|x) − E[E(y|x)|x]
= E(y|x) − E(y|x) = 0
Now, by property 2,
E[g(x)u] = E(E[g(x)u]|x)
= E[g(x)E(u|x)] by property 1
= 0 as E(u|x) = 0
For the special cases, if J = 1 and g(x) = 1, then E(u) = 0.
Also, if g(x) = x,
E[g(x)u] = E[xu] = 0
⇒ E(xu) − E(x)E(u) = 0 as E[u] = 0
⇒ cov(x, u) = 0
⇒ cov(xj , u) = 0, ∀j = 1 to K.
7. If c : < → < is a convex function defined on < and E[|y|] < ∞, then
c[E(y|x)] ≤ E[c(y)|x] → Conditional Jensen’s Inequality.
We do not prove this property.

As examples, consider
[E(y)]2 ≤ E[y 2]
Also, if y > 0, then
−log[E(y)] ≤ E[−log(y)]
or, log[E(y)] ≥ E[log(y)]
8. If E(y 2) < ∞ and µ(x) ≡ E(y|x), then µ is a solution to

min
E[(y − m(x))2],
m∈M
where M is the set of all functions m : <K → < such that E[m(x)2] <
∞.
Proof : By property (7): [E(y|x)]2 ≤ E[y 2|x]. ⇒ E[{E(y|x)}2] ≤
E[E(y 2|x)] = E(y 2). But E(y 2) < ∞ ⇒ E[µ(x)2] < ∞, so that µ ∈ M .
Next, for any m ∈ M ,
E[(y − m(x))2] = E[(y − µ(x) + µ(x) − m(x))2]

= E[(y − µ(x))2] + E[(µ(x) − m(x))2]
+2E[(µ(x) − m(x))u],
where u ≡ y − µ(x).
Hence by Property 6, E[(y − m(x))2] = E(u2) + E[(µ(x) − m(x))2],

the last term being zero by property 6.
The RHS of the above expression is clearly minimized when m ≡ µ.

Properties of Conditional Variance
Definition: var(y|x) ≡ σ 2(x) ≡ E[{y−E(y|x)}2|x] = E(y 2|x)−[E(y|x)]2.
Properties:
1.
var[a(x)y + b(x)|x] = [a(x)]2var(y|x),
where, a(x) and b(x) are scalar functions of x.
2.
var(y) = E[var(y|x)] + var[E(y|x)]

Proof :
var(y) = E[[y − E(y)]2]

= E[[y − E(y|x) + E(y|x) − E(y)]2]
= E[[y − E(y|x)]2] + E[[E(y|x) − E(y)]2]
+2E[y − E(y|x)][E(y|x) − E(y)]
Now, y − E(y|x) ≡ u. Also, E(y) is a population constant and E(y|x)

is a function of x, so that E(y|x) − E(y) is a function of x, say g(x).
By property 6 of Conditional Expectation, E[ug(x)] = 0. Therefore,
var(y) = E[[y − E(y|x)]2] + E[[E(y|x) − E(y)]2]

= E[E[[y − E(y|x)]2|x] + E[[E(y|x) − E[E(y|x)]]2]
= E[var(y|x)] + var[E(y|x)].
3.
var(y|x) = E[var(y|x, z)|x] + var[E(y|x, z)|x]
Proof is similar to CE property 3.
4.
E[var(y|x)] ≥ E[var(y|x, z)]
This follows by applying E[ ] to both the sides of property 3 and the

LIE as var(·) ≥ 0, hence E[var(·)] ≥ 0.

1 PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

1 PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

INTRODUCTION

The Objective: To determine whether a change in w causes a change

E.g.- If years of schooling goes up by 1 unit, does that lead to an

To study a casual relationship it is important to hold the effect of

Thus we want to estimate E(y|w, c), where c is a vector of con-

Data Structure: We assume that the population is well-specified

Sometimes, complication in estimation arise because either we are

Conditional expectations, i.e., E(·|·), are of immense importance be-

1. Let a1(x), · · · , aG(x) and b(x) be scalar functions of x. Let

Proof : Since we are conditioning on x, hold x as invariant or constant

Proof : Let x be a discrete random vector taking values c1, c2, · · · , cM

Thus, E[E(y|x)] = E(y|x = c1)p1 + E(y|x = c2)p2 + · · · + E(y|x =

In other words, if x is also a scalar, namely x, consider:

c1 p11 ··· p1N p1

= y1(p11 + p21 + · · · + pM 1) + y2(p12 + p22 + · · · + pM 2)

There is another result similar to above but simpler to verify.

This result follows from conditional aspect of expectation. Since x

To summarize both the above results: “The smaller information set

E(y|x) = E[E(y|x, z)|x]

Proof: The exercise is similar to the previous one. Consider E(y|x) =

z1 pi11 ··· pi1N pi1·

E[E(y|x = xi, z)|x]

Proof : Let us consider scalar x. Thus,

E[y|f (x)] = E[E(y|x)|f (x)] (by property 3)

Proof : Consider scalar u, v, x. Hold v constant. Thus,

x1 p1j1 ··· p1jN p1j·

Note that the conditional probabilities of (u, v) do not depend on

⇒ E(u|v) = E(u|x, v).

E(u|x) = E[(y − E(y|x)|x]

⇒ E(xu) − E(x)E(u) = 0 as E[u] = 0

We do not prove this property.

Also, if y > 0, then

or, log[E(y)] ≥ E[log(y)]

8. If E(y 2) < ∞ and µ(x) ≡ E(y|x), then µ is a solution to

Next, for any m ∈ M ,

E[(y − m(x))2] = E[(y − µ(x) + µ(x) − m(x))2]

Hence by Property 6, E[(y − m(x))2] = E(u2) + E[(µ(x) − m(x))2],

The RHS of the above expression is clearly minimized when m ≡ µ.

Definition: var(y|x) ≡ σ 2(x) ≡ E[{y−E(y|x)}2|x] = E(y 2|x)−[E(y|x)]2.

var[a(x)y + b(x)|x] = [a(x)]2var(y|x),

where, a(x) and b(x) are scalar functions of x.

var(y) = E[var(y|x)] + var[E(y|x)]

var(y) = E[[y − E(y)]2]

Now, y − E(y|x) ≡ u. Also, E(y) is a population constant and E(y|x)

By property 6 of Conditional Expectation, E[ug(x)] = 0. Therefore,

var(y) = E[[y − E(y|x)]2] + E[[E(y|x) − E(y)]2]

var(y|x) = E[var(y|x, z)|x] + var[E(y|x, z)|x]

Proof is similar to CE property 3.

E[var(y|x)] ≥ E[var(y|x, z)]

This follows by applying E[ ] to both the sides of property 3 and the

Anda mungkin juga menyukai