Anda di halaman 1dari 24

Studies in Economic Statistics Jae-Young Kim

1 Introduction to Probability
1.1 Introduction

Definition 1.1 (Probability Space). A probability space is a triple (Ω, F , P)


where,

1. Ω (Sample Space): the set of all possible outcomes of a random experiment.

2. F (σ-field or σ-algebra): a collection of subsets of Ω.

3. P (Probability Measure): a real-valued function defined on F .

Example 1.1 (Tossing a Coin).

• Ω = { H, T }

• F = {∅, { H }, { T }, { H, T }}

• P(∅) = 0

• P({ H }) = P({ T }) = 1/2

• P({ H, T }) = 1

Definition 1.2 (σ-field (σ-algebra)). A class F of subsets of Ω is called σ-field


or σ-algebra if it satisfies:

1. Ω ∈ F

2. For A ∈ F , Ac ∈ F

3. For Ai ∈ F , i = 1, 2, · · ·, ∪i Ai ∈ F

Remarks

• A σ-field is always a field, but not vice versa.

• An element A ∈ F is called an event.

• An element ω ∈ Ω is called an outcome.

1
Studies in Economic Statistics Jae-Young Kim

Definition 1.3 (The smallest σ-field generated by A, σ(A)). Let A be a class


of subsets of Ω. Consider a class that is the intersection of all the σ-field containing
A; it is called σ-field generated by A and is denoted by σ(A). σ(A) satisfies

1. A ⊂ σ(A).

2. σ(A) is a σ-field.

3. If A ⊂ G, and G is a σ-field, then σ(A) ⊂ G.

Example 1.2 (σ(A)).

• Ω = {1,2,3,4,5,6}

• A = {1,3,5}

• A = {A}

⇒ σ(A) = {A, Ac , ∅, Ω}

Definition 1.4 (Probability Measure). A real-valued set function defined on a


σ-field is a probability measure if it satisfies

1. P( A) ≥ 0, ∀ A ∈ F

2. P(Ω) = 1

3. For Ai ∩ Aj = ∅, i ̸= j, P( i Ai ) = ∑i P ( Ai )

Remarks

• The three properties given above are often referred to as the axioms of
probability.

• A probability (measure) has the range on [0, 1], and a measure has the
range on [0, ∞].

Definition 1.5 (Lebesque Measure). First we define µ on an open interval in the


natural way. Note that any open set in R can be represented as countable union of
disjoint open intervals.

• Outer measure of A

µ∗ ( A) = inf ∑ µ(Ck ), {Ck }: Open covering


A⊂∪k Ck

2
Studies in Economic Statistics Jae-Young Kim

• Inner measure of A
µ∗ ( A) = 1 − µ∗ ( Ac )

• Lebesque Measure: µ( A) = µ∗ ( A) = µ∗ ( A)

Theorem 1.1 (Unique Extension). A probability measure on a field F0 has a


unique extension in a σ-field generated by F0 .

1. Let P be a probability measure on F0 and let F =σ(F0 ).Then, there exists a


probability measure Q on F such that Q( A) = P( A) for A ∈ F0 .

2. Let Q′ be another probability measure on F such that Q′ ( A) = P( A) for A


∈ F0 .
Then Q′ ( A) = Q( A) for A ∈ F .
∪∞
3. For Ai ∈ F , Ai ∩ A j = ϕ, i =1 Ai ∈ F , Q is countably additive.

Theorem 1.2 (Properties of Probability Measure).

1. For A ⊂ B, P(A) ≤ P(B).


Proof
Hint: P(B - A) = P(B) - P(A)

2. P(A ∪ B) = P(A) + P(B) - P(A ∩ B).


Proof
Hint: A ∪ B = A ∪ (B ∩ Ac )

3. P(A ∪ B) ≤ P(A)+ P(B)

• Extension

n n
P( Ak ) = ∑ P( Ak ) − ∑ P( Ai ∩ A j )+
k =1 k =1 i< j

· · · + (−1) n +1
P ( A1 ∩ A2 ∩ · · · ∩ A n )

• Boole’s inequality

∪ ∞
P( Ai ) ≤ ∑ P ( Ai )
i =1 i =1

3
Studies in Economic Statistics Jae-Young Kim

1.2 Some Limit Concepts of Probability

Definition 1.6 (Limit of Events for Monotone Sequences). Let { En } be a se-


quence of events. { En } is monotone when E1 ⊂ E2 ⊂ · · · or E1 ⊃ E2 ⊃ · · · .

1. Monotone increasing sequence of events :




E1 ⊂ E2 ⊂ . . . → (lim En = En )
n =1

2. Monotone decreasing sequence of events :




E1 ⊃ E2 ⊃ . . . → (lim En = En )
n =1

Theorem 1.3 (A monotone sequence of events { En }).

P(lim En ) = lim P( En )

Proof.

• E0 = ϕ, En : monotone increasing

• Fn = En − En−1 , P( Fi ) = P( Ei ) − P( Ei−1 )
∪n ∪n
• P( i =1 Fi ) = ∑in=1 P( Fi ) = P( En ) = P( i =1 Ei )

Definition 1.7 (Limit Supremum and Limit Infimum of Events). For a se-
quence of events En , define
∞ ∪
∩ ∞
lim sup En = Ek (∀n ≥ 1, ∃k ≥ n such that ω ∈ Ek , En infinitely often)
n n =1 k = n
∪∞ ∩ ∞
lim inf En = Ek (∃n ≥ 1 such that ∀k ≥ n, ω ∈ Ek , En eventually)
n
n =1 k = n

lim En = lim sup En = lim inf En

Lemma 1.1 (Borel - Cantelli). Let { En } be a sequence of events.



If ∑ P(Ei ) < ∞, then P(lim sup En ) = 0
i =1

4
Studies in Economic Statistics Jae-Young Kim

Proof.
∞ ∪
∩ ∞ ∞
∪ ∞
P(lim sup En ) = P( Ek ) ≤ P( Ek ) ≤ ∑ P(Ek ) → 0
n =1 k = n k=n k=n

Remarks
Note that if P( En ) → 0, P(lim inf En ) = 0
Lemma 1.2 (2nd Borel - Cantelli Lemma). Let { En } be a independent sequence
of events.

If ∑ P(Ei ) = ∞, then P(lim sup En ) = 1
i =1

1.3 Conditional Probability and Independence

Definition 1.8 (Conditional Probability). For an event A s.t P( A) > 0, the


conditional probability of A given B is defined as
P( A ∩ B)
P( A | B) =
P( B)
Definition 1.9 (Independence: A ⊥ B). Let A, B ∈ F , B ̸= ϕ

• If A ⊥ B, then P( A ∩ B) = P( A) P( B).

• If A ⊥ B, then P( A | B) = P( A).
P( A∩ B) P( A) P( B)
• P( A | B) = P( B)
= P( B)
= P( A)
Remarks
If A or B is empty, then they are always independent.
Definition 1.10 (Pairwise Independence).
• Let Γ be a class of subsets of Ω.

• For any pair A, B ∈ Γ, if P( A ∩ B) = P( A) P( B), then events in Γ are


pairwise independent.
Definition 1.11 (Mutual Independence).
• Let Γ be a class of subsets of Ω.

• For any collection of events ( Ai1 , . . . , Aik ), i, k = 1, 2, . . . in Γ, if P( Ai1 ∩


Ai2 ∩ · · · ∩ Aik ) = Πkj=1 P( Aij ), then events in Γ are mutually independent
or completely indempendent.

5
Studies in Economic Statistics Jae-Young Kim

1.4 Bayes Theorem

Theorem 1.4 (Bayes Theorem). For A, B ∈ F , P( A) > 0, P( B) > 0,


P( A∩ B) P( A| B) P( B)
• P( B | A) = P( A)
= P( A| B) P( B)+ P( A| Bc ) P( Bc )

P( A∩ B) P( B| A) P( A)
• P( A | B) = P( B)
= P( B| A) P( A)+ P( B| Ac ) P( Ac )

Remarks A Partition {Ai } of Ω

• Ai , i = 1, 2, . . . , n

• {Ai } is a partition of Ω if it satisfies


∪n
(i) i =1 Ai = Ω
(ii) Ai ∩ A j = ϕ, i ̸= j

• Ai , i = 1, 2, . . . , n, a partition of Ω, P( Ai ) > 0

• For every B ∈ F , P( B) > 0,


P ( B | Ai ) P ( Ai )
• P ( Ai | B ) = Σin=1 P( B| Ai ) P( Ai )

Remarks Bayesian Approach

• On a probability space (Ω, F , P)

• Events H ∈ F , P(· | H ) = PH

• Let Hi be a partition of Ω, which are unobservable events.

• Let B ⊂ Ω be observable.
P( Hi ) P( B| Hi )
• P( Hi | B) = ∑in=1 P( Hi ) P( B| Hi )

Remarks Classical VS Bayesian Approach

Y = Xβ + ε

• Classical (Frequentist) Approach

(a) X, Y are random variables.


(b) Parameters (β) are fixed.

• Bayesian Approach

(a) Unknowns (Unobservable) are regarded as random variables.


(b) β, ε are random variables.

6
Studies in Economic Statistics Jae-Young Kim

2 Random Variables, Distribution Functions, and Ex-


pectation
2.1 Random Variables

Definition 2.1 (Random Variable).


• A finite function X : Ω → R is a random variable (r.v) if for each B ∈ B ,
X −1 ( B)={ω : X (ω ) ∈ B} ∈ F , where B is the Borel σ-algebra on R
Remarks
• A random variable is a real measurable function.

• A random variable X : Ω → R defined on (Ω, F , P) is called F /B -


measurable function.
Definition 2.2 (Measurable Mapping).
• Measurable mapping: Generalization of measurable function

• Let (Ω, F ), (Ω′ , F ′ ) be two measurable spaces.

• A mapping T : Ω → Ω′ is said to be F /F ′ -measurable if for any B ∈ F ′ ,


T −1 ( B ) = { ω ∈ Ω : T ( ω ) ∈ B } ∈ F .
Theorem 2.1.
• Let (Ω, F , P) be a probability space.

• Let X be a random variable defined on Ω.

• Then, the random variable X induces a new probability space ( R, B , PX )


where X : Ω → R.
Proof.
For B ∈ B , let PX ( B) = P[ X −1 ( B)] = P[ω : X (ω ) ∈ B].

It is sufficient to show that

1. PX ( R) = 1
2. PX ( B) ≥ 0 for any B ∈ B
3. For Bi , i = 1, 2, . . . , with Bi ∩ Bj = ∅

PX (∪i Bi ) = ∑ P( Bi )
i

7
Studies in Economic Statistics Jae-Young Kim

2.2 Probability Distribution Function

Definition 2.3 (Distribution Function). Let X be a random variable. Given x,


a real valued function FX (·) defined as FX ( x ) = P[{ω : X (ω ) ≤ x }] is called the
distribution function (DF) of a random variable X.

Definition 2.4 (Cumulative distribution function (cdf)).

FX ( x ) = P[{ω : X (ω ) ≤ x }] = P( X ≤ x ) = PX {(−∞, x ]} = PX [{r : −∞ < r ≤ x }]

FX ( x2 ) − FX ( x1 ) = PX {( x1 , x2 ]}

Theorem 2.2 (Properties of Distribution Function).

1. limx→−∞ FX ( x ) = 0, limx→+∞ FX ( x ) = 1

2. For x1 ≤ x2 , FX ( x1 ) ≤ FX ( x2 ) (Monotone and Non-decreasing)

3. lim0<h→0 FX ( x + h) = FX ( x ) (Right Continuity)

Remarks
A distribution function is not necessarily left continuous.

Definition 2.5 (Discrete Random Variable). A random variable X is said to be


discrete if the range of X is countable or if there exists E, a countable set, such
that P( X ∈ E) = 1.

Definition 2.6 (Continuous Random Variable). A random variable X is said


∫x
to be continuous if there exists a function f X (·) such that FX ( x ) = −∞ f X (t)dt
for every real number x.

Remarks Another Characterization of Continuous Random Variable

• Let FX (·) be a distribution function (DF) of a random variable X.

(a) A distribution function, FX (·) is absolutely continuous if and


only if there exists a non-negative function f such that
∫ x
FX ( x ) = f (t)dt ∀x ∈ R
−∞

(b) That is, a random variable X is a continuous random variable if


and only if FX (·) is absolutely continuous.

8
Studies in Economic Statistics Jae-Young Kim

Definition 2.7 (Continuity).

• A function f : X → Y is continuous at a point x0 ∈ X if, at x0 , for given


any ϵ > 0, ∃δ > 0 such that

ρ( x0 , x ) < δ ⇒ ρ′ [ f ( x0 ), f ( x )] < ϵ

where ρ and ρ′ are metrics on X and Y.

• A function f is said to be continuous if it is continuous at each x ∈ X.

Definition 2.8 (Uniform Continuity).

• Let f : X → Y be a mapping from a metric space < X, ρ > to < Y, ρ′ >.

• We say that f is uniformly continuous if for any given ϵ > 0, ∃δ > 0 such
that, for any x1 , x2 ∈ X,

ρ( x1 , x2 ) < δ ⇒ ρ′ ( f ( x1 ), f ( x2 )) < ϵ.

Remarks

Uniformly continuous ⇒ Continuous

When f is defined on compact set (closed and bounded set if Rn ), Con-


tinuous ⇒ Uniformly Continuous.

Definition 2.9 (Absolute Continuity of a Function on Real Line).

• A real-valued function f defined on [ a, b] is said to be absolutely continu-


ous on [ a, b] if, for any given ϵ > 0, ∃δ > 0 such that

k k
∑ ( a i , bi ) < δ ⇒ ∑ | f (bi ) − f (ai )| < ϵ
i =1 i =1

for ( ai , bi ) pairwise disjoint, i = 1, · · · , k, k being arbitrary.

Remarks

• Absolutely continuous ⇒ Uniformly continuous

• Uniformly continuous ; Absolutely continuous

9
Studies in Economic Statistics Jae-Young Kim

Definition 2.10 (Absolute Continuity of a Measure: P ≪ Q).

• Let P, Q be two σ-finite measures in F .

- For a given ϵ > 0, ∃δ > 0 s.t Q( A) < δ ⇒ P( A) < ϵ.

- If Q( A) = 0 ⇒ P( A) = 0 ∀ A ∈ F

⇒ P is absolute continuous with respect to Q or we denote that ( P ≪ Q).

Example 2.1.

• P( A) = A
f dQ, A ∈ F
∫x
• FX ( x ) = −∞ f (t)dt

Theorem 2.3 (Radon-Nikodym Theorem). Let P, Q be two σ-finite measures



in F . If P ≪ Q, then there exists f ≥ 0 such that P( A) = A f dQ for any
A ∈ F . We write f = dQ
dP
and call it Radon-Nikodym derivative.

Definition 2.11 (Probability Mass Function). If X is a discrete random vari-


able with distinct values x1 , x2 , . . . , xk , then the function, denoted by f X ( xi ) =
P[ X = xi ] = P[ω : X (ω ) = xi ] such that

• f X ( xi ) > 0 for x = xi , i = 1, . . . , k

• f X ( x ) = 0 for x ̸= xi

• ∑ f X ( xi ) = 1

is said be the probability mass function (pmf) of X.

Remarks

• Some other names of p.m.f are Discrete density function, discrete fre-
quency function, and probability function.

• Note that f X ( xi ) = FX ( xi ) − FX ( xi−1 )

Definition 2.12 (Probability Density Function). If X is continuous random


∫x
variable, then the function f X (·) such that FX ( x ) = −∞ f X (t)dt is called the
probability density function of X.

• f X ( x ) ≥ 0, ∀ x
∫∞
• −∞ f X ( x )dx = 1

10
Studies in Economic Statistics Jae-Young Kim

Remarks

• Some other names of p.d.f are Density function, continuous density func-
tion, and integrating density function.

• P [ X = xi ] = 0
dFX ( x )
• f X (x) = dx
∫b
• P( a < X ≤ b) = F (b) − F ( a) = a
f ( x )dx

Remarks Decomposition of a Distribution Function

• Any cdf F ( x ) may be represented in the form of mixed distribution :

FX ( x ) = p1 FXD ( x ) + p2 FXC ( x ) where pi ≥ 0, i = 1, 2, p1 + p2 = 1, D:


discrete, C: continuous.

Theorem 2.4 (Function of a Random Variable). Let X be a random variable


and g be a Borel measurable function.Then, Y = g( X ) is also a random variable.

Proof. It suffices to show that {Y ≤ y} ∈ F to see if Y = g( X ) is a random


variable. That is, {Y ≤ y} = { g( X ) ≤ y} = {ω : X ∈ g−1 (−∞, y]} ∈ F

2.3 Expectation and Moments

Definition 2.13 (Expected Value). Let X be a random variable. Then, we define


E( X ) as expected value, (mathematical) expectation or mean of X.

1. Continuous random variable ⇒ E( X ) = x f ( x )dx

2. Discrete random variable ⇒ E( X ) = ∑ xi f i

Definition 2.14 (Expectation of a Function of Random Variable). Let Y =



g( X ) be a random variable. Suppose that | g( x ) | f ( x )dx < ∞. Then, we
∫ ∫
define E[Y ] = E[ g( X )] = g( x ) f ( x )dx = y f (y)dy.

Theorem 2.5 (Preservation of Monotonicity). Let E[ gi ( X )] be an expectation



for a real valued function gi of X. Suppose that E(| gi ( X ) |) = | gi ( x ) |
f ( x )dx < ∞. If g1 ( x ) ≤ g2 ( x ) for all x, then E[ g1 ( X )] ≤ E[ g2 ( X )].

Proof.

Suppose that g1 ( x ) ≤ g2 ( x ) for all x.

11
Studies in Economic Statistics Jae-Young Kim

∫ ∫
Then, E[ g1 ( X )] − E[ g2 ( X )] = g1 ( x ) f ( x )dx − g2 ( x ) f ( x )dx

= [ g1 ( x ) − g2 ( x )] f ( x )dx ≤ 0.

Remarks

• Suppose that g1 ( x ) ≤ g2 ( x ) for almost every x and | g1 |< ∞ and


| g2 |< ∞. Then, P[ω : g1 ( X (ω )) ≤ g2 ( X (ω )) = 1.

• That is, A = {ω : g1 ( x ) ≤ g2 ( x )} with P( A) = 1 and Ac = {ω :


g1 ( x ) > g2 ( x )} with P( Ac ) = 0
∫ ∫
• Finally, E[ g1 ( X ) − g2 ( X )] = A [ g1 ( x ) − g2 ( x )] f ( x )dx + Ac [ g1 ( x ) −
g2 ( x )] f ( x )dx ≤ 0.

Theorem 2.6 (Properties of Expectation).

1. When c is constant, E(c) = c

2. E(cX ) = cE( X ) (cf. E( XY | X ) = XE(Y | X ))

3. Linear Opeartor E( X + Y ) = E( X ) + E(Y )

4. If X ⊥ Y, then E( XY ) = E( X ) E(Y )

Proof.
∫ ∫
1. c f ( x )dx = c f dx = c · 1 = c

2. Trivial.
∫∫ ∫∫ ∫∫
3. E( X + Y ) = ( x + y) f ( x, y)dxdy = x f ( x, y)dxdy + y f ( x, y)dxdy
∫ ∫ ∫ ∫ ∫ ∫
= x [ f ( x, y)dy]dx + y[ f ( x, y)dx ]dy = x f ( x )dx + y f (y)dy
= E( X ) + E(Y ))

4. It is trivial when we use f ( x, y) = f ( x ) f (y).

Definition 2.15 (Moments).


′ ∫
• rth moment of X ⇒ mr = µr = E( X r ) = xr f ( x )dx

• rth central moment of X ⇒ µr = E[( X − E( X ))r ] = ( X − E( X ))r f ( x )dx

12
Studies in Economic Statistics Jae-Young Kim

Example 2.2.
1
1. E( X ) = ∑i xi f i , X̄ = n ∑ xi

2. Var ( X ) = E[( X − E( X ))2 ]

3. Skewness = E[( X − E( X ))3 ]

4. Kurtosis = E[( X − E( X ))4 ]

Definition 2.16 (Moment Generating Function). For a continuous random


variable X,

• MX (t) = E[etx ] = etx f ( x )dx for −h < t < h, for some small h > 0
dMX (t) ∫
• dt = xetx f ( x )dx
dr MX ( t ) ∫
• dtr = xr etx f ( x )dx
′ dr MX ( t )
• µr = E [ X r ] = dtr | t =0

For a discrete random variable X,

• MX (t) = E[etx ] = ∑i etxi f ( xi ) where e x = ∑i∞=0 i!1 xi


′ dr MX ( t )
• µr = E [ X r ] = dtr | t =0

Theorem 2.7. For 0 < s < r, if E[| X |r ] exists, then E[| X |s ] < ∞.

Remarks

• There must exist h > 0 such that MX (t) = E[etx ] = etx f ( x )dx for
−h < t < h.

• The moment generating function (mgf) does not always exist for a
random variable X.

Example 2.3.

• Consider the r.v X having pdf f ( x ) = x −2 I[1,∞) ( x ).


∫∞
⇒ If the mgf of X exists, then it is given by 1 x −2 etx dx by the definition of
mgf. However, it can be shown that the integral does not exist for any t > 0.
In fact, E[ X ] = ∞.

13
Studies in Economic Statistics Jae-Young Kim

• Cauchy distribution: t(1)

⇒ E[ X ] = ∞ and thus all the moments do not exist.

Definition 2.17 (Characteristic Function).


∫ √
• ϕX (t) = E[eitX ] = eitx f ( x )dx where i = −1

c f . eiy = cos(y) + isin(y)

Remarks

• ϕX (t) ⇔ FX : Characteristic function exists for any random variable


X.

• | eitx |=| cost(tx ) + isin(tx ) |= cos2 (tx ) + sin2 (tx ) = 1


d r ϕX ( t ) ′
• dtr | t =0 = E[(itX )r ] = ir µr

• MX ( t ) → mr

• FX ( x ) ⇔ mr for all r (if mr exists for every r)

2.4 Characteristics of Distribution

Location (Representative Value)



1. Expectation: µ = µ1 = E( X ) = x f ( x )dx
(a) E(c) = c
(b) E(cX ) = cE( X )
(c) E( X + Y ) = E( X ) + E(Y )
(d) If X ⊥Y, then E( XY ) = E( X ) E(Y ).
2. αth-Quantile ξ α : the smallest ξ such that FX (ξ ) ≤ α
3. Median: 0.5th quantile
(a) m or Xmed such that P( X < m) ≤ 1
2 and P( X > m) ≤ 1
2
(b) In a symmetric distribution, E( X ) = m.
4. Mode: Xmod
(a) A mode of a distribution of one random variable X is a value
of x that maximizes the pdf or pmf.
(b) There may be more than one mode. Also, there may be no
mode at all.

14
Studies in Economic Statistics Jae-Young Kim

Measures of Dispersion

1. Variance: µ2 = Var ( X ) = E[( X − µ)2 ]


(a) Var (c) = 0
(b) Var (cX ) = c2 Var ( X )
(c) Var ( a + bX ) = b2 Var ( X )

2. Standard Deviation: SD ( X ) = Var ( X ) (cf. SD ( a + bX ) =
|b|SD ( X ))
3. Interquantile Range: ξ 0.75 − ξ 0.25
– This is useful for an asymmetric distribution.

Skewness

1. Skewness: µ3 = E[( X − µ)3 ]


(a) µ3 > 0: skewed to the right
(b) µ3 = 0: symmetric
(c) µ3 < 0: skewed to the left
2. Skewness Coefficient: unit-free measure
µ3 E[( X − µ)3 ]
=
σ3 ( E[( X − µ)2 ])3/2

Kurtosis

1. Kurtosis: µ4 = E[( X − µ)4 ]


(a) µ4 > 3: long tail (leptokurtic)
(b) µ4 = 3: normal (mesokurtic)
(c) µ4 < 3: short tail (platykurtic)
2. Kurtosis Coefficient: unit-free measure
µ4 E[( X − µ)4 ]
=
σ4 ( E[( X − µ)2 ])4/2

15
Studies in Economic Statistics Jae-Young Kim

2.5 Inequalities

Theorem 2.8 (Markov Inequality). Let X be a random variable and g(·) a non-
negative Borel measurable function. Then, for every k > 0,

E[ g( X )]
P[ g( X ) ≥ k] ≤
k
Proof.
∫ ∫ ∫
E[ g( X )] = g( x ) f ( x )dx = g( x ) f ( x )dx + g( x ) f ( x )dx
X:g( x )≥k X:g( x )<k
∫ ∫
≥ g( x ) f ( x )dx ≥ k f ( x )dx
X:g( x )≥k X:g( x )≥k

≥k f ( x )dx = kP[ g( X ) ≥ k]
X:g( x )≥k

Example 2.4.

• Apply Markov inequality to g( x ) = ( X − µ)2 , k = r2 σX2

⇒ Chebyshev’s inequality : P[( X − µ)2 ≥ r2 σX2 ] ≤ 1


r2

• g( x ) =| X |, g( x ) =| X |α

Theorem 2.9 (Jensen’s Inequality). Let X be a random variable with mean E[ X ],


and let g(·) be a convex function. Then E[ g( X )] ≥ g( E[ X ]).

Proof. Since g( x ) is continuous and convex, there exists a line, satisfying


l ( x ) ≤ g( x ) and l ( E[ X ]) = g( E[ X ]). By definition, l ( x ) goes through the
point ( E[ X ], g( E[ X ])) and we can let l ( x ) = a + bx. That is,

E[l ( X )] = E[( a + bX )] = a + bE[ X ] = l ( E[ X ])

⇒ ( E[ X ]) = l ( E[ X ]) = E[l ( X )] ≤ E[ g( X )]

Theorem 2.10 (Hölder’s Inequality). Let X, Y be two random variables and p, q


are numbers such that p > 1, q > 1, 1p + 1q = 1. Then,

1 1
E[ XY ] ≤ E[| X | p ] p E[| Y |q ] q

16
Studies in Economic Statistics Jae-Young Kim

Example 2.5.

Apply Hölder’s inequality to p = q = 2


1 1
E[ XY ] ≤ E[ X 2 ] 2 E[Y 2 ] 2 : Cauchy-Schwarz’s inequality
√ √
⇒ Cov( X, Y ) ≤ Var ( X ) Var (Y ) (c f . Cov( X, Y ) = E[( X − µ X )(Y −
µY )])

∴ −1 ≤ ρ XY = √ Cov( X,Y )
√ ≤1
Var ( X ) Var (Y )

3 Joint and Conditional Distributions, Stochastic In-


dependence and More Expectations
3.1 Joint Distribution

Definition 3.1 (n-dimensional Random Variable).

• Let X (ω ) = ( X1 (ω ), X2 (ω ), ·, Xn (ω )) for ω ∈ Ω be an n-dimensional


function defined on (Ω, F , P) into Rn

• X (ω ) is called n-dimensional random variable if the inverse image of every


n-dimensional interval in Rn , I = {( x1 , x2 , · · · , xn ) : −∞ < xi < ai ,
ai ∈ R, i = 1, 2, · · · , n} is in F .

• i.e. X −1 ( I ) = {ω : X1 (ω ) ≤ x1 , · · · , Xn (ω ) ≤ xn } ∈ F .

Theorem 3.1 (Construction of a n-dimensional Random Variable). Let Xi ,


i = 1, · · · , n be each one-dimensional random variable. Then, X = ( X1 , · · · , Xn )
is an n-dimensional random variable.

Definition 3.2 (Joint Cumulative Distribution Function). Let X be n-dimensional


random variable; X = ( X1 , · · · , Xn ). Then, the joint cumulative distribution
function of X is defined as

FX ( x1 , · · · , xn ) = FX1 ,··· ,Xn ( x1 , · · · , xn ) = P[ω : X1 (ω ) ≤ x1 ; · · · ; Xn (ω ) ≤ xn ]

for each ( x1 , · · · , xn ) ∈ Rn

Theorem 3.2 (Properties of Joint Cumulative Distribution Function).

1. Non-decreasing with respect to all arguments x1 , · · · , xn

17
Studies in Economic Statistics Jae-Young Kim

2. Right continuous with respect to all arguments x1 , · · · , xn

c f . lim0<h→0 F ( x + h, y) = lim0<h→0 F ( x, y + h) = F ( x, y)

3. F (+∞, +∞) = 1, FXY (−∞, y) = FXY ( x, −∞) = 0 for all x, y

4. F ( x2 , y2 ) − F ( x2 , y1 ) − F ( x1 , y2 ) + F ( x1 , y1 ) ≥ 0 (∵ P[ x1 ≤ X ≤ x2 , y1 ≤
Y ≤ y2 ] ≥ 0)

Definition 3.3 (Joint Probability Mass Function). Let X = ( X1 , X2 , . . . , Xn )


be a discrete random vector with distinct values a1 , a2 , . . . , ak ∈ Rn . Then the
function, denoted by f X ( ai ) = P[ X = ai ], such that

• f X ( x ) > 0 for x = ai , i = 1, . . . , k

• f X ( x ) = 0 for x ̸= ai

• ∑i f X ( ai ) = 1

is called the joint probability mass function of X.

Definition 3.4 (Joint Probability Density Function). Let X = ( X1 , X2 , . . . , Xn )


be a continuous random vector and FX1 ,...,Xn be its cumulative distribution func-
tion. Then the function f X1 ,...,Xn such that
∫ x1 ∫ xn
FX1 ,...,Xn ( x1 , x2 , . . . , xn ) = ··· f (t1 , t2 , . . . , tn )dt1 · · · dtn
−∞ −∞

exists. That function is called the joint probability density function of X.

Remarks

• f ( x1 , . . . , xn ) ≥ 0,∀( x1 , . . . , xn )
∂n F ( x ,...,x )
• f ( x1 , . . . , xn ) = ∂x1 ···
1
∂xn
n

∫∞ ∫∞
• −∞ · · · −∞ f (t1 , t2 , . . . , tn )dt1 · · · dtn = 1

3.2 Marginal Distribution

Definition 3.5 (Marginal Distribution). Let X, Y be two random variables. Then


the marginal distributions of X and Y are:

FX ( x ) = FXY ( x, +∞) = P[ X ≤ x, Y < +∞]

FY (y) = FXY (+∞, y) = P[ X < +∞, Y ≤ y]

18
Studies in Economic Statistics Jae-Young Kim

Definition 3.6 (Marginal Probability Density Function). Let X, Y be two ran-


dom variables and let f X,Y ( x, y) be the joint pdf of X, Y. Then marginal probability
density functions of X and Y are:

• (Discrete case)

f X ( x ) = ∑nj=1 f ( xi , y j )

f Y (y) = ∑in=1 f ( xi , y j )

• (Continuous case)

f X ( x ) = f ( x, y)dy

f Y (y) = f ( x, y)dx

3.3 Conditional Distribution

Definition 3.7 (Conditional Probability Distribution Function). Let X, Y be


two random variables. Then the conditional distribution of X given Y is:

FX |Y ( x | y) = P( X ≤ x | Y = y)

and the conditional density of X given Y is:

∂FX |Y ( x | y)
f X |Y ( x | y ) = (Continuous)
∂x
f X |Y ( x | y ) = P ( X = x | Y = y ) (Discrete)
∫ x
FX |Y ( x | y) = f (u | y)du
−∞

Remarks
∫x f X,Y (u,y)
• FX |Y ( x | y) = −∞ f Y (y)
du

∂FX |Y ( x |y) f X,Y ( x,y)


• ∂x = f Y (y)

Theorem 3.3 (Alternative Derivation of Conditional Density).

f X,Y ( x, y)
f X |Y ( x | y ) = if f Y (y)>0
f Y (y)

19
Studies in Economic Statistics Jae-Young Kim

Proof. First, consider discrete random variables X, Y. Let A x = {w : X (w) =


x }, By = {w : Y (w) = y}. Then we have,

P( A x ∩ By )
f X |Y ( x | y) = P( X = x | Y = y) = P( A x | By ) =
P( By )
P({w : X (w) = x, Y (w) = y}) f ( x, y)
= = X,Y
P({w : Y (w) = y}) f Y (y)

Next, consider continuous random variables X, Y. Let A x = {w : X (w) 5


x } and Bε = {w : y − ε 5 Y (w) 5 y + ε}. Define By = limε→0 Bε . Then we
have,
P({w : X (w) 5 x, y − ε 5 Y (w) 5 y + ε})
FX |Y ( x | y) = P( A x | By ) = lim
ε →0 P({w : y − ε 5 Y (w) 5 y + ε})
1
∫ y+ε ∫ x ∫x
limε→0 2ε y−ε −∞ f X,Y (u, v)dudv f X,Y (u, y)du
= ∫ y+ε = ∞
1
limε→0 2ε y−ε f Y (v)dv f Y (y)
∫ x
f X,Y (u, y)
= du
−∞ f Y (y)
f X,Y ( x,y)
Therefore, f X |Y ( x |y) = f Y (y)
.

3.4 Independence of Random Variables

Definition 3.8 (Independence of Random Variables). The random variables


X and Y are said to be independent if

f X,Y ( x, y) = f X ( x ) f Y (y) ( P( A x ∩ By ) = P( A x ) P( By ))

Random variables that are not independent are said to be dependent.

Theorem 3.4. X and Y are independent if and only if

FX,Y ( x, y) = FX ( x ) FY (y) ∀( x, y) ∈ R2

Proof.

⇐) By partial differentiations

⇒) FX,Y ( x, y)

= P({ω : X (ω ) 5 x, Y (ω ) 5 y}) = P({w : X (w) 5 x } ∩ {w : Y (w) 5 y})


= P({ω : X (ω ) 5 x }) P({w : Y (w) 5 y}) = FX ( x ) FY (y)

20
Studies in Economic Statistics Jae-Young Kim

Definition 3.9 (Pairwise and Mutual Independence). Let X1 , X2 , · · · , Xn be


random variables.

• X1 , . . . , Xn are pairwise independent if Xi ⊥ X j for ∀i, j = 1, 2, · · · , n, i ̸=


j

• X1 , . . . , Xn are mutually independent if for any k collection,


( Xi1 , Xi2 , . . . , Xik ) ∈ ( X1 , X2 , . . . , Xn ), k = 2, 3, . . . , n,
k
FXi1 ,··· ,Xi ( xi1 , · · · , xik ) =
k
∏ FX ij
( xi j )
j =1

Theorem 3.5 (Preservation of Independence). Let X, Y be random variables


and g1 , g2 be Borel-measurable functions. If X ⊥Y, then g1 ( X )⊥ g2 (Y ).

Proof.

P({ g1 ( X ) 5 x, g2 (Y ) 5 y}) = P({ g1 ( X ) ∈ (−∞, x ], g2 (Y ) ∈ (−∞, y]})


= P({ X ∈ g1−1 (−∞, x ], Y ∈ g2−1 (−∞, y]})
= P({ X ∈ g1−1 (−∞, x ]) P(Y ∈ g2−1 (−∞, y]})
= P({ g1 ( X ) ∈ (−∞, x ]) P( g2 (Y ) ∈ (−∞, y]})
= P({ g1 ( X ) ≤ x ]) P( g2 (Y ) ≤ y})

Definition 3.10. Identically Distributed Random Variables Let X, Y be random


variables. X and Y are identically distributed if FX ( a) = FY ( a) ∀ a ∈ R and
d
we denote X = Y

Theorem 3.6. If Xi (i = 1, 2, · · · , n) are independent identically distributed,


n
FX1 ,··· ,Xn ( x1 , · · · , xn ) = ∏ FX (xi )
i =1

Definition 3.11 (Moment Generating Function of Joint Distribution). For a


random vector X = ( X1 , X2 , · · · , Xn )′ , the moment generating function is

m X (t) = E[et X ] = E[et1 X1 +t2 X2 +···+tn Xn ] < ∞ − hi < ti < hi (i = 1, 2, . . . , n hi > 0)

21
Studies in Economic Statistics Jae-Young Kim

Definition 3.12 (Cross Moments).

µr′ 1 ,r2 = E[ X1r1 X2r2 ]: (r1 , r2 )th cross moment

µr1 ,r2 = E[( X1 − µ1 )r1 ( X2 − µ2 )r2 ]: (r1 , r2 )th cross central moment

Remarks
∂r1 +r2 MX,Y (t1 ,t2 )
µr′ 1 ,r2 = r r | t1 = t2 =0
∂t11 ∂t22

∂r1 +r2 ϕX,Y (t1 ,t2 )


(i )r1 +r2 µr′ 1 ,r2 = r r | t1 = t2 =0
∂t11 ∂t22

(ϕX,Y : Characteristic f unction)

Theorem 3.7. X1 , X2 , . . . , Xn are mutually independent if and only if

MX1 ,X2 ,··· ,Xn (t1 , t2 , · · · tn ) = MX1 (t1 ) MX2 (t2 ) · · · MXn (tn )

Theorem 3.8. Let X ⊥Y and g1 , g2 be Borel-measurable functions. Then,

E[ g1 ( X ) g2 (Y )] = E[ g1 ( X )][ g2 (Y )]

Remarks

• A trivial corollary of the theorem is that X ⊥Y ⇒ Cov( X, Y ) = 0

Theorem 3.9. Let X1 , X2 , . . . , Xn be random variables. Let S = ∑in=1 ai Xi . Then,


n
Var (S) = ∑ a2i Var(Xi ) + ∑ ai a j Cov(Xi , Xj )
i =1 i̸= j

If X1 , X2 , . . . , Xn are independent,
n
Var (S) = ∑ a2i Var(Xi )
i =1

22
Studies in Economic Statistics Jae-Young Kim

3.5 Conditional Expectation

Definition 3.13 (Conditional Expectation). Let X be an integrable random


variable on (Ω, F , P) and that G is a sub σ-field of F (G ⊂ F ). Then there exist
a random variable E[ X |G], called the conditional expected value of X given G ,
with following properties:

(1) E[ X |G] is G -measurable and integrable.

(2) E[ X |G] satisfies the functional equation


∫ ∫
E[ X |G]dP = XdP, G∈G
G G

Definition 3.14 (Conditional Mean). Let X, Y be random variables and h(·) be


a Borel-measurable function. Then,

E[h( X )|Y = y] = ∑ h ( xi ) f ( xi | y ) (Discrete)


i

= h( x ) f ( x |y)dx (Continuous)

Remarks

E[h( X )|Y ] is also a random variable.

Theorem 3.10 (Properties of Conditional Expectation).

1. E[c|Y ] = c, c : consant

2. For h1 (), h2 (), Borel-measurable functions

E[c1 h1 ( X ) + c2 h2 ( X )|Y ] = c1 E[h1 ( X )|Y ] + c2 E[h2 ( X )|Y ]

3. P[ X ≥ 0] = 1 ⇒ E [ X |Y ] ≥ 0

4. P[ X1 ≥ X2 ] = 1 ⇒ E [ X1 | Y ] ≥ E [ X 2 | Y ]

5. ϕ(·): A function of X, Y ⇒ E[ϕ( X, Y )|Y = y] = E[ϕ( X, y)|Y = y]

6. Ψ (·): A Borel-measurable function ⇒ E[Ψ ( X )ϕ( X, Y )| X ] = Ψ ( X ) E[ϕ( X, Y )| X ]

23
Studies in Economic Statistics Jae-Young Kim

Theorem 3.11 (Law of Iterated Expectations). Let X, Y be random variables


and E[h( X )] exist. Then,

E[ E[h( X )|Y ]] = E[h( X )]

Proof.
∫ ∞ ∫ ∞
h( x ) f X,Y ( x, y)dxdy
−∞ −∞
∫ ∞ ∫ ∞
f X,Y ( x, y)
= [ h( x ) dx ] f Y (y)dy
−∞ −∞ f Y (y)
∫ ∞
= E[h( X )|y] f Y (y)dy = E[ E[h( X )|Y ]]
−∞
= E[h( X )]

Definition 3.15 (Conditional Variance). Let X, Y be random variables and E[ X |Y ]


be a conditional expectation of X given Y. Then,

Var ( X |Y ) = E[( X − E[ X |Y ])2 |Y ]

Theorem 3.12. Let X, Y be random variables with finite variances. Then,

1. Var ( X |Y ) = E[ X 2 |Y ] − ( E[ X |Y ])2

2. Var ( X ) = E[Var ( X |Y )] + Var ( E[ X |Y ])

Proof.

1. E[( X − E[ X |Y ])2 |Y ] = E[ X 2 − 2XE[ X |Y ] + ( E[ X |Y ])2 |Y ]


=E[ X 2 |Y ]-2E[ XE[ X |Y ]|Y ]+E[( E[ X |Y ])2 |Y ]
=E[ X 2 |Y ]-( E[ X |Y ])2

2. E[Var ( X |Y )] = E[ E[ X 2 |Y ] − ( E[ X |Y ])2 ]
=E[ X 2 ] − ( E[ X ])2 − ( E[( E[ X |Y ])2 ] − ( E[ X ])2 )
=Var ( X ) − Var ( E[ X |Y ])
∴ Var ( X ) = E[Var ( X |Y )] + Var ( E[ X |Y ])

24

Anda mungkin juga menyukai