(김재영) probability

Studies in Economic Statistics Jae-Young Kim
1 Introduction to Probability
1.1 Introduction
Definition 1.1 (Probability Space). A probability space is a triple (Ω, F , P)

where,
1. Ω (Sample Space): the set of all possible outcomes of a random experiment.
2. F (σ-field or σ-algebra): a collection of subsets of Ω.
3. P (Probability Measure): a real-valued function defined on F .
Example 1.1 (Tossing a Coin).
• Ω = { H, T }
• F = {∅, { H }, { T }, { H, T }}
• P(∅) = 0
• P({ H }) = P({ T }) = 1/2
• P({ H, T }) = 1
Definition 1.2 (σ-field (σ-algebra)). A class F of subsets of Ω is called σ-field

or σ-algebra if it satisfies:
1. Ω ∈ F
2. For A ∈ F , Ac ∈ F
3. For Ai ∈ F , i = 1, 2, · · ·, ∪i Ai ∈ F
Remarks
• A σ-field is always a field, but not vice versa.
• An element A ∈ F is called an event.
• An element ω ∈ Ω is called an outcome.
1
Definition 1.3 (The smallest σ-field generated by A, σ(A)). Let A be a class

of subsets of Ω. Consider a class that is the intersection of all the σ-field containing
A; it is called σ-field generated by A and is denoted by σ(A). σ(A) satisfies
1. A ⊂ σ(A).
2. σ(A) is a σ-field.
3. If A ⊂ G, and G is a σ-field, then σ(A) ⊂ G.
Example 1.2 (σ(A)).
• Ω = {1,2,3,4,5,6}
• A = {1,3,5}
• A = {A}
⇒ σ(A) = {A, Ac , ∅, Ω}
Definition 1.4 (Probability Measure). A real-valued set function defined on a

σ-field is a probability measure if it satisfies
1. P( A) ≥ 0, ∀ A ∈ F
2. P(Ω) = 1
∪
3. For Ai ∩ Aj = ∅, i ̸= j, P( i Ai ) = ∑i P ( Ai )
Remarks
• The three properties given above are often referred to as the axioms of
probability.
• A probability (measure) has the range on [0, 1], and a measure has the
range on [0, ∞].
Definition 1.5 (Lebesque Measure). First we define µ on an open interval in the

natural way. Note that any open set in R can be represented as countable union of
disjoint open intervals.
• Outer measure of A
µ∗ ( A) = inf ∑ µ(Ck ), {Ck }: Open covering

A⊂∪k Ck
2
• Inner measure of A
µ∗ ( A) = 1 − µ∗ ( Ac )
• Lebesque Measure: µ( A) = µ∗ ( A) = µ∗ ( A)
Theorem 1.1 (Unique Extension). A probability measure on a field F0 has a

unique extension in a σ-field generated by F0 .
1. Let P be a probability measure on F0 and let F =σ(F0 ).Then, there exists a

probability measure Q on F such that Q( A) = P( A) for A ∈ F0 .
2. Let Q′ be another probability measure on F such that Q′ ( A) = P( A) for A

∈ F0 .
Then Q′ ( A) = Q( A) for A ∈ F .
∪∞
3. For Ai ∈ F , Ai ∩ A j = ϕ, i =1 Ai ∈ F , Q is countably additive.
Theorem 1.2 (Properties of Probability Measure).
1. For A ⊂ B, P(A) ≤ P(B).

Proof
Hint: P(B - A) = P(B) - P(A)
2. P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

Proof
Hint: A ∪ B = A ∪ (B ∩ Ac )
3. P(A ∪ B) ≤ P(A)+ P(B)
• Extension
∪
n n
P( Ak ) = ∑ P( Ak ) − ∑ P( Ai ∩ A j )+
k =1 k =1 i< j
· · · + (−1) n +1
P ( A1 ∩ A2 ∩ · · · ∩ A n )
• Boole’s inequality
∞
∪ ∞
P( Ai ) ≤ ∑ P ( Ai )
i =1 i =1
3
1.2 Some Limit Concepts of Probability
Definition 1.6 (Limit of Events for Monotone Sequences). Let { En } be a se-

quence of events. { En } is monotone when E1 ⊂ E2 ⊂ · · · or E1 ⊃ E2 ⊃ · · · .
1. Monotone increasing sequence of events :

∞
∪
E1 ⊂ E2 ⊂ . . . → (lim En = En )
n =1
2. Monotone decreasing sequence of events :

∞
∩
E1 ⊃ E2 ⊃ . . . → (lim En = En )
n =1
Theorem 1.3 (A monotone sequence of events { En }).
P(lim En ) = lim P( En )
Proof.
• E0 = ϕ, En : monotone increasing
• Fn = En − En−1 , P( Fi ) = P( Ei ) − P( Ei−1 )
∪n ∪n
• P( i =1 Fi ) = ∑in=1 P( Fi ) = P( En ) = P( i =1 Ei )
Definition 1.7 (Limit Supremum and Limit Infimum of Events). For a se-
quence of events En , define
∞ ∪
∩ ∞
lim sup En = Ek (∀n ≥ 1, ∃k ≥ n such that ω ∈ Ek , En infinitely often)
n n =1 k = n
∪∞ ∩ ∞
lim inf En = Ek (∃n ≥ 1 such that ∀k ≥ n, ω ∈ Ek , En eventually)
n
n =1 k = n
lim En = lim sup En = lim inf En
Lemma 1.1 (Borel - Cantelli). Let { En } be a sequence of events.

∞
If ∑ P(Ei ) < ∞, then P(lim sup En ) = 0
i =1
4
Proof.
∞ ∪
∩ ∞ ∞
∪ ∞
P(lim sup En ) = P( Ek ) ≤ P( Ek ) ≤ ∑ P(Ek ) → 0
n =1 k = n k=n k=n
Remarks
Note that if P( En ) → 0, P(lim inf En ) = 0
Lemma 1.2 (2nd Borel - Cantelli Lemma). Let { En } be a independent sequence
of events.
∞
If ∑ P(Ei ) = ∞, then P(lim sup En ) = 1
i =1
1.3 Conditional Probability and Independence
Definition 1.8 (Conditional Probability). For an event A s.t P( A) > 0, the

conditional probability of A given B is defined as
P( A ∩ B)
P( A | B) =
P( B)
Definition 1.9 (Independence: A ⊥ B). Let A, B ∈ F , B ̸= ϕ
• If A ⊥ B, then P( A ∩ B) = P( A) P( B).
• If A ⊥ B, then P( A | B) = P( A).
P( A∩ B) P( A) P( B)
• P( A | B) = P( B)
= P( B)
= P( A)
Remarks
If A or B is empty, then they are always independent.
Definition 1.10 (Pairwise Independence).
• Let Γ be a class of subsets of Ω.
• For any pair A, B ∈ Γ, if P( A ∩ B) = P( A) P( B), then events in Γ are

pairwise independent.
Definition 1.11 (Mutual Independence).
• Let Γ be a class of subsets of Ω.
• For any collection of events ( Ai1 , . . . , Aik ), i, k = 1, 2, . . . in Γ, if P( Ai1 ∩

Ai2 ∩ · · · ∩ Aik ) = Πkj=1 P( Aij ), then events in Γ are mutually independent
or completely indempendent.
5
1.4 Bayes Theorem
Theorem 1.4 (Bayes Theorem). For A, B ∈ F , P( A) > 0, P( B) > 0,

P( A∩ B) P( A| B) P( B)
• P( B | A) = P( A)
= P( A| B) P( B)+ P( A| Bc ) P( Bc )
P( A∩ B) P( B| A) P( A)
• P( A | B) = P( B)
= P( B| A) P( A)+ P( B| Ac ) P( Ac )
Remarks A Partition {Ai } of Ω
• Ai , i = 1, 2, . . . , n
• {Ai } is a partition of Ω if it satisfies

∪n
(i) i =1 Ai = Ω
(ii) Ai ∩ A j = ϕ, i ̸= j
• Ai , i = 1, 2, . . . , n, a partition of Ω, P( Ai ) > 0
• For every B ∈ F , P( B) > 0,

P ( B | Ai ) P ( Ai )
• P ( Ai | B ) = Σin=1 P( B| Ai ) P( Ai )
Remarks Bayesian Approach
• On a probability space (Ω, F , P)
• Events H ∈ F , P(· | H ) = PH
• Let Hi be a partition of Ω, which are unobservable events.
• Let B ⊂ Ω be observable.
P( Hi ) P( B| Hi )
• P( Hi | B) = ∑in=1 P( Hi ) P( B| Hi )
Remarks Classical VS Bayesian Approach
Y = Xβ + ε
• Classical (Frequentist) Approach
(a) X, Y are random variables.

(b) Parameters (β) are fixed.
• Bayesian Approach
(a) Unknowns (Unobservable) are regarded as random variables.

(b) β, ε are random variables.
6
2 Random Variables, Distribution Functions, and Ex-

pectation
2.1 Random Variables
Definition 2.1 (Random Variable).

• A finite function X : Ω → R is a random variable (r.v) if for each B ∈ B ,
X −1 ( B)={ω : X (ω ) ∈ B} ∈ F , where B is the Borel σ-algebra on R
Remarks
• A random variable is a real measurable function.
• A random variable X : Ω → R defined on (Ω, F , P) is called F /B -

measurable function.
Definition 2.2 (Measurable Mapping).
• Measurable mapping: Generalization of measurable function
• Let (Ω, F ), (Ω′ , F ′ ) be two measurable spaces.
• A mapping T : Ω → Ω′ is said to be F /F ′ -measurable if for any B ∈ F ′ ,

T −1 ( B ) = { ω ∈ Ω : T ( ω ) ∈ B } ∈ F .
Theorem 2.1.
• Let (Ω, F , P) be a probability space.
• Let X be a random variable defined on Ω.
• Then, the random variable X induces a new probability space ( R, B , PX )

where X : Ω → R.
Proof.
For B ∈ B , let PX ( B) = P[ X −1 ( B)] = P[ω : X (ω ) ∈ B].
It is sufficient to show that
1. PX ( R) = 1
2. PX ( B) ≥ 0 for any B ∈ B
3. For Bi , i = 1, 2, . . . , with Bi ∩ Bj = ∅
PX (∪i Bi ) = ∑ P( Bi )
i
7
2.2 Probability Distribution Function
Definition 2.3 (Distribution Function). Let X be a random variable. Given x,

a real valued function FX (·) defined as FX ( x ) = P[{ω : X (ω ) ≤ x }] is called the
distribution function (DF) of a random variable X.
Definition 2.4 (Cumulative distribution function (cdf)).
FX ( x ) = P[{ω : X (ω ) ≤ x }] = P( X ≤ x ) = PX {(−∞, x ]} = PX [{r : −∞ < r ≤ x }]
FX ( x2 ) − FX ( x1 ) = PX {( x1 , x2 ]}
Theorem 2.2 (Properties of Distribution Function).
1. limx→−∞ FX ( x ) = 0, limx→+∞ FX ( x ) = 1
2. For x1 ≤ x2 , FX ( x1 ) ≤ FX ( x2 ) (Monotone and Non-decreasing)
3. lim0<h→0 FX ( x + h) = FX ( x ) (Right Continuity)
Remarks
A distribution function is not necessarily left continuous.
Definition 2.5 (Discrete Random Variable). A random variable X is said to be

discrete if the range of X is countable or if there exists E, a countable set, such
that P( X ∈ E) = 1.
Definition 2.6 (Continuous Random Variable). A random variable X is said

∫x
to be continuous if there exists a function f X (·) such that FX ( x ) = −∞ f X (t)dt
for every real number x.
Remarks Another Characterization of Continuous Random Variable
• Let FX (·) be a distribution function (DF) of a random variable X.
(a) A distribution function, FX (·) is absolutely continuous if and

only if there exists a non-negative function f such that
∫ x
FX ( x ) = f (t)dt ∀x ∈ R
−∞
(b) That is, a random variable X is a continuous random variable if

and only if FX (·) is absolutely continuous.
8
Definition 2.7 (Continuity).
• A function f : X → Y is continuous at a point x0 ∈ X if, at x0 , for given

any ϵ > 0, ∃δ > 0 such that
ρ( x0 , x ) < δ ⇒ ρ′ [ f ( x0 ), f ( x )] < ϵ
where ρ and ρ′ are metrics on X and Y.
• A function f is said to be continuous if it is continuous at each x ∈ X.
Definition 2.8 (Uniform Continuity).
• Let f : X → Y be a mapping from a metric space < X, ρ > to < Y, ρ′ >.
• We say that f is uniformly continuous if for any given ϵ > 0, ∃δ > 0 such
that, for any x1 , x2 ∈ X,
ρ( x1 , x2 ) < δ ⇒ ρ′ ( f ( x1 ), f ( x2 )) < ϵ.
Remarks
Uniformly continuous ⇒ Continuous
When f is defined on compact set (closed and bounded set if Rn ), Con-

tinuous ⇒ Uniformly Continuous.
Definition 2.9 (Absolute Continuity of a Function on Real Line).
• A real-valued function f defined on [ a, b] is said to be absolutely continu-

ous on [ a, b] if, for any given ϵ > 0, ∃δ > 0 such that
k k
∑ ( a i , bi ) < δ ⇒ ∑ | f (bi ) − f (ai )| < ϵ
i =1 i =1
for ( ai , bi ) pairwise disjoint, i = 1, · · · , k, k being arbitrary.
Remarks
• Absolutely continuous ⇒ Uniformly continuous
• Uniformly continuous ; Absolutely continuous
9
Definition 2.10 (Absolute Continuity of a Measure: P ≪ Q).
• Let P, Q be two σ-finite measures in F .
- For a given ϵ > 0, ∃δ > 0 s.t Q( A) < δ ⇒ P( A) < ϵ.
- If Q( A) = 0 ⇒ P( A) = 0 ∀ A ∈ F
⇒ P is absolute continuous with respect to Q or we denote that ( P ≪ Q).
Example 2.1.
∫
• P( A) = A
f dQ, A ∈ F
∫x
• FX ( x ) = −∞ f (t)dt
Theorem 2.3 (Radon-Nikodym Theorem). Let P, Q be two σ-finite measures

∫
in F . If P ≪ Q, then there exists f ≥ 0 such that P( A) = A f dQ for any
A ∈ F . We write f = dQ
dP
and call it Radon-Nikodym derivative.
Definition 2.11 (Probability Mass Function). If X is a discrete random vari-

able with distinct values x1 , x2 , . . . , xk , then the function, denoted by f X ( xi ) =
P[ X = xi ] = P[ω : X (ω ) = xi ] such that
• f X ( xi ) > 0 for x = xi , i = 1, . . . , k
• f X ( x ) = 0 for x ̸= xi
• ∑ f X ( xi ) = 1
is said be the probability mass function (pmf) of X.
Remarks
• Some other names of p.m.f are Discrete density function, discrete fre-
quency function, and probability function.
• Note that f X ( xi ) = FX ( xi ) − FX ( xi−1 )
Definition 2.12 (Probability Density Function). If X is continuous random

∫x
variable, then the function f X (·) such that FX ( x ) = −∞ f X (t)dt is called the
probability density function of X.
• f X ( x ) ≥ 0, ∀ x
∫∞
• −∞ f X ( x )dx = 1
10
Remarks
• Some other names of p.d.f are Density function, continuous density func-
tion, and integrating density function.
• P [ X = xi ] = 0
dFX ( x )
• f X (x) = dx
∫b
• P( a < X ≤ b) = F (b) − F ( a) = a
f ( x )dx
Remarks Decomposition of a Distribution Function
• Any cdf F ( x ) may be represented in the form of mixed distribution :
FX ( x ) = p1 FXD ( x ) + p2 FXC ( x ) where pi ≥ 0, i = 1, 2, p1 + p2 = 1, D:

discrete, C: continuous.
Theorem 2.4 (Function of a Random Variable). Let X be a random variable

and g be a Borel measurable function.Then, Y = g( X ) is also a random variable.
Proof. It suffices to show that {Y ≤ y} ∈ F to see if Y = g( X ) is a random

variable. That is, {Y ≤ y} = { g( X ) ≤ y} = {ω : X ∈ g−1 (−∞, y]} ∈ F
2.3 Expectation and Moments
Definition 2.13 (Expected Value). Let X be a random variable. Then, we define

E( X ) as expected value, (mathematical) expectation or mean of X.
∫
1. Continuous random variable ⇒ E( X ) = x f ( x )dx
2. Discrete random variable ⇒ E( X ) = ∑ xi f i
Definition 2.14 (Expectation of a Function of Random Variable). Let Y =

∫
g( X ) be a random variable. Suppose that | g( x ) | f ( x )dx < ∞. Then, we
∫ ∫
define E[Y ] = E[ g( X )] = g( x ) f ( x )dx = y f (y)dy.
Theorem 2.5 (Preservation of Monotonicity). Let E[ gi ( X )] be an expectation

∫
for a real valued function gi of X. Suppose that E(| gi ( X ) |) = | gi ( x ) |
f ( x )dx < ∞. If g1 ( x ) ≤ g2 ( x ) for all x, then E[ g1 ( X )] ≤ E[ g2 ( X )].
Proof.
Suppose that g1 ( x ) ≤ g2 ( x ) for all x.
11
∫ ∫
Then, E[ g1 ( X )] − E[ g2 ( X )] = g1 ( x ) f ( x )dx − g2 ( x ) f ( x )dx
∫
= [ g1 ( x ) − g2 ( x )] f ( x )dx ≤ 0.
Remarks
• Suppose that g1 ( x ) ≤ g2 ( x ) for almost every x and | g1 |< ∞ and

| g2 |< ∞. Then, P[ω : g1 ( X (ω )) ≤ g2 ( X (ω )) = 1.
• That is, A = {ω : g1 ( x ) ≤ g2 ( x )} with P( A) = 1 and Ac = {ω :

g1 ( x ) > g2 ( x )} with P( Ac ) = 0
∫ ∫
• Finally, E[ g1 ( X ) − g2 ( X )] = A [ g1 ( x ) − g2 ( x )] f ( x )dx + Ac [ g1 ( x ) −
g2 ( x )] f ( x )dx ≤ 0.
Theorem 2.6 (Properties of Expectation).
1. When c is constant, E(c) = c
2. E(cX ) = cE( X ) (cf. E( XY | X ) = XE(Y | X ))
3. Linear Opeartor E( X + Y ) = E( X ) + E(Y )
4. If X ⊥ Y, then E( XY ) = E( X ) E(Y )
Proof.
∫ ∫
1. c f ( x )dx = c f dx = c · 1 = c
2. Trivial.
∫∫ ∫∫ ∫∫
3. E( X + Y ) = ( x + y) f ( x, y)dxdy = x f ( x, y)dxdy + y f ( x, y)dxdy
∫ ∫ ∫ ∫ ∫ ∫
= x [ f ( x, y)dy]dx + y[ f ( x, y)dx ]dy = x f ( x )dx + y f (y)dy
= E( X ) + E(Y ))
4. It is trivial when we use f ( x, y) = f ( x ) f (y).
Definition 2.15 (Moments).

′ ∫
• rth moment of X ⇒ mr = µr = E( X r ) = xr f ( x )dx
∫
• rth central moment of X ⇒ µr = E[( X − E( X ))r ] = ( X − E( X ))r f ( x )dx
12
Example 2.2.
1
1. E( X ) = ∑i xi f i , X̄ = n ∑ xi
2. Var ( X ) = E[( X − E( X ))2 ]
3. Skewness = E[( X − E( X ))3 ]
4. Kurtosis = E[( X − E( X ))4 ]
Definition 2.16 (Moment Generating Function). For a continuous random

variable X,
∫
• MX (t) = E[etx ] = etx f ( x )dx for −h < t < h, for some small h > 0
dMX (t) ∫
• dt = xetx f ( x )dx
dr MX ( t ) ∫
• dtr = xr etx f ( x )dx
′ dr MX ( t )
• µr = E [ X r ] = dtr | t =0
For a discrete random variable X,
• MX (t) = E[etx ] = ∑i etxi f ( xi ) where e x = ∑i∞=0 i!1 xi

′ dr MX ( t )
• µr = E [ X r ] = dtr | t =0
Theorem 2.7. For 0 < s < r, if E[| X |r ] exists, then E[| X |s ] < ∞.
Remarks
∫
• There must exist h > 0 such that MX (t) = E[etx ] = etx f ( x )dx for
−h < t < h.
• The moment generating function (mgf) does not always exist for a
random variable X.
Example 2.3.
• Consider the r.v X having pdf f ( x ) = x −2 I[1,∞) ( x ).

∫∞
⇒ If the mgf of X exists, then it is given by 1 x −2 etx dx by the definition of
mgf. However, it can be shown that the integral does not exist for any t > 0.
In fact, E[ X ] = ∞.
13
• Cauchy distribution: t(1)
⇒ E[ X ] = ∞ and thus all the moments do not exist.
Definition 2.17 (Characteristic Function).

∫ √
• ϕX (t) = E[eitX ] = eitx f ( x )dx where i = −1
c f . eiy = cos(y) + isin(y)
Remarks
• ϕX (t) ⇔ FX : Characteristic function exists for any random variable

X.
• | eitx |=| cost(tx ) + isin(tx ) |= cos2 (tx ) + sin2 (tx ) = 1

d r ϕX ( t ) ′
• dtr | t =0 = E[(itX )r ] = ir µr
• MX ( t ) → mr
• FX ( x ) ⇔ mr for all r (if mr exists for every r)
2.4 Characteristics of Distribution
Location (Representative Value)

∫
1. Expectation: µ = µ1 = E( X ) = x f ( x )dx
(a) E(c) = c
(b) E(cX ) = cE( X )
(c) E( X + Y ) = E( X ) + E(Y )
(d) If X ⊥Y, then E( XY ) = E( X ) E(Y ).
2. αth-Quantile ξ α : the smallest ξ such that FX (ξ ) ≤ α
3. Median: 0.5th quantile
(a) m or Xmed such that P( X < m) ≤ 1
2 and P( X > m) ≤ 1
2
(b) In a symmetric distribution, E( X ) = m.
4. Mode: Xmod
(a) A mode of a distribution of one random variable X is a value
of x that maximizes the pdf or pmf.
(b) There may be more than one mode. Also, there may be no
mode at all.
14
Measures of Dispersion
1. Variance: µ2 = Var ( X ) = E[( X − µ)2 ]

(a) Var (c) = 0
(b) Var (cX ) = c2 Var ( X )
(c) Var ( a + bX ) = b2 Var ( X )
√
2. Standard Deviation: SD ( X ) = Var ( X ) (cf. SD ( a + bX ) =
|b|SD ( X ))
3. Interquantile Range: ξ 0.75 − ξ 0.25
– This is useful for an asymmetric distribution.
Skewness
1. Skewness: µ3 = E[( X − µ)3 ]

(a) µ3 > 0: skewed to the right
(b) µ3 = 0: symmetric
(c) µ3 < 0: skewed to the left
2. Skewness Coefficient: unit-free measure
µ3 E[( X − µ)3 ]
=
σ3 ( E[( X − µ)2 ])3/2
Kurtosis
1. Kurtosis: µ4 = E[( X − µ)4 ]

(a) µ4 > 3: long tail (leptokurtic)
(b) µ4 = 3: normal (mesokurtic)
(c) µ4 < 3: short tail (platykurtic)
2. Kurtosis Coefficient: unit-free measure
µ4 E[( X − µ)4 ]
=
σ4 ( E[( X − µ)2 ])4/2
15
2.5 Inequalities
Theorem 2.8 (Markov Inequality). Let X be a random variable and g(·) a non-
negative Borel measurable function. Then, for every k > 0,
E[ g( X )]
P[ g( X ) ≥ k] ≤
k
Proof.
∫ ∫ ∫
E[ g( X )] = g( x ) f ( x )dx = g( x ) f ( x )dx + g( x ) f ( x )dx
X:g( x )≥k X:g( x )<k
∫ ∫
≥ g( x ) f ( x )dx ≥ k f ( x )dx
X:g( x )≥k X:g( x )≥k
∫
≥k f ( x )dx = kP[ g( X ) ≥ k]
X:g( x )≥k
Example 2.4.
• Apply Markov inequality to g( x ) = ( X − µ)2 , k = r2 σX2
⇒ Chebyshev’s inequality : P[( X − µ)2 ≥ r2 σX2 ] ≤ 1

r2
• g( x ) =| X |, g( x ) =| X |α
Theorem 2.9 (Jensen’s Inequality). Let X be a random variable with mean E[ X ],

and let g(·) be a convex function. Then E[ g( X )] ≥ g( E[ X ]).
Proof. Since g( x ) is continuous and convex, there exists a line, satisfying

l ( x ) ≤ g( x ) and l ( E[ X ]) = g( E[ X ]). By definition, l ( x ) goes through the
point ( E[ X ], g( E[ X ])) and we can let l ( x ) = a + bx. That is,
E[l ( X )] = E[( a + bX )] = a + bE[ X ] = l ( E[ X ])
⇒ ( E[ X ]) = l ( E[ X ]) = E[l ( X )] ≤ E[ g( X )]
Theorem 2.10 (Hölder’s Inequality). Let X, Y be two random variables and p, q

are numbers such that p > 1, q > 1, 1p + 1q = 1. Then,
1 1
E[ XY ] ≤ E[| X | p ] p E[| Y |q ] q
16
Example 2.5.
Apply Hölder’s inequality to p = q = 2

1 1
E[ XY ] ≤ E[ X 2 ] 2 E[Y 2 ] 2 : Cauchy-Schwarz’s inequality
√ √
⇒ Cov( X, Y ) ≤ Var ( X ) Var (Y ) (c f . Cov( X, Y ) = E[( X − µ X )(Y −
µY )])
∴ −1 ≤ ρ XY = √ Cov( X,Y )
√ ≤1
Var ( X ) Var (Y )
3 Joint and Conditional Distributions, Stochastic In-

dependence and More Expectations
3.1 Joint Distribution
Definition 3.1 (n-dimensional Random Variable).
• Let X (ω ) = ( X1 (ω ), X2 (ω ), ·, Xn (ω )) for ω ∈ Ω be an n-dimensional

function defined on (Ω, F , P) into Rn
• X (ω ) is called n-dimensional random variable if the inverse image of every

n-dimensional interval in Rn , I = {( x1 , x2 , · · · , xn ) : −∞ < xi < ai ,
ai ∈ R, i = 1, 2, · · · , n} is in F .
• i.e. X −1 ( I ) = {ω : X1 (ω ) ≤ x1 , · · · , Xn (ω ) ≤ xn } ∈ F .
Theorem 3.1 (Construction of a n-dimensional Random Variable). Let Xi ,

i = 1, · · · , n be each one-dimensional random variable. Then, X = ( X1 , · · · , Xn )
is an n-dimensional random variable.
Definition 3.2 (Joint Cumulative Distribution Function). Let X be n-dimensional

random variable; X = ( X1 , · · · , Xn ). Then, the joint cumulative distribution
function of X is defined as
FX ( x1 , · · · , xn ) = FX1 ,··· ,Xn ( x1 , · · · , xn ) = P[ω : X1 (ω ) ≤ x1 ; · · · ; Xn (ω ) ≤ xn ]
for each ( x1 , · · · , xn ) ∈ Rn
Theorem 3.2 (Properties of Joint Cumulative Distribution Function).
1. Non-decreasing with respect to all arguments x1 , · · · , xn
17
2. Right continuous with respect to all arguments x1 , · · · , xn
c f . lim0<h→0 F ( x + h, y) = lim0<h→0 F ( x, y + h) = F ( x, y)
3. F (+∞, +∞) = 1, FXY (−∞, y) = FXY ( x, −∞) = 0 for all x, y
4. F ( x2 , y2 ) − F ( x2 , y1 ) − F ( x1 , y2 ) + F ( x1 , y1 ) ≥ 0 (∵ P[ x1 ≤ X ≤ x2 , y1 ≤
Y ≤ y2 ] ≥ 0)
Definition 3.3 (Joint Probability Mass Function). Let X = ( X1 , X2 , . . . , Xn )

be a discrete random vector with distinct values a1 , a2 , . . . , ak ∈ Rn . Then the
function, denoted by f X ( ai ) = P[ X = ai ], such that
• f X ( x ) > 0 for x = ai , i = 1, . . . , k
• f X ( x ) = 0 for x ̸= ai
• ∑i f X ( ai ) = 1
is called the joint probability mass function of X.
Definition 3.4 (Joint Probability Density Function). Let X = ( X1 , X2 , . . . , Xn )

be a continuous random vector and FX1 ,...,Xn be its cumulative distribution func-
tion. Then the function f X1 ,...,Xn such that
∫ x1 ∫ xn
FX1 ,...,Xn ( x1 , x2 , . . . , xn ) = ··· f (t1 , t2 , . . . , tn )dt1 · · · dtn
−∞ −∞
exists. That function is called the joint probability density function of X.
Remarks
• f ( x1 , . . . , xn ) ≥ 0,∀( x1 , . . . , xn )
∂n F ( x ,...,x )
• f ( x1 , . . . , xn ) = ∂x1 ···
1
∂xn
n
∫∞ ∫∞
• −∞ · · · −∞ f (t1 , t2 , . . . , tn )dt1 · · · dtn = 1
3.2 Marginal Distribution
Definition 3.5 (Marginal Distribution). Let X, Y be two random variables. Then

the marginal distributions of X and Y are:
FX ( x ) = FXY ( x, +∞) = P[ X ≤ x, Y < +∞]
FY (y) = FXY (+∞, y) = P[ X < +∞, Y ≤ y]
18
Definition 3.6 (Marginal Probability Density Function). Let X, Y be two ran-

dom variables and let f X,Y ( x, y) be the joint pdf of X, Y. Then marginal probability
density functions of X and Y are:
• (Discrete case)
f X ( x ) = ∑nj=1 f ( xi , y j )
f Y (y) = ∑in=1 f ( xi , y j )
• (Continuous case)
∫
f X ( x ) = f ( x, y)dy
∫
f Y (y) = f ( x, y)dx
3.3 Conditional Distribution
Definition 3.7 (Conditional Probability Distribution Function). Let X, Y be

two random variables. Then the conditional distribution of X given Y is:
FX |Y ( x | y) = P( X ≤ x | Y = y)
and the conditional density of X given Y is:
∂FX |Y ( x | y)
f X |Y ( x | y ) = (Continuous)
∂x
f X |Y ( x | y ) = P ( X = x | Y = y ) (Discrete)
∫ x
FX |Y ( x | y) = f (u | y)du
−∞
Remarks
∫x f X,Y (u,y)
• FX |Y ( x | y) = −∞ f Y (y)
du
∂FX |Y ( x |y) f X,Y ( x,y)

• ∂x = f Y (y)
Theorem 3.3 (Alternative Derivation of Conditional Density).
f X,Y ( x, y)
f X |Y ( x | y ) = if f Y (y)>0
f Y (y)
19
Proof. First, consider discrete random variables X, Y. Let A x = {w : X (w) =

x }, By = {w : Y (w) = y}. Then we have,
P( A x ∩ By )
f X |Y ( x | y) = P( X = x | Y = y) = P( A x | By ) =
P( By )
P({w : X (w) = x, Y (w) = y}) f ( x, y)
= = X,Y
P({w : Y (w) = y}) f Y (y)
Next, consider continuous random variables X, Y. Let A x = {w : X (w) 5

x } and Bε = {w : y − ε 5 Y (w) 5 y + ε}. Define By = limε→0 Bε . Then we
have,
P({w : X (w) 5 x, y − ε 5 Y (w) 5 y + ε})
FX |Y ( x | y) = P( A x | By ) = lim
ε →0 P({w : y − ε 5 Y (w) 5 y + ε})
1
∫ y+ε ∫ x ∫x
limε→0 2ε y−ε −∞ f X,Y (u, v)dudv f X,Y (u, y)du
= ∫ y+ε = ∞
1
limε→0 2ε y−ε f Y (v)dv f Y (y)
∫ x
f X,Y (u, y)
= du
−∞ f Y (y)
f X,Y ( x,y)
Therefore, f X |Y ( x |y) = f Y (y)
.
3.4 Independence of Random Variables
Definition 3.8 (Independence of Random Variables). The random variables

X and Y are said to be independent if
f X,Y ( x, y) = f X ( x ) f Y (y) ( P( A x ∩ By ) = P( A x ) P( By ))
Random variables that are not independent are said to be dependent.
Theorem 3.4. X and Y are independent if and only if
FX,Y ( x, y) = FX ( x ) FY (y) ∀( x, y) ∈ R2
Proof.
⇐) By partial differentiations
⇒) FX,Y ( x, y)
= P({ω : X (ω ) 5 x, Y (ω ) 5 y}) = P({w : X (w) 5 x } ∩ {w : Y (w) 5 y})

= P({ω : X (ω ) 5 x }) P({w : Y (w) 5 y}) = FX ( x ) FY (y)
20
Definition 3.9 (Pairwise and Mutual Independence). Let X1 , X2 , · · · , Xn be

random variables.
• X1 , . . . , Xn are pairwise independent if Xi ⊥ X j for ∀i, j = 1, 2, · · · , n, i ̸=

j
• X1 , . . . , Xn are mutually independent if for any k collection,

( Xi1 , Xi2 , . . . , Xik ) ∈ ( X1 , X2 , . . . , Xn ), k = 2, 3, . . . , n,
k
FXi1 ,··· ,Xi ( xi1 , · · · , xik ) =
k
∏ FX ij
( xi j )
j =1
Theorem 3.5 (Preservation of Independence). Let X, Y be random variables

and g1 , g2 be Borel-measurable functions. If X ⊥Y, then g1 ( X )⊥ g2 (Y ).
Proof.
P({ g1 ( X ) 5 x, g2 (Y ) 5 y}) = P({ g1 ( X ) ∈ (−∞, x ], g2 (Y ) ∈ (−∞, y]})

= P({ X ∈ g1−1 (−∞, x ], Y ∈ g2−1 (−∞, y]})
= P({ X ∈ g1−1 (−∞, x ]) P(Y ∈ g2−1 (−∞, y]})
= P({ g1 ( X ) ∈ (−∞, x ]) P( g2 (Y ) ∈ (−∞, y]})
= P({ g1 ( X ) ≤ x ]) P( g2 (Y ) ≤ y})
Definition 3.10. Identically Distributed Random Variables Let X, Y be random

variables. X and Y are identically distributed if FX ( a) = FY ( a) ∀ a ∈ R and
d
we denote X = Y
Theorem 3.6. If Xi (i = 1, 2, · · · , n) are independent identically distributed,

n
FX1 ,··· ,Xn ( x1 , · · · , xn ) = ∏ FX (xi )
i =1
Definition 3.11 (Moment Generating Function of Joint Distribution). For a

random vector X = ( X1 , X2 , · · · , Xn )′ , the moment generating function is
′
m X (t) = E[et X ] = E[et1 X1 +t2 X2 +···+tn Xn ] < ∞ − hi < ti < hi (i = 1, 2, . . . , n hi > 0)
21
Definition 3.12 (Cross Moments).
µr′ 1 ,r2 = E[ X1r1 X2r2 ]: (r1 , r2 )th cross moment
µr1 ,r2 = E[( X1 − µ1 )r1 ( X2 − µ2 )r2 ]: (r1 , r2 )th cross central moment
Remarks
∂r1 +r2 MX,Y (t1 ,t2 )
µr′ 1 ,r2 = r r | t1 = t2 =0
∂t11 ∂t22
∂r1 +r2 ϕX,Y (t1 ,t2 )

(i )r1 +r2 µr′ 1 ,r2 = r r | t1 = t2 =0
∂t11 ∂t22
(ϕX,Y : Characteristic f unction)
Theorem 3.7. X1 , X2 , . . . , Xn are mutually independent if and only if
MX1 ,X2 ,··· ,Xn (t1 , t2 , · · · tn ) = MX1 (t1 ) MX2 (t2 ) · · · MXn (tn )
Theorem 3.8. Let X ⊥Y and g1 , g2 be Borel-measurable functions. Then,
E[ g1 ( X ) g2 (Y )] = E[ g1 ( X )][ g2 (Y )]
Remarks
• A trivial corollary of the theorem is that X ⊥Y ⇒ Cov( X, Y ) = 0
Theorem 3.9. Let X1 , X2 , . . . , Xn be random variables. Let S = ∑in=1 ai Xi . Then,

n
Var (S) = ∑ a2i Var(Xi ) + ∑ ai a j Cov(Xi , Xj )
i =1 i̸= j
If X1 , X2 , . . . , Xn are independent,
n
Var (S) = ∑ a2i Var(Xi )
i =1
22
3.5 Conditional Expectation
Definition 3.13 (Conditional Expectation). Let X be an integrable random

variable on (Ω, F , P) and that G is a sub σ-field of F (G ⊂ F ). Then there exist
a random variable E[ X |G], called the conditional expected value of X given G ,
with following properties:
(1) E[ X |G] is G -measurable and integrable.
(2) E[ X |G] satisfies the functional equation

∫ ∫
E[ X |G]dP = XdP, G∈G
G G
Definition 3.14 (Conditional Mean). Let X, Y be random variables and h(·) be

a Borel-measurable function. Then,
E[h( X )|Y = y] = ∑ h ( xi ) f ( xi | y ) (Discrete)

i
∫
= h( x ) f ( x |y)dx (Continuous)
Remarks
E[h( X )|Y ] is also a random variable.
Theorem 3.10 (Properties of Conditional Expectation).
1. E[c|Y ] = c, c : consant
2. For h1 (), h2 (), Borel-measurable functions
E[c1 h1 ( X ) + c2 h2 ( X )|Y ] = c1 E[h1 ( X )|Y ] + c2 E[h2 ( X )|Y ]
3. P[ X ≥ 0] = 1 ⇒ E [ X |Y ] ≥ 0
4. P[ X1 ≥ X2 ] = 1 ⇒ E [ X1 | Y ] ≥ E [ X 2 | Y ]
5. ϕ(·): A function of X, Y ⇒ E[ϕ( X, Y )|Y = y] = E[ϕ( X, y)|Y = y]
6. Ψ (·): A Borel-measurable function ⇒ E[Ψ ( X )ϕ( X, Y )| X ] = Ψ ( X ) E[ϕ( X, Y )| X ]
23
Theorem 3.11 (Law of Iterated Expectations). Let X, Y be random variables

and E[h( X )] exist. Then,
E[ E[h( X )|Y ]] = E[h( X )]
Proof.
∫ ∞ ∫ ∞
h( x ) f X,Y ( x, y)dxdy
−∞ −∞
∫ ∞ ∫ ∞
f X,Y ( x, y)
= [ h( x ) dx ] f Y (y)dy
−∞ −∞ f Y (y)
∫ ∞
= E[h( X )|y] f Y (y)dy = E[ E[h( X )|Y ]]
−∞
= E[h( X )]
Definition 3.15 (Conditional Variance). Let X, Y be random variables and E[ X |Y ]

be a conditional expectation of X given Y. Then,
Var ( X |Y ) = E[( X − E[ X |Y ])2 |Y ]
Theorem 3.12. Let X, Y be random variables with finite variances. Then,
1. Var ( X |Y ) = E[ X 2 |Y ] − ( E[ X |Y ])2
2. Var ( X ) = E[Var ( X |Y )] + Var ( E[ X |Y ])
Proof.
1. E[( X − E[ X |Y ])2 |Y ] = E[ X 2 − 2XE[ X |Y ] + ( E[ X |Y ])2 |Y ]

=E[ X 2 |Y ]-2E[ XE[ X |Y ]|Y ]+E[( E[ X |Y ])2 |Y ]
=E[ X 2 |Y ]-( E[ X |Y ])2
2. E[Var ( X |Y )] = E[ E[ X 2 |Y ] − ( E[ X |Y ])2 ]
=E[ X 2 ] − ( E[ X ])2 − ( E[( E[ X |Y ])2 ] − ( E[ X ])2 )
=Var ( X ) − Var ( E[ X |Y ])
∴ Var ( X ) = E[Var ( X |Y )] + Var ( E[ X |Y ])
24

(김재영) probability

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

(김재영) probability

Diunggah oleh

Hak Cipta:

Format Tersedia

Studies in Economic Statistics Jae-Young Kim

Definition 1.1 (Probability Space). A probability space is a triple (Ω, F , P)

1. Ω (Sample Space): the set of all possible outcomes of a random experiment.

2. F (σ-field or σ-algebra): a collection of subsets of Ω.

3. P (Probability Measure): a real-valued function defined on F .

Example 1.1 (Tossing a Coin).

• P({ H }) = P({ T }) = 1/2

Definition 1.2 (σ-field (σ-algebra)). A class F of subsets of Ω is called σ-field

• A σ-field is always a field, but not vice versa.

• An element A ∈ F is called an event.

• An element ω ∈ Ω is called an outcome.

Definition 1.3 (The smallest σ-field generated by A, σ(A)). Let A be a class

3. If A ⊂ G, and G is a σ-field, then σ(A) ⊂ G.

Example 1.2 (σ(A)).

Definition 1.4 (Probability Measure). A real-valued set function defined on a

Definition 1.5 (Lebesque Measure). First we define µ on an open interval in the

µ∗ ( A) = inf ∑ µ(Ck ), {Ck }: Open covering

Theorem 1.1 (Unique Extension). A probability measure on a field F0 has a

1. Let P be a probability measure on F0 and let F =σ(F0 ).Then, there exists a

2. Let Q′ be another probability measure on F such that Q′ ( A) = P( A) for A

Theorem 1.2 (Properties of Probability Measure).

1. For A ⊂ B, P(A) ≤ P(B).

2. P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

3. P(A ∪ B) ≤ P(A)+ P(B)

1.2 Some Limit Concepts of Probability

Definition 1.6 (Limit of Events for Monotone Sequences). Let { En } be a se-

1. Monotone increasing sequence of events :

2. Monotone decreasing sequence of events :

Theorem 1.3 (A monotone sequence of events { En }).

lim En = lim sup En = lim inf En

Lemma 1.1 (Borel - Cantelli). Let { En } be a sequence of events.

1.3 Conditional Probability and Independence

Definition 1.8 (Conditional Probability). For an event A s.t P( A) > 0, the

• For any pair A, B ∈ Γ, if P( A ∩ B) = P( A) P( B), then events in Γ are

• For any collection of events ( Ai1 , . . . , Aik ), i, k = 1, 2, . . . in Γ, if P( Ai1 ∩

1.4 Bayes Theorem

Theorem 1.4 (Bayes Theorem). For A, B ∈ F , P( A) > 0, P( B) > 0,

Remarks A Partition {Ai } of Ω

• {Ai } is a partition of Ω if it satisfies

• For every B ∈ F , P( B) > 0,

Remarks Bayesian Approach

• On a probability space (Ω, F , P)

• Let Hi be a partition of Ω, which are unobservable events.

Remarks Classical VS Bayesian Approach

• Classical (Frequentist) Approach

(a) X, Y are random variables.

(a) Unknowns (Unobservable) are regarded as random variables.

2 Random Variables, Distribution Functions, and Ex-

Definition 2.1 (Random Variable).

• A random variable X : Ω → R defined on (Ω, F , P) is called F /B -

• Let (Ω, F ), (Ω′ , F ′ ) be two measurable spaces.

• A mapping T : Ω → Ω′ is said to be F /F ′ -measurable if for any B ∈ F ′ ,

• Let X be a random variable defined on Ω.

• Then, the random variable X induces a new probability space ( R, B , PX )

It is sufficient to show that

2.2 Probability Distribution Function

Definition 2.3 (Distribution Function). Let X be a random variable. Given x,

Definition 2.4 (Cumulative distribution function (cdf)).

FX ( x ) = P[{ω : X (ω ) ≤ x }] = P( X ≤ x ) = PX {(−∞, x ]} = PX [{r : −∞ < r ≤ x }]

Theorem 2.2 (Properties of Distribution Function).

2. For x1 ≤ x2 , FX ( x1 ) ≤ FX ( x2 ) (Monotone and Non-decreasing)

3. lim0<h→0 FX ( x + h) = FX ( x ) (Right Continuity)