Anda di halaman 1dari 53

BST 401 Probability Theory

Xing Qiu Ha Youn Lee

Department of Biostatistics and Computational Biology


University of Rochester

September 30, 2009

Qiu, Lee BST 401


Outline

1 Basic Properties of Integrals

2 Useful Inequalities

3 Convergence Theorems

Qiu, Lee BST 401


Review

Random variables and Borel-measurable functions.


Simple functions.
Two things make a Lebesgue-Stieltjes integral (for simple
functions).
Example about change either one of them: one measure is
Lebesgue measure, an other one a probability measure.
You can say that Lebesgue integral w.r.t. µ is just an
weighted Riemann integral/summation.

Qiu, Lee BST 401


Review

Random variables and Borel-measurable functions.


Simple functions.
Two things make a Lebesgue-Stieltjes integral (for simple
functions).
Example about change either one of them: one measure is
Lebesgue measure, an other one a probability measure.
You can say that Lebesgue integral w.r.t. µ is just an
weighted Riemann integral/summation.

Qiu, Lee BST 401


Review

Random variables and Borel-measurable functions.


Simple functions.
Two things make a Lebesgue-Stieltjes integral (for simple
functions).
Example about change either one of them: one measure is
Lebesgue measure, an other one a probability measure.
You can say that Lebesgue integral w.r.t. µ is just an
weighted Riemann integral/summation.

Qiu, Lee BST 401


Review

Random variables and Borel-measurable functions.


Simple functions.
Two things make a Lebesgue-Stieltjes integral (for simple
functions).
Example about change either one of them: one measure is
Lebesgue measure, an other one a probability measure.
You can say that Lebesgue integral w.r.t. µ is just an
weighted Riemann integral/summation.

Qiu, Lee BST 401


Review

Random variables and Borel-measurable functions.


Simple functions.
Two things make a Lebesgue-Stieltjes integral (for simple
functions).
Example about change either one of them: one measure is
Lebesgue measure, an other one a probability measure.
You can say that Lebesgue integral w.r.t. µ is just an
weighted Riemann integral/summation.

Qiu, Lee BST 401


Linearity

Assume h, g are two functions both measurable w.r.t. the


same measure µ.
l(ω) = c1 h(ω) + c2 g(ω) is called a linear combination of h
and g. (c1 , c2 are two constants)
R R R
Ω c1 h + c 2 gdµ = c1 Ω h + c 2 Ω gdµ.

Qiu, Lee BST 401


Linearity

Assume h, g are two functions both measurable w.r.t. the


same measure µ.
l(ω) = c1 h(ω) + c2 g(ω) is called a linear combination of h
and g. (c1 , c2 are two constants)
R R R
Ω c1 h + c 2 gdµ = c1 Ω h + c 2 Ω gdµ.

Qiu, Lee BST 401


Linearity

Assume h, g are two functions both measurable w.r.t. the


same measure µ.
l(ω) = c1 h(ω) + c2 g(ω) is called a linear combination of h
and g. (c1 , c2 are two constants)
R R R
Ω c1 h + c 2 gdµ = c1 Ω h + c 2 Ω gdµ.

Qiu, Lee BST 401


Other properties

Theorem (4.7), page 458.


How to prove? First prove these properties are true for
simple functions. Then take limits to generalize them to
measurable functions.
All these equalities/inequalities can be replaced by their
“almost everywhere” counterparts.

Qiu, Lee BST 401


Other properties

Theorem (4.7), page 458.


How to prove? First prove these properties are true for
simple functions. Then take limits to generalize them to
measurable functions.
All these equalities/inequalities can be replaced by their
“almost everywhere” counterparts.

Qiu, Lee BST 401


Other properties

Theorem (4.7), page 458.


How to prove? First prove these properties are true for
simple functions. Then take limits to generalize them to
measurable functions.
All these equalities/inequalities can be replaced by their
“almost everywhere” counterparts.

Qiu, Lee BST 401


Jensen’s inequality

A function ϕ(x) is convex if for all λ ∈ (0, 1) and x, y ∈ R,

λϕ(x) + (1 − λ)ϕ(y ) > ϕ(λx + (1 − λ)y ).

Show students a graph and explain why this definition is


more general than a simpler definition via 2nd derivative.
Jensen’s inequality. Denote X = X (ω) as a random
variable defined
R on a probability space (Ω, F , µ) and
E(X ) := Ω X (ω)dµ(x), which is called the mathematical
expectation of X . If X is integrable (E|X | < ∞), then

ϕ (E(X )) 6 E (ϕ(X )) .

If ϕ is concave, then we have the opposite inequality.

Qiu, Lee BST 401


Jensen’s inequality

A function ϕ(x) is convex if for all λ ∈ (0, 1) and x, y ∈ R,

λϕ(x) + (1 − λ)ϕ(y ) > ϕ(λx + (1 − λ)y ).

Show students a graph and explain why this definition is


more general than a simpler definition via 2nd derivative.
Jensen’s inequality. Denote X = X (ω) as a random
variable defined
R on a probability space (Ω, F , µ) and
E(X ) := Ω X (ω)dµ(x), which is called the mathematical
expectation of X . If X is integrable (E|X | < ∞), then

ϕ (E(X )) 6 E (ϕ(X )) .

If ϕ is concave, then we have the opposite inequality.

Qiu, Lee BST 401


Jensen’s inequality

A function ϕ(x) is convex if for all λ ∈ (0, 1) and x, y ∈ R,

λϕ(x) + (1 − λ)ϕ(y ) > ϕ(λx + (1 − λ)y ).

Show students a graph and explain why this definition is


more general than a simpler definition via 2nd derivative.
Jensen’s inequality. Denote X = X (ω) as a random
variable defined
R on a probability space (Ω, F , µ) and
E(X ) := Ω X (ω)dµ(x), which is called the mathematical
expectation of X . If X is integrable (E|X | < ∞), then

ϕ (E(X )) 6 E (ϕ(X )) .

If ϕ is concave, then we have the opposite inequality.

Qiu, Lee BST 401


Jensen’s inequality

A function ϕ(x) is convex if for all λ ∈ (0, 1) and x, y ∈ R,

λϕ(x) + (1 − λ)ϕ(y ) > ϕ(λx + (1 − λ)y ).

Show students a graph and explain why this definition is


more general than a simpler definition via 2nd derivative.
Jensen’s inequality. Denote X = X (ω) as a random
variable defined
R on a probability space (Ω, F , µ) and
E(X ) := Ω X (ω)dµ(x), which is called the mathematical
expectation of X . If X is integrable (E|X | < ∞), then

ϕ (E(X )) 6 E (ϕ(X )) .

If ϕ is concave, then we have the opposite inequality.

Qiu, Lee BST 401


Heuristics of the Jensen’s inequality

A graphical proof based on ϕ(x) = x 2 , Ω = [a, b].


A convex transformation “accentuates” the extreme values
of X and a concave transformation “attenuates” these
extreme values.
Modern microeconomics depends on several assumptions.
One of which is the famous “law” of diminishing marginal
returns of virtually everything. One example: you may
choose from two investing portfolios. One is more
aggressive (high return high risk) and the other more
conservative (low risk low return).
In this context, Jensen’s inequality is the foundation of the
price theory of the insurance industry and financial market.

Qiu, Lee BST 401


Heuristics of the Jensen’s inequality

A graphical proof based on ϕ(x) = x 2 , Ω = [a, b].


A convex transformation “accentuates” the extreme values
of X and a concave transformation “attenuates” these
extreme values.
Modern microeconomics depends on several assumptions.
One of which is the famous “law” of diminishing marginal
returns of virtually everything. One example: you may
choose from two investing portfolios. One is more
aggressive (high return high risk) and the other more
conservative (low risk low return).
In this context, Jensen’s inequality is the foundation of the
price theory of the insurance industry and financial market.

Qiu, Lee BST 401


Heuristics of the Jensen’s inequality

A graphical proof based on ϕ(x) = x 2 , Ω = [a, b].


A convex transformation “accentuates” the extreme values
of X and a concave transformation “attenuates” these
extreme values.
Modern microeconomics depends on several assumptions.
One of which is the famous “law” of diminishing marginal
returns of virtually everything. One example: you may
choose from two investing portfolios. One is more
aggressive (high return high risk) and the other more
conservative (low risk low return).
In this context, Jensen’s inequality is the foundation of the
price theory of the insurance industry and financial market.

Qiu, Lee BST 401


Heuristics of the Jensen’s inequality

A graphical proof based on ϕ(x) = x 2 , Ω = [a, b].


A convex transformation “accentuates” the extreme values
of X and a concave transformation “attenuates” these
extreme values.
Modern microeconomics depends on several assumptions.
One of which is the famous “law” of diminishing marginal
returns of virtually everything. One example: you may
choose from two investing portfolios. One is more
aggressive (high return high risk) and the other more
conservative (low risk low return).
In this context, Jensen’s inequality is the foundation of the
price theory of the insurance industry and financial market.

Qiu, Lee BST 401


Hölder’s inequality
1
We use kf kp , p ∈ [0, ∞) to denote Ω |f (x)|p dµ p , the Lp
R

norm of f w.r.t. measure µ.


Later we will learn that this norm can be considered as the
“length” of a measurable function f .
For p, q ∈ (1, ∞) with p1 + q1 = 1, we have
Z
|fg|dµ 6 kf kp kgkq .

A special case: p = q = 2. It’s called the Cauchy-Schwarz


inequality.
The Hölder’s inequality is a very important inequality in
many different branches of mathematics. As an example,
in probability theory, it can be used to show that finite
variance must imply finite expectation.
Qiu, Lee BST 401
Hölder’s inequality
1
We use kf kp , p ∈ [0, ∞) to denote Ω |f (x)|p dµ p , the Lp
R

norm of f w.r.t. measure µ.


Later we will learn that this norm can be considered as the
“length” of a measurable function f .
For p, q ∈ (1, ∞) with p1 + q1 = 1, we have
Z
|fg|dµ 6 kf kp kgkq .

A special case: p = q = 2. It’s called the Cauchy-Schwarz


inequality.
The Hölder’s inequality is a very important inequality in
many different branches of mathematics. As an example,
in probability theory, it can be used to show that finite
variance must imply finite expectation.
Qiu, Lee BST 401
Hölder’s inequality
1
We use kf kp , p ∈ [0, ∞) to denote Ω |f (x)|p dµ p , the Lp
R

norm of f w.r.t. measure µ.


Later we will learn that this norm can be considered as the
“length” of a measurable function f .
For p, q ∈ (1, ∞) with p1 + q1 = 1, we have
Z
|fg|dµ 6 kf kp kgkq .

A special case: p = q = 2. It’s called the Cauchy-Schwarz


inequality.
The Hölder’s inequality is a very important inequality in
many different branches of mathematics. As an example,
in probability theory, it can be used to show that finite
variance must imply finite expectation.
Qiu, Lee BST 401
Hölder’s inequality
1
We use kf kp , p ∈ [0, ∞) to denote Ω |f (x)|p dµ p , the Lp
R

norm of f w.r.t. measure µ.


Later we will learn that this norm can be considered as the
“length” of a measurable function f .
For p, q ∈ (1, ∞) with p1 + q1 = 1, we have
Z
|fg|dµ 6 kf kp kgkq .

A special case: p = q = 2. It’s called the Cauchy-Schwarz


inequality.
The Hölder’s inequality is a very important inequality in
many different branches of mathematics. As an example,
in probability theory, it can be used to show that finite
variance must imply finite expectation.
Qiu, Lee BST 401
Hölder’s inequality
1
We use kf kp , p ∈ [0, ∞) to denote Ω |f (x)|p dµ p , the Lp
R

norm of f w.r.t. measure µ.


Later we will learn that this norm can be considered as the
“length” of a measurable function f .
For p, q ∈ (1, ∞) with p1 + q1 = 1, we have
Z
|fg|dµ 6 kf kp kgkq .

A special case: p = q = 2. It’s called the Cauchy-Schwarz


inequality.
The Hölder’s inequality is a very important inequality in
many different branches of mathematics. As an example,
in probability theory, it can be used to show that finite
variance must imply finite expectation.
Qiu, Lee BST 401
Minkowski’s inequality

See the book. Leave as a homework.

Qiu, Lee BST 401


Why inequalities are important?

Sometimes an equality is impossible to derive so an


inequality estimation is the next best thing.
Inequalities lay the foundation of functional spaces, such
as the Lp spaces, which will be introduced later.
All different types of convergence of random variables are
described by inequalities (the  − δ definition).

Qiu, Lee BST 401


Why inequalities are important?

Sometimes an equality is impossible to derive so an


inequality estimation is the next best thing.
Inequalities lay the foundation of functional spaces, such
as the Lp spaces, which will be introduced later.
All different types of convergence of random variables are
described by inequalities (the  − δ definition).

Qiu, Lee BST 401


Why inequalities are important?

Sometimes an equality is impossible to derive so an


inequality estimation is the next best thing.
Inequalities lay the foundation of functional spaces, such
as the Lp spaces, which will be introduced later.
All different types of convergence of random variables are
described by inequalities (the  − δ definition).

Qiu, Lee BST 401


Bounded convergence theorem

S
En := {x|fn (x) 6= 0}, E = n En , µ(E) < ∞ (bounded
measure).
|fn (x)| 6 M < ∞ (uniformly bounded range).
Then we have
Z Z
lim fn dµ = lim fn dµ.
n→∞ n→∞

Qiu, Lee BST 401


Bounded convergence theorem

S
En := {x|fn (x) 6= 0}, E = n En , µ(E) < ∞ (bounded
measure).
|fn (x)| 6 M < ∞ (uniformly bounded range).
Then we have
Z Z
lim fn dµ = lim fn dµ.
n→∞ n→∞

Qiu, Lee BST 401


Bounded convergence theorem

S
En := {x|fn (x) 6= 0}, E = n En , µ(E) < ∞ (bounded
measure).
|fn (x)| 6 M < ∞ (uniformly bounded range).
Then we have
Z Z
lim fn dµ = lim fn dµ.
n→∞ n→∞

Qiu, Lee BST 401


Monotone convergence theorem

Motivation: we already have monotone convergence for


numbers, for sets, for measurable function. This is just
another version of the same trick for the integrals.
f1 , f2 , . . . is a monotone sequence of nonnegative
measurable R functions. fn → f pointwisely. Then
RB f dµ for all B ∈ B, in particular,
R
RB n f dµ →
Ω fn dµ → Ω f dµ.
Remember the definition of integral by monotone
sequence of simple functions? This theorem says the
integral of the limit of measurable functions, not
necessarily just simple functions, is the limit of integrals.

Qiu, Lee BST 401


Monotone convergence theorem

Motivation: we already have monotone convergence for


numbers, for sets, for measurable function. This is just
another version of the same trick for the integrals.
f1 , f2 , . . . is a monotone sequence of nonnegative
measurable R functions. fn → f pointwisely. Then
RB f dµ for all B ∈ B, in particular,
R
RB n f dµ →
Ω fn dµ → Ω f dµ.
Remember the definition of integral by monotone
sequence of simple functions? This theorem says the
integral of the limit of measurable functions, not
necessarily just simple functions, is the limit of integrals.

Qiu, Lee BST 401


Monotone convergence theorem

Motivation: we already have monotone convergence for


numbers, for sets, for measurable function. This is just
another version of the same trick for the integrals.
f1 , f2 , . . . is a monotone sequence of nonnegative
measurable R functions. fn → f pointwisely. Then
RB f dµ for all B ∈ B, in particular,
R
RB n f dµ →
Ω fn dµ → Ω f dµ.
Remember the definition of integral by monotone
sequence of simple functions? This theorem says the
integral of the limit of measurable functions, not
necessarily just simple functions, is the limit of integrals.

Qiu, Lee BST 401


Corollaries

R P∞ P∞ R
Ω( n=1 fn ) dµ = n=1 Ω f dµ holds for nonnegative fn .
Can we exchange limit/integral in general? The answer is
no.

Qiu, Lee BST 401


Corollaries

R P∞ P∞ R
Ω( n=1 fn ) dµ = n=1 Ω f dµ holds for nonnegative fn .
Can we exchange limit/integral in general? The answer is
no.

Qiu, Lee BST 401


Counter examples

A moving block: An = (n, n + 1), fn = 1An .


With Lebesgue measure, this sequence of integrals
“escapes” to x-infinity.
With a probability measure, the above example won’t be a
counter example, why?
A sequence which leads to the Dirac δ function (escapes to
y -infinity).
The above observation implies that for a probability
measure µ, the only way to “break” the interchangeability
of lim/integral is to escape to y -infinity.

Qiu, Lee BST 401


Counter examples

A moving block: An = (n, n + 1), fn = 1An .


With Lebesgue measure, this sequence of integrals
“escapes” to x-infinity.
With a probability measure, the above example won’t be a
counter example, why?
A sequence which leads to the Dirac δ function (escapes to
y -infinity).
The above observation implies that for a probability
measure µ, the only way to “break” the interchangeability
of lim/integral is to escape to y -infinity.

Qiu, Lee BST 401


Counter examples

A moving block: An = (n, n + 1), fn = 1An .


With Lebesgue measure, this sequence of integrals
“escapes” to x-infinity.
With a probability measure, the above example won’t be a
counter example, why?
A sequence which leads to the Dirac δ function (escapes to
y -infinity).
The above observation implies that for a probability
measure µ, the only way to “break” the interchangeability
of lim/integral is to escape to y -infinity.

Qiu, Lee BST 401


Counter examples

A moving block: An = (n, n + 1), fn = 1An .


With Lebesgue measure, this sequence of integrals
“escapes” to x-infinity.
With a probability measure, the above example won’t be a
counter example, why?
A sequence which leads to the Dirac δ function (escapes to
y -infinity).
The above observation implies that for a probability
measure µ, the only way to “break” the interchangeability
of lim/integral is to escape to y -infinity.

Qiu, Lee BST 401


Counter examples

A moving block: An = (n, n + 1), fn = 1An .


With Lebesgue measure, this sequence of integrals
“escapes” to x-infinity.
With a probability measure, the above example won’t be a
counter example, why?
A sequence which leads to the Dirac δ function (escapes to
y -infinity).
The above observation implies that for a probability
measure µ, the only way to “break” the interchangeability
of lim/integral is to escape to y -infinity.

Qiu, Lee BST 401


Dominated convergence theorem

This is perhaps the single most useful convergence theorem.


Two counter examples prompt an idea:R to find an
µ-integrable function g (e.g., , integral Ω |g|dµ is finite) that
“dominates” the sequence fn .
f1 , f2 , . . ., f , g are all measurable functions.
|fn | ≤ g for all n (in other words, g dominates |fn |).
g is a µ-integrable function.
R R
Conclusion: fn → f implies Ω fn dµ → Ω f dµ, or say, you
can swap limit/integral.
The monotone convergence theorem is just a special case
of this theorem.

Qiu, Lee BST 401


Dominated convergence theorem

This is perhaps the single most useful convergence theorem.


Two counter examples prompt an idea:R to find an
µ-integrable function g (e.g., , integral Ω |g|dµ is finite) that
“dominates” the sequence fn .
f1 , f2 , . . ., f , g are all measurable functions.
|fn | ≤ g for all n (in other words, g dominates |fn |).
g is a µ-integrable function.
R R
Conclusion: fn → f implies Ω fn dµ → Ω f dµ, or say, you
can swap limit/integral.
The monotone convergence theorem is just a special case
of this theorem.

Qiu, Lee BST 401


Dominated convergence theorem

This is perhaps the single most useful convergence theorem.


Two counter examples prompt an idea:R to find an
µ-integrable function g (e.g., , integral Ω |g|dµ is finite) that
“dominates” the sequence fn .
f1 , f2 , . . ., f , g are all measurable functions.
|fn | ≤ g for all n (in other words, g dominates |fn |).
g is a µ-integrable function.
R R
Conclusion: fn → f implies Ω fn dµ → Ω f dµ, or say, you
can swap limit/integral.
The monotone convergence theorem is just a special case
of this theorem.

Qiu, Lee BST 401


Dominated convergence theorem

This is perhaps the single most useful convergence theorem.


Two counter examples prompt an idea:R to find an
µ-integrable function g (e.g., , integral Ω |g|dµ is finite) that
“dominates” the sequence fn .
f1 , f2 , . . ., f , g are all measurable functions.
|fn | ≤ g for all n (in other words, g dominates |fn |).
g is a µ-integrable function.
R R
Conclusion: fn → f implies Ω fn dµ → Ω f dµ, or say, you
can swap limit/integral.
The monotone convergence theorem is just a special case
of this theorem.

Qiu, Lee BST 401


Dominated convergence theorem

This is perhaps the single most useful convergence theorem.


Two counter examples prompt an idea:R to find an
µ-integrable function g (e.g., , integral Ω |g|dµ is finite) that
“dominates” the sequence fn .
f1 , f2 , . . ., f , g are all measurable functions.
|fn | ≤ g for all n (in other words, g dominates |fn |).
g is a µ-integrable function.
R R
Conclusion: fn → f implies Ω fn dµ → Ω f dµ, or say, you
can swap limit/integral.
The monotone convergence theorem is just a special case
of this theorem.

Qiu, Lee BST 401


Dominated convergence theorem

This is perhaps the single most useful convergence theorem.


Two counter examples prompt an idea:R to find an
µ-integrable function g (e.g., , integral Ω |g|dµ is finite) that
“dominates” the sequence fn .
f1 , f2 , . . ., f , g are all measurable functions.
|fn | ≤ g for all n (in other words, g dominates |fn |).
g is a µ-integrable function.
R R
Conclusion: fn → f implies Ω fn dµ → Ω f dµ, or say, you
can swap limit/integral.
The monotone convergence theorem is just a special case
of this theorem.

Qiu, Lee BST 401


Corollary

This corollary is the foundation of the Lp convergence.


Conditions just as above, plus:
|g|p is µ-integrable (p > 0 is a fixed constant).
Then: a) |f |p is integrable; b) Ω |fn − f |p dµ → 0.
R

In practice, most popular choices of p: either 1 or 2.

Qiu, Lee BST 401


Corollary

This corollary is the foundation of the Lp convergence.


Conditions just as above, plus:
|g|p is µ-integrable (p > 0 is a fixed constant).
Then: a) |f |p is integrable; b) Ω |fn − f |p dµ → 0.
R

In practice, most popular choices of p: either 1 or 2.

Qiu, Lee BST 401


Corollary

This corollary is the foundation of the Lp convergence.


Conditions just as above, plus:
|g|p is µ-integrable (p > 0 is a fixed constant).
Then: a) |f |p is integrable; b) Ω |fn − f |p dµ → 0.
R

In practice, most popular choices of p: either 1 or 2.

Qiu, Lee BST 401


Corollary

This corollary is the foundation of the Lp convergence.


Conditions just as above, plus:
|g|p is µ-integrable (p > 0 is a fixed constant).
Then: a) |f |p is integrable; b) Ω |fn − f |p dµ → 0.
R

In practice, most popular choices of p: either 1 or 2.

Qiu, Lee BST 401

Anda mungkin juga menyukai