Notes On Jensen's Inequality

Notes on Jensen’s inequality
There are various concrete representa!ons of Jensen’s inequality.
Jensen’s inequality in Andrew Ng’s Lecture Notes
Let f be a convex func!on, and let X be a random variable. Then:
E[f (X)] ≥ f (EX)
Moreover, if f is strictly convex, then E[f (X)] = f (EX) holds true if and only if
X = E[X] with probability 1 (i.e., if X is a constant).
Jensen’s inequality also holds for concave func!ons f , but with the direc!on of all the
inequali!es reversed (E[f (X)] ≤ f (EX), etc.).
For an interpreta!on of the theorem, consider the figure below.

Here, f is a convex func!on shown by the solid line. Also, X is a random variable that
has a 0.5 chance of taking the value a, and a 0.5 chance of taking the value b (indicated
on the x-axis). Thus, the expected value of X is given by the midpoint between a and b.
We also see the values f (a), f (b) and f (E[X]) indicated on the y -axis. Moreover, the
value E[f (X)] is now the midpoint on the y -axis between f (a) and f (b). From our
example, we see that because f is convex, it must be the case that E[f (X)] ≥
f (EX).
Jensen’s inequality in David McAllester’s Lecture Notes
Consider a probability distribu!on P on a set M and a func!on X assigning real values

X(m) for m ∈ M . If f is convex then for any distribu!on P on M we have the
following:
Em∼P [f (X(m))] ≥ f (Em∼P [X(m)])
Jensen’s inequality in Richard Yida Xu’s Lecture Notes
If Φ is a convex func!on and 0 < t < 1, then
Φ((1 − t) × x1 + t × x2 ) ≤ (1 − t) × Φ(x1 ) + t × Φ(x2 )

n
With ∑i=1 pi = 1, we can generalize the above inequality.
Φ(p1 x1 + p2 x2 + ... + pn xn ) ≤ p1 Φ(x1 ) + p1 Φ(x1 ) + ... + pn Φ(xn )

n n
Φ(∑ pi × xi ) ≤ ∑ pi × Φ(xi )
i=1 i=1
If both values of xi and f (xi ) are in the domain of Φ, we can replace xi with f (xi ) and
s!ll get
n n
Φ(∑ pi × f (xi )) ≤ ∑ pi × Φ(f (xi ))
i=1 i=1
For the con!nuous case with ∫x∈S p(x) = 1, if both values of x and f (x) are in the
domain of Φ we can get
Φ(∫ f (x)p(x)dx) ≤ ∫ Φ(f (x))p(x)dx

x∈S x∈S
Actually, the above inequality is
Φ(E[f (x)]) ≤ E[Φ(f (x))]
Jensen’s inequality from Wikipedia
Form involving a probability density func!on
Suppose Ω is a measurable subset of the real line and f (x) is a non-nega!ve func!on
such that
∞
∫ f (x) dx = 1
−∞
In probabilis!c language, f is a probability density func!on.
Then Jensen’s inequality becomes the following statement about convex integrals:
If g is any real-valued measurable func!on and φ is convex over the range of g , then
∞ ∞
φ (∫ g(x)f (x) dx) ≤ ∫ φ(g(x))f (x) dx.
−∞ −∞
If g(x) = x, then this form of the inequality reduces to a commonly used special case:
∞ ∞
φ (∫ x f (x) dx) ≤ ∫ φ(x) f (x) dx.
−∞ −∞
Alterna!ve finite form
Let Ω = {x1 , ...xn }, and take µ to be the coun!ng measure on Ω, then the general
form reduces to a statement about sums:
n n
φ (∑ g(xi )f (xi )) ≤ ∑ φ(g(xi ))f (xi )
i=1 i=1
provided that f (xi ) = λi ≥ 0 and
λ1 + ⋯ + λn = 1
Gibbs’ inequality
If p(x) is the true probability distribu!on for x, and q(x) is another distribu!on, then
applying Jensen’s inequality for the random variable Y (x) = q(x)/p(x) and the
func!on φ(y) = −log(y) gives
E[φ(Y )] ≥ φ(E[Y ])
Therefore:
KL(p(x)∥q(x)) = ∫ p(x) log ( ) dx

p(x)
q(x)
= − ∫ p(x) log ( ) dx
q(x)
p(x)
≥ − log (∫ p(x) dx)
q(x)
p(x)
= − log (∫ q(x) dx) = 0
a result called Gibbs’ inequality.
It shows that the average message length is minimized when codes are assigned on the
basis of the true probabili!es p rather than any other distribu!on q . The quan!ty that is
non-nega!ve is called the Kullback–Leibler divergence of q from p.
Since −log(x) is a strictly convex func!on for x > 0, it follows that equality holds
when p(x) equals q(x) almost everywhere.
Notes:
Compared with other notes, it seems that the notes from Richard Yida Xu and Wikipedia
be"er match the deriva!on of EM and KL-divergence.
Reference
David McAllester, Jensen’s Inequality, (h"p://#c.uchicago.edu/~dmcallester/#c101-

07/lectures/jensen/jensen.pdf)
Richard YiDa Xu, Expecta!on-Maximiza!on, (h"p://www-

staﬀ.it.uts.edu.au/~ydxu/ml_course/em.pdf)
Jensen’s inequality from Wikipedia (h"ps://en.wikipedia.org/wiki/Jensen’s_inequality)

Notes On Jensen's Inequality

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Notes On Jensen's Inequality

Diunggah oleh

Hak Cipta:

Format Tersedia

Notes on Jensen’s inequality

There are various concrete representa!ons of Jensen’s inequality.

Jensen’s inequality in Andrew Ng’s Lecture Notes

Let f be a convex func!on, and let X be a random variable. Then:

E[f (X)] ≥ f (EX)

For an interpreta!on of the theorem, consider the figure below.

Jensen’s inequality in David McAllester’s Lecture Notes

Consider a probability distribu!on P on a set M and a func!on X assigning real values

Jensen’s inequality in Richard Yida Xu’s Lecture Notes

If Φ is a convex func!on and 0 < t < 1, then

Φ((1 − t) × x1 + t × x2 ) ≤ (1 − t) × Φ(x1 ) + t × Φ(x2 )

Φ(p1 x1 + p2 x2 + ... + pn xn ) ≤ p1 Φ(x1 ) + p1 Φ(x1 ) + ... + pn Φ(xn )

Φ(∫ f (x)p(x)dx) ≤ ∫ Φ(f (x))p(x)dx

Actually, the above inequality is

Φ(E[f (x)]) ≤ E[Φ(f (x))]

Jensen’s inequality from Wikipedia

Form involving a probability density func!on

In probabilis!c language, f is a probability density func!on.

Alterna!ve finite form

provided that f (xi ) = λi ≥ 0 and

KL(p(x)∥q(x)) = ∫ p(x) log ( ) dx

a result called Gibbs’ inequality.

David McAllester, Jensen’s Inequality, (h"p://#c.uchicago.edu/~dmcallester/#c101-

Richard YiDa Xu, Expecta!on-Maximiza!on, (h"p://www-

Jensen’s inequality from Wikipedia (h"ps://en.wikipedia.org/wiki/Jensen’s_inequality)

Anda mungkin juga menyukai