Differential Entropy Explained

Differential Entropy
Definition
Let
X be a random variable with cumulative
distribution function F(x) = Pr ( X x ). If F(x) is continuous, the r.v. is said to be continuous. Let f(x) = F(x) when the derivative is defined. If
f ( x )dx = 1, then f(x) is called the pdf for X .
The set where f(x) > 0 is called the support set of X .

2
Definition
The differential entropy h( X ) of a continuous r.v. X with a density function f(x) is defined as
h ( X ) = f ( x ) log f ( x ) dx
s
(1)
where S is the support of the r.v. Since h ( X ) depends only on f(x), sometimes the differential entropy is written as h( f ) rather then
h( X ) .
3
EX.1: (Uniform distribution)

, 0x<a f (x ) = 0, otherwise a 1 1 h ( X ) = log dx = log a 0 a a Note: For a<1, log a < 0, and h ( X ) = log a < 0.
1 a
However, 2
h(X )
=2
log a
= a is the volume of the
support set, which is always non-negative.
Ex. 2: (Normal distribution)

Let
X ~=
1 22
2 2 2
, then
h( ) = ln x2 2 = ( x ) ln 2 2 2 1 1 2 2 2 = E X + ln (2 ) 2 2 1 2 1 2 = + ln 2 2 2 2
[ ]
Changing the base of the logarithm, we have
1 1 2 = + ln 2 2 2 1 1 2 = ln e + ln 2 2 2 1 = ln 2e 2 2
nats
1 2 h( ) = log 2e bits. 2
6
Theorem 1
Let
X 1 , X 2 ,L, X n be a sequence of rvs drawn

1 log f ( X 1 , X 2 ,L X n ) n E [ log f ( X )] = h ( X ) in probability
i.i.d. according to the density f(x). Then
proof: The proof follows directly from the weak law of the large numbers.
7
Def: For >0 and any n, we define the typical set A(n ) w.r.t. f(x) as follows:
1 n A = ( X1, X2 ,LXn ) S : log f ( X1, X2 ,LXn ) h( X ) , n n where f ( X1, X2 ,LXn ) = i=1 f ( Xi )

(n )
Def: The volume Vol(A) of a set ARn is defined as
vol ( A) = dx1dx2 L dxn

A
( Thm: The typical set An ) has the following
(A( ) ) > 1 for n sufficiently large 1. Pr ( ) ( ( )) 2. Vol(A ) 2 for all n ( ( )) ( ) 3. Vol(A ) (1 )2 for n sufficiently large
n n n h X + n n h X
9
properties:
Thm: The Set A is the smallest volume set with probability 1-, to the first order in the exponent. The volume of the smallest set that contains most of the Prob. Is approximately 2nh. This is an n-D volume, so the corresponding side length is (2nh)1/n=2h. The differential entropy is the logarithm of the equivalent side length of the smallest set that contains most of the Prob. low entropy implies that the rv is confined to a small effective volume and high entropy indicates that the rv is widely dispersed.
10
(n )
Relation of Differential Entropy to Discrete Entropy

f(x)
Quantization of a continuous rv
Spose we divide the range of X into bins of length . Lets assume that the density is continuous within the bins.
11
By the mean value theorem, there is a value x i within each bin such that
f ( xi ) =
(i +1) i
f ( x )dx
Consider the quantized rv X , which is defined by

X = xi , if i X < (i + 1)
Then the prob. that X = xi is

Pi =
(i +1) i
f ( x )dx = f ( xi )
12
The entropy of the quantized version is

H (X
) =
Pi log Pi
= f ( xi ) log f ( xi ) f ( xi ) log = f ( xi ) log f ( xi ) log
= f ( xi ) log( f ( xi ) )
Since
f (x ) = f (x ) = 1
i
13
If f(x)logf(x) is Riemann integrable, then

f ( xi ) log f ( xi ) f ( x ) log f ( x ) dx , as 0
This proves the following Thm: If the density f(x) of the rv X is Riemann integrable, then
H (X
) + log h ( f ) = h ( X ),
-n
as 0
Thus the entropy of an n-bit quantization of a continuous rv X is approximately h( X ) + n
Since = 2 for a n - bit uniform quantizer
14
Joint and Conditional Differential Entropy:
h( X 1 , X 2 , L , X n ) = f ( x ) log f ( x )dx
n n
h( X | Y ) = f ( x, y ) log f ( x | y )dxdy h( X | Y ) = h( X , Y ) h(Y )
15
Theorem (Entropy of a multivariate normal distribution)
Let X 1 , X 2 , L , X n have a multivariate normal distribution with mean and covariance matrix K. Then 1 h( X 1 , X 2 , L , X n ) = h( n ( , K ) ) = log(2e) n K bits 2 where K denotes the determinant of K .
16
x pf : ( X 1 , X 2 ,L, X n ) ~ N n ( , K ) f ( ~ ) = Then
1 2
)K
n
1 2
exp
1 ~ ( x )T K 1 ( ~ ) x 2
1 ~ T h( f ) = f ( x ) ( x ) K 1 (~ ) ln 2 x 2 1 1 1 = E ( X i i )(K )ij (X j j ) + ln(2 ) n K 2 ij 2 1 1 1 = E ( X i i )(X j j )(K )ij + ln(2 ) n K 2 ij 2
)K
n
1 2
dx
17
= =
1 1 E (X j j )( X i i ) K 1 ij + ln(2 ) n K 2 ij 2 1 1 K ji K 1 ij + ln(2 ) n K 2 j i 2
]( )
( )
jj
1 = KK 1 2 j =
1 + ln(2 ) n K 2
1 1 I jj + ln(2 ) n K 2 2 j
n 1 = + ln(2 ) n K 2 2 1 = ln(2e) n K nats 2 1 = log(2e) n K bits 2
18
Relative Entropy and Mutual Information
D( f // g ) = I ( X ; Y ) =
f f log g
f ( x, y ) dxdy f ( x, y ) log f ( x) f ( y )
I ( X ;Y ) = h( X ) h( X | Y ) = h(Y ) h (Y | X ) = h( X ) + h(Y ) h( X , Y ) I ( X ;Y ) = D ( f ( x, y ) // f ( x ) f ( y ) )
19
Remark: The mutual information between two continuous r.vs is the limit of the mutual information between their quantized versions.
I ( X ;Y ) = H ( X ) H ( X | Y ) h ( X ) log (h( X | Y ) log ) = I ( X ; Y )
20
Properties of
h ( x ) , D(p
q) , I(X; Y)
D(f
g) 0
g g pf : - D(f g) = s f log log s f ( Jensen' s inequality ) f f = log s g log 1 = 0. I (X; Y) 0 h(X Y) h(X)
21
h (X1 , X 2 ,L, X n ) = h( XX 1 , X 2 ,L X i 1 ) i h (X1 , X 2 ,L, X n ) = h( X i )

i =1
Theorems h( X + c) = h( x ) : translation does not change the differential entropy h(aX ) = h( x ) + log a
22
1 y pf : let Y = aX. Then , f Y ( y ) = f x ( ), and a a h(aX) = - f Y ( y ) log f Y ( y ) d y y 1 1 y = f x ( ) log ( f x ( ))dy a a a a = f x ( x ) log x ( x ) dx + log a = h ( x ) + log a Corollary : h(AX) = h(X) + log det(A)
23
Theorem : The multivariate normal distribution maximizes the entropy over all distributions with the same variance.
Let the random vector X R n have zero mean and convariance K = E XX (i.e. , K ij = EX i X j , 1 i , j n ) , Then
T
1 h(x) log (2e) n K , with equality iff X ~ N n (0 , k ) 2
24
Pf:
x Let g (~) be any density satisfying g( ~ ) xi x j d~ = K ij for all i , j. x x Let k be the density of a (0, K) vector. Note that logk (~) is a quadratic form and xi x jk (~)d~ = K ij . x x x 0 D ( g k ) = g log( g = h( g ) + h(k )
k )
= h( g ) g log k = h( g ) k log k
25
where the substitution glogk follows from the fact that g and k yield the same moments of the quadraic form logk (x) the Gaussian distribution maximizes the entropy over all distributions with the same variance.
26
Let X be a random variable with differential entropy h(x) Let X be an estimate of X and let . E( X - X) 2 be the expected prediction error. Let h(x) be in nats
Theorem : For any r.v. X and estimator X ) 2 1 e 2h(x) E(X - X 2e with equality iff X is Gaussian and X is the mean of X
27
pf : Let X be any estimator of X then E(X - X) 2 min E (X - X) 2 (1)

x
= E(X - E(X)) 2 [ the mean of X is the best estimator for X ] 1 2h(x) e (2) = var (X) 2e [Gaussian distribution has the maximum entropy for a given varance] 1 i.e. , h(x) ln 2e 2 2
28
We have equality, only in (1), only if x is the best estimator (i.e. , x is the mean of X) and equality in (2) only if X is Gaussian. Gorollary : Given side information Y and estimator X(Y) it follows that (Y))2 1 e 2h(XY) E(X - X 2e Fano' s inequality
29

Differential Entropy Explained

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Differential Entropy Explained

Diunggah oleh

Hak Cipta:

Format Tersedia

Differential Entropy

X be a random variable with cumulative

f ( x )dx = 1, then f(x) is called the pdf for X .

The set where f(x) > 0 is called the support set of X .

EX.1: (Uniform distribution)

= a is the volume of the

support set, which is always non-negative.

Ex. 2: (Normal distribution)

Changing the base of the logarithm, we have

X 1 , X 2 ,L, X n be a sequence of rvs drawn

i.i.d. according to the density f(x). Then

1 n A = ( X1, X2 ,LXn ) S : log f ( X1, X2 ,LXn ) h( X ) , n n where f ( X1, X2 ,LXn ) = i=1 f ( Xi )

Def: The volume Vol(A) of a set ARn is defined as

vol ( A) = dx1dx2 L dxn

( Thm: The typical set An ) has the following

Relation of Differential Entropy to Discrete Entropy

Consider the quantized rv X , which is defined by

Then the prob. that X = xi is

The entropy of the quantized version is

= f ( xi ) log f ( xi ) f ( xi ) log = f ( xi ) log f ( xi ) log

If f(x)logf(x) is Riemann integrable, then

Thus the entropy of an n-bit quantization of a continuous rv X is approximately h( X ) + n

Since = 2 for a n - bit uniform quantizer

Joint and Conditional Differential Entropy:

h( X | Y ) = f ( x, y ) log f ( x | y )dxdy h( X | Y ) = h( X , Y ) h(Y )

Theorem (Entropy of a multivariate normal distribution)

1 ~ T h( f ) = f ( x ) ( x ) K 1 (~ ) ln 2 x 2 1 1 1 = E ( X i i )(K )ij (X j j ) + ln(2 ) n K 2 ij 2 1 1 1 = E ( X i i )(X j j )(K )ij + ln(2 ) n K 2 ij 2

n 1 = + ln(2 ) n K 2 2 1 = ln(2e) n K nats 2 1 = log(2e) n K bits 2

Relative Entropy and Mutual Information

I ( X ;Y ) = H ( X ) H ( X | Y ) h ( X ) log (h( X | Y ) log ) = I ( X ; Y )

h (X1 , X 2 ,L, X n ) = h( XX 1 , X 2 ,L X i 1 ) i h (X1 , X 2 ,L, X n ) = h( X i )

1 h(x) log (2e) n K , with equality iff X ~ N n (0 , k ) 2

pf : Let X be any estimator of X then E(X - X) 2 min E (X - X) 2 (1)

Anda mungkin juga menyukai