Anda di halaman 1dari 13

Asymptotic Equipartition

Property

Accepted by: Dr.-Ing. Julian Hoxha Worked by: Klea Xhixho


AEP is formalized in the following theorem:

If 𝑋1, 𝑋2 ,… are i.i.d. ~ p(x), then


1
− 𝑙𝑜𝑔 𝑝(𝑋1, 𝑋2 , … , 𝑋𝑛 ) → 𝐻(X) in probability.
𝑛
Proof:
• Functions of independent random variables are also
independent random variables. Thus, since the 𝑋𝑖 are i.i.d., so
are log 𝑝 (𝑋𝑖 ). Hence, by the weak law of large numbers,
1 1
− log 𝑝(𝑋1, 𝑋2 , … , 𝑋𝑛 )= − ෌𝑖 log 𝑝(𝑋𝑖ሶ )
𝑛 𝑛
→ −𝐸 log p(X) in probability
= H(X),
which proves the theorem.
Definition:

(𝑛)
The typical set 𝐴𝜖 with respect to p(x) is the set of
𝑛
sequences (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) ∈ 𝑋 with the property
2−𝑛(𝐻(𝑋)+𝜖) ≤ 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) ≤ 2−𝑛(𝐻(𝑋)−𝜖) .
As a consequence of the AEP, we can show that the set
(𝑛)
𝐴𝜖 has the following properties that are shown in the
theorem below:
Theorem:
𝑛
1.If (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) ∈
𝐴𝜖 , 𝑡ℎ𝑒𝑛 𝐻(𝑋) − 𝜖 ≤
1
− log 𝑝(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) ≤ 𝐻(𝑋) + 𝜖.
𝑛
𝑛
2.Pr 𝐴𝜖 > 1 − 𝜖 for n sufficiently large.
𝑛 𝑛 𝐻 𝑋 +𝜖
3. 𝐴𝜖 ≤ 2 , 𝑤ℎ𝑒𝑟𝑒 𝐴 denotes the number of
elements in the set A.
𝑛
4. 𝐴𝜖 ≥ 1 − 𝜖 2𝑛 𝐻 𝑋 −𝜖 for n sufficiently large.
Thus, the typical set has probability nearly 1, all elements
of the typical set are nearly equiprobable, and the
𝑛𝐻
number of elements in the typical set is nearly 2 .
Consequences of the AEP: Data compression
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be i.i.d. random
variables drawn from the
probability mass function p(x).
Our intention is to find short
descriptions for such sequences of
random variables.
So, we divided all sequences in 𝑋 𝑛
into two sets: the typical set
𝑛
𝐴𝜖 and its complement.
• We order all elements in each set according to some
order
𝑛
• Then we can represent each sequence of 𝐴𝜖 by giving
the index of the sequence in the set.
𝑛
• Since there are ≤ 2𝑛 𝐻+𝜖 sequences in 𝐴𝜖 , the
indexing requires no more than n(H + 𝜖) + 1 bits.(extra
bit may be necessary because n(H + 𝜖) may not be
integer)
• Prefix all these sequences by 0, giving a total length of
≤ n(H + 𝜖) + 2 bits to represent each sequence in
𝑛
𝐴𝜖 . (𝑓𝑖𝑔 3.2)
𝑛
• Similarly, we can index each sequence not in 𝐴𝜖 by
using not more than 𝑛 log 𝑥 + 1 bits.
• Prefixing these indices by 1, we have a code for all
the sequences in 𝑋 𝑛 .
Features of the above coding scheme:

• The code is one-to-one and easily decodable. The initial bit acts
as a flag bit to indicate the length of the codeword that follows.
• We have used a brute-force enumeration of the atypical set
𝑛 𝐶
𝐴𝜖 without taking into account the fact that the number of
𝑛 𝐶 𝑛
elements in 𝐴𝜖 is less than the number of elements in 𝑋 .
Surprisingly, this is good enough to yield an efficient
description.
• The typical sequences have short descriptions of length ≈ 𝑛𝐻.
Definition: We use the notation 𝑥 𝑛 to denote a sequence 𝑥1 , 𝑥2 , … , 𝑥𝑛 .Let
𝑙 𝑥 𝑛 be the length of the codeword corresponding to 𝑥 𝑛 . If n is sufficiently large
𝑛
so that Pr 𝐴𝜖 ≥ 1 − 𝜖, the expected length of the codeword is
𝐸 𝑙 𝑋 𝑛 = σ𝑥 𝑛 𝑝 𝑥 𝑛 𝑙 𝑥 𝑛
= σ 𝑛 𝑛 𝑝 𝑥𝑛 𝑙 𝑥𝑛 + σ 𝑛 𝑛 𝐶 𝑃 𝑥𝑛 𝑙 𝑥𝑛
𝑥 ∈𝐴𝜖 𝑥 ∈𝐴𝜖
≤ ෌𝑥 𝑛 ∈𝐴 𝑛 𝑃 𝑥 𝑛 𝑛 𝐻 + 𝜖 +2
𝜖
+σ (𝑛)𝐶 p(𝑥 𝑛 )(nlog 𝑋 + 2)
𝑥 𝑛 𝜖𝐴
𝑛 𝑛 𝐶
= Pr 𝐴𝜖 𝑛 𝐻 + 𝜖 + 2 + 𝑃𝑟 𝐴𝜖 𝑛 log 𝑥 + 2
≤ 𝑛 𝐻 + 𝜖 + 𝜖𝑛 log 𝑥 + 2
2
𝜖′
= 𝑛 𝐻 + , where = 𝜖 + 𝜖′ 𝜖 𝑙𝑜𝑔|𝑥| +
can be made arbitrarily small
𝑛
by an appropriate choice of 𝜖 followed by an appropriate choice of n. Hence we
have proved the following theorem.
Theorem:
𝑛
Let 𝑋 be i.i.d. ~ 𝑝 𝑥 . Let 𝜖 > 0. Then there exists a code that
maps sequences 𝑥 𝑛 of length n into binary strings such that the
mapping is one-to-one (and therefore invertible) and
1
𝐸[ 𝑙 𝑋𝑛 ] ≤ 𝐻 𝑋 + 𝜖
𝑛
for n sufficiently large.
𝑛
Thus , we can represent sequence 𝑋 using nH(X) bits on
average.
High-probability sets and the typical set

𝑛
Definition: For each n = 1,2,…, let 𝐵𝛿 ⊂ 𝑥 𝑛 be the smallest set
with
𝑛
Pr 𝐵𝛿 ≥ 1 − 𝛿.

𝑛 𝑛
We argue that 𝐵𝜹 must have significant intersection with 𝐴𝜖
and therefore must have about as many elements.
Theorem

1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be i.i.d. ~ p(x). For 𝛿 < and any 𝛿 ′ > 0 , if
2
𝑛
Pr 𝐵𝛿 > 1 − 𝛿, 𝑡ℎ𝑒𝑛
1 (𝑛)
log 𝐵𝛿 > 𝐻 − 𝛿 ′ for n sufficiently large.
𝑛

(𝑛) 𝑛𝐻
Thus
(𝑛)
, 𝐵𝛿 must have at least 2 elements,
(𝑛)
to first order in the exponent. But
𝐴𝜖 has 2 𝑛(𝐻±𝜖) elements. Therefore, 𝐴𝜖 is about the same size as the smallest
high-probability set.
Definition:
1 𝑎𝑛
𝑎𝑛 = 𝑏𝑛 means that log → 0 as n → ∞
𝑛 𝑏𝑛

Smallest probable set:


1 (𝑛)
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be i.i.d. ~ p(x), and for 𝛿 < , let 𝐵𝛿 ⊂ 𝑥 𝑛 be
2
(𝑛)
the smallest set such that Pr 𝐵𝛿 ≥ 1 − 𝛿.
Then
𝑛
𝐵𝛿 = 2𝑛𝐻

Anda mungkin juga menyukai