Digital Communications C M. Skoglund Digital Communications C M. Skoglund

H(X, Y )
Lecture 2
Information Sources and Source Coding: 3 Mutual information and dierential entropy: 3.2.12 Lossless source coding: 3.3 Denitions. . . The source coding theorem. . . Human codes: 3.3.1 Lempel-Ziv codes: 3.3.3
H(X|Y ) I(X; Y ) H(Y |X)
H(X) I(X; Y ) = I(Y ; X)
H(Y )
I(X; Y ) = H(Y ) H(Y |X) = H(X) H(X|Y ) I(X; Y ) = H(X) + H(Y ) H(X, Y ) I(X; X) = H(X) H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y )
Digital Communications
2.1
c M. Skoglund
2.3
c M. Skoglund
Continuous Variables Mutual Information

A continuous random variable, X, with pdf f (x). Consider a pair (X, Y ) of discrete variables.
H(X) = average information in observing X H(Y ) = average information in observing Y H(X, Y ) = average information in observing (X, Y ) H(Y |X) = average information in observing Y when X is known H(X|Y ) = average information in observing X when Y is known
Dene the dierential entropy h(X) of X as
h(X)
f (x) log f (x)dx = E[log f (X)]
h(X) is not an entropy in the sense of a measure of

uncertainty/information. . . The amount of uncertainty removed/information gained when observing X = x is always innite for a continuous X. . .
Measure of the information about X obtained when observing Y I(X; Y ) H(X) H(X|Y )
Mutual information for continuous variables X and Y I(Y ; X) = h(X) h(X|Y ) = h(Y ) h(Y |X) = I(X; Y ) Still valid as a measure of information!
The mutual information between X and Y
2.2
c M. Skoglund
2.4
c M. Skoglund
Lossless Source Coding A Digital Communication System

source
Denition: A d-ary (lossless) source code for a discrete random variable X maps a realization x of X into a nite-length string c(x) of symbols from a discrete alphabet D of size |D| = d. The expected length L of the source code is
channel
source encoder
channel encoder
modulator
discrete channel
noise
L
xX
p(x)(x)
user
source decoder
channel decoder
demodulator
where (x) is the length of the string c(x) in symbols. The (average) rate R of the source code is R L log d
2.5
c M. Skoglund
2.7
c M. Skoglund
Source Coding
X
bits source encoder channel
(transmit/store)
Desired properties of a source code. A code is

X
bits source decoder
non-singular if c(x1 ) = c(x2 ) whenever x1 = x2 . uniquely decodable if any encoded string has only one possible source string producing it. instantaneous if c(x) can be decoded into x without looking at future codewords no codeword is a prex of any other codeword
Non-singular, but not uniquely decodable 0 010 01 10 Uniquely decodable, but not instantaneous 10 00 11 110
Represent source data eciently in digital form for transmission or storage. . . We concentrate on the source code and assume there are no transmission errors; the channel is noiseless. The source code is lossless if X = X compression, Human, Lempel-Ziv,. . . lossy if X = X rate-distortion, quantization, waveform coding,. . .
X 1 2 3 4
Singular 0 0 0 0
Instantaneous 0 10 110 111
2.6
c M. Skoglund
2.8
c M. Skoglund
Instantaneous The Prefix Condition

000
Shannons Source Coding Theorem

A discrete source with entropy rate H [bits per source symbol], and a lossless source code of rate R [bits per source symbol]. In 1948 Claude Shannon showed, based on typical sequences, that a lossless (errorfree) code exists as long as R > H. A lossless code does not exist for any R < H. Shannons result is an existence/non-existence result (as are many results in information theory). The theorem does not say how to design practical coding schemes. Now we rst look at a more concrete coding theorem (for lossless uniquely decodable codes) and then we study two code design algorithms (Human and Lempel-Ziv). . .
00
001
010 01 0 011
100 1 10 101
110 11
111
2.9
c M. Skoglund
2.11
c M. Skoglund
all codes
Inequalities
The codeword lengths of a uniquely decodable d-ary code for a discrete variable X must satisfy the Kraft inequality d(x) 1
xX
nonsingular uniquely decodable
instantaneous
Conversely, given a set of codeword lengths that satisfy the Kraft inequality, it is possible to construct a uniquely decodable code with these codeword lengths.
Other classications xed-length to xed-length xed-length to variable-length variable-length to xed-length variable-length to variable-length
n=1
Any two sets {a1 , . . . , aN } and {b1 , . . . , bN } of N positive numbers must satisfy the log-sum inequality
N
an log
an bn
an
n=1
log
N n=1 an N n=1 bn
with equality i an /bn = constant.

c M. Skoglund Digital Communications
2.10
2.12
c M. Skoglund
The source coding theorem for uniquely decodable codes The rate R of a uniquely decodable code for a random variable X satises R H(X) [bits]. A uniquely decodable code for a random variable X exists that satises R < H(X) + log d [bits]. A stationary discrete source {Xn }: Code a block N X1 = (X1 , . . . , XN ) of source symbols 1 log d 1 N N H(X1 ) R < H(X1 ) + N N N where R is the rate in bits per source symbol. A uniquely decodable code exists such that R H as N . No such code with R < H can be found.
+ Optimal in the sense that no other (xed-length to variable-length) code can provide a lower average rate! + Works well also for symbol-by-symbol encoding. + In principle it is straightforward to obtain better compression by grouping blocks of source symbols together and letting these dene new symbols. The source statistics have to be known in practice a coding in two passes needed. . . Coding blocks signicantly increases the complexity the complexity of a code that achieves the entropy rate of the source is generally very high. . .
2.13
c M. Skoglund
2.15
c M. Skoglund
Huffman Codes
A simple way of constructing a xed-length to variable-length instantaneous code. Merge the two least probable nodes at each level. . .
00 0.30 00 0.55 0 01 0.25 01 1.0
The Lempel-Ziv Algorithm

A universal variable-length to xed-length scheme. A given string of source symbols is parsed into phrases of varying length. . . new phrase = one of minimum length that has not appeared before A dictionary is built. . . Indices of dictionary entries and new bits are transmitted/stored. . . Does not work well for short strings (even expands the size of the source string), but is asymptotically optimal in the sense that the entropy rate of the source is achieved for very long strings! Often used in practice (compress, .ZIP, .ARJ,. . . )
10
0.25
10 0.45 1
110
0.10
110 0.20 11
111
0.10 111
2.14
c M. Skoglund
2.16
c M. Skoglund
source string 010000110000101 parse 0, 1, 00, 001, 10, 000, 101 results in Dictionary Index Contents 000 empty 001 0 010 1 011 00 100 001 101 10 110 000 111 101 Codeword 000 000 001 011 010 011 101 0 1 0 1 0 0 1
Inecient since the whole string must be visited to determine number of bits spent on each index. Several modications exist. . .
2.17
c M. Skoglund

Digital Communications C M. Skoglund Digital Communications C M. Skoglund

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Digital Communications C M. Skoglund Digital Communications C M. Skoglund

Diunggah oleh

Hak Cipta:

Format Tersedia

H(X, Y )

H(X|Y ) I(X; Y ) H(Y |X)

H(X) I(X; Y ) = I(Y ; X)

Continuous Variables Mutual Information

Dene the dierential entropy h(X) of X as

f (x) log f (x)dx = E[log f (X)]

h(X) is not an entropy in the sense of a measure of

The mutual information between X and Y

Lossless Source Coding A Digital Communication System

Desired properties of a source code. A code is

bits source decoder

Instantaneous 0 10 110 111

Instantaneous The Prefix Condition

Shannons Source Coding Theorem

nonsingular uniquely decodable

with equality i an /bn = constant.

The Lempel-Ziv Algorithm

Anda mungkin juga menyukai