Lecture 2
Information Sources and Source Coding: 3 Mutual information and dierential entropy: 3.2.12 Lossless source coding: 3.3 Denitions. . . The source coding theorem. . . Human codes: 3.3.1 Lempel-Ziv codes: 3.3.3
H(Y )
I(X; Y ) = H(Y ) H(Y |X) = H(X) H(X|Y ) I(X; Y ) = H(X) + H(Y ) H(X, Y ) I(X; X) = H(X) H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y )
Digital Communications
2.1
c M. Skoglund
Digital Communications
2.3
c M. Skoglund
h(X)
Measure of the information about X obtained when observing Y I(X; Y ) H(X) H(X|Y )
Mutual information for continuous variables X and Y I(Y ; X) = h(X) h(X|Y ) = h(Y ) h(Y |X) = I(X; Y ) Still valid as a measure of information!
Digital Communications
2.2
c M. Skoglund
Digital Communications
2.4
c M. Skoglund
Denition: A d-ary (lossless) source code for a discrete random variable X maps a realization x of X into a nite-length string c(x) of symbols from a discrete alphabet D of size |D| = d. The expected length L of the source code is
channel
source encoder
channel encoder
modulator
discrete channel
noise
L
xX
p(x)(x)
user
source decoder
channel decoder
demodulator
where (x) is the length of the string c(x) in symbols. The (average) rate R of the source code is R L log d
Digital Communications
2.5
c M. Skoglund
Digital Communications
2.7
c M. Skoglund
Source Coding
X
bits source encoder channel
(transmit/store)
non-singular if c(x1 ) = c(x2 ) whenever x1 = x2 . uniquely decodable if any encoded string has only one possible source string producing it. instantaneous if c(x) can be decoded into x without looking at future codewords no codeword is a prex of any other codeword
Non-singular, but not uniquely decodable 0 010 01 10 Uniquely decodable, but not instantaneous 10 00 11 110
Represent source data eciently in digital form for transmission or storage. . . We concentrate on the source code and assume there are no transmission errors; the channel is noiseless. The source code is lossless if X = X compression, Human, Lempel-Ziv,. . . lossy if X = X rate-distortion, quantization, waveform coding,. . .
Digital Communications
X 1 2 3 4
Singular 0 0 0 0
2.6
c M. Skoglund
Digital Communications
2.8
c M. Skoglund
00
001
010 01 0 011
100 1 10 101
110 11
111
Digital Communications
2.9
c M. Skoglund
Digital Communications
2.11
c M. Skoglund
all codes
Inequalities
The codeword lengths of a uniquely decodable d-ary code for a discrete variable X must satisfy the Kraft inequality d(x) 1
xX
instantaneous
Conversely, given a set of codeword lengths that satisfy the Kraft inequality, it is possible to construct a uniquely decodable code with these codeword lengths.
Other classications xed-length to xed-length xed-length to variable-length variable-length to xed-length variable-length to variable-length
n=1
Any two sets {a1 , . . . , aN } and {b1 , . . . , bN } of N positive numbers must satisfy the log-sum inequality
N
an log
an bn
an
n=1
log
N n=1 an N n=1 bn
Digital Communications
2.10
2.12
c M. Skoglund
The source coding theorem for uniquely decodable codes The rate R of a uniquely decodable code for a random variable X satises R H(X) [bits]. A uniquely decodable code for a random variable X exists that satises R < H(X) + log d [bits]. A stationary discrete source {Xn }: Code a block N X1 = (X1 , . . . , XN ) of source symbols 1 log d 1 N N H(X1 ) R < H(X1 ) + N N N where R is the rate in bits per source symbol. A uniquely decodable code exists such that R H as N . No such code with R < H can be found.
+ Optimal in the sense that no other (xed-length to variable-length) code can provide a lower average rate! + Works well also for symbol-by-symbol encoding. + In principle it is straightforward to obtain better compression by grouping blocks of source symbols together and letting these dene new symbols. The source statistics have to be known in practice a coding in two passes needed. . . Coding blocks signicantly increases the complexity the complexity of a code that achieves the entropy rate of the source is generally very high. . .
Digital Communications
2.13
c M. Skoglund
Digital Communications
2.15
c M. Skoglund
Huffman Codes
A simple way of constructing a xed-length to variable-length instantaneous code. Merge the two least probable nodes at each level. . .
00 0.30 00 0.55 0 01 0.25 01 1.0
10
0.25
10 0.45 1
110
0.10
110 0.20 11
111
0.10 111
Digital Communications
2.14
c M. Skoglund
Digital Communications
2.16
c M. Skoglund
source string 010000110000101 parse 0, 1, 00, 001, 10, 000, 101 results in Dictionary Index Contents 000 empty 001 0 010 1 011 00 100 001 101 10 110 000 111 101 Codeword 000 000 001 011 010 011 101 0 1 0 1 0 0 1
Inecient since the whole string must be visited to determine number of bits spent on each index. Several modications exist. . .
Digital Communications
2.17
c M. Skoglund