Anda di halaman 1dari 27

Greedy

Huffman codes

R. Inkulu
http://www.iitg.ac.in/rinkulu/

(Huffman codes)

1 / 15

Encoding symbols using bits

Given a set of symbols S, the code of S is a one-to-one function

: S N, where each element of N is a binary number. The codeword


of a symbol x S is (x).
The fixed-length code does not take frequency of occurrence of

individual symbols into account; hence, not space-efficient.


The variable-length code helps in improving the space-efficiency: assign

longer code to less frequently used symbols and vice versa.

(Huffman codes)

2 / 15

Prefix code

Difficulty in decoding text with an arbitrary variable-length code:


ex. how to decode 01, when (a) = 0, (b) = 1, (c) = 01
The variable-length code in which no codeword is a prefix of another is

termed as a prefix code.


with code 1 (a) = 11, 1 (b) = 01, 1 (c) = 001, 1 (d) = 10, 1 (e) = 000
decoding 0010000011101 yields cecab

(Huffman codes)

3 / 15

Optimal prefix codes

Given a set of symbols S with their frequency of occurrences, fx for every

x S, determine a space-efficient prefix code that assigns a unique


codeword for each symbol x in S.
The average number of bits required per letter (ABL) is

xS fx .|(x)|.

Hence, the objective is to choose a code that minimizes ABL.

(Huffman codes)

4 / 15

Optimal prefix code example

For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15,


with fixed-length code, ABL is 3
with 1 (a) = 11, 1 (b) = 01, 1 (c) = 001, 1 (d) = 10, 1 (e) = 000,

ABL(1 ) = 2.25
with 2 (a) = 11, 2 (b) = 10, 2 (c) = 01, 2 (d) = 001, 2 (e) = 000,

ABL(2 ) = 2.23

(Huffman codes)

5 / 15

Representing prefix codes using binary trees


0

a
0

0
e

1
b

1
d

0
c

1
b

0
e

0
d

1
a

1
c

c
0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

Consider a binary tree T with each leaf of T is labeled with a distinct

letter in S. For each symbol x S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.

(Huffman codes)

6 / 15

Representing prefix codes using binary trees


0

a
0

0
e

1
b

1
d

0
c

1
b

0
e

0
d

1
a

1
c

0
0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

Consider a binary tree T with each leaf of T is labeled with a distinct

letter in S. For each symbol x S, the path from the root to the leaf
labeled x; each time the path goes from a node to its left (resp. right)
child, write down a 0 (resp. 1) to get the encoding of x.
The encoding of S constructed from T is a prefix code.

(Huffman codes)

6 / 15

Representing prefix codes using binary trees (cont)

a
0

0
e

1
d

0
c

1
b

0
e

0
d

1
a

1
c

1
b

0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

Given a prefix code, we can build a binary tree recursively.

(Huffman codes)

7 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves

(Huffman codes)

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T

(Huffman codes)

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T

(Huffman codes)

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T

so that together they minimize ABL =

(Huffman codes)

xS fx .|(x)|

xS fx . depthT (x).

8 / 15

Objective in terms of binary trees


Constructing an optimal prefix code involves
searching for a binary tree T
labeling the leaves of T

so that together they minimize ABL =

xS fx .|(x)|

xS fx . depthT (x).

a
0

0
e

1
d

0
c

1
b

0
e

0
d

1
a

1
c

1
b

c
0
e

0
b

1
a

1
d

(a) = 11, (b) =

(a) = 1, (b) =

(a) = 11, (b) =

10, (c) = 01, (d) =

011, (c) = 010, (d) =

01, (c) = 001, (d) =

001, (e) = 000

001, (e) = 000

10, (e) = 000

For fa = 0.32, fb = 0.25, fc = 0.20, fd = 0.18, fe = 0.15, the rightmost one gives an optimal prefix code.

(Huffman codes)

8 / 15

Optimal binary tree is full

The binary tree corresponding to the optimal prefix code is full.

(Huffman codes)

9 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves

(Huffman codes)

10 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves


searching for a full binary tree T

(Huffman codes)

10 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves


searching for a full binary tree T
labeling the leaves of T

(Huffman codes)

10 / 15

Objective in terms of full binary trees

Constructing an optimal prefix code involves


searching for a full binary tree T
labeling the leaves of T

so that together they minimize ABL =

(Huffman codes)

xS fx .|(x)|

xS fx . depthT (x).

10 / 15

Labeling leaves of a given optimal full binary tree

For any two leaves u and v with depth(u) < depth(v) in an optimal full

binary tree T , the symbol associated with u must be more frequent than
the symbol associated with v.
- proof using an exchange argument

(Huffman codes)

11 / 15

Labeling leaves of a given optimal full binary tree

For any two leaves u and v with depth(u) < depth(v) in an optimal full

binary tree T , the symbol associated with u must be more frequent than
the symbol associated with v.
- proof using an exchange argument
With the above in place, choice of assignment of symbols among leaves

of the same depth does not affect the ABL.

(Huffman codes)

11 / 15

Algorithm to label the leaves of a given optimal full


binary tree

take leaves of least depth and label them with the highest-frequency

symbols in any order


take leaves of next least depth and label them with the highest-frequency

symbols in any order


etc.,

(Huffman codes)

12 / 15

Observations to construct an optimal full binary tree

There is an optimal prefix code, with corresponding tree T , in which the

two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T .

(Huffman codes)

13 / 15

Observations to construct an optimal full binary tree

There is an optimal prefix code, with corresponding tree T , in which the

two lowest-frequency letters, say x and y, are assigned to leaves that are
sibilings in T .
Let x and y be the two lowest-frequency letters. Let T be a full binary

tree corresponding to an optimal prefix code for S {y, z} {w} with


fw = fy + fz . Also, let T be the tree obtained by attaching leaves y and z
as children of node w of T .
Then, ABL(T) = ABL(T ) + fw .

(Huffman codes)

13 / 15

Huffman algorithm
Recursively find two symbols y, z with lowest frequency and make them

siblings of the binary tree T to be constructed, before setting S to


S {y, z} {w} with fw = fy + fz . The resultant codewords for all the
symbols together is known as the Huffman code.

0
6/21
1

0
3/21
0

0
12/21
1

21/21
1
0

d:6/21 e:4/21

9/21
1
f:5/21

c:3/21

a:1/21 b:2/21 (freq)


(a) = 0000, (b) = 0001, (c) = 001, (d) = 01, (e) = 10, (f ) = 11

(Huffman codes)

14 / 15

Huffman algorithm
Recursively find two symbols y, z with lowest frequency and make them

siblings of the binary tree T to be constructed, before setting S to


S {y, z} {w} with fw = fy + fz . The resultant codewords for all the
symbols together is known as the Huffman code.

0
6/21
1

0
3/21
0

0
12/21
1

21/21
1
0

d:6/21 e:4/21

9/21
1
f:5/21

c:3/21

a:1/21 b:2/21 (freq)


(a) = 0000, (b) = 0001, (c) = 001, (d) = 01, (e) = 10, (f ) = 11

using priority queue, takes O(|S| lg |S|) time


(Huffman codes)

14 / 15

Correctness

feasibility: every symbol got a symbol

(Huffman codes)

15 / 15

Correctness

feasibility: every symbol got a symbol


optimality: induction on the size of the alphabet

(Huffman codes)

15 / 15

Anda mungkin juga menyukai