Information Theory
2
What is Information?
It is hard to measure the “semantic” (শব্দার্গত
থ ভাবে)
information!
Consider the following two sentences
3
What is Information?
Let’s attempt at a different definition of
information.
How about counting the number of letters in
the two sentences:
4
It’s interesting to know that
log is the only functionf
that satisfies
What is information? f ( s l ) lf ( s )
3 4 2
1 2 3 4
How many questions?
5 6 7 8
4
9 10 11 12
13 14 15 16
6
Information theory
Information theory provides a mathematical basis for
measuring the information content.
To understand the notion of information, think about it as
providing the answer to a question, for example, whether
a coin will come up heads.
If one already has a good guess about the answer, then the
actual answer is less informative.
If one already knows that the coin is rigged so that it will come
with heads with probability 0.99, then a message (advanced
information) about the actual outcome of a flip is worthless than
it would be for a honest coin (50-50).
7
Information theory (cont …)
For a fair (honest) coin, you have no information,
and you are willing to pay more (say in terms of
$) for advanced information - less you know, the
more valuable the information.
Information theory uses this same intuition(সঙ্গা) ,
but instead of measuring the value for information
in dollars, it measures information contents in
bits.
One bit of information is enough to answer a
yes/no question about which one has no idea, such
as the flip of a fair coin
8
Shannon’s Information Theory
The
Claude Shannon: A Mathematical Theory of Communication
Bell System Technical Journal, 1948
abaabaababbbaabbabab… ….
0.5
b
source
10
Intuition on Shannon’s Entropy
n
Why H pi log( pi )
i 1
Suppose you have a long random string of two binary symbols 0 and 1, and the
p
probability of symbols 1 and 0 are 1 and 0 p
Ex: 00100100101101001100001000100110001 ….
If any string is long enough say N, it is likely to contain Np0 0’s and Np1 1’s.
The probability of this string pattern occurs is equal to
p p0Np0 p1Np1
Np0 Np1
Hence, # of possible patterns is 1 / p p p1
0
1
# bits to represent all possible patterns is log( p0 Np0 p1 Np1 ) Npi log pi
i 0
The average # of bits to represent the symbol is therefore
1
pi log pi 11
i 0
More Intuition on Entropy
Assume a binary memoryless source, e.g., a flip of a coin. How
much information do we receive when we are told that the
outcome is heads?
12
Self Information
So, let’s look at it the way Shannon did.
Assume a memoryless source with
alphabet A = (a1, …, an)
symbol probabilities (p1, …, pn).
How much information do we get when finding out that the
next symbol is ai?
According to Shannon the self information of ai is
13
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
15
Self Information
16
Entropy
Example: Binary Memoryless Source
BMS 01101000…
Let
Then
1
The uncertainty (information) is greatest when
0 0.5 1 17
Example
Three symbols a, b, c with corresponding probabilities:
What is H(P)?
What is H(Q)?
18
Entropy: Three properties
1. It can be shown that 0 · H · log N.
19
THANKS
20