1. Shannons Information Theory 2. Source Coding theorem 3. Channel Coding Theory 4. Information Capacity Theorem 5. Introduction to Error Control Coding Appendix A : Historical Notes
This theorem provides the mathematical tool for assessing data compaction, i.e. lossless data compression, of data generated by a discrete memoryless source. The entropy of a source is a function of the probabilities of the source symbols that constitute the alphabet of the source. Entropy of Discrete Memoryless Source Assume that the source output is modeled as a discrete random variable, S , which takes on symbols from a fixed finite alphabet
S = { s0 , s1 ,L , s K 1 }
With probabilities
P(S = sk ) = pk , k = 0,1,2,L , K 1 with
p
k =0
K -1
=1
Define the amount of information gain after observing the event S = sk as the logarithmic function
I(sk ) = log2 (
1 ) bits pk
the entropy of the source is defined as the mean of I ( sk ) over source alphabet S given by
H (S ) = E[ I ( sk )] = =
K 1 k =0 K 1 k =0
I ( sk ) 1 ) bits pk
pk log 2 (
The entropy is a measure of the average information content per source symbol. The source coding theorem is also known as the "noiseless coding theorem" in the sense that it establishes the condition for error-free encoding to be possible.
There exists a coding scheme for which the source output can be transmitted over the channel and be reconstructed with an arbitrarily small probability of error.
(b) Conversely, if
H (S ) C > TS TC
It is not possible to transmit information over the channel and reconstruct it with an arbitrarily small probability of error. The theorem specifies the channel capacity C as a fundamental limit on the rate at which the transmission of reliable error-free message can take place over a discrete memoryless channel. Mutual Information
x Channel
alphabet X
alphabet Y
Channel input x (selected from X ) with X = {x 0 , x1 ,..., x J 1 } entropy H . Channel output y (selected from Y ) ,
Y = { y0 , y1 ,..., y K 1 }
Define the conditional entropy of X selected from X , given that Y = y k as 1 H (X | Y = y ) = P ( x | y ) log [ ] P(x | y )
J 1 j=0 k j k 2 j k
H (X
| Y = yk ) P ( y k )
j , y k ) log 2 [
P( x
k = 0 j =0
K - 1 J 1
1 P ( x j | yk )
The conditional entropy represents the amount of uncertainty remaining the channel input after the channel output has been observed. The difference H ( X ) H ( X |Y ) is called the mutual information, which represents the uncertainty about the channel input that is resolved by observing the channel output. Denoting the mutual information by I ( X; Y ) , we may thus write
I ( X; Y ) = H ( X ) H ( X | Y )
Note that
I ( X; Y ) = I ( Y; X ) .
can
be
P ( x j | yk ) ] P ( yk )
input probability distribution { P ( x j )} is obviously independent of the channel. We can then maximize I ( X ;Y ) with respect to { P ( x j )} Hence we define the channel capacity C as
C = max I ( X; Y )
{ p( x j )}
The
is measured
4. Information Capacity Theorem (also known as Shannon-Hartley law or Shannon's 3rd theorem)
It can be stated as follows: The information capacity of a continuous channel of bandwidth B Hz, perturbed by additive white Gaussian noise of power spectral density and limited in bandwidth to B , is given by
C = B log 2 ( 1 + P ) bits/secon d N0 B
N0 2
where P is the average transmitted power. This theorem implies that, for given average transmitted power P and channel bandwidth B , we can transmit information at the rate C bits per second, with arbitrarily small probability of error by employing sufficiently complex encoding systems.
Shannon limit: For an ideal system that transmits data at a rate Rb = C Thus P = Eb Rb = Eb C Then
E C C = log 2 (1 + b ) B No B
C
Eb 2 B 1 = c No B
Appendix A.
Historical Notes
The seeds of Shannon's information theory appeared first in a Bell Labs classified memorandum dated 1 September 1945, A Mathematical Theory of Cryptography (revised form was published in BSTJ, 1949.) Shannon's work on cryptography problem at Bell Labs brought him into contact with two scholars, R.V.L. Hartley and H. Nyquist, who were major consultants to the cryptography research. Nyquist (1924) has shown that a certain bandwidth was necessary in order to send telegraph signals at a definite rate. Hartley (1928) had attempted to quantify of information as the logarithm of the number of possible message built from a pool of symbols.
Shannon was aware of Norbert Wiener's work on cybernetics (he cited Wiener's book in his two 1948 articles on information theory). Wiener had recognized that the communication of information was a problem in statistics. Shannon had taken a mathematics course from Norbert Wiener when he was studying at MIT, and Shannon had access to the "Yellow Peril" report (Wiener's book before publication) while he worked on cryptographic research. Major Publications of Shannon A Mathematical Theory of Communication, I, II, published in BSTJ, 1948, 379-423, 623-656. Communication Theory of Secrecy Systems, published in BSTJ, 1949, 656-715.
Paperback edition of "The Mathematics Theory of Communication" was published by University of Illinois Press, 1963. 32,000 copies have been sold up to 1964. In a review of the accomplishments of Bell Labs. in communication science, Millman in his edited book (1984), A history of engineering and science in the Bell system: Communication Science (1925-1980) says: Probably the most specular development in communication mathematics to take place at Bell Laboratories was the formulation in the 1940's of information theory by C. E. Shannon. In his 1948 articles on information theory, Shannon credits J.W. Tukey (a professor at Princeton Univ.) with suggesting the word "bit" as a measure of information.
In 1984 J.L. Massey wrote a paper Information Theory: the Copernican System of Communications. Published in IEEE Communications Magazine, vol. 22, No. 12, pp. 26-28. The Shannons theory of information is the scientific basis of communications in the same sense that the Copernicus heliocentric theory is the scientific basis of astronomy. IEEE Transaction on Information Theory Oct. 1998. Vol.44 No.6. Commemorate Issue (1948-1998)
5.Introduction to Error Control Coding 5.1 Purpose of Error Control Coding In data communications, coding is used for controlling transmission errors induced by channel noise or other impairments, such as fading and interference, so that error-free communication can be achieved. In data storage system, coding is used for controlling storage errors (during retrieval) caused by storage medium defects, dust particles and radiation so that error-free storage can be achieved.
Information source Source encoder Channel encoder Modulation (writing unit)
noise interference
Destination
Source decoder
Channel decoder
5.2 Coding Principle Coding is achieved by adding properly designed redundant digits (bits) to each message. These redundant digits (bits) are used for detecting and/or correcting transmission (or storage) errors. 5.3 Types of Codings Block and Convolutional coding. Block coding: A message of k digits is mapped into a structured sequence of n digits, called a codeword.
k digits n digits
k : code rate n
The mapping operation is called encoding. Each encoding operation is independent of the past encoding, i.e. memoryless. The collection of all codeword is called a "block code".
Convolutional Coding: An information sequence is divided into (short) blocks of k digits each. Each k-digit message is encoded into an n-digit coded block. The n-digit coded block depends not only the corresponding k-digit message block but also on m ( 1 ) previous message blocks. That is, the encoder has memory of order m. The encoder has k inputs and n outputs. An information is encoded into a coded sequence. The collection of all possible code sequences is called an (n, k, m) convolutional code. Normally,
1k 8 2n9 k : code rate n
5.4 Type of Errors and Channels Random errors and burst errors. Type of Channels Random error channels: Deep space channel, satellite channels, line of sight transmission channel, etc. Burst error channels: Radio links, terrestrial microwave links, wire and cable transmission channels, etc.
5.5 Decoding Suppose a codeword corresponding to a message is transmitted over a noisy channel. Let r be the received sequence. Based on r , the encoding rules and the noise characteristics of the channel, the receiver (or decoder) makes a decision which message was actually transmitted. This decision making operation is called decoding. The device which performing the decoding operation is called a decoder. Two types of decoding: Hard-decision decoding and soft-decision decoding. Hard-decision decoding: When binary coding is used, the modulator has only binary inputs. If binary demodulator output quantization
is used, the decoder has only binary inputs. In this case, the demodulator is said to make hard decisions. Decoding based on hard decision made by the demodulator is called hard decision decoding. Soft-decision decoding: If the output of demodulator consists of more than two quantization levels or is left unquantized, the demodulator is said to make soft decisions. Decoding based on soft decision made by demodulator is called soft-decision decoding. Hard-decision decoding is much easier to implement than soft-decision decoding. However, soft-decision decoding offers much better performance.
p: transition probability
When BPSK modulation is used over the AWGN channel with optimum coherent detection and binary output quantization, the bit-error probability for equally likely signal is given by
p = Q( 2E ) N0
y2 2
where
Q(x)
1 2p
dy
E: N0 2
Output
An optimum decoding rule is that minimize the probability of a decoding error. That is, P ( c c | r ) is minimized. Or, equivalently, maximizing P ( c = c | r ) .
The decoding error is minimized for a given r by choosing c to be a codeword c that maximizes
P (c | r ) = P ( r | c ) P (c ) P(r )
That is, c is chosen to be the most likely codeword, given that r is received.
If knowledge or an estimation of P (c ) is used for decoding, the technique is called MAP decoding.
5.9 Maximum Likelihood Decoding (MLD) Suppose all the messages are equally likely. An optimum decoding can be done as follows: For every codeword c j , compute the conditional probability
P(r | c j )
The codeword c j with the largest conditional probability is chosen as the estimate c for the transmitted codeword c . This decoding rule is called the Maximum Likelihood decoding (MLD).
MLD for a BSC Let a = (a1 , a2 ,L , a n ) and b = ( b1 , b2 ,L , bn ) be two binary sequences of n components. The Hamming distance between a and b , denoted as d H ( a , b ) , is defined as the number of places where a and b differ. In coding for a BSC, every codeword and every received sequence are binary sequences. Suppose some codeword is transmitted and the received sequence is
r = ( r1 , r2 ,L, rn )
p<
1 2
The MLD is completed by the following steps: (i) Compute d H ( r , ci ) for all c i (ii) c i is taken as the transmitted codeword if d H ( r , ci ) < d H ( r , c j ) for j i (iii) Decode c i into message m i That is, the received vector decoded into the closest codeword.
r
is
5.10 Bounded Distance Decoding Given a received word r , a t-error correcting, bounded distance decoder selects that codeword c which minimizes d H ( r , c ) if and only if there exists c such that d H ( r , c ) t . If no such c exists, then a decoder failure is declared. All practical bounded distance decoder use some form of syndrome decoding. The bounded distance decoding is usually an incomplete decoding since it decodes only those received words lying in a radius-t sphere about a codeword.