Anda di halaman 1dari 27

Chapter 1 Introduction

1. Shannons Information Theory 2. Source Coding theorem 3. Channel Coding Theory 4. Information Capacity Theorem 5. Introduction to Error Control Coding Appendix A : Historical Notes

1. Shannon's Information Theory


The theory provides answers to two fundamental questions (among others): (a)What is the irreducible complexity below which a signal cannot be compressed? (b)What is the ultimate transmission rate for reliable communication over a noisy channel?

2. Source Coding Theorem (Shannon's first theorem)


The theorem can be stated as follows: Given a discrete memoryless source of entropy H (S ) , the average code-word length L for any distortionless source coding is bounded as
L H (S )

This theorem provides the mathematical tool for assessing data compaction, i.e. lossless data compression, of data generated by a discrete memoryless source. The entropy of a source is a function of the probabilities of the source symbols that constitute the alphabet of the source. Entropy of Discrete Memoryless Source Assume that the source output is modeled as a discrete random variable, S , which takes on symbols from a fixed finite alphabet
S = { s0 , s1 ,L , s K 1 }

With probabilities
P(S = sk ) = pk , k = 0,1,2,L , K 1 with

p
k =0

K -1

=1

Define the amount of information gain after observing the event S = sk as the logarithmic function

I(sk ) = log2 (

1 ) bits pk

the entropy of the source is defined as the mean of I ( sk ) over source alphabet S given by
H (S ) = E[ I ( sk )] = =
K 1 k =0 K 1 k =0

I ( sk ) 1 ) bits pk

pk log 2 (

The entropy is a measure of the average information content per source symbol. The source coding theorem is also known as the "noiseless coding theorem" in the sense that it establishes the condition for error-free encoding to be possible.

3. Channel Coding Theorem (Shannon's 2nd theorem)


The channel coding theorem for a discrete memoryless channel is stated in two parts as follows: (a)Let a discrete memoryless source with an alphabet S have entropy H (S ) and produce symbols once every TS seconds. Let a discrete memoryless channel have capacity C and be used once every TC seconds. Then if
H (S ) C TS TC

There exists a coding scheme for which the source output can be transmitted over the channel and be reconstructed with an arbitrarily small probability of error.

(b) Conversely, if
H (S ) C > TS TC

It is not possible to transmit information over the channel and reconstruct it with an arbitrarily small probability of error. The theorem specifies the channel capacity C as a fundamental limit on the rate at which the transmission of reliable error-free message can take place over a discrete memoryless channel. Mutual Information
x Channel
alphabet X
alphabet Y

Channel input x (selected from X ) with X = {x 0 , x1 ,..., x J 1 } entropy H . Channel output y (selected from Y ) ,
Y = { y0 , y1 ,..., y K 1 }

How can we measure the uncertainty about x after observing y?

Define the conditional entropy of X selected from X , given that Y = y k as 1 H (X | Y = y ) = P ( x | y ) log [ ] P(x | y )
J 1 j=0 k j k 2 j k

The mean of entropy H ( X | yk ) over the output alphabet Y is given by


H (X | Y) = =
K 1 k =0

H (X

| Y = yk ) P ( y k )
j , y k ) log 2 [

P( x
k = 0 j =0

K - 1 J 1

1 P ( x j | yk )

The conditional entropy represents the amount of uncertainty remaining the channel input after the channel output has been observed. The difference H ( X ) H ( X |Y ) is called the mutual information, which represents the uncertainty about the channel input that is resolved by observing the channel output. Denoting the mutual information by I ( X; Y ) , we may thus write
I ( X; Y ) = H ( X ) H ( X | Y )

Note that

I ( X; Y ) = I ( Y; X ) .

Channel Capacity The mutual information expressed by


I ( X ; Y ) = P ( x j , yk ) log2 [
k =0 j =0 K -1 J 1

can

be

P ( x j | yk ) ] P ( yk )

input probability distribution { P ( x j )} is obviously independent of the channel. We can then maximize I ( X ;Y ) with respect to { P ( x j )} Hence we define the channel capacity C as
C = max I ( X; Y )
{ p( x j )}

The

The channel capacity in bits per channel use.

is measured

The Channel Coding Theorem is also known as noisy coding theorem.

4. Information Capacity Theorem (also known as Shannon-Hartley law or Shannon's 3rd theorem)
It can be stated as follows: The information capacity of a continuous channel of bandwidth B Hz, perturbed by additive white Gaussian noise of power spectral density and limited in bandwidth to B , is given by
C = B log 2 ( 1 + P ) bits/secon d N0 B

N0 2

where P is the average transmitted power. This theorem implies that, for given average transmitted power P and channel bandwidth B , we can transmit information at the rate C bits per second, with arbitrarily small probability of error by employing sufficiently complex encoding systems.

Shannon limit: For an ideal system that transmits data at a rate Rb = C Thus P = Eb Rb = Eb C Then
E C C = log 2 (1 + b ) B No B
C

Eb 2 B 1 = c No B

For infinite bandwidth, the E b / N o approaches the limiting value


( Eb E ) = lim ( b ) = ln 2 = 0.693 = 1.6 dB B N No o

This value is called Shannon limit.

Appendix A.

Historical Notes

The seeds of Shannon's information theory appeared first in a Bell Labs classified memorandum dated 1 September 1945, A Mathematical Theory of Cryptography (revised form was published in BSTJ, 1949.) Shannon's work on cryptography problem at Bell Labs brought him into contact with two scholars, R.V.L. Hartley and H. Nyquist, who were major consultants to the cryptography research. Nyquist (1924) has shown that a certain bandwidth was necessary in order to send telegraph signals at a definite rate. Hartley (1928) had attempted to quantify of information as the logarithm of the number of possible message built from a pool of symbols.

Shannon was aware of Norbert Wiener's work on cybernetics (he cited Wiener's book in his two 1948 articles on information theory). Wiener had recognized that the communication of information was a problem in statistics. Shannon had taken a mathematics course from Norbert Wiener when he was studying at MIT, and Shannon had access to the "Yellow Peril" report (Wiener's book before publication) while he worked on cryptographic research. Major Publications of Shannon A Mathematical Theory of Communication, I, II, published in BSTJ, 1948, 379-423, 623-656. Communication Theory of Secrecy Systems, published in BSTJ, 1949, 656-715.

Paperback edition of "The Mathematics Theory of Communication" was published by University of Illinois Press, 1963. 32,000 copies have been sold up to 1964. In a review of the accomplishments of Bell Labs. in communication science, Millman in his edited book (1984), A history of engineering and science in the Bell system: Communication Science (1925-1980) says: Probably the most specular development in communication mathematics to take place at Bell Laboratories was the formulation in the 1940's of information theory by C. E. Shannon. In his 1948 articles on information theory, Shannon credits J.W. Tukey (a professor at Princeton Univ.) with suggesting the word "bit" as a measure of information.

In 1984 J.L. Massey wrote a paper Information Theory: the Copernican System of Communications. Published in IEEE Communications Magazine, vol. 22, No. 12, pp. 26-28. The Shannons theory of information is the scientific basis of communications in the same sense that the Copernicus heliocentric theory is the scientific basis of astronomy. IEEE Transaction on Information Theory Oct. 1998. Vol.44 No.6. Commemorate Issue (1948-1998)

5.Introduction to Error Control Coding 5.1 Purpose of Error Control Coding In data communications, coding is used for controlling transmission errors induced by channel noise or other impairments, such as fading and interference, so that error-free communication can be achieved. In data storage system, coding is used for controlling storage errors (during retrieval) caused by storage medium defects, dust particles and radiation so that error-free storage can be achieved.
Information source Source encoder Channel encoder Modulation (writing unit)

noise interference

Transmission channel (storage medium)

Destination

Source decoder

Channel decoder

Demodulator (read unit)

5.2 Coding Principle Coding is achieved by adding properly designed redundant digits (bits) to each message. These redundant digits (bits) are used for detecting and/or correcting transmission (or storage) errors. 5.3 Types of Codings Block and Convolutional coding. Block coding: A message of k digits is mapped into a structured sequence of n digits, called a codeword.
k digits n digits

k : message length n : code length

k : code rate n

The mapping operation is called encoding. Each encoding operation is independent of the past encoding, i.e. memoryless. The collection of all codeword is called a "block code".

Convolutional Coding: An information sequence is divided into (short) blocks of k digits each. Each k-digit message is encoded into an n-digit coded block. The n-digit coded block depends not only the corresponding k-digit message block but also on m ( 1 ) previous message blocks. That is, the encoder has memory of order m. The encoder has k inputs and n outputs. An information is encoded into a coded sequence. The collection of all possible code sequences is called an (n, k, m) convolutional code. Normally,
1k 8 2n9 k : code rate n

5.4 Type of Errors and Channels Random errors and burst errors. Type of Channels Random error channels: Deep space channel, satellite channels, line of sight transmission channel, etc. Burst error channels: Radio links, terrestrial microwave links, wire and cable transmission channels, etc.

5.5 Decoding Suppose a codeword corresponding to a message is transmitted over a noisy channel. Let r be the received sequence. Based on r , the encoding rules and the noise characteristics of the channel, the receiver (or decoder) makes a decision which message was actually transmitted. This decision making operation is called decoding. The device which performing the decoding operation is called a decoder. Two types of decoding: Hard-decision decoding and soft-decision decoding. Hard-decision decoding: When binary coding is used, the modulator has only binary inputs. If binary demodulator output quantization

is used, the decoder has only binary inputs. In this case, the demodulator is said to make hard decisions. Decoding based on hard decision made by the demodulator is called hard decision decoding. Soft-decision decoding: If the output of demodulator consists of more than two quantization levels or is left unquantized, the demodulator is said to make soft decisions. Decoding based on soft decision made by demodulator is called soft-decision decoding. Hard-decision decoding is much easier to implement than soft-decision decoding. However, soft-decision decoding offers much better performance.

5.6 Some Channel Models Binary Symmetric Channel (BSC)


0 Input 1 1-p 1-p p p 1 Output 0

p: transition probability

When BPSK modulation is used over the AWGN channel with optimum coherent detection and binary output quantization, the bit-error probability for equally likely signal is given by
p = Q( 2E ) N0
y2 2

where

Q(x)

1 2p

dy

E: N0 2

bit energy : power spectral density of AWGN

Binary-input, 8-ary Output Discrete Channel


0 Input 1 0 1 2 3 4 5 6 7

Output

5.7 Optimum Decoding


c Suppose the codeword corresponding to a certain message m is transmitted. Let r be the corresponding output of the demodulator.

The decoder produces an estimate of the message based on r .

An optimum decoding rule is that minimize the probability of a decoding error. That is, P ( c c | r ) is minimized. Or, equivalently, maximizing P ( c = c | r ) .

The decoding error is minimized for a given r by choosing c to be a codeword c that maximizes
P (c | r ) = P ( r | c ) P (c ) P(r )

That is, c is chosen to be the most likely codeword, given that r is received.

5.8 Maximum a posteriori decoding (MAP decoding) In general, we have


P (c | r ) = P (r | c ) P (c ) P (r )

If knowledge or an estimation of P (c ) is used for decoding, the technique is called MAP decoding.

5.9 Maximum Likelihood Decoding (MLD) Suppose all the messages are equally likely. An optimum decoding can be done as follows: For every codeword c j , compute the conditional probability
P(r | c j )

The codeword c j with the largest conditional probability is chosen as the estimate c for the transmitted codeword c . This decoding rule is called the Maximum Likelihood decoding (MLD).

MLD for a BSC Let a = (a1 , a2 ,L , a n ) and b = ( b1 , b2 ,L , bn ) be two binary sequences of n components. The Hamming distance between a and b , denoted as d H ( a , b ) , is defined as the number of places where a and b differ. In coding for a BSC, every codeword and every received sequence are binary sequences. Suppose some codeword is transmitted and the received sequence is
r = ( r1 , r2 ,L, rn )

For a codeword c i , the conditional probability P ( r | ci ) is


P ( r | c i ) = p d H ( r,ci ) ( 1 - p ) n- dH ( r ,ci )

For , P ( r | c i ) is a monotonically decreasing function of d H ( r , c i ) . Then P ( r | c i ) > P ( r | c j ) if and only if


d H ( r , ci ) < d H ( r , c j )

p<

1 2

The MLD is completed by the following steps: (i) Compute d H ( r , ci ) for all c i (ii) c i is taken as the transmitted codeword if d H ( r , ci ) < d H ( r , c j ) for j i (iii) Decode c i into message m i That is, the received vector decoded into the closest codeword.
r

is

This is also called the minimum distance (nearest neighbor) decoding.

5.10 Bounded Distance Decoding Given a received word r , a t-error correcting, bounded distance decoder selects that codeword c which minimizes d H ( r , c ) if and only if there exists c such that d H ( r , c ) t . If no such c exists, then a decoder failure is declared. All practical bounded distance decoder use some form of syndrome decoding. The bounded distance decoding is usually an incomplete decoding since it decodes only those received words lying in a radius-t sphere about a codeword.

Anda mungkin juga menyukai