Y
2
), we have to nd the distribution p(x
1
, x
2
) on the input alphabet
X
1
X
2
that maximizes I(X
1
, X
2
; Y
1
, Y
2
). Since the transition proba-
bilities are given as p(y
1
, y
2
|x
1
, x
2
) = p(y
1
|x
1
)p(y
2
|x
2
),
p(x
1
, x
2
, y
1
, y
2
) = p(x
1
, x
2
)p(y
1
, y
2
|x
1
, x
2
)
= p(x
1
, x
2
)p(y
1
|x
1
)p(y
2
|x
2
),
Therefore, Y
1
X
1
X
2
Y
2
forms a Markov chain and
I(X
1
, X
2
; Y
1
, Y
2
) = H(Y
1
, Y
2
) H(Y
1
, Y
2
|X
1
, X
2
)
= H(Y
1
, Y
2
) H(Y
1
|X
1
, X
2
) H(Y
2
|X
1
, X
2
)(1)
= H(Y
1
, Y
2
) H(Y
1
|X
1
) H(Y
2
|X
2
) (2)
H(Y
1
) + H(Y
2
) H(Y
1
|X
1
) H(Y
2
|X
2
) (3)
= I(X
1
; Y
1
) + I(X
2
; Y
2
),
where Eqs. (1) and (2) follow from Markovity, and Eq. (3) is met with
equality if X
1
and X
2
are independent and hence Y
1
and Y
2
are inde-
pendent. Therefore
C = max
p(x
1
,x
2
)
I(X
1
, X
2
; Y
1
, Y
2
)
max
p(x
1
,x
2
)
I(X
1
; Y
1
) + max
p(x
1
,x
2
)
I(X
2
; Y
2
)
= max
p(x
1
)
I(X
1
; Y
1
) + max
p(x
2
)
I(X
2
; Y
2
)
= C
1
+ C
2
.
with equality i p(x
1
, x
2
) = p
(x
1
)p
(x
2
) and p
(x
1
) and p
(x
2
) are the
distributions that maximize C
1
and C
2
respectively.
5. A channel with two independent looks at Y.
Let Y
1
and Y
2
be conditionally independent and conditionally identi-
cally distributed given X. Thus p(y
1
, y
2
|x) = p(y
1
|x)p(y
2
|x).
3
(a) Show I(X; Y
1
, Y
2
) = 2I(X; Y
1
) I(Y
1
; Y
2
).
(b) Conclude that the capacity of the channel
- -
X
(Y
1
, Y
2
)
is less than twice the capacity of the channel
- -
X Y
1
Solution: A channel with two independent looks at Y.
(a)
I(X; Y
1
, Y
2
) = H(Y
1
, Y
2
) H(Y
1
, Y
2
|X) (4)
= H(Y
1
) + H(Y
2
) I(Y
1
; Y
2
) H(Y
1
|X) H(Y
2
|X) (5)
(since Y
1
and Y
2
are conditionally independent given X) (6)
= I(X; Y
1
) + I(X; Y
2
) I(Y
1
; Y
2
) (7)
= 2I(X; Y
1
) I(Y
1
; Y
2
) (since Y
1
and Y
2
are conditionally
identically distributed)
. (8)
(b) The capacity of the single look channel X Y
1
is
C
1
= max
p(x)
I(X; Y
1
). (9)
The capacity of the channel X (Y
1
, Y
2
) is
C
2
= max
p(x)
I(X; Y
1
, Y
2
) (10)
= max
p(x)
2I(X; Y
1
) I(Y
1
; Y
2
) (11)
max
p(x)
2I(X; Y
1
) (12)
= 2C
1
. (13)
Hence, two independent looks cannot be more than twice as good
as one look.
4
6. Choice of channels.
Find the capacity C of the union of 2 channels (X
1
, p
1
(y
1
|x
1
), Y
1
) and
(X
2
, p
2
(y
2
|x
2
), Y
2
) where, at each time, one can send a symbol over
channel 1 or over channel 2 but not both. Assume the output alphabets
are distinct and do not intersect.
(a) Show 2
C
= 2
C
1
+ 2
C
2
.
(b) What is the capacity of this Channel?
1
2
3
1
2
3
1p
p
1p
p
Solution: Choice of channels.
(a) Let
=
_
1, if the signal is sent over the channel 1
2, if the signal is sent over the channel 2
.
Consider the following communication scheme: The sender chooses
between two channels according to Bern() coin ip. Then the
channel input is X = (, X
).
Since the output alphabets Y
1
and Y
2
are disjoint, is a function
of Y , i.e. X Y .
Therefore,
I(X; Y ) = I(X; Y, )
= I(X
, ; Y, )
= I(; Y, ) + I(X
; Y, |)
= I(; Y, ) + I(X
; Y |)
= H() + I(X
; Y | = 1) + (1 )I(X
; Y | = 2)
= H() + I(X
1
; Y
1
) + (1 )I(X
2
; Y
2
).
5
Thus, it follows that
C = sup
{H() + C
1
+ (1 )C
2
} ,
which is a strictly concave function on . Hence, the maxi-
mum exists and by elementary calculus, one can easily show C =
log
2
(2
C
1
+ 2
C
2
), which is attained with = 2
C
1
/(2
C
1
+ 2
C
2
).
If one interprets M = 2
C
as the eective number of noise free
symbols, then the above result follows in a rather intuitive manner:
we have M
1
= 2
C
1
noise free symbols from channel 1, and M
2
=
2
C
2
noise free symbols from channel 2. Since at each step we get
to choose which channel to use, we essentially have M
1
+ M
2
=
2
C
1
+ 2
C
2
noise free symbols for the new channel. Therefore, the
capacity of this channel is C = log
2
(2
C
1
+ 2
C
2
).
This argument is very similiar to the eective alphabet argument
given in Problem 19, Chapter 2 of the text.
(b) From part (b) we get capacity is
log(2
1H(p)
+ 2
0
).
7. Cascaded BSCs.
Consider the two discrete memoryless channels (X, p
1
(y|x), Y) and (Y, p
2
(z|y), Z).
Let p
1
(y|x) and p
2
(z|y) be binary symmetric channels with crossover
probabilities
1
and
2
respectively.
-
-
3 Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Qs -
-
3 Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Qs
1
0
X
1
0
Z
1
0
Y
1
1
1
1
1
1
2
1
2
2
6
(a) What is the capacity C
1
of p
1
(y|x)?
(b) What is the capacity C
2
of p
2
(z|y)?
(c) We now cascade these channels. Thus p
3
(z|x) =
y
p
1
(y|x)p
2
(z|y).
What is the capacity C
3
of p
3
(z|x)? Show C
3
min{C
1
, C
2
}.
(d) Now let us actively intervene between channels 1 and 2, rather
than passively transmitting y
n
. What is the capacity of channel 1
followed by channel 2 if you are allowed to decode the output y
n
of
channel 1 and then reencode it as y
n
for transmission over channel
2? (Think W x
n
(W) y
n
y
n
(y
n
) z
n
W.)
(e) What is the capacity of the cascade in part c) if the receiver can
view both Y and Z?
Solution: Cascaded channels.
(a) Brute force method: Let C
1
= 1 H(p) be the capacity of the
BSC with parameter p, and C
2
= 1 be the capacity of the
BEC with parameter . Let
Y denote the output of the cascaded
channel, and Y the output of the BSC. Then, the transition rule
for the cascaded channel is simply
p( y|x) =
y=0,1
p( y|y)p(y|x)
for each (x, y) pair.
Let X Bern() denote the input to the channel. Then,
H(
Y ) = H((1)((1p)+p(1)), , (1)(p+(1p)(1)))
and also
H(
Y |X = 0).
7
Therefore,
C = max
p(x)
I(X;
Y )
= max
p(x)
[H(
Y ) H(
Y |X)]
= max
p(x)
[H(
Y )] H(
Y |X)
= max
Y ) = H(
Y , E)
= H(E) + H(
Y |E)
= H() + H(
Y |E = 1) + (1 )H(
Y |E = 0)
= H() + (1 )H(Y ),
8
where the last equality comes directly from the construction of E.
Similarly,
H(
Y |X) = H(
Y , E|X)
= H(E|X) + H(
Y |X, E)
= H(E) + H(
Y |X, E = 1) + (1 )H(
Y |X, E = 0)
= H() + (1 )H(Y |X),
whence
I(X;
Y ) = H(
Y ) H(
Y |X) = (1 )I(X; Y ).
8. Channel capacity
(a) What is the capacity of the following channel
1
1
1
2
1
2
1
2
1
2
(Input) X
Y (Output)
0
0
1
1
2
2 3
3
4
(b) Provide a simple scheme that can transmit at rate R = log
2
3 bits
through this channel.
Solution for Channel capacity
(a) We can use the solution of previous home question:
C = log
_
2
C
1
+ 2
C
2
+ 2
C
3
_
Now we need to calculate the capacity of each channel:
C
1
= max
p(x)
I(X; Y ) = H(Y ) H(Y |X) = 0 0 = 0
9
C
2
= max
p(x)
I(X; Y ) = H(Y ) H(Y |X) = 1 1 = 0
C
3
= max
p(x)
I(X; Y ) = max
p(x)
{H(Y ) H(Y |X)}
= max
p(x)
_
1
2
p
2
log
_
1
2
p
2
_
_
1
2
p
2
+ p
3
_
log
_
1
2
p
2
+ p
3
__
p
2
Assigning p
3
= 1 p
2
and derive against p
2
:
dI(X; Y )
dp
2
=
p
2
2
1
2
1
p
2
2
1
2
log
_
p
2
2
_
+
2 p
2
2
1
2
1
2p
2
2
+
1
2
log
_
2 p
2
2
_
1 = 0
And as result p
2
=
2
5
:
C
3
0.322
And, nally:
C = log(2
0
+ 2
0
+ 2
0.322
) 1.7
(b) Here is a simple code that achieves capacity.
Encoding: You just use ternary representation of the message and
send using 0,1,2 but no 3 (or 0,1,3 but no 2) of the input channel.
Decoding: map the ternary output into the message.
9. Source-channel coding problem The next question is dealing with
the separation of source and channel coding. Please read the lecture
note on this topic before answering the question. It is in the website as
an appendix to Lecture 6
Consider the source-channel coding problem given in Fig. 1, where
V, X, Y, W have a Binary alphabet. The source V is i.i.d. Bernoulli
(p), and the channel is in Fig. 2.
(a) What is the capacity of the channel given in Fig. 2? [8 points]
(b) Assume that error-free bits can be transmitted through the chan-
nel. What is the minimum rate in which the source V can be
encoded such that the source decoder can reconstruct the source
V losslessy? [4 points]
10
Source Enc. Channel Enc. Channel Channel Dec. Source Dec.
V
X Y
V
Figure 1: A source-channel coding problem
0.5
0.5
0.1
0.9
0 0
1 1
Figure 2: The channel.
(c) For what values of p can the source V be reconstructed losslessly
using the scheme in Fig. 1 (you may use the inverse of H, i.e.,
H
1
(q))? [4 points]
(d) Would the answer to 9c changes if a joint source-channel coding
and decoding is allowed? [4 points]
Solution to Source-channel coding:
(a) We rst write the mutual information function: I(X; Y ) = H(Y )
H(Y |X). Note that H(Y |X) = pH(0.5)+(1p)H(0.1), where p =
p(x = 0). As for H(Y ), note that Y is distributed B(0.90.4p).
Hence,
I(X; Y ) =pH(0.5) (1 p)H(0.1)
(0.9 0.4p) log(0.9 0.4p) (0.1 + 0.4p) log(0.1 + 0.4p).
Solving
I(X;Y )
p
= 0 leaves us with
0.4 log
0.9 0.4p
0.1 + 0.4p
= 1 H(0.1),
or p = 0.4623. Thus, C = 0.1476.
11
(b) The minimum rate, of course, is R = H(p) since V B(p).
(c) We require that R C. Recall that H
1
(C) has to values, given
by a, 1 a. Thus, p a, p 1 a is the answer.
(d) No, it would not change. This is due to the fact that for DMCs,
it doesnt matter if you do joint source-channel decoding (source-
channel separation theorem).
12