Anda di halaman 1dari 5

Boosting Output Distributions in Finite Blocklength

Channel Coding Converse Bounds


Oliver Kosut
School of Electrical, Computer and Energy Engineering
Arizona State University
Tempe, AZ 85287
Email: okosut@asu.edu

AbstractPoint-to-point channel coding is studied in the finite


blocklength regime. Many existing converse bounds involve an
optimization over a distribution on the channel output. This
paper provides a method for generating good, if not optimal,
output distributions. In particular, given any candidate output
distribution, a boosting procedure is given that constructs a new
distribution which improves the converse bound derived from the
divergence spectrum. For discrete memoryless channels, it is shown
that using the i.i.d. capacity-achieving output distribution as an
initial guess in this procedure results in an output distribution
that is good enough to derive the third-order coding rate for most
channels. The finite blocklengths bounds are then applied to the
Z channel.

I. I NTRODUCTION
Beginning with [1], there has recently been considerable
interest in developing good information theoretic bounds in the
non-asymptotic finite blocklength regime. This paper focuses on
converse bounds for point-to-point channel coding, in particular
for discrete memoryless channels.
The tightest known finite blocklength converse bounds for
channel coding typically involve optimization over a distribution
on the channel output. The problem of finding this optimal
distribution was examined in [2], which derived the exact
optimal distribution for some channels, including the binary
symmetric channel and the binary erasure channel. For the latter,
it was shown that even though the channel is memoryless and
stationary, the optimal output distribution is not i.i.d. However,
the techniques in [2] do not appear to be sufficient to derive the
exact optimal distribution for non-symmetric channels.
Another line of work, dating back to Strassen [3], focuses
on deriving multi-term asymptotic expressions for the optimal
code rate with fixed probability of error. The bounds in [1] were
shown to give the exact second-order (or dispersion) term, but
not always the third-order term. Using a carefully chosen output
distribution, a tighter bound on the third-order term was derived
in [4]. This output distribution would in principle result in a
finite blocklength bound, but one that is difficult to compute.
The third-order bound in [4] is exact for most channels, but it
was shown in [5] that for symmetric singular channels, such as
the binary erasure channel, the third-order term is smaller. (It
is not clear whether our results recover this bound.)
This work was supported in part by the National Science Foundation under
grant CCF-1422358.

The present paper presents a technique for finding output


distributions that, while not necessarily optimal, are good
enough, in that they nearly recover the same fixed error asymptotic bounds as in [4], while resulting in easily computable
finite blocklength bounds. (The only deficiency in
our result as
compared to [4] is that the remaining term is O( log n) rather
than O(1).) After setting up the problem in Sec. II, in Sec. III we
state two existing converse bounds, namely the so-called metaconverse based on hypothesis testing that originated in [1], and a
relaxation based on the divergence spectrum from [4]. In Sec. IV
we state our main result, which gives a procedure for boosting
a candidate output distribution: that is, given a distribution
Q0 , a new distribution Q1 is found that improvesor at least
does not degradethe resulting divergence spectrum converse
bound. An obvious initial guess for Q0 is the i.i.d. capacityachieving output distribution. We show in Sec. V that using
this initial guess and then applying our boosting technique
gives a bound tight enough to derive the fixed-error asymptotics
up to the third-order term for most channels. We then apply
our finite blocklength bounds to the Z channel in Sec. VI,
an asymmetric binary channel for which good and computable
finite blocklength converse bounds have not previously been
found.
II. P RELIMINARIES
Consider a one-shot discrete channel with finite input alphabet X , finite output alphabet Y, and transition matrix W (y|x).
An (M, ) code consists of a (possibly randomized) encoder
f : {1, . . . , M } X and a (possibly randomized) decoder
g : Y {1, . . . , M } that satisfies the average probability of
error constraint
M
1 X
P{g(Y ) 6= m|X = f (m)} .
M m=1

(1)

M ? () = max{M : there exists an (M, ) code}.

(2)

Let

We will also consider a stationary memoryless channel, with n


independent uses of the same discrete channel. In this scenario,
the input alphabet is X n , the output alphabet Y n , and the
transition matrix
n
Y
W (y n |xn ) =
W (yi |xi ).
(3)
i=1

Let M ? (, n) be the analogue of M ? () in the case of n channel


uses.
Notation: Given an alphabet X , let P(X ) be the set of
distributions on X . We assume all logarithms and exponentials
are to base 2. For a transition matrix W and distribution
QY P(Y), define the information density
W kQY (x; y) = log

W (y|x)
.
QY (y)

(4)

For channel W , let Q? (y) be the unique capacity-achieving


output distribution. We use the shorthand (x; y) without a
subscript to denote W kQ? (x; y).
For a sequence xn X n , let Pxn be the type of xn .
Let Pn (X ) be the set of types of sequences in X n . For
P Pn (X ) let T (P ) be the type class of type P , i.e. the
sequences xn for which Pxn = P . For a pair of sequences
(xn , y n ) X n Y n , let Pxn ,yn denote their joint type, and
P n n (x,y)
Pyn |xn (y|x) = xPx,yn (x) the conditional type.
III. E XISTING R ESULTS
For two distributions P and Q on the same alphabet, the
fundamental limit of hypothesis testing between these two
distributions is described the following quantity, which gives
the smallest type-II error for a given type-I error:
X
Q(x)PZ|X (1|x).
(P, Q) =
min
P
PZ|X :

P (x)PZ|X (1|x)

(5)
This quantity was used in [1] to derive the so-called metaconverse bound, stated as follows.
Theorem 1: For any channel W (y|x),
1
inf sup 1 (PX W, PX QY ).
?
M () PX QY

(6)

It was shown in [2] that convexity properties of the function


allow the infimum and supremum in (6) to be reversed without
changing the value, and moreover that the right-hand side of
(6) is equal to



sup 7 inf (W (|x), QY )
(7)

QY

=1

where () denotes taking the convex envelope of a function.


Note that the minimization over the distribution PX is reduced
to the simpler process of minimizing over individual input
values.
Virtually all known converse bounds can be derived from
Thm. 1, including those in this paper. However, even with
the simplification in (7) it remains difficult to compute the
optimal QY for channels that are not highly symmetric. The
main contribution of the present paper is to suggest specific
QY distributions that seem to give good results.
For any two distributions P and Q, the divergence spectrum
is given by




P (X)
Ds (P kQ) = sup R : P log
R  . (8)
Q(X)
In [4], the fact that for any > 0,
log 1 (P, Q)

Ds+ (P kQ)

was applied to Thm. 1 to derive the following result.


Theorem 2: For any channel W (y|x),
log M ? () inf inf sup Ds+ (W (|x)kQY ) log .
>0 QY

Our techniques to find good output distributions QY are tailored


to Thm. 2, but the resulting distributions may also be applied
to Thm. 1, as we illustrate in Sec. VI.
IV. N EW C ONVERSE B OUNDS
Our main result is the following theorem, which addresses the
problem of finding a good QY distribution in (10) (or, by proxy,
(6)). Given an initial distribution Q0 , the theorem provides a
new distribution Q1 that is better in the sense that the divergence
spectrum is not increased, and thus using it in Thm. 2 can give
a tighter bound. In the sequel we will show that for memoryless
channels, using the capacity-achieving output distribution as Q0
is a good initial guess, in that the resulting Q1 provides a bound
tight enough to give the third-order coding rate.
Theorem 3: Fix a set T X Y, and an initial distribution
Q0 P(Y). Let
X
= max
W (y|x)
(11)
x

(9)

y:(x,y)T
/

and define a new distribution


Q1 (y) =

1
K

max

x:W kQ0 (x;y)Ds+ (W (|x)kQ0 ),


(x,y)T

W (y|x)

(12)

where the normalizing constant K is chosen so that


P
y Q1 (y) = 1. Then
max Ds (W (|x)kQ1 ) log K max Ds+ (W (|x)kQ0 ).
x
x
(13)
Proof: For any pair of discrete distributions P and Q, R =
Ds (P kQ) if and only if


P (X)
P log
< R ,
(14)
Q(X)


P (X)
P log
R > .
(15)
Q(X)
For any x, applying (15) to Ds+ (W (|x)kQ0 ) gives
X
+ <
W (y|x)

(16)

y:W kQ0 (x;y)Ds+ (W (|x)kQ0 )

W (y|x) +

y:W kQ0 (x;y)


Ds+ (W (|x)kQ0 ),
(x,y)T

W (y|x) (17)

y:(x,y)T
/

W (y|x) +

(18)

y:W kQ0 (x;y)Ds+ (W (|x)kQ0 ),


(x,y)T

log

(10)

X
y:W (y|x)KQ1 (y)

W (y|x) +

(19)

where (18) holds by the definition of , and (19) holds because,


by the definition of Q1 , if W kQ0 (x; y) Ds (W (|x)kQ0 ) and
(x, y) T , then W (y|x) KQ1 (y). Hence
X
W (y|x) > 
(20)
y:log

W (y|x)
K
Q1 (y)

which proves that log K maxx Ds (W (|x)kQ1 ).


Now we may write
X
log K = log
max
W (y|x)
y

log

X
y

log

x:W kQ0 (x;y)


Ds (W (|x)kQ0 ),
(x,y)T

W (y|x)

max

max

x:W kQ0 (x;y)


y
Ds (W (|x)kQ0 )

log

X
y

(22)

x:W kQ0 (x;y)


Ds (W (|x)kQ0 )

(21)

Q0 (y)2Ds (W (|x)kQ0 )

max Q0 (y)2

P (x)V (W (|x)kQ)

(33)

T (W kQ|P ) =

P (x)T (W (|x)kQ).

(34)

(25)

This proves the second inequality in (13).


Applying Thm. 3 to Thm. 2 immediately gives the following
result.
Corollary 4: For any channel W (y|x), any initial distribution
Q0 P(Y), and any set T X n Y n ,
M ? () inf Ds++ (W (|x), Q1 ) log
>0

(26)

where is defined in (11) and


max

(23)
(24)

x:Q0 kW (x;y)Ds++ (W (|x),Q0 ),


(x,y)T

Define the conditional versions of each of these quantities as


X
D(W kQ|P ) =
P (x)D(W (|x)kQ)
(32)

1
K

Define the second- and third-order absolute moments of the


log-likelihood ratio as
2

X
P (x)
D(P kQ)
(30)
V (P kQ) =
P (x) log
Q(x)
x

3
X


P (x)

(31)
T (P kQ) =
P (x) log
D(P kQ) .
Q(x)
x

V (W kQ|P ) =

Ds (W (|x)kQ0 )

= max Ds (W (|x)kQ0 ).

Q1 (y) =

relative entropy is given by the mean of the log-likelihood ratio


between P and Q
X
P (x)
D(P kQ) =
P (x) log
.
(29)
Q(x)
x

W (y|x).

(27)

One can derive even tighter bounds by successively applying


Thm. 3, using (12) with Q1 in the place of Q0 to find an even
better distribution Q2 , and so on.
We state one more corollary to Thm. 3 specifically for
memoryless channels, by taking Q0 to be the capacity-achieving
output distribution. In this corollary, we use the intermediate
bound log K rather than maxx Ds (W (|x)kQ1 ) as in Corollary 4. Even though this is a weaker bound, we will prove in
Sec. V that it is enough to derive our fixed-error asymptotic
bound.
Corollary 5: For a stationary memoryless channel with
capacity-achieving i.i.d. output distribution Q? (y n ), and any set
T X n Y n,
1X
M ? (, n) inf
max
W (y n |xn ) (28)
>0 n
xn :(xn ;y n )
y
Ds++ (W (|xn )kQ? ),
(xn ,y n )T

where is as defined in (11).


V. F IXED E RROR A SYMPTOTICS
We first make several definitions that will be necessary for
our asymptotic analysis. For two distributions P and Q their

The mutual information


P is given by I(P, W ) = D(W kP W |P )
where P W (y) =
x P (x)W (y|x). The capacity of channel
W is given by
C = max I(P, W ).
(35)
P P(X )

Let be the set of capacity-achieving input distributions. There


is a unique capacity-achieving output distributions Q? = P W
for all P . Let V (P, W ) = V (W kP W |P ). The channel
P(X )
dispersion is given by V = minP V (P, W ). Let
be the set of distributions P such that for some conditional
distribution PY |X , P PY |X = Q? and
X
W (y|x)
= C.
(36)
P (x)PY |X (y|x) log ?
Q (y)
x
Let V = minP V (W kQ? |P ). Note that V V , but they are
sometimes equal, as for the Z channel. Finally, define Q() to
be the complementary Gaussian CDF, and Q1 () its inverse
function. It was shown in [4] that for all discrete memoryless
channels with V > 0,

1
log M ? (, n) nC nV Q1 () + log n + O(1). (37)
2
The following theorem uses Corollary 5
to derive a weaker
bound, in which the remainder term is O( log n) rather than
O(1) if V < V .
Theorem 6: Consider a discrete memoryless channel with
V > 0 and any  < 1/2. It follows from Corollary 5 that

1
log M ? (, n) nC nV Q1 () + log n + r(n) (38)
2

where r(n) = O(1) if V = V and r(n) = O( log n) if V < V .


Proof: Define the set of distributions PXY where PY |X is
close to W as
(
)
r
log n
Sn = PXY P(X Y) : kPY |X W k F1
n
(39)
Here we assume that the probability of error  < 1/2.

for a constant F1 to be defined later. We will apply Corollary 5


with T = {(xn , y n ) : Pxn ,yn Sn }. For a given xn , we have
X
W (y n |xn )
(40)
y n :(xn ,y n )T
/

"

P |PY n |xn (y|x) W (y|x)| > F1

x,y

#
log n n
x
(41)
n

 


p
B0
|X | |Y| Q F1 F2 log n +
n
#
"


0
F1 F2 log n
B

|X | |Y|
+
n
F1 F2 log n


1
=O
n

(43)
(44)

D(PX ) = D(W kQ? |PX )


r


V (W kQ? |PX ) 1 0 B(PX )
Q
 +

n
n

(45)

6T (W kQ? |PX )
.
V (W kQ? |PX )3/2

The Berry-Esseen theorem


where B(PX ) =
gives that
0
Ds (W (|xn )kQ? ) nD(Pxn ).
(46)
Define the set of distributions
(
)
X
R = PXY P(XY) :
PXY (x, y)(x; y) D(PX ) .
x,y

yn

X
yn

(47)
max

x :(xn ;y n )D(Pxn )
(xn ,y n )T
?

Q (y ) exp

W (y n |xn )

(48)


max

max

PX|Y :PXY Sn

xn :Pxn ,yn RSn

(x ; y )


(49)

D(PX )

(52)

where in (51) we have used the fact that if y n T (PY ) then


?
Q? (y n ) = 2n(H(PY )+D(PY kQ )) , and in (52) we have applied
the definition of R. It can be shown (e.g. exercise 1.2.2 in [6])
that for sufficiently small , there exists a constant F3 such that
for all PY for which D(PY |Q? ) , then
|T (PY )| F3 n

|Y|1
2

2nH(PY )

(53)

and for all PY , we have the looser bound |T (PY )| 2nH(PY ) .


For any PY define
r


V (W kQ? |PX ) 1 0 B(PX )
.
Q
 +
U (PY ) =
min
PX|Y :PXY Sn
n
n
(54)
Thus for any PY
max

PX|Y :PXY Sn

D(PX ) max D(W kQ? |PX ) U (PY ) (55)


PX|Y

max D(W (|x)kQ? ) U (PY ) (56)


x

= C U (PY )

(57)

where in (57) we have used the fact that for the capacityachieving output distribution Q? , maxx D(W (|x)kQ? ) = C.
Combining (53) and (57) into (52) gives
M ? (, n)
exp{nC}
X

(58)
|T (PY )| exp{n(D(PY kQ? ) U (PY ))} (59)

PY Pn (Y)

F3 n

|Y|1
2

exp{n(D(PY kQ? ) U (PY ))}

PY Pn (Y):
D(PY kQ? )

Note that (x ; y ) nD(Pxn ) if and only if Pxn ,yn R.


Now applying Corollary 5, for any > 0 we have
M ? (, n)
X

(42)

where in (42) we have applied the Berry-Esseen theorem, with


F2 a constant relating to the variance of PY n |xn (y|x), in (43) we
have used the fact that Q(z) (z)
z where (z) is the standard
Gaussian
PDF, and (44) holds for sufficiently large F1 . Thus
= O(1/ n) where is defined as in (11).
Define for convenience 0 =  + + . For any PX P(X )
let

!)

exp{n)}

(60)

PY Pn (Y):
D(PY kQ? )>

We may approximate the first term in (60) by an integral over


P(Y). In particular, there exists a constant F4 where for the
first term in (60) is at most
Z
|Y|1
exp{n(D(PY kQ? ) U (PY ))} dPY . (61)
F4 n 2
P(Y)

Now applying Laplaces approximation (see [7], Chap. 9,

Thm. 3), for some constant F5 , (61) is upper bounded by


n
n
=
Q? (y n ) exp
max
n
P
(x,
y)(x;
y)
x ,y
n

x
:
yn
x,y
(n)
(n)
Pxn ,yn RSn
F5 exp{n(D(PY kQ? ) U (PY )}
(62)
(50)
(
where
X
(n)

|T (PY )| exp n H(PY ) D(PY kQ? )


(63)
PY = arg min D(PY kQ? ) + U (PY ).
PY Pn (Y)
PY
!)

(n)
X
Since U (PY ) = O(1/ n) for all PY , PY = Q? + O(1/ n).
+
max
PXY (x, y)(x; y)
(51)
(n)
(n)
PX|Y :PXY RSn
Thus D(PY kQ? = O(1/n) and U (PY ) = U (Q? ) + O(1).
x,y
(
Hence (62) is at most F6 exp{nU (Q? )} for some constant
X
F6 . Moreover, recall that PXY Sn means that PY |X is close

|T (PY )| exp n H(PY ) D(PY kQ? )


to W . In the case that PY |X = W and PY = Q? , then we need
PY Pn (Y)
X

log M ? (, n)

0.4
Capacity
0.35
0.3
Thm. 2 bound
0.25
Rate

PX . Since V = minPX V (W kQ? |PX ), by continuity,


for some constant F7
s
p


V F7 log n/n 1 0 B(PX )
?
Q
 +
(64)
U (Q )
n
n
r


V 1 0
log n
=
Q ( ) O
.
(65)
n
n

Putting this all together and setting = 1/ n we have

0.2

(66)

0.15

exp{n}

(67)

0.1

p

1
() + log n + O
log n .
2

(68)

nC nU (Q ) log +

Thm. 1 bound

PY Pn (Y)

nC

nV Q

In the case that V = V , we may similarly assume the optimal


PY distribution is close to Q? , but in order for the inequality
in (52) to hold with equality, in the limit of large n we need
X
W (y|x)
= C.
(69)
PXY (x, y) log ?
Q (y)
x,y
q

Thus U (Q? ) may be lower bounded by Vn Q1 () O(1/n).


VI. T HE Z C HANNEL
The Z channel is a discrete memoryless channel with binary
inputs and outputs in which Y = 0 if X = 0, but if X = 1,
then Y is uniformly distributed between 0 and 1. The capacity is
log(5/4), the capacity achieving input distribution is PX (0) =
3/5 and the capacity achieving output distribution is Q? (0) =
4/5. For any pair xn , y n where if xi = 0 then yi = 0, let a =
nPxn (1) and b = nPyn (1). These two integers determine the
joint type of xn , y n , because Pxn ,yn (0, 1) = 0. By symmetry,
we may assume that any distribution QY P(Y) depends only
n
on the type
Pn of yn. Write this distribution as QY (b), and it must
satisfy k=0 k QY (k) = 1.
Consider applying Thm. 3 with any distribution Q0 (b) such
that, for some b0 , Q0 (b) is strictly decreasing in b for b
b0 and Q0 (b) = 0 for b > b0 . The capacity achieving output
distribution Q? (b) = (1/5)b (4/5)nb satisfies this condition.
We also take T = X n Y n , so that = 0. Given a sequence
xn with nPxn (1) = a. The output sequence Y n is uniformly
distributed over all sequences with yi = 0 if xi = 0. For any
such y n , W (y n |xn ) = 2nan. To calculate Ds (W (|xn )kQ0 ) =
)
R, we note that log WQ(y0 (y|x
R iff
n)
log Q0 (b) a R.
Hence

Ds (W (|xn )kQ0 )

=


X a
2a ,
b
?

b<ba

(70)

log Q0 (b?a ),

where


X a
2a > .
b
?

(71)

bba

Note that b?a is non-decreasing in a. Moreover, (xn ; y n )


Ds (W (|xn )kQ0 ) if b b?a . Hence the distribution Q1 is given
by


1
1
Q1 (b) =
max? 2a =
exp min{a : b b?a } (72)
K a:bba
K

0.05
0

DT bound

100

200

300

400
500
Blocklength

600

700

800

Fig. 1. Finite blocklength bounds for the Z channel with probability of


error 103 . The output distribution Q1 deriving from applying the boosting
procedure of Thm. 3 to the capacity-achieving output distribution was used to
find converse bound from Thms. 2 and 1. Also shown is the dependence testing
(DT) achievability bound.


P
where K is chosen so that b nb Q1 (b) = 1. This distribution
also satisfies the decreasing condition, so plugging it back into
Thm. 3 does not yield a still-better distribution. Moreover, Q1
optimizes Thm. 2 over all distributions QY (b) that are nonincreasing in b, since by continuity any distribution that is nonincreasing may be perturbed slightly into one that is decreasing
while changing the bound an infinitesimal amount, and by the
above analysis Q1 dominates any such distribution.
In Fig. 1 we show several finite blocklength bounds for the Z
channel. One converse bound is Thm. 2 using the distribution
Q1 described above. The parameter was roughly optimized
using a bisection method. The resulting distribution Q1 was then
applied to Thm. 1, resulting in a somewhat tighter bound. Also
shown in Fig. 1 is the dependence testing (DT) achievability
bound from [1].
R EFERENCES
[1] Y. Polyanskiy, H. V. Poor, and S. Verdu, Channel coding rate in the finite
blocklength regime, IEEE Trans. Inform. Theory, vol. 56, pp. 23072359,
2010.
[2] Y. Polyanskiy, Saddle point in the minimax converse for channel coding,
Information Theory, IEEE Transactions on, vol. 59, no. 5, pp. 25762595,
2013.
[3] V. Strassen, Asymptotic approximations in Shannons information
theory, Aug. 2009, English translation of original Russian article in
Trans. Third Prague Conf. on Inform. Th., Statistics, Decision Functions,
Random Processes (Liblice, 1962), Prague, 1964. [Online]. Available:
http://www.math.cornell.edu/ pmlut/strassen.pdf
[4] M. Tomamichel and V. Tan, A tight upper bound for the third-order
asymptotics for most discrete memoryless channels, Information Theory,
IEEE Transactions on, vol. 59, no. 11, pp. 70417051, 2013.
[5] Y. Altug and A. B. Wagner, The third-order term in the normal approximation for singular channels, arXiv:1309.5126v1, 2013.
[6] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete
Memoryless Systems. Orlando, FL: Academic Press, 1982.
[7] R. Wong, Asymptotic Approximations of Integrals. Academic Press, Inc.,
1989.

Anda mungkin juga menyukai