I. I NTRODUCTION
Beginning with [1], there has recently been considerable
interest in developing good information theoretic bounds in the
non-asymptotic finite blocklength regime. This paper focuses on
converse bounds for point-to-point channel coding, in particular
for discrete memoryless channels.
The tightest known finite blocklength converse bounds for
channel coding typically involve optimization over a distribution
on the channel output. The problem of finding this optimal
distribution was examined in [2], which derived the exact
optimal distribution for some channels, including the binary
symmetric channel and the binary erasure channel. For the latter,
it was shown that even though the channel is memoryless and
stationary, the optimal output distribution is not i.i.d. However,
the techniques in [2] do not appear to be sufficient to derive the
exact optimal distribution for non-symmetric channels.
Another line of work, dating back to Strassen [3], focuses
on deriving multi-term asymptotic expressions for the optimal
code rate with fixed probability of error. The bounds in [1] were
shown to give the exact second-order (or dispersion) term, but
not always the third-order term. Using a carefully chosen output
distribution, a tighter bound on the third-order term was derived
in [4]. This output distribution would in principle result in a
finite blocklength bound, but one that is difficult to compute.
The third-order bound in [4] is exact for most channels, but it
was shown in [5] that for symmetric singular channels, such as
the binary erasure channel, the third-order term is smaller. (It
is not clear whether our results recover this bound.)
This work was supported in part by the National Science Foundation under
grant CCF-1422358.
(1)
(2)
Let
W (y|x)
.
QY (y)
(4)
P (x)PZ|X (1|x)
(5)
This quantity was used in [1] to derive the so-called metaconverse bound, stated as follows.
Theorem 1: For any channel W (y|x),
1
inf sup 1 (PX W, PX QY ).
?
M () PX QY
(6)
=1
Ds+ (P kQ)
(9)
y:(x,y)T
/
1
K
max
W (y|x)
(12)
(16)
W (y|x) +
W (y|x) (17)
y:(x,y)T
/
W (y|x) +
(18)
log
(10)
X
y:W (y|x)KQ1 (y)
W (y|x) +
(19)
W (y|x)
K
Q1 (y)
log
X
y
log
W (y|x)
max
max
log
X
y
(22)
(21)
Q0 (y)2Ds (W (|x)kQ0 )
max Q0 (y)2
P (x)V (W (|x)kQ)
(33)
T (W kQ|P ) =
P (x)T (W (|x)kQ).
(34)
(25)
(26)
(23)
(24)
1
K
V (W kQ|P ) =
Ds (W (|x)kQ0 )
Q1 (y) =
W (y|x).
(27)
1
log M ? (, n) nC nV Q1 () + log n + O(1). (37)
2
The following theorem uses Corollary 5
to derive a weaker
bound, in which the remainder term is O( log n) rather than
O(1) if V < V .
Theorem 6: Consider a discrete memoryless channel with
V > 0 and any < 1/2. It follows from Corollary 5 that
1
log M ? (, n) nC nV Q1 () + log n + r(n) (38)
2
"
x,y
#
log n n
x
(41)
n
p
B0
|X | |Y| Q F1 F2 log n +
n
#
"
0
F1 F2 log n
B
|X | |Y|
+
n
F1 F2 log n
1
=O
n
(43)
(44)
n
n
(45)
6T (W kQ? |PX )
.
V (W kQ? |PX )3/2
yn
X
yn
(47)
max
x :(xn ;y n )D(Pxn )
(xn ,y n )T
?
Q (y ) exp
W (y n |xn )
(48)
max
max
PX|Y :PXY Sn
(x ; y )
(49)
D(PX )
(52)
|Y|1
2
2nH(PY )
(53)
PX|Y :PXY Sn
= C U (PY )
(57)
where in (57) we have used the fact that for the capacityachieving output distribution Q? , maxx D(W (|x)kQ? ) = C.
Combining (53) and (57) into (52) gives
M ? (, n)
exp{nC}
X
(58)
|T (PY )| exp{n(D(PY kQ? ) U (PY ))} (59)
PY Pn (Y)
F3 n
|Y|1
2
PY Pn (Y):
D(PY kQ? )
(42)
!)
exp{n)}
(60)
PY Pn (Y):
D(PY kQ? )>
x
:
yn
x,y
(n)
(n)
Pxn ,yn RSn
F5 exp{n(D(PY kQ? ) U (PY )}
(62)
(50)
(
where
X
(n)
(n)
X
Since U (PY ) = O(1/ n) for all PY , PY = Q? + O(1/ n).
+
max
PXY (x, y)(x; y)
(51)
(n)
(n)
PX|Y :PXY RSn
Thus D(PY kQ? = O(1/n) and U (PY ) = U (Q? ) + O(1).
x,y
(
Hence (62) is at most F6 exp{nU (Q? )} for some constant
X
F6 . Moreover, recall that PXY Sn means that PY |X is close
log M ? (, n)
0.4
Capacity
0.35
0.3
Thm. 2 bound
0.25
Rate
0.2
(66)
0.15
exp{n}
(67)
0.1
p
1
() + log n + O
log n .
2
(68)
nC nU (Q ) log +
Thm. 1 bound
PY Pn (Y)
nC
nV Q
=
X a
2a ,
b
?
b<ba
(70)
log Q0 (b?a ),
where
X a
2a > .
b
?
(71)
bba
0.05
0
DT bound
100
200
300
400
500
Blocklength
600
700
800
P
where K is chosen so that b nb Q1 (b) = 1. This distribution
also satisfies the decreasing condition, so plugging it back into
Thm. 3 does not yield a still-better distribution. Moreover, Q1
optimizes Thm. 2 over all distributions QY (b) that are nonincreasing in b, since by continuity any distribution that is nonincreasing may be perturbed slightly into one that is decreasing
while changing the bound an infinitesimal amount, and by the
above analysis Q1 dominates any such distribution.
In Fig. 1 we show several finite blocklength bounds for the Z
channel. One converse bound is Thm. 2 using the distribution
Q1 described above. The parameter was roughly optimized
using a bisection method. The resulting distribution Q1 was then
applied to Thm. 1, resulting in a somewhat tighter bound. Also
shown in Fig. 1 is the dependence testing (DT) achievability
bound from [1].
R EFERENCES
[1] Y. Polyanskiy, H. V. Poor, and S. Verdu, Channel coding rate in the finite
blocklength regime, IEEE Trans. Inform. Theory, vol. 56, pp. 23072359,
2010.
[2] Y. Polyanskiy, Saddle point in the minimax converse for channel coding,
Information Theory, IEEE Transactions on, vol. 59, no. 5, pp. 25762595,
2013.
[3] V. Strassen, Asymptotic approximations in Shannons information
theory, Aug. 2009, English translation of original Russian article in
Trans. Third Prague Conf. on Inform. Th., Statistics, Decision Functions,
Random Processes (Liblice, 1962), Prague, 1964. [Online]. Available:
http://www.math.cornell.edu/ pmlut/strassen.pdf
[4] M. Tomamichel and V. Tan, A tight upper bound for the third-order
asymptotics for most discrete memoryless channels, Information Theory,
IEEE Transactions on, vol. 59, no. 11, pp. 70417051, 2013.
[5] Y. Altug and A. B. Wagner, The third-order term in the normal approximation for singular channels, arXiv:1309.5126v1, 2013.
[6] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete
Memoryless Systems. Orlando, FL: Academic Press, 1982.
[7] R. Wong, Asymptotic Approximations of Integrals. Academic Press, Inc.,
1989.