Wireless Systems
T. Lestable* and E. Zimmermann#
A B T
check matrix by an identity matrix and each M −g
“0” by an all-zeros matrix). Different code
rates are obtained by appending more H= M = m⋅ p
elements to the matrix in only one dimension
(i.e., add more variable nodes, but no check C D E g =γ ⋅ p
nodes). Note that the decoder must be flexible
enough to support such changes of the code
structure. Using such base matrices hence N = n⋅ p
adds flexibility in terms of packet length Figure 3: Approximated Lower Triangular H to
facilitate near linear-time encoding
Page 3 (10)
The Block-LDPC considered in this paper The total number of logical gates (NAND) is
[27], have exactly the above requested given in Fig.5, as a function of both the block
format, and thus enable first to estimate length (indexed by its expansion factor Zf),
accurately the encoding complexity, and then and the code rate Rc. Therefore dimensioning
to take advantage of the pipelined processing requires around 11K gates for the worst case,
described hereafter (Fig. 4): here a rate Rc=1/2 code of codeword length
sT 2304 bits (Zf=96).
(1) (2) p1T
s T
A E −T −1
− Φ −1 B − T −1 p T2
(3)
C
Figure 4: Pipelined encoder structure
Page 4 (10)
The code rate ranking is respected here (Fig. introducing correction terms in the calculation
7), and Rc=0.5 is once again the worst case of check node messages [20][21]. However,
requesting around 15.6Kbits of memory the most efficient method by far is a simple
(1Kbits=1024 bits). As a result, to implement a scaling of the check node messages [16] that
fully pipelined encoder working with the whole enables to recover close-to-optimal error
range of coding schemes available in correction performance. We will refer to this
TGnSync we need 11K gates, together with algorithm as the corrected MinSum algorithm.
around 16.2Kbits of memory. Note that for a Calculating variable and check node
random-like LDPC the required amount of messages then only involves simple sum and
memory will be significantly higher, even if the minimum operations, respectively, plus a
sparsity of the check matrix is retained for scaling of the check node messages. It is
encoding, as will be highlighted later when conjectured that no further substantial
discussing interleaver complexity. reduction in the node complexity is possible
without accepting quite significant
performance losses.
Decoding Complexity
In this context, Bit-Flipping Algorithms can be
On the receiver side, there are mainly three
considered as a viable solution for low-end
different topics that have to be investigated:
terminals where some loss in performance is
how to decrease the decoder complexity, how
acceptable when large savings in decoding
to increase the efficiency by means of generic
complexity can be achieved. Such types of
architectures, and/or how to achieve high
algorithms based on majority-logic decoding
throughput. We will start by considering
have recently received increased interest from
decoding complexity. As is obvious from the
the research community [3][4][5]. Proposals
structure of the message passing process, the
for weighted bit-flipping [5][22] show
average complexity of LDPC decoding
reasonable performance at extremely low
process is the product of three factors:
implementation complexity. A drawback of
o the node complexity
such approaches is the absence of reliability
o the average number of iterations, and
information (soft output) at the output of the
o the number of nodes in each iteration.
decoder. However, since low-end terminals
In the following, we will discuss how these
will most probably not use iterative
three factors can be minimized using state-of-
equalisation and/or channel estimation
the-art algorithms.
techniques in any case, this appears to be no
significant limitation.
Node Complexity – Sub-Optimal Decoding
Convergence Speed – Scheduling
The standard algorithm for decoding of LDPC
codes is the so-called “belief propagation
It has been shown that the classical method of
algorithm (BPA)”, also known as sum-product
message passing (called flooding), that
algorithm [1][19]. For implementation
consists in updating all the nodes on one side
simplicity, it is convenient to execute the
of the Tanner graph before going into the next
algorithm in the log-domain, turning the
half-iteration, leads to the highest memory
typically required multiplications into simple
requirements, together with a higher number
additions (e.g. at the bit nodes). Following this
of iterations (delay). Alternative schedulings of
path, however, has a significant disadvantage:
the message passing [23][24][25][13] also
calculating the check node messages then
known as “shuffled” or “layered” belief
requires the non-linear “box-plus” operation.
propagation in fact yield faster convergence,
This drawback can be overcome by applying
thus enabling to reduce the average and
the maxLog approximation known from Turbo
maximum number of decoder iterations while
decoding – resulting in the well-known
retaining performance. The methods
“MinSum” algorithm [16]. Unfortunately,
proposed by Mansour and Fossorier
introducing this approximation results in a
([30][13]), are updating all bit nodes
typical performance loss of 0.5-1dB. Several
connected to each check node (horizontal
proposals aim at reducing this offset by
Page 5 (10)
scheduling), or all check node connected to allows to trade computational complexity for
each bit nodes (vertical scheduling), decoding performance. Decoding complexity
respectively. A speedup factor of 2 in the can be lowered by around 20% at a target
average number of iterations is usually FER of 1%, for both message passing and
achieved [13]. MinSum decoding at negligible losses in error
Moreover, if we denote Γ as the number of correction performance [26].
non-zero elements of H, and q as the number
of quantization bits, we can obtain the Decoder Architecture
following memory requirements (in bits):
• Flooding: Γ.q+3.n.q Recently many architecture proposals [6][8]
• Horizontal: Γ.q+n.q are more or less relying on the same trend,
• Vertical: Γ.q+(2-Rc).n.q which is generic semi-parallel architecture,
where the degree of parallelism can be tuned
depending on throughput and HW
requirements. While using Block-LDPC from
TGnSync, these architectures converge
towards the following scheme (Fig. 9):
……
VNU1 VNU 2 …… VNU n
Control
Unit
Shuffle Shuffle −−11
……
CNU1 CNU 2 …… CNU m
Decoder HW requirements
By applying now the above used estimation
technique for memory usage to all the coding
schemes available in TGnSync proposal, this Figure 11: Number of Logical Gates for TGnSync
results in the following requirements (Fig. 10): Decoder w.r.t. Block length (Zf) and Code Rate (Rc)
Interleaver Complexity
log 2 ( N ⋅ d v ) ⋅ N ⋅ d v bits
Figure 10: Memory Requirement for TGnSync for storing addresses. For instance, in our
decoder w.r.t. Block length (Zf) and Code Rate (Rc)
case with Zf=96 (2304 bits), code rate Rc=1/2,
It is worth noticing that the code of rate ¾ is and an average bit node degree of 3.4, we
requesting the largest amount of memory, would need around 6.13*107 bits,i.e., around
around 53.7Kbits. 10Mbytes of memory! Fortunately using such
Now, considering the complexity itself, Block-LDPC allows to for determining directly
authors from [7] evaluate code nodes and the interleaver relation by means of the base
check nodes complexity equivalent to 250 and model matrix and the circular shifts values.
320 NAND gates, respectively (nevertheless For the current case, we need to store 12*24
this can vary depending on the position of the cyclic shift values from -1 to 46, leading to a
shuffle network, thus impacting the complexity memory requirement of 1728 bits, which
of Node Units). In our case we apply the represents 0.28% of the memory consumed in
shuffle positioning proposed by Zhong in his the random case.
thesis [6], and thus leading to following
complexity evaluation (Fig. 11):
Page 7 (10)
Decoding Performance 10
0
BPA, 20 it.
BPA, 100 it.
The following two items can be regarded as MinSum, 20 it.
central when considering the application of -1
MinSum, 100 it.
10 Corr. MinSum, 20 it.
LDPC codes to wireless communications Corr. MinSum, 50 it.
systems, i.e., their implementation for a real-
FER
time system:
-2
o How much is lost in error correction 10
performance by replacing true belief
propagation by a reduced complexity
variant? -3
10
o How much is performance 0 1 2 3 4
deteriorated if we prefer structured to Eb/N0 [dB]
random-like LDPC codes? Figure 13: Comparison between different decoding
algorithms, for a random-like LDPC of length N=2048
To answer these two questions, we evaluate
Using the corrected Min-Sum algorithm is
the performance of the above mentioned
clearly the most promising option, especially
Block-LDPC, as well as several random-like
for larger block lengths (cf. the results in
codes of comparable block length from [31]
Figure 13). The SNR loss w.r.t. true BPA
for different decoding algorithms and block
decoding is usually below 0.2dB – which is
lengths. All of the random-like LPDC codes
quite acceptable when considering a practical
were constructed using the PEG [18]
implementation of LDPC codes.
algorithm and many of them are in fact
considered to be the best available codes for
0
the considered block length [31]. 10
N=504, Random-like code
N=576, Structured code
N=1024, Random-like code
0 N=1152, Structured code
10 N=2048, Random-like code
-1
10 N=1728, Structured code
FER
-1
10 -2
10
FER
Page 8 (10)
Acknowledgements
0
10
10
loss between 5-6 dB, which is quite [4] P. Zarrinkhat, and A. H. Banihashemi, ‘Threshold
Values and convergence Properties of Majority-Based
significant. Nevertheless, this degradation Algorithms for Decoding Regular Low-Density Parity-Check
should be balanced by considering that BFA Codes’, IEEE trans. on Comm., Vol.52, N.12, Dec. 2004.
don’t require any message-passing storage, [5] J. Zhang and M. P. C. Fossorier, ‘A Modified Weighted
or complex Processing Units. They thus might Bit-Flipping Decoding of Low-Density Parity-Check Codes’,
IEEE comm. Letters, Vol. 8, N.3, March 2004.
be suitable for very low end terminals.
[6] T. Zhang, ‘Efficient VLSI Architectures for Error-
Correcting Coding’, University of Minnesota, Ph.D Thesis,
Conclusions July 2002.
formalize the joint framework of the LDPC [12] B. Vasic, and O. Milenkovic, ‘Combinatorial
Constructions of Low-Density Parity-Check Codes for
encoding/decoding architecture. In this paper, Iterative Decoding’, IEEE Trans. on Info. Theory, Vol.50, N.6,
we reviewed the keystone elements of such a June 2004.
formal process. We are convinced that the [13] J. Zhang, M. Fossorier, “Shuffled iterative decoding,”
combination of implementation oriented code IEEE Trans. Comm., vol. 53, no.2, pp 209-213, Feb. 2005.
design (Block-LPDC) and sub-optimal [14] IEEE 802.16e, “LDPC coding for OFDMA PHY,” IEEE
Doc. C802-16e-05/066r3, January 05
decoding algorithms (corrected MinSum
decoding) will make LDPC codes a viable [15] IEEE 802.11n, “Structured LDPC Codes as an
Advanced Coding Scheme for 802.11n,” IEEE Doc. 802.11-
option for next generation wireless systems. 04/884r0, August 2004
Page 9 (10)
[16] J. Chen and M. Fossorier, “Near Optimum Universal
Belief Propagation Based Decoding of Low-Density Parity
Check Codes,” IEEE Transactions on Communications, Vol.
50, No. 3, pp. 406–414, Mar. 2002.
[17] T.J. Richardson, M.A. Shokrollahi, R. Urbanke,
“Design of Capacity-Approaching Irregular Low-Density
Parity-Check Codes”, IEEE Trans. Inform. Theory, vol. 47,
No. 2, pp 617-637, Feb. 2001
[18] X.Y. Hu, E. Eleftheriou and D.M. Arnold, “Regular and
Irregular Progressive Edge-Growth Tanner Graphs”,
submitted to IEEE Trans. On Inf. Theory 2003
[19] D. J. C. MacKay and R. M. Neal, “Near Shannon Limit
Performance of Low-Density Parity-Check Codes”, Electron.
Lett., vol. 32, pp. 1645-1646, August 1996
[20] T. Clevorn and P. Vary. “Low-Complexity Belief
Propagation Decoding by Approximations with Lookup-
Tables.” In Proc. of the 5th International ITG Conference on
Source and Channel Coding, pp. 211–215, Erlangen,
Germany, January 2004.
[21] G. Richter, G. Schmidt, M. Bossert, and E. Costa.
„Optimization of a Reduced-Complexity Decoding Algorithm
for LDPC Codes by Density Evolution.” In Proceedings of the
IEEE International Conference on Communications 2005
(ICC2005), Seoul, Korea, March 2005.
[22] Y. Kou, S. Lin and M. Fossorier, “Low Density parity-
check codes based on finite geometries: A rediscovery and
more,” IEEE Trans. Inform. Theory, vol. 47, pp. 2711-2736,
Nov. 2001.
[23] F. R. Kschischang, B. J. Frey, “Iterative Decoding of
Compound Codes by Probabilistic Propagation in Graphical
Models,” Journal on Select. Areas Commun., pp. 219-230,
1998.
[24] Y. Mao, A. H. Banihashemi, “Decoding Low-Density
Parity-Check Codes with Probabilistic Scheduling,” Comm.
Letters, 5:415-416, Oct. 2001.
[25] E. Yeo, B. Nikolie, and V. Anantharam, “High
Throughput Low-Density Parity-Check Decoder
Architectures,” IEEE Globecom 2001.
Page 10 (10)