Anda di halaman 1dari 14

Vol 63, No.

11;Nov 2013
292 Jokull Journal
Performance Analysis of Low Complexity MAP Decoder for
Low Power Applications
Mr.P.Maniraj kumar
1
, Dr.S.Sutha
2


1
Faculty of ECE, PSNA College of Engineering & Technology, Dindigul, India.
Mail-id: mani1376@yahoo.com

2
Faculty of EEE, University College of Engineering , Panruti, India
Mail-id : suthapadmanabhan@gmail.com
Abstract
This paper proposes Maximum A-Posteriori (MAP) decoding architecture with
greater error correcting capability and lower computational hardware complexity.
Decoding procedure of the conventional Max-Log-MAP algorithm are performed in
parallel and formulated in to a set of simple operations, which can considerably
increase the speed of the decoding operations and reduce the complexity of the
algorithm. In this research, we propose a scalable Maximum A-Posteriori Algorithm
(MAP) processor design which can support both single-binary (SB) and double-binary
(DB) decoding for high throughput decoding. The Pipelined metric architecture is
proposed along with the sliding window (SW) approach in order to increase the speed of
the computation. The memory requirement for storing the branch and state metrics can
be reduced 45% to 55%, and synthesis result shows that the overall memory area can be
reduced by 29% to 32 %, when compared to state-of-art MAP decoders. Decoder
throughput of the proposed method is maintained without degrading the BER
performance. Compared with previous low-complexity techniques, this work reduces
gate counts by more than 34.4% and requires only a one-line-buffer memory. The final
implementation costs 122 K gate counts with XCS3S500E FPGA processor and
consumes 74 mw of power consumption.

Keywords: Maximum a Posteriori probability (MAP) decoder, Single Binary (SB),
Double Binary (DB), Throughput and Interleaver.

1. Introduction
In our existing communication systems, reliability of the communication link
depends upon block channel codes. To obtain high performance and good quality of
service, source data is encoded into a codeword that adds some redundancy to the
transmission of information bits from the source as parity bits. Then, these parity bits
are utilized by the decoder at the receiver side to perform forward error correction
(FEC). Nowadays, several communication standards specify the use of either turbo or
LDPC codes or both, for FEC. These codes support several services and applications,
which include access networks such as wireless local access networks (W-LANs) and
wireless metropolitan access networks (W-MANs), advanced cellular networks starting
from UMTS-2000 and 3GPP to the satellite broadcasting for fixed and hand-held
terminals.
A Mathematical Theory of Communications, Shannon stated that a code did
exist that could achieve a data rate close to capacity with negligible probability of error
[1]. Turbo code, proposed by Berrou in 1993, shows error correcting performance close
to the Shannon limit [2]. This turbo code has been used some applications where real-
Vol 63, No. 11;Nov 2013
293 Jokull Journal
time processing is not required, such as satellite communications, due to its hardware
complexity and decoding delay. Since then, a great deal of research effort has been
taken to improve the performance of the Turbo code. As a result, the Turbo code has
been adopted in the IMT-2000 system for high data rate transmission. Convolutional
Turbo Codes (CTC) has been implemented in advanced wireless communication
standards, such as worldwide interoperability for microwave access (WiMAX). Turbo
codes have also been employed in different standards for wireless communication
systems, such as wideband code division multiple access (WCDMA) and W-MAN
(IEEE 802.16) due to the outstanding performance in terms of bit error rate (BER) at
very low signal-to-noise ratio (SNR).
A Turbo encoder generates the parity bits along with the input data using two
constituent encoders. Both the encoders encode the input data and the scrambled data is
passed through an Interleaver. A Turbo decoder consists of two constituent decoders to
perform iterative decoding by taking a-priori information from each decoder for a better
error correction capability. The SISO decoding algorithm which has been most widely
used is Maximum A-Posteriori probability (MAP) algorithm. The decoder
implementing the MAP algorithm is more complex than other decoders, but the BER
performance is twice as that of other decoders.
Owing to the recent advancement in fabrication and circuit design technology,
the BER performance has become a key parameter than hardware complexity. As a
result, the MAP algorithm is preferred in most decoders. A log-MAP decoding
algorithm has been proposed to reduce computational complexity of the MAP
algorithm, and the block-wise MAP decoding algorithm has been proposed to reduce
memory usage. The block-wise MAP decoding algorithm requires lesser memory than
the original MAP decoding algorithm.
In this work, we have proposed a hybrid decoder which combines the features
and performances of Low complexity pipelined metric based MAP decoder architecture
for low power consumption. The rest of this research article is organized as follows:
Section II describes different state of art of MAP decoder. Section III describes the
proposed hardware architecture of pipelined metric MAP decoder. Section IV describes
quantitative results and their discussion in detail. Finally, Section V represents the
various reference papers referred for this research.
2. Literature Review
Chen-Hung Lin et al [3] proposed a scalable MAP processor design which
supports both single-binary (SB) and double-binary (DB) CTC decoding. They
proposed 3 combinations of parallel-window (PW) and hybrid-window (HW) MAP
decoding. The design of computational modules and storages of the dual-mode (SB/DB)
MAP decoding are designed to achieve high area utilization. Experiments were
conducted and a 1.28 mm2 dual-mode 2PW-1HW MAP processor was implemented in
0.13 m CMOS process. The prototype chip achieved a maximum throughput rate of
500 Mbps at 125 MHz with an area efficiency of 3.13 bits/mm2 and an energy
efficiency of 0.19 nJ /bit.
In [4], a novel area-efficient folded modified Convolutional interleaving (FMCI)
architecture was presented for MAP decoder. The end-to-end delay of FMCI requires
only 2M. Also, the number of latches required can be reduced to (M-2). Hence this
architecture has lesser end-to-end delay and minimizes the number of latches usage
compared to block interleaving and Convolutional interleaving. Additionally, the life
time analysis and forward-backward allocation has also been implemented. The results
Vol 63, No. 11;Nov 2013
294 Jokull Journal
show that FMCI architecture can reduce the memory elements about 88% satisfying the
condition M =NJ.
In [5], Perttu Salmela et al implemented turbo decoder as a highly parallel
application specific instruction set processor (ASIP). High parallelism requires a higher
memory throughput. Expensive dual-port memory was avoided in their design with a
novel memory interface of the extrinsic information memory. The accelerating units are
allowed to connect directly to the memory interfaces of the processor to enable fast
memory access. Due to the high parallelism and throughput, the proposed ASIP could
be used as a programmable replacement of pure hardware decoder. Their ASIP design is
fully programmable and customizable. The results show that it achieved a throughput of
22.7 Mbps with six turbo iterations at 277 MHz clock frequency on 130nm technology.
S. Raghukrishna and K. N. Hari Bhatt [6] implemented the turbo decoder using
Max-Log-MAP algorithms to avoid complex numerical representation problems. The
Max-Log-Map algorithm is the least complex of the four algorithmsMAP, LOG-
MAP, soft output Viterbi algorithm (SOVA) and Improved SOVA. It has twice the
complexity of the Viterbi algorithm for each half-iteration but offers the worst BER
performance. The Max-Log-Map algorithm is tolerant to high noise variance while
operating on an additive white Gaussian noise (AWGN) channel.
A fixed-point model of the MAP decoding algorithm of turbo and low-density
parity-check (LDPC) codes has been proposed in [7]. Fixed-point model of the
decoding algorithm aims to support the two classes of codes and reduces the power
consumption and area requirements. This is done by taking the turbo and LDPC codes
of two recent communication standards such as WiMAX and 3GPP-LTE as a reference
benchmark for mobile applications and by analyzing their performance over AWGN
channel for different values of the fixed-point parameters.
S. M. Karim et al [8] presented a new pipelined architecture of Turbo decoder
which runs at nearly four times the speed of a recently reported architecture with a
reasonable increase in hardware. Their architecture is based on block-interleaved
pipelining technique which enables the pipelining of the add-compare-select-offset
(ACSO) kernels. Moreover next iteration initialization (NII) method has been adapted in
their work to initialize sliding window border values. The decoder chip consumes
219.8mW of power at a maximum operating frequency of 192.3 MHz when
implemented using 0.18m CMOS technology. Synthesis result shows that the
designed turbo decoder achieves a decoding throughput of 38.46 Mbps with an energy
efficiency of 1.14nJ / bit/ iteration at the maximum operating frequency.
A different approach was used in [913] for parallel memory access in turbo
decoders where buffers were applied instead of deriving conflict-free address generation
and bank selection functions. In [911] high-speed decoding with several write accesses
is assumed. For each writer there is one memory bank and for each bank there is a
dedicated buffer.
2.1. Overview of Turbo Code and MAP Algorithm
This section presents a brief description on the structure of the Turbo code and
the MAP algorithm. The basis of turbo coding is to introduce redundancy in the data to
be transmitted through a channel. The redundant data helps to recover original data from
the received data. In data transmission, turbo coding helps to achieve near Shannon
limit performance. Turbo-Codes operate on block level that is the data is separated into
blocks. A block is decoded in iterative procedure by Maximum A-Posteriori (MAP) or
soft output Viterbi algorithm (SOVA) component decoders. The turbo encoder
Vol 63, No. 11;Nov 2013
295 Jokull Journal
transmits the encoded bits which form inputs to the turbo decoder. The turbo decoder
decodes the information iteratively. The maximum a posteriori algorithm (MAP) is used
in the turbo decoder.
Figure 1 shows an overall structure of the Turbo encoder having a 1/2 coding
ratio. It consists of two encoders, one of which encodes the input bit sequence while the
other encodes the bit sequence obtained by interleaving the input bit sequence.Input bit
sequence D is directly output together with encoded bit sequence Y
k
.









Figure 1. Parallel Concatenated Turbo Codes (PCTC)

At time k, encoder 1 produces encoded output sequence Y
1
using input sequence
D, and encoder 2 produces output sequence Y
2
by encoding D, which is obtained by
interleaving the input bit sequence D. To construct a half-code rate, Y could be
generated using a puncturing method between Y
1
and Y
2
. Constituent encoders employ
the recursive systematic Convolutional code. The outputs of the systematic
Convolutional code consist of the original input data and encoded bit sequence Y. It is
well known that the BER performance of the systematic Convolutional code is better
than that of the non-systematic Convolutional code in low signal-to-noise ratio (SNR)
channel environments. The Turbo code also shows a remarkable BER performance
because the recursive systematic Convolutional code has an infinite impulse response.
Turbo decoder as shown in Figure 2 consists of two constituent decoders, known
as soft-input soft-output (SISO) decoders, which communicate iteratively through an
interleaver/de-interleaver. These two decoders perform the backward process of the two
encoders in the Turbo encoder. The interleaver and de-interleaver reassemble the
information bit sequence, and the hard decision block determines and generates the
decoded bit sequence.













Figure 2. Turbo Decoder
D
Y
Y
Y
De
Multiplexer
Decoder for
code 1

Interleaver

Decoder for
code 2
De-
Interleaver

Interleaver
Hard
Decision
Input
y2
d
Output
y1
Vol 63, No. 11;Nov 2013
296 Jokull Journal
1 + = m K
Turbo decoding is iterative. The decoding is also soft; the values that flow
around the whole decoder are real values and not binary representations (with the
exception of the hard decisions taken at the end of the number of iterations you are
prepared to perform). They are usually log likelihood ratios (LLRs), the log of the
probability that a particular bit was a logic 1 divided by the probability the same bit was
a logic 0. Decoding is accomplished by first demultiplexing the incoming data stream
into d, y
1
, y
2
, d and y
1
go into the decoder for the first code, Figure 2. This gives an
estimate of the extrinsic information from the first decoder which is interleaved and
passed on to the second decoder. The second decoder thus has three inputs, the extrinsic
information from the first decoder, the interleaved data d, and the received values for y
2
.
It produces its extrinsic information and this is deinterleaved and passed back to the first
encoder. This process is then repeated or iterated as required until the final solution is
obtained from the second decoder Interleaver.

3. Proposed MAP Encoder
The functional description of the PCTC encoding is shown in Figure1. The
encoding process in Figure 1 passes the original information bit, that is, systematic bit,
unchanged. Two parity bits are created by two component Convolutional encoders. One
of the Convolutional encoders takes input message or data bits in sequential order, but
the input sequence of the second component encoder is interleaved. The interleaving is
denoted by in Figure 1.The Convolutional encoder architecture for (n,k) in its generic
form is given in Figure 3.In this research, we have used (2,1) Convolutional structure
for turbo encoder. Generator representation shows the hardware connection of the shift
register taps to the modulo-2 adders. A generator vector represents the position of the
taps for an output. Here 1 represents a connection and 0 represents no connection.
For example, the two generator vectors for the encoder are g1 =[111] and g2 =[101],
where the subscripts 1 and 2 denote the corresponding output terminals. The code rate r
for a Convolutional code is defined as,

n
k
r = (1)
Where, k is the number of parallel input information bits and n is the number of parallel
output encoded bits at one time interval. The constraint length K for a Convolutional
code is defined as,
(2)

Where, m is the maximum number of stages (memory size) in shift register.











Figure 3. Convolutional Encoder Structure

Vol 63, No. 11;Nov 2013
297 Jokull Journal
) ( )
1
( ) ,
1
(
1
0
) ( )
1
( ) ,
1
(
1
1
ln ) (
m
S
m
S
m
S
m
S
m
S
m
S
m
S
m
S
m
S
m
S
m
S
m
S
d L
m


=
( )


+ =
1
)) ( ln ) , ( exp(ln ln
1 1
m
S
m m m m
S S S S
3.1. Interleaver

A technique called bit (or binary digit) interleaving keeps track of the number and
sequence of the bits from each specific transmission so that they can be quickly and
efficiently reassembled into their original form upon receipt. Figure 4 represents the
Interleaving scheme. Interleaving is mainly used in digital data transmission technology,
to protect the transmission against burst errors. Linear Interleavers limit the error rate
performance of turbo codes by the linear interleaver asymptote. These Interleavers are
generally good for turbo codes with very short block lengths.


Figure 4. Interleaver Scheme

3.2. Proposed Max Log MAP decoding algorithm
The Max Log Map algorithm operates in logarithmic domain in order to avoid
the highly complicated exponential operations involved in the MAP algorithm. The
MAP algorithm is an optimal but computationally complex SISO algorithm. The Log-
MAP and Max-Log-MAP algorithms are simplified versions of the MAP algorithm.
Using Alpha ( forward state metric), Beta ( backward state metric) and Gamma (
branch metric), the maximum likelihood ratio (LLR) is calculated which provides soft
decision[3]. The soft output makes it possible to decide the received information is zero
or one.
The Max Log -MAP algorithm is used to design reduced complexity decoder. It
involves the calculations of the forward and backward state metric values to obtain the
log likelihood ratio (LLR) values, which consist of decoded bit information and
reliability values.
The input sequence consisting of information bits X
k
parity bits Y
k
may include
additive white Gaussian noise at time k. The MAP decoded output, the log-likelihood
ratio of information bits d
k
can be derived from Equations. (4),(5 )and 6 using MAP
algorithm,
The log likelihood ratio values are calculated by the following equation:

(3)


Forward state metric can be calculated by
(4)

Vol 63, No. 11;Nov 2013
298 Jokull Journal
( ) ) ( ,
1 1
p
m c
s
m c
s
m e m m
u y L u x L u L S S + + =

( )

+
+ +
+ =
1
)) ( ln ) , ( exp(ln ln
1 1
m
S
m m m m
S S S S
) 1 ( ln ) , max( ) ( ln
| | y x y x
e y x e e

+ + = +
Backward state metric can be calculated by
(5)

where the branch metric () is calculated by the a priori information (Le), channel
reliability value (Lc), input symbols (x and y1), the systematic bit (ums) and the parity
bit (ump).
Branch metric can be calculated by
(6)

The above equations consists logarithm function. So, it is converted into max log
function by the well know approximation, called J acobi algorithm [10] and [11], which
is given below
(7)

In most of the MAP decoders, the log-MAP decoding algorithm is used to
reduce the complexity of the MAP decoding algorithm by determining the metrices in
the log domain. It requires memory proportional to the frame size for storing the branch
metrices and forward state metrices. The block-wise MAP decoding algorithm is
proposed to reduce the memory requirement in the log-MAP decoding algorithm.
Hence, the block-wise log-MAP decoding algorithm can be implemented in a smaller
area. The block-wise Map-decoding scheme is illustrated in Figure 5, in which the
received data frame is divided into M sub-blocks, each of size L and each block is fed to
the Map decoder one by one.
In the block-wise MAP decoding algorithm, forward and backward state
metrices are computed for each block. It is processed forward from Block 0 to Block
(M-1), the initial forward state metric of each block can be obtained from the last
forward state metric of the previous block. The initial backward state metrices cannot be
obtained from the previous block unlike the initial forward state metrices, because the
calculation sequence of backward state metric is opposite to that of the block
processing.








Figure 5. Block-wise MAP decoding scheme

Vol 63, No. 11;Nov 2013
299 Jokull Journal
3.3 Sliding Window Approach
The conventional MAP decoding process has very high latency due to the
processing of forward and backward calculations in all trellis states. Computing the
LLR values requires the state metric values generated by the forward and backward
processes. Therefore, a large memory size is required to store the state metric values,
which in turn depends on the input data block size. To address this problem, the SW
method, which uses a sub-block of input data can be required for the decoder is higher
for large data frame. The decoder needs to store N survivors and N survivor path
metrics during the forward and backward recursion calculations for N size frame
transmission. An approach which eliminates the large storage problem of decoders by
initializing the & metrics at an intermediate stage.
The following steps have been carried out for SW method.
1) Any stage, after a depth of L stages, the decision is likely to be made as
starting from the initial stage; L is typically 5 to 10 times the constraint length.
2) A parameter defined as the ratio of actual metrics per dummy metric. Let
=p/q, where p and q are integers.
3) The number of stages of actual calculations for L dummy stages of
calculations is L (by definition).
4) The data frame is divided into blocks of size L/q. Then each frame consists of
K =qN/L data blocks.
5) Let M be the number of blocks that are processed concurrently for high
throughput applications.
6) Processing is achieved by pipelining the computation units by M levels and
processing the M blocks in an interleaved fashion.







Fig. 6 Sliding Window Approach of metric calculations
D1
D2
D3
DK
T1
T2 T3
Tk
d0
0 o
d1
1 1
d2
2 2
dk
k k
Vol 63, No. 11;Nov 2013
300 Jokull Journal

Fig6 also shows how metrics calculations are done from an intermediate stage
and metrics calculations are done from the initial stage for M = 1, The
throughput of the MAP-based SISO decoder is defined by the number of bits processed,
N, divided by the latency LAsiso, i.e, throughput = N/LAsiso. LAsiso is clearly
dependent on the clock period, tclock, which is again dependent on the delay in the
recursive loops of the , and dummy- calculations. Since the computation units are
pipelined to M levels, tclock is inversely proportional to M. Apart from tclock, and the
number of outputs N, LAsiso is also a function of the latency due to use of pipelined
computation units, , and the overhead due to data loading before beginning of
computation plus the time to complete the computation of the last block. For = p/q, the
overhead for each of the M blocks is 2q - 1 time slots prior to start of the computation
and p additional time slots to finish the computation. The latency of the decoder is then
While LAsiso increases with increase of and decreases with increase of M, the effect
of is not as significant as the effect of M. As M increases, tclock reduces by a factor of
M, and the overhead and increases by a factor of M. Thus for large values of N,
increasing M to increase the throughput is still the best option. However, in order that
an output is generated in every clock cycle, the computation units have to be provided
with a large volume of data.

3.4. Pipelined metric processor Architecture

The proposed metric pipelined processor architecture consists of three adders, a
subtractor, a comparator, one Selection MUX, Trellis MUX and logic.













Figure 7. Architecture of metric pipelined processor

Since, the add-compare select operation are more complex than Trellis MUX, it
is divided into five pipeline stages in this architecture as shown in Figure 7.

3.5. Dual-Mode LAPO for Radix-4 SB/DB MAP Decoding
The dual-mode MAP decoding for multi standard CTC schemes has been
introduced in[19] . The radix-4 SB and radix-4 DB MAP is reformulated to achieve
high hardware usages and fully-shared storages of the dual mode MAP decoding [4]. In
Vol 63, No. 11;Nov 2013
301 Jokull Journal
general, the MAP is composed of branch metrics, forward recursion state metrics,
backward recursion state metrics, a priori LLR, a posteriori LLR, and extrinsic
information. For the dual-mode MAP decoding, the radix-4 SB MAP decoding and
radix-4 DB MAP decoding are employed because of their computational similarity.
Figure.7 shows the block diagram of the dual-mode MAP Decoder. The output from the
proposed LLR unit is given as the input for the MAP LLR Calculator.















Figure 8. Block diagram of the dual-mode MAP Decoder















(a) SB mode (b) DB mode

Figure 9. Logic blocks of dual mode LAPO

Mode is used to configure the dual-mode LLR calculator. When Mode is
active low, the dual-mode LLR calculator is in SB mode. In the dual mode architecture
some of the connections are dummy.
4. Results and discussion
A number of performance evaluation and resource utilization parameters are
being used in the design of pipelined metric MAP decoder. The present research is
focused on the design and development of pipelined metric MAP decoder for low power
applications. The parameters considered for investigation include number of slices(S),
number of LUTs(Look up table),Slice latches(SL),Occupied Slice latches(OSL) ,IOBs
Latches(Input-Output Block),Input IBUFs (Input Buffers),gate counts(GA),Latency(L)
LL
(00)


(00)

MAX
(00)


(01)

MAX
(01)


(10)

MAX
10)


(11)

MAX
(11)

LL
(11)

LL
(10)
LL
(01)

Dual Mode SB / DB MAP LLR Calculator
Mode
Lapo
k-1
/Lapo
2
k
Lapo
k
/Lapo
1
k
Lapo
3
k

Vol 63, No. 11;Nov 2013
302 Jokull Journal
) ( 2 A + +
=
DL P S L I
f N b
rate Throughput
and Power consumption(PC),throughput rate, Area Utilization,normalized energy
efficiency(NEE), normalized area efficiency(NAE),throughput of one iteration(R).
The proposed decoder can be designed using Verilog HDL, a descriptive
hardware language for architectural module design. This designed architectural module
is simulated using Modelsim 5.5e simulating tool, synthesized using Xilinx Project
Navigator 10.2i synthesis tool and tested on a Spartan3E family device XC3S500E.
This proposed scheme utilized 1972 LUTs and 476 FFs at a maximum frequency of
125MHz. The proposed work results shows that the pipelined metric MAP decoding
system incorporated with SB/DB mode leads to lower power consumption in terms of
slices, Look Up Tables and Flip Flops.

Table 1. Comparison of power consumptions of Spartan-3 family

FPGA Family Device Specifications Power Consumption
Spartan-3E Xc3s1200E 158.9mw
Spartan-3E Xc3s500E 81.8mw
Spartan-3E Xc3s250E 52.27mw

The various devices in the Spartan-3 family are tested against their frequency
and power and tabulated in Table 1.














Figure 9. Comparison of power consumptions of Spartan-3 family

Table 2. Performance Analysis of hardware utilization

SB Mode DB Mode
Slices 110 89
LUTs 70 60
Gate counts 1789 1701
IOBs 21 21


(8)


Spartan family
Vol 63, No. 11;Nov 2013
303 Jokull Journal
% 100
mod
mod
X
ules total of Area
ules active of Area
AU
|
|
.
|

\
|
=
factor energy Normalised X
Throghput
Power
NEE
|
|
.
|

\
|
=
factor area Normalised X
Frequency X Area
Throughput
NAE
|
|
.
|

\
|
=
where, b denotes decoded hard bit per clock cycle, B- information bits in a CTC block,
N- decoded CTC block size (=B/b), f - operating frequency of MAP processor, L
window size, DL decoding latency of MAP decoding, P no.of parallel windows, S
no.of sliding windows, I no.of decoding iterations.
(9)


AU- Area Utilization
The NEE indicates how much energy a decoder chip consumes to process a hard bit.
The normalized energy factor is 0.23; similarly, the normalized area factor is 1.92.

(10)

Normalized area efficiency (NAE) as the other performance index.

The NAE indicates how many hard bits per one mm
2
for a single CTC block a decoder
chip decodes.
(11)

Note that the NEE and NAE are not to justify which design is superior to the others, but
to provide an evaluative method for reference.

Table 3 Performance evaluation parameters

Performance
Evalaution
Parameters
Dual Mode
MAP Decoder
Throughput 24.8Mbps
AU 110.6 %
NEE 0.17nJ /bit
NAE 3.84bits/mm
2













Figure 10. Comparison of throughput and area utilization
Vol 63, No. 11;Nov 2013
304 Jokull Journal

5. Conclusion
Synthesis results the proposed architecture with a SB/DB decoding can achieve
the memory requirement for storing the branch and state metrics can be reduced 45% to
55%, and overall memory area can be reduced by 7.29% to 8.90%, when compared to
state-of-art MAP decoders. This work reduces gate counts by more than 34.4% and
requires only a one-line-buffer memory. The final implementation costs 122 K gate
counts with XCS3S500E FPGA processor and consumes 74 mw of power consumption.
Our MAP processor achieved high throughput rates with low energy efficiency and high
area efficiency. Proposed system has minimum number of blocks compared to the
conventional System.

6. References
[1]C. Shannon, A Mathematical Theory of Information, Bell System Technical J ., vol.
27, J uly 1948, pp. 379-423.

[2]C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon Limit Error-
Correcting Coding and Decoding: Turbo-Codes(1), Proc. ICC93, Geneva,
Switzerland, May 1993, pp. 1064-1070.

[3]Chen-Hung Lin, Chun-Yu Chen, and An-Yeu Wu (Andy), Area-Efficient Scalable
MAP Processor Design for High-Throughput Multi standard Convolutional Turbo
Decoding, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol.
19, No. 2, pp.305-318, Feb. 2011.
[4]S. Shiyamala and V. Rajamani, A Novel Area Efficient Folded Modified
Convolutional Interleaving Architecture for MAP Decoder, International J ournal of
Computer Applications, Vol. 9, No. 9, pp-18-22, Nov. 2010.
[5]Perttu Salmela, Harri Sorokin, and J armo Takala, A Programmable Max-Log-MAP
Turbo Decoder Implementation, Research Article in VLSI Design, Vol. 2008, 2008.
[6]S. Raghukrishna, K. N. Hari Bhatt, Implementation of Turbo Decoder Using Max-
Log-Map Algorithm for Wireless Application, International J ournal of Advanced
Technology & Engineering Research (IJ ATER), pp. 246-251.
[7]Massimo Rovini, Giuseppe Gentile, and Luca Fanucci, Fixed-Point MAP Decoding
of Channel Codes, Research Article in EURASIP J ournal on Advances in Signal
Processing, Vol. 2011, 2011.
[8]S. M. Karim, Girish Mahale, and Indrajit Chakrabarti, A Pipelined Architecture for
High Throughput Efficient Turbo Decoder, Special Issue of International J ournal of
Computer Applications on Electronics, Information and Communication Engineering
(ICEICE), No. 1, pp. 12-16, Dec. 2011.
[9]M. J . Thul, N. Wehn, and L. P. Rao, Enabling high speed turbo-decoding through
concurrent interleaving, in Proceedings of IEEE International Symposium on Circuits
and Systems (ISCAS 02), Vol. 1, pp. 897-900, Phoenix, Ariz, USA, May 2002.
Vol 63, No. 11;Nov 2013
305 Jokull Journal
[10]F. Berens, M. J . Thul, F. Gilber, and N. Wehn, Electronic device avoiding write
access conicts in interleaving, in particular optimized concurrent interleaving
architecture for high throughput turbo-decoding, European Patent Application
EP1401108 A1, March 2004.
[11]M. J . Thul, F. Gilbert, T. Vogt, G. Kreiselmaier, and N. Wehn, A scalable system
architecture for high-throughput turbo decoders, The J ournal of VLSI Signal
Processing, Vol. 39, No. 1-2, pp. 63-77, 2005.
[12]Z. Wang and K. Parhi, Efficient interleaver memory architectures for serial turbo
decoding, in Proceedings of IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP 03), vol. 2, pp. 629-632, Hong Kong, April 2003.
[13]Z. Wang, Y. Tang, and Y. Wang, Low hardware complexity parallel turbo decoder
architecture, in Proceedings of IEEE International Symposium on Circuits and Systems
(ISCAS 03), Vol. 2, pp. 5356, Bangkok, Thailand, May 2003.
[14]Sik Kim, Sun-Young Hwang, and Moon J un Kang, A Memory-Efficient Block-
wise MAP Decoder Architecture, ETRI Journal, Vol. 26, No. 6, pp. 615-621, Dec.
2004.
[15]Telemetry Channel Coding, Consultative Committee for Space Data Systems
(CCSDS), Blue book 101.0-B-4, May 1999.
[16]A. Viterbi, An Intuitive J ustification and a Simplified Implementation of the MAP
Decoder for Convolutional Codes, IEEE J . on Selected Areas in Comm., Vol. 16, No.
2, pp. 260-264, Feb. 1998.
[17]G. Park, S. Yoon, I. J in, and C. Kang, A Block-wise MAP Decoder Using a
Probability Ratio for Branch Metric, Proc. VTC99, Amsterdam, Netherlands, pp.
1610-1614, Sep. 1999.
[18]Z. Wang, H. Suzuki, and K. Parhi, VLSI Implementation Issues of Turbo Decoder
Design for Wireless Applications, Proc. IEEE Workshop on Signal Processing
Systems, Taipei, Taiwan, pp. 503-512, Oct. 1999.
[19]S.-J . Lee, N. R. Shanbhag, and A. C. Singer, Area-efficient highthroughput MAP
decoder architectures, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no.
8, pp. 921933, Aug. 2005.