frac
so that the approach can still
guarantee a faithful rounded DFP antilogarithmic result. In the
following, we focus on the algorithm and the architecture
of the (q +1)-digit DXP decimal antilogarithmic converter
which can produce the q-digit faithful decimal signicand of
the DFP antilogarithmic result.
2.2 Rounding
IEEE 754-2008 species ve types of rounding modes [2]. A
common requirement for the DFP antilogarithmic operation in
IEEE 754-2008 is capable of computing exactly rounded
results (within 0.5 ulp of precision). In order to achieve exactly
rounded results by any one of the rounding modes, it is needed
to determine whether the value of the exact result (innite
precision) is less or higher than the midpoint between the two
nearest DFP numbers. However, if the exact result is so close
to the midpoint that the exact rounding is difcult to perform,
unless we can determine the maximum length of chain of
nines or zeros after the rounding digit for every possible DFP
results (Table Makers Dilemma [23]). On the other hand,
providing additional guard digits before rounding cannot only
guarantee results much closer to a half-ulp, but also greatly
reduce the probability of incorrect rounding to near zero. In
this paper, we mainly focus on delay optimisation of the
proposed DFP antilogarithmic converter, so we design a
digit-recurrence algorithm to achieve faithfully rounded
results (within 1 ulp of precision) for the DFP antilogarithmic
operation by using the roundTiesToEven mode.
3 Algorithm
A digit-recurrence algorithm to compute 10
v
frac
is summarised
as follows
lim
j1
v
frac
log
10
(f
j
)
_ _
0 (4)
If (4) is satised
lim
j1
log
10
(f
j
)
_ _
v
frac
(5)
278 IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289
& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cdt.2011.0089
www.ietdl.org
Thus
10
v
frac
=
1
j=1
f
j
(6)
f
j
is dened as f
j
1 +e
j
10
2j
by which v
frac
is transformed
to 0 through a successive subtraction of log
10
( f
j
). This form
of f
j
allows the use of a decimal shift-and-add
implementation.
According to (5) and (6), the corresponding recurrences for
transforming v
frac
and computing the antilogarithm are
presented in (7) and (8), where
j 1, L[1] = v
frac
and E[1] = 1
L[j +1] = L[j] log
10
(1 +e
j
10
j
) (7)
E[j +1] = E[j] (1 +e
j
10
j
) (8)
The digits e
j
are selected so that L( j +1) converges to 0. A 1-
digit accuracy is, therefore obtained in each iteration. After
performing the last iteration of recurrence, the results are
L[j +1] 0 (9)
E[j +1] 10
v
frac
(10)
To have a selection function for e
j
, a scaled remainder is
dened in (11), where g is dened as a scaled constant.
W[j] = 10
j
L[j] g (11)
Thus
L[j] = W[j] 10
j
g
1
(12)
To substitute (12) into (7)
W[j +1] = 10W[j] 10
j+1
g log
10
(1 +e
j
10
j
) (13)
3.1 Selection by rounding
The selection of the digit e
j
is achieved by rounding the scaled
residuals to its integer part. In order to reduce the delay of
selection function, the rounding is performed on an estimate
W[j]) (14)
In (14), round indicates that if the digit of
W[j] at the
position 10
21
is larger than or equal to 5, the digit e
j
is
obtained by adding the integer part of
W[j] and 1; otherwise
it is directly obtained by the integer part of
W[j]. In this
work, the selection by rounding is performed with the
maximum redundant set e
j
[ {29, 28, . . . , 0, . . . , 8, 9}.
Since |e
j
| 9
9.5 ,
W[ j] , 9.5 (15)
Since we must have (15) satised, the range of W[ j ] is
9.5 +d
t
, W[j] , 9.5 +d
t
(16)
In (16), d is the truncation error. It should be noted that
0 d
t
, (10/9)10
2t
, regarding the sign-magnitude carry-
save representation of W[ j ]. Therefore the bounds of
W[ j ] 2e
j
are
0.5 , W[j] e
j
, 0.5 +
10
9
10
t
(17)
Since (13) can be represented as
W[j +1] = 10(W[j] e
j
) 10
j+1
g
log
10
(1 +e
j
10
j
) +10e
j
(18)
If we want to keep 9.5 ,
W[j +1] , 9.5, we must keep
9.5 +
10
9
10
t
, W[j +1] , 9.5 (19)
According to (17), (18) and (19), the numerical analysis is
processed as follows
10
j+1
g log
10
(1 +e
j
10
j
) 10e
j
. 4.5 +
10
9
10
t+1
(20)
10
j+1
g log
10
(1 +e
j
10
j
) 10e
j
, 4.5
10
9
10
t
(21)
The results in the numerical analysis show that when g 2.3,
and only if j 3, t 1 the conditions (20) and (21) are
satised. In doing so, the selection by rounding is only
valid for iterations j 3, and e
1
and e
2
can be only
achieved by look-up tables. However, using two look-up
tables for j 1, 2 signicantly increase the overall
hardware implementations. Therefore the restriction for e
1
is dened so that e
2
can be achieved by selection by
rounding and one look-up table is saved. Since
W[1] 10 2.3 v
frac
, W[2] can be achieved as
W[2] = 230 v
frac
10
2
2.3 log
10
(1 +e
1
10
1
) (22)
When the value of j equals to 2 and t equals to 1, the value of
e
2
is in the range of 28 e
2
8 so that (20) and (21) are
satised. Substituting 28 e
2
8 and t 1 in (17) yields
8.5 , W[2] , 8.5 +
1
9
(23)
According to (22) and (23), we obtain
230 v
frac
10
2
2.3 log
10
(1 +e
1
10
1
) , 8.5 +
1
9
(24)
230 v
frac
10
2
2.3 log
10
(1 +e
1
10
1
) . 8.5 (25)
The results in the numerical analysis of (24) and (25) show
that the decimal input operand v
frac
is restricted in the range
IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289 279
doi: 10.1049/iet-cdt.2011.0089 & The Institution of Engineering and Technology 2012
www.ietdl.org
of 21.03 v
frac
0.31 so that e
2
can be achieved with
selection by rounding. Since the value of v
frac
is in the
range of 21 , v
frac
, 1, in order to tune the positive v
frac
to negative, the fraction part of the positive v
frac
should be
rstly adjusted to negative by v
frac
21 and then its
corresponding integer part v
int
is adjusted by v
int
+1.
Table 1 shows the selection of e
1
. Since 1-digit e
1
fails to
create Table 1 for achieving continuous ranges to cover all
negative v
frac
, e
1
is extended to a 2-digit so that all negative
v
frac
can be achieved.
3.2 Error analysis and evaluation
The errors in the proposed antilogarithmic digit-recurrence
algorithm can be produced in four ways. The rst type of
error is the inherent error of algorithm, 1
i
, resulting from
the difference between the antilogarithm results obtained
from nite iterations and the exact results obtained from
innite iterations. The second one is the inexact input error,
1
v
, produced by the difference between antilogarithmic
results of the inexact input v
frac
and the real input v
frac
. The
third one is the quantisation error, 1
q
, resulting from
the nite precision of the intermediate values in hardware
implementation. The fourth one is the nal rounding error
1
r
, whose maximum value is 0.5 ulp (|1
r
| 0.5 10
2q
).
In order to achieve a q-digit decimal signicand of the
faithful DFP antilogarithmic result, the following condition
must be satised
1
t
= 1
i
+1
v
+1
q
10
q
(26)
3.2.1 Inherent error of algorithm: Since each DXP
antilogarithmic result is achieved after (q +1)th iterations,
1
i
can be dened as
1
i
=
1
j=1
(1 +e
j
10
j
)
q+1
j=1
(1 +e
j
10
j
) (27)
Thus, (27) can be written as
1
i
=
1
j=1
(1 +e
j
10
j
) 1
1
1
j=q+2
(1 +e
j
10
j
)
_ _
(28)
In (28), since the proposed DXP antilogarithmic algorithm can
compute the input values, which fall in the range of (21, 0), the
exact antilogarithmic results, obtained after the innite
iterations, are in the range of (0.1, 1). In order to use the static
error analysis method, we substitute the case e
j
9 or 29 and
the maximum value of the exact antilogarithmic results to
(28), then the maximum 1
i
is obtained
1
i
1
1
1
j=q+2
(1 +9 10
j
)
(29)
In (29), it is obvious that
1
j=q+2
(1 +9 10
j
) = e
S
1
j=q+2
ln(1+910
j
)
(30)
Since (30) is satised
1
j=q+2
ln(1 +9 10
j
) , 9 (10
q2
+10
q3
+ )
(31)
We obtain
1
j=q+2
(1 +9 10
j
) , e
9(10
q2
+10
q3
+)
(32)
Thus, the maximum absolute 1
i
is
|1
i
| , 1
1
e
9(10
q2
+10
q3
+)
1 10
q1
(33)
3.2.2 Inexact input error: If a DFP operand, v, is very
close to zero, the whole digit-width of v
frac
+0.00. . .00
d
0
, d
1
, . . . , d
q21
can be too long to be implemented. v
frac
has to be truncated to at least (q +1)-digit v
frac
in the DXP
antilogarithmic operation. Therefore the inexact input error
can be dened as
1
v
= 10
v
frac
10
v
frac
(34)
It is evident that the maximum 1
v
is obtained when (i) the v
frac
consists of (q +1)-digit leading zeros and q-digit decimal
signicand; (ii) each of decimal signicand digit, d
0
, d
1
,
. . . , d
q21
9
1
v
10
+0.00...00
....,,....
q+1
99...99
..,,..
q
10
+0.00...00
....,,....
q+1
(35)
Table 1 Selection of e
1
Range of v
frac
e
1
(BCD) Range of v
frac
e
1
(BCD)
[20.00, 20.02] 20.0(00000000) (20.49, 20.55] 27.0(00110000)
(20.02, 20.07] 21.0(10010000) (20.55, 20.61] 27.4(00100110)
(20.07, 20.12] 22.0(10000000) (20.61, 20.67] 27.7(00100011)
(20.12, 20.19] 23.0(01110000) (20.67, 20.72] 28.0(00100000)
(20.19, 20.24] 24.0(01100000) (20.72, 20.77] 28.2(00011000)
(20.24, 20.28] 24.5(01010101) (20.77, 20.82] 28.4(00010110)
(20.28, 20.32] 25.0(01010000) (20.82, 20.89] 28.6(00010100)
(20.32, 20.37] 25.5(01000101) (20.89, 20.94] 28.8(00010010)
(20.37, 20.42] 26.0(01000000) (20.94, 20.98] 28.9(00010001)
(20.42, 20.49] 26.5(00110101) (20.98, 21.00) 29.0(00010000)
280 IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289
& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cdt.2011.0089
www.ietdl.org
Equation (35) can be written as
1
v
(10
+0. 99...99
..,,..
q
)
10
q1
1 (36)
Thus
log
10
(1 +1
v
) +0. 99 . . . 99
....,,....
q
10
q1
(37)
According to Taylor series expansion of the logarithm
function log
10
(1 +x), we obtain
1
v
1
2
v
2
+
_ _
/ ln(10) , 1
v
/ ln(10)
+0. 99 . . . 99
....,,....
q
10
q1
(38)
Therefore the maximum absolute 1
v
is
|1
v
| 2.303 10
q1
(39)
3.2.3 Quantisation error: Since only the nite precision
of the intermediate values is processed in hardware
implementation, the quantisation error is produced. In this
paper, we dene FD-digit as the minimal data-width of
fractional digits for each intermediate value. The DXP
antilogarithmic results can be achieved by (q +1) times
successive multiplication
10
v
frac
=
q+1
j=1
(1 +e
j
10
j
) (40)
Since the fractional digit-width of the intermediate
multiplication results are represented in the carry-save
representation in which the carry may occur in the FD-digit
that is shifted out of the data-path in the rst interaction.
Therefore the truncated error 10
2FD
is produced from the
rst iteration. After (q +1) iterations, the maximum
quantisation error, 1
q
, can be represented as
1
q
= 10
FD
q+1
j=1
(1 +e
j
10
j
) + +
q+1
j=q+1
(1 +e
j
10
j
) +1
_ _
(41)
According to the same mathematical method as (30), (31) and
(32), each successive multiplication in (41) satises
10
FD
q+1
j=1
(1 +e
j
10
j
) , 10
FD
e
S
q+1
j=1
e
j
10
j
(42)
Thus, the maximal quantisation error, 1
q
, satises
1
q
, 10
FD
(e
S
q+1
j=1
e
j
10
j
+ +e
S
q+1
j=q+1
e
j
10
j
+1) (43)
Considering the case e
j
9 or 29 in (43), we obtain the
maximum absolute 1
q
|1
q
| , (q +2) 10
FD
(44)
3.2.4 Error evaluation: Having obtained 1
i
, 1
v
, 1
q
in (33),
(39) and (44), respectively, we achieve the maximum absolute
error 1
t
as
|1
t
| = |1
i
| +|1
v
||1
q
|
0.331 10
q
+(q +2) 10
FD
(45)
We substitute the digit-width of the decimal signicand of the
three DFP formats, q 7, 16 and 34, into (45), respectively.
The results indicate that the maximum absolute errors |1
t
|
obtained in the three DFP formats are smaller than 0.5 ulp,
which can satisfy the condition (26). Thus, the nal
rounded results are smaller than the accuracy requirement
within 1 ulp after considering the nal rounding error.
Table 2 shows the error analysis for three different DFP
interchange formats. The error analysis in Table 2 proves
that only when the minimal data-width of the fractional
digits for each intermediate value (FD-digit) is larger than
or equal to (q +2)-digit or (q +3)-digit, the proposed
algorithm can guarantee q-digit accuracy for the DXP
antilogarithm operation, and therefore a q-digit decimal
signicand of the faithful DFP antilogarithmic result can
be achieved.
3.3 Guard digit of scaled residual
Since the scaled residual W[ j ] with only nite precision is
operated in hardware implementation, we need to analyse
how many guard digits g are enough to prevent the
rounding error of W[ j ], 1
w
, from affecting the correct
selection of digits e
j
. Since W[ j ] is converged in the range
of (29.5, 9.5), we dene the digit-width of W[ j ] as
(q +g +3)-digit, consisting of three-digit integer part and
(q +g)-digit fraction part.
The values of logarithm 22.3 log
10
(1 +e
j
10
2j
) in (13)
can be achieved by storing these values in the look-up
table. With the increasing number of iterations, however,
the size of the table will become prohibitively large.
Therefore there is a need for a method that can reduce the
table size and achieve a signicant reduction in the overall
hardware requirement. A Taylor series expansion of the
logarithm function log
10
(1 +x) is demonstrated in (46)
log
10
(1 +x) = x
x
2
2
+
_ _
/ ln(10) (46)
After h iterations, the values of 22.3 log
10
(1 +e
j
10
2j
) do
Table 2 Error analysis of DFP antilogarithm for DFP interchange
formats
Format names Decimal32 Decimal64 Decimal128
signicand (q-digit) 7 16 34
no. of iteration (q +1) 8 17 35
accuracy (q-digit) 7 16 34
FD-digit 9
a
19
b
37
b
max. error (|1
t
| 10
2q
) 0.421 0.349 0.367
a
(q +2)-digit
b
(q +3)-digit
IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289 281
doi: 10.1049/iet-cdt.2011.0089 & The Institution of Engineering and Technology 2012
www.ietdl.org
not need to be stored in the look-up table, whereas
22.3 e
j
10
2j
/ln(10), instead, are used for approximation.
In iterations ( j 1) to ( j q +1), because (q +g +3)-
digit rounded values of 22.3 log
10
(1 +e
j
10
2j
) and
22.3 e
j
10
2j
/ln(10) are obtained from the look-up tables,
the rounding error, +0.5 10
2q2g
, is produced in each
iteration. The maximum quantisation error, 1
1
wq
, is
|1
wq
|
q+1
j=1
0.5 10
qg
(47)
Since the value of 22.3 log
10
(1 +e
j
10
2j
) is approximated
by the value of 22.3 e
j
10
2j
/ln(10) in iterations ( j h +1)
to ( j q +1). However, according to the series expansion of
the logarithmic function in (46), an approximation error, 1
wa
,
is produced in each iteration
1
wa
= 2.3
q+1
j=h+1
(e
j
10
j
)
2
2
+
(e
j
10
j
)
3
3
_ _
/ ln(10)
(48)
we keep (e
j
10
2j
)
2
/2 ln(10) to analyse 1
wa
1
wa
2.3
q+1
j=h+1
(e
j
10
j
)
2
2
_ _
/ ln(10) (49)
Considering the worst case (e
j
9 or 29), we obtain the
maximum 1
wa
|1
wa
| 4.01 10
2h1
(50)
Therefore according to (13), after the (q +1)th iteration, the
truncation error of W[ j ], 1
w
, is obtained as
|1
w
| 10
q+1
(|1
wq
| +|1
wa
|)
= (0.5q +0.5) 10
1g
+4.01 10
q2h
(51)
Since the digit e
j
is selected by rounding the scaled residual
frac
. Meanwhile, the value of
v
int
is obtained in the v
int
generator and sent to stage 4 by a
register. If the DFP operand is positive, the fraction part of the
DFP operand v
frac
is adjusted to a negative fraction number
by v
frac
21 in a 10s complement converter, and then it is
the input of the DXP decimal antilogarithmic converter.
Meanwhile, its corresponding integer part, v
int
, is adjusted by
v
int
+1 and sent to stage 4. The digit e
1
is obtained from
look-up table I based on the value of 2-digit MSDs of
v
frac
. The v
frac
is multiplied by a 2-digit constant 2.3 in a
multiple logic (Mult1) to achieve the (q +3)-digit value of
m 2.3 v
frac
with the carry-save representation (m
s
, m
c
).
The m out from Mult1 is shifted 2-digit to the left to achieve
10W[1] (W[1] 10 2.3 v
frac
); and the corresponding
value of 2230 log
10
(1 +e
1
10
21
) is achieved from look-up
table II. Then, the values of 10W[1] and 2230 log
10
(1 +
e
1
10
21
) are sent to stage 2 by registers.
Stage 2, from the second to the (q +1)th clock cycles
(in iterations j 2 to j q +1): In the second clock
cycle, the residual W[ j ] is achieved by adding 10W[1]
and 2230 log
10
(1 +e
1
10
21
) together in a decimal 3:2
CSA compressor. Then, the digit e
j
can be obtained by
rounding 3-digit
W[j] in a rounding e
j
logic. This can be
expressed by
(W
s
[j], W
c
[j]) = 10W[1] 230 log
10
(1 +e
1
10
1
)
e
j
= round(
W[j])
The W[ j ] in the carry-save representation is shifted 1-digit
to the left to achieve 10 W[ j ] that is sent back to
Mux2 for the next iteration. From the number of j 2 to
j hth iteration, the value of 22.3 10
j+1
log
10
(1 +e
j
10
2j
)10
j+1
is obtained from look-up table II and sent
back to Mux1 for the next iteration. This can be expressed
by
(W
s
[j +1], W
c
[j +1]) = 10W[j] 2.3
10
j+1
log
10
(1 +e
j
10
j
)
e
j
= round(
W[j +1])
From the number of j (h +1)th to j (q +1)th iteration,
the value of 22.3 10 e
j
/ln(10) is obtained from
look-up table III and sent back to Mux1. This can be
expressed by
(W
s
[j +1], W
c
[j +1]) = 10W[j] 2.3 10 e
j
/ ln(10)
e
j+1
= round(
W[j +1])
After the (q +1)th clock cycle, all the digits e
j
are achieved
by the selection by rounding.
Stage 3, from the second to the (q +2)th clock cycles (in
iterations j 1 to j q +1): In the second clock cycle,
2-digit e
1
is concatenated with 9 and zeros, and it is shifted
1-digit to the right to achieve e
1
10
21
in a barrel shifter and
then selected by Mux4. Meanwhile, E[1] 1 is selected by
Mux5. The decimal signicand result E[2] of the rst
iteration is obtained in the (q +3)-digit decimal 4:2 CSA
compressor. This can be expressed by
(E
s
[2], E
c
[2]) = 1 +e
1
10
1
From the third to the (q +2)th clock cycles, the intermediate
value of e
j
E[ j ] out from a multiple logic (Mult2) is shifted
j-digit to the right to obtain e
j
E[ j ]10
2j
in a barrel shifter.
The value of E[ j ] is selected by Mux5 for the computation
of E[ j +1] in the next iteration. This can be expressed by
((e
j
E[j]10
j
)
s
, (e
j
E[j]10
j
)
c
) = e
j
E[j]10
j
(E
s
[j +1], E
c
[j +1]) = e
j
E[j]10
j
+E[j]
After the (q +2)th clock cycle, (q +3)-digit decimal
signicand of the DFP antilogarithm result is obtained.
Stage 4, in the (q +3)th clock cycle: The sum and carry of
(q +1)-digit MSDs of the fractional part of E
s
[ j ] and E
c
[ j ],
E
s
and E
c
frac
-0.85763088829868920 e
1
28.6,
m 2.3 v
frac
(m
s
7926448956923414740, m
c
0101000000001100100), h 9, g 2
W[j] W[ j] E[ j]
W
s
[1] 979.264489569234147400 In rst clock cycle (rst iteration)
W
c
[1] 001.010000000011001000
10W
s
[1] 792.644895692341474000
10W
c
[1] 010.100000000110010000 E
s
[1] 1.000000000000000000
22.3 log
10
(1 +e
1
10
21
) 10
2
+196.390551794005254100 E
c
[1] 0.000000000000000000
W
s
[2] 980 898.034346386456638100 (E[1]e
1
10
21
)
s
9.140000000000000000
W
c
[2] 011 101.101101100000100000 (E[1]e
1
10
21
)
c
+0.000000000000000000
W[2] 208 e
2
21 E
s
[2] 0.140000000000000000
10W
s
[2] 980.343463864566381000 E
c
[2] 0.000000000000000000
10W
c
[2] 011.011011000001000000 In second clock cycle (second iteration)
22.3 log
10
(1 +e
2
10
22
) 10
3
+010.039052425635195000
W
s
[3] 013 001.383426289192476000 (E[2]e
2
10
22
)
s
9.887488888888888888
W
c
[3] 000 000.010101001010100000 (E[2]e
2
10
22
)
c
+0.111111111111111111
W[3] +13 e
3
+1 E
s
[3] 9.038599999999999999
10W
s
[3] 013.834262891924760000 E
c
[3] 1.100000000000000000
10W
c
[3] 000.101010010101000000 In third clock cycle (third iteration)
22.3 log
10
(1 +e
3
10
23
) 10
4
+990.026217975671270000
W
s
[10] 021 002.199071104000000000 (E[9]e
9
10
29
)
s
9.999999984725084930
W
c
[10] 010 001.001011000000000000 (E[9]e
9
10
29
)
c
+0.000000011111110111
W[10] +31 e
10
+3 E
s
[10] 0.028782483451741196
10W
s
[10] 021.990711040000000000 E
s
[10] 0.110011011010010000
10W
c
[10] 010.010110000000000000 In 10th clock cycle (10th iteration)
22.3 10 e
10
/ln(10) +970.033680748675623892
W
s
[11] 019 001.933401788675623892
W
c
[11] 001 000.101100000000000000
W[11] +20 e
11
+2
10W
s
[11] 019.334017886756238920
10W
c
[11] 001.011000000000000000 In 11th clock cycle (11th iteration)
22.3 10 e
11
/ln(10) +980.022453832450415928
W
s
[17] 036 993.644904860602385560 E
s
[17] 9.027693494806399868
W
c
[17] 011 111.110100110111100000 E
c
[17] 1.111110000100001100
W[17] +47 e
17
+5 In 17th clock cycle (17th iteration)
In 18th clock cycle (E[17]e
17
10
217
s
)
s
9.999999999999999906
(E[17]e
17
10
217
)
c
+0.000000000000000100
E
s
[18] 0.138682384895290964
E
c
[18] 0.000111110011110010
In 19th clock cycle E
s
= .13868238489529096 r
d
v
int
= 0 R
exp
= v
int
16 = 16 E
= .1387934949064010 E
c
= .00011111001111001
R
sign
= 0 R
exp
= 16("111110000") R
significand
= 1387934949064010 +compound 1
addition
inc
284 IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289
& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cdt.2011.0089
www.ietdl.org
The e
1
generator is implemented straightforwardly
according to Table 1. The corresponding (q +g +3)-digit
2230 log
10
(1 +e
1
10
21
) is obtained from look-up table
I. Since g is equal to 2, 2 and 3 for three formats, the size
of look-up table I is (2
5
48)-bit, (2
5
84)-bit and
(2
5
160)-bit for Decimal32, Decimal64 and Decimal128,
respectively.
The multiple logic (Mult1) is applied to compute
m 2.3 v
frac
, where v
frac
is a (q +1)-digit negative value
in the range of 21 , m 0. The Mult1 is implemented
based on the partial product generation logic presented in
[25]. The multiples are formed by adding two of an initial
multiple set 23m (achieved by adding 25m and 2m in a 3:2
CSA counter) and 220m (achieved by shifting 1-digit to the
right of 22m). Both 2m and 5m can be generated with only
a few logic delays. The Boolean equations for generating
double and quintuple of the BCD number are presented in
[26]. To decrease the delay of the addition, two levels of
decimal CSA adders are implemented to develop multiples
m
s
and m
c
. The Boolean equation for computing 1-digit
decimal addition of the BCD number is presented in [26].
The signals cin1 and cin2 are generated to supplement the
LSD owing to the 9s complement conversion (22m and
25m). The signal cin1 and cin2 are added in the LSD and
the second LSD of the rst level of CSA adders, respectively.
4.2.2 Digit recurrence stage: Fig. 3 shows the details of
the hardware implementation of the digit recurrence stage
(stage 2).
In stage 2: The 3:2 decimal CSA compressor, applied to
achieve the residual (W
s
[ j ], W
c
[ j ]), is implemented by one
level of (q +g +3)-digit 3:2 CSA counter. Then, 1-digit
sign (S
s
, S
c
), 1-digit integer part(I
s
, I
c
) and 1-digit fraction
part (f
s
MSD
, f
c
MSB
) of the residual are sent to the rounding e
j
logic for selecting digits e
j
by rounding the residual (
W
s
[j],
W
c
[j]). The sign of the digit e
j
is obtained by the sign
detector block which is implemented based on the equation
sign = (S
0
s
S
c
) (I
3
s
^ I
0
s
^ I
c
)
The 1-digit fraction (f
s
MSD
, f
c
MSB
) and the value of 5 are added
together in the 1-digit decimal full adder to generate the signal
carry to determine the rounding operation. The signals
carry and sign are sent to the selection generator to achieve
a control signal sel of a 4-to-1 multiplexer. The value of |e
j
|
is achieved in four parallel full adders by adding the value of
0, 1, 6 and 5 with the signals of f
s
MSD
and f
c
MSB
, respectively
|e
j
| =
I
s
+I
c
+0 if sign = 0 ^ carry = 0
I
s
+I
c
+1 if sign = 0 ^ carry = 1
I
s
+I
c
+6 if sign = 1 ^ carry = 0
I
s
+I
c
+5 if sign = 1 ^ carry = 1
_
_
Thus, the digit e
j
is obtained by concatenating 1-bit sign with
1-digit |e|.
Fig. 2 Details of hardware implementation of Stage 1 in Fig. 1
Fig. 3 Details of hardware implementation of Stage 2 in Fig. 1
IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289 285
doi: 10.1049/iet-cdt.2011.0089 & The Institution of Engineering and Technology 2012
www.ietdl.org
The look-up table II stores all the (q +g +3)-digit
22.3 10
j+1
log
10
(1 +e
j
10
2j
), where j is in the range
of 1 j h. Since e
j
is in the range of 29 e
j
9, there
are 18 different values (except for the value when e
j
0)
that need to be stored in the look-up table for each iteration.
Since h are equal to 4, 9 and 18 and g are equal to 2, 2 and
3 for three formats, the size of look-up table II is
(2
6
48)-bit, (2
8
84)-bit and (2
9
160)-bit for
Decimal32, Decimal64 and Decimal128, respectively. In
order to reduce the size and delay of look-up table II,
(q +g +3)-digit 22.3 10
j+1
log
10
(1 +e
j
10
2j
) can be
efciently reallocated in the multiple tables. For Decimal64
shown in Fig. 3, the single look-up table II is relocated
into two parts in which the rst part (TabII 1) stores all the
values of 22.3 10
j+1
log
10
(1 +e
j
10
2j
), when
2 j 9 and e
j
+1; the second part (TabII 2) stores the
values when 2 j 9, and 2 e
j
9 and 29 e
j
22.
The sizes of TabII 1 and TabII 2 are (2
4
84) and
(2
7
84), respectively. Thus, the optimised size of look-
up table II is reduced from 2.64 to 1.48 kB. Look-up table
III stores 19 values of (q +g +3)-digit 22.3 e
j
/
ln(10) 10, thus it is implemented by a size of 2
5
84-bit
look-up table. Thus, the total optimised size of the
look-up tables is about 2.14 kB for Decimal64. The
implementations of address generators to address look-up
table II and look-up table III based on the values of j and
e
j
are straightforward.
4.2.3 Antilogarithm computation stage: Fig. 4 shows
the details of the hardware implementation of the
antilogarithm computation stage (stage 3).
In stage 3: The multiple logic (Mult2) is applied to
compute E
s
[ j ]e
j
+E
c
[ j ]e
j
, where e
j
is in the range of
29 e
j
9. The multiple of E
s
[ j ]e
j
is formed by adding
two of an initial multiple set m
, 2m
, 2m
, 22m
, 5m
,
25m
, 10m
, 210m
s com(E
c
[j]
i
^ e
3:0
j
) if sign(e
j
) = 1
_
Thus, the E[ j ]e
j
((e
j
E[ j ])
s
, (e
j
E[ j ])
c
) are achieved by adding
the E
s
[ j ]e
j
and E
c
[ j ]e
j
in a decimal CSA adder. Finally, the
(E[ j ]e
j
10
2j
)
s
and (E[ j ]e
j
10
2j
)
c
are obtained in a decimal
barrel shifter. The 4:2 decimal CSA compressor, applied to
add the E[ j ] (E
s
[ j ], E
c
[ j ]) and E[ j ]e
j
10
2j
((E[ j ]e
j
10
2j
)
s
,
(E[ j ]e
j
10
2j
)
c
) together is implemented by two levels of
(q +3)-digit 3:2 CSA counters.
4.2.4 Final processing stage: Fig. 5 shows the details of
the hardware implementation for the nal processing stage
(stage 4).
Fig. 4 Details of hardware implementation of Stage 3 in Fig. 1
286 IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289
& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cdt.2011.0089
www.ietdl.org
In stage 4: The decimal compound adder is implemented
based on the conditional speculative method [27]. A prex
tree is implemented based on the binary KoggeStone
network [28]. The additions of E
s
i
+E
c
i
and E
s
i
+E
c
i
+1
can be implemented using three binary half adders and a
binary full adder connected as a ripple carry chain. The
logic for adding the value of 6 is used to compensate the E
j
in a very-
high radix combined division/square-root unit with scaling
and selection by rounding, IEEE Trans. Comput., 1998, 47, (2),
pp. 152161
15 Antelo, E., Lang, T., Bruguera, J.: High-radix CORDIC rotation based
on selection by rounding, J. VLSI Signal Process. Syst., 2000, 25, (2),
pp. 141153
16 Pineiro, A., Ercegovac, M.D., Bruguera, J.D.: High-radix logarithm
with selection by rounding: algorithm and implementation, J. VLSI
Signal Process. Syst., 2005, 40, (1), pp. 109123
17 Chen, D., Han, L., Choi, Y., Ko, S.: Improved decimal oating-point
logarithmic converter based on selection by rounding, IEEE Trans.
Comput., 2012, 61, (5), pp. 607621
18 Chen, D., Zhang, Y., Teng, D., Wahid, K., Lee, M.H., Ko, S.-B.: A new
decimal antilogarithmic converter. IEEE Symp. on Circuit and System
(ISCAS09), 2009, pp. 445448
19 Vazquez A., Villalba J., Antelo E., Zapata E.L.: Redundant oating-
point decimal CORDIC algorithm, IEEE Trans. Comput., PrePrint,
2012
20 Kaivani A., Jaberipur G.: Decimal cordic rotation based on selection
by rounding: algorithm and architecture, Comput. J., 2011, 54, (11),
pp. 17981809
Table 6 Comparison results Decimal64 antilogarithmic converter with other designs
Works Cycle time (FO4) Cycles (No.) Latency (FO4) Ratio Area (NAND) Ratio ROM (kB)
proposed 28.0 19 532.0 1.00 29 325 1.00 2.14
proposed
a
28.0 18 504.0 0.95 26 197 0.89 2.14
previous [18]
a
110.0 18 1980.0 3.72 14 073 0.48 2.14
CORDIC [20]
b
34.62 35 1211.7 2.40 18 826 0.64 4.50
CORDIC [19] 13.0 200 2600.0 4.89 N/A N/A N/A
software [10] 23.0 1060 24 380 45.8 N/A N/A N/A
Software library running at Intel Core(TM) 2 Quad @ 2.66 GHz
a
16-digit DXP antilogarithmic converter
b
16-digit DXP CORDIC unit
288 IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289
& The Institution of Engineering and Technology 2012 doi: 10.1049/iet-cdt.2011.0089
www.ietdl.org
21 Cowlishaw, M.F.: Densely packed decimal encoding, J. IEE Comput.
Digit. Tech., 2002, 149, (3), pp. 102104
22 Cornea, M., Harrison, J., Anderson, C., Tang, P.T.P., Schneider, E.,
Gvozdev, E.: A software implementation of the IEEE 754R decimal
oating-point arithmetic using the binary encoding format, IEEE
Trans. Comput., 2009, 58, (2), pp. 148162
23 Lefevre, V., Muller, J.M., Tisserand, A.: Toward correctly
rounded transcendentals, IEEE Trans. Comput., 1998, 47, (11),
pp. 12351243
24 Oklobdzija, V.G.: An algorithmic and novel design of a leading zero
detector circuit: comparison with logic synthesis, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., 1994, 2, (1), pp. 124128
25 Lang, T., Nannarelli, A.: A radix-10 combinational multiplier. IEEE
Asilomar Conf. on Signals, Systems and Computers (ACSSC06),
2006, pp. 313317
26 Erle, M.A., Schulte, M.J.: Decimal multiplication via carry-save
addition. 14th IEEE Int. Conf. on Application-Specic Systems,
Architectures, and Processors (ASAP03), 2003, pp. 348358
27 Vazquez, A., Antelo, E.: Conditional speculative decimal addition.
Seventh Conf. on Real Numbers and Computers (RNC 7), 2006,
pp. 4757
28 Kogge, P.M., Stone, H.S.: A parallel algorithm for the efcient solution
of a general class of recurrence equations, IEEE Trans. Comput., 1973,
C-22, (8), pp. 786793
29 Deschamps, J.-P., Bioul, G.J.A., Sutter, G.D.: Synthesis of
arithmetic circuits: FPGA, ASIC and embedded systems (Wiley,
2006, 1st edn.)
30 STMicroelectronics, 90 nm CMOS090 Design Platform, 2007
31 Sutherland, I., Sproull, R., Harris, D.: Logical effort: designing fast
CMOS circuits (Morgan Kaufmann, 1999, 1st edn.)
32 Intel Corporation, Using decimal oating-point with Intel C++
compiler, http://software.intel.com/en-us/articles/using-decimal-oating-
point-with-intel-c-compiler, 2010
33 Pineiro, A., Ercegovac, M.D., Bruguera, J.D.: Algorithm and
architecture for logarithm, exponential, and powering computation,
IEEE Trans. Comput., 2004, 53, (9), pp. 10851096
IET Comput. Digit. Tech., 2012, Vol. 6, Iss. 5, pp. 277289 289
doi: 10.1049/iet-cdt.2011.0089 & The Institution of Engineering and Technology 2012
www.ietdl.org