Real/Complex Logarithmic Number System ALU

A Real/Complex Logarithmic Number
System ALU
Mark G. Arnold, Member, IEEE, and Sylvain Collange
AbstractThe real Logarithmic Number System (LNS) offers fast multiplication but uses more expensive addition. Cotransformation
and higher order table methods allow real LNS ALUs with reasonable precision on Field-Programmable Gate Arrays (FPGAs). The
Complex LNS (CLNS) is a generalization of LNS, which represents complex values in log-polar form. CLNS is a more compact
representation than traditional rectangular methods, reducing bus and memory cost in the FFT; however, prior CLNS implementations
were either slow CORDIC-based or expensive 2D-table-based approaches. Instead, we reuse real LNS hardware for CLNS, with
specialized hardware (including a novel logsin that overcomes singularity problems) that is smaller than the real-valued LNS ALU to
which it is attached. All units were derived from the Floating-Point-Cores (FloPoCo) library. FPGA synthesis shows our CLNS ALU is
smaller than prior fast CLNS units. We also compare the accuracy of prior and proposed CLNS implementations. The most accurate of
the proposed methods increases the error in radix-two FFTs by less than half a bit, and a more economical FloPoCo-based
implementation increases the error by only one bit.
Index TermsComplex arithmetic, logarithmic number system, hardware function evaluation, FPGA, fast Fourier transform, VHDL.
1 INTRODUCTION
T
HE usual approach to complex arithmetic works with
pairs of real numbers that represent points in a
rectangular coordinate system. To multiply two complex
numbers, denoted in this paper by upper-case variables,

A
and

Y , using rectangular coordinates involves four real
multiplications:
1[

A

Y [ = 1[

A[ 1[
Y [ [
A[ [
Y [ (1)
[

A

Y [ = 1[

A[ [
Y [ [
A[ 1[
Y [. (2)
where 1[

A[ is the real part and[

A[ is the imaginary part of a
complex value,

A. The bar indicates that this is an exact
linearly represented unbounded precision number. There
are many implementation alternatives for the real arithmetic
in (1) and (2): fixed-point, Floating-Point (FP), or more
unusual systems like the scaled Residue Number System
(RNS) [7] or the real Logarithmic Number System(LNS) [17].
Swartzlander et al. [17] analyzed fixed-point, FP, and LNS
usage in a Fast Fourier Transform (FFT) implemented with
rectangular coordinates, and found several advantages in
using logarithmic arithmetic to represent 1[
A[ and [

A[.
Several hundred papers [24] have considered conventional,
real-valued LNS; most have found some advantages for low-
precision implementation of multiply-rich algorithms, like
(1) and (2).
This paper considers an even more specialized number
system in which complex values are represented in log-
polar coordinates. The advantage is that the cost of complex
multiplication and division is reduced even more than in
rectangular LNS at the cost of a very complicated addition
algorithm. This paper proposes a novel approach to reduce
the cost of this log-polar addition algorithm by using a
conventional real-valued LNS ALU, together with addi-
tional hardware that is less complex than the real-valued
LNS ALU, to which it is attached.
Arnold et al. [4] introduced a generalization of logarith-
mic arithmetic, known as the Complex Logarithmic
Number System (CLNS), which represents each complex
point in log-polar coordinates. CLNS was inspired by a
nineteenth century paper by Mehmke [14] on manual usage
of such log-polar representation, and shares only the polar
aspect with the highly-unusual complex-level-index repre-
sentation [18]. The initial CLNS implementations, like the
manual approach, were based on straight table lookup,
which grows quite expensive as precision increases, since
the lookup involves a 2D table.
Cotransformation[4] cuts the cost of CLNSsignificantlyby
converting difficult cases for addition and subtraction into
easier cases. In CLNS, it is easy to break a complex value into
two parts such that

A =

A
1

A
2
. Incrementation of the
complex value can be described in terms of these two parts:
1

A = 1

A
1

A
2

A
1

A
1
= (1

A
1
) (
A
1
(
A
2
1))
= (1

A
1
) 1
A
1
(
A
2
1)
1

A
1
_ _
.
The CLNS representations of (1

A
1
) and (
A
2
1) can be
obtained from smaller tables and the incrementation of
A
1
(
A
2
1)
1

A
1
is easier. Although this saves memory, the overall
table sizes are still large.
Lewis [13] overcame such large area requirements in the
design of a 32-bit CLNS ALU by using a CORDIC algorithm,
which is significantly less expensive; however, the imple-
mentation involves many steps, making it rather slow.
Despite the implementation cost, CLNS may be preferred
in certain applications. For example, Arnold et al. [2], [3]
202 IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 2, FEBRUARY 2011
. M.G. Arnold is with the Computer Science and Engineering Department,
Lehigh University, Bethlehem, PA 18015. E-mail: marnold@cse.lehigh.edu.
. S. Collange is with ELIAUS, Universite de Perpignan, 52 av. Paul Alduy,
66860 Perpignan Cedex, France.
Manuscript received 12 Aug. 2009; accepted 22 Feb. 2010; published online
18 June 2010.
Recommended for acceptance by J. Bruguera, M. Cornea, and D. Das Sarma.
For information on obtaining reprints of this article, please send e-mail to:
tc@computer.org, and reference IEEECS Log Number TCSI-2009-08-0381.
Digital Object Identifier no. 10.1109/TC.2010.154.
0018-9340/11/$26.00 2011 IEEE Published by the IEEE Computer Society
showed that CLNS is significantly more compact than a
comparable rectangular fixed-point representation for a
radix-two FFT, making the memory and busses in the system
much less expensive. These savings counterbalance the extra
cost of the CLNS ALU, if the precision requirements are low
enough. Vouzis et al. [20] analyzed a CLNS approach for
Orthogonal Frequency Division Multiplexing (OFDM) de-
modulation of Ultrawideband (UWB) receivers. To reduce
the implementation cost of such CLNS applications, Vouzis
and Arnold [19] proposed using a Range-Addressable
Lookup Table (RALUT), similar to [15].
2 REAL-VALUED LNS
The format of the real-valued LNS [16] has a base-/ (usually,
/ = 2) logarithm (consisting of /
1
-signed integer bits and
)
1
fractional bits) to represent the absolute value of a real
number and often an additional sign bit to allow for that
real to be negative. We can describe the ideal logarithmic
transformation and the quantization with distinct notations.
For an arbitrary nonzero real, r, the ideal (infinite precision)
LNS value is r
1
= log
/
[ r[. The exact linear r has been
transformed into an exact but nonlinear r
1
, which is then
quantized as ^ r
1
= r
1
2
)
1
0.5|2
)
1
. In analogy to the
FP number system, the /
1
integer bits behave like an
FP exponent (negative exponents mean smaller than unity;
positive exponents mean larger than unity); the )
1
fractional
bits are similar to the FP mantissa. Multiplication or
division simply requires adding or subtracting the loga-
rithms and exclusive-ORing the sign bits.
To compute real LNS sums, a conventional LNS ALU
computes a special function, known as :
/
(:) = log
/
(1 /
:
),
which, in effect, increments the LNS representation by 1.0.
(The : reminds us this function is only used for sums when
the sign bits are the same.) The LNS addition algorithm
starts with r
1
= log
/
[ r[ and y
1
= log
/
[ y[ already in LNS
format. The result, t
1
= log
/
[ r y[ is computed as t
1
=
y
1
:
/
(r
1
y
1
), which is justified by the fact that
t = r y = y( r, y 1), even though such a step never

occurs in the hardware. Since r y = y r, we can always
choose :
1
= [r
1
y
1
[ [16] by simultaneously choosing the
maximum of r
1
and y
1
. In other words, t
1
= max(r
1
. y
1
)
:
/
([r
1
y
1
[), thereby restricting : < 0.
Analogously to :
/
, there is a similar function to compute
the logarithm of the differences with t
1
= max(r
1
. y
1
)
d
/
([r
1
y
1
[), where d
/
(:) = log
/
[1 /
:
[. The decision
whether to compute :
/
or d
/
is based on the signs of the
real values.
Early LNS implementations [16] used ROM to lookup ^ :
/
and
^
d
/
achieving moderate ()
1
= 12 bits) accuracy. More
recent methods [8], [9], [12], [21], [22] can obtain accuracy
near single-precision floating point at reasonable cost. The
goal here is to leverage the recent advances made in real-
valued LNS units for the more specialized context of CLNS.
3 COMPLEX LOGARITHMIC NUMBER SYSTEM
CLNS uses a log-polar approach for

A ,= 0 in which we
define, for hardware simplicity,
log
/
(
A) = A = A
1
A
0
i. (3)
where A
1
= log
/
(
1[

A[
2
[

A[
2
_
) is the logarithm(base /) of
the length of a vector in the complex plane and _ A
0
<
is the angle of that vector.
1
This can be converted back exactly
to rectangular form:
1[
A[ = /
A
1
cos(A
0
).
[
A[ = /
A
1
sin(A
0
).
(4)
In a practical implementation, both the logarithm and angle
will be quantized:
^
A
1
= A
1
2
)
1
0.5|2
)
1
.
^
A
0
= A
0
2
)
0
2
, 0.5|2
)
0
.
(5)
We scale the angle by 4, so that the quantization near the
unit circle will be roughly the same in the angular and
radial axes when )
1
= )
0
while allowing the complete circle
of 2 radians to be represented by a power of two. The
rectangular value,
~
A, represented by the quantized CLNS
has the following real and imaginary parts:
1[
~
A[ = /
^
A
1
cos(,4
^
A
0
).
[
~
A[ = /
^
A
1
sin(,4
^
A
0
).
(6)
To summarize our notation, an arbitrary precision complex
value

A ,= 0 is transformed losslessly to its ideal CLNS
representation, A. Quantizing A to
^
A produces an
absolute error perceived by the user in the rectangular
system as [
A
~
A[.
Complex multiplication and division are trivial in CLNS.
Given the exact CLNS representations, A and Y , the result
of multiplication, 7 = A Y , can be computed by two
parallel adders:
7
1
= A
1
Y
1
.
7
0
= (A
0
Y
0
) mod 2.
(7)
The modular operation for the imaginary part is not strictly
necessary, but reduces the range of angles that the
hardware needs to accept. In the quantized system, this
reduction comes at no cost in the underlying binary adder
because of the 4, scaling.
The conjugate of the value is represented by the
conjugate of the representation, which can be formed by
negating 7
0
. The negative can be formed by adding to 7
0
.
Division is like multiplication, except the representations
are subtracted.
The difficult issue for all variations of LNS is addition
and subtraction. Like conventional LNS, CLNS utilizes
A

Y =

Y (
7 1), where

7 =

A,

Y . This is valid for all
nonzero complex values, except when

A =
Y , a case that
can be handled specially, as in real LNS.
The complex addition logarithm, o
/
(7) is a function [14]
that accepts a complex argument, 7, the log-polar repre-
sentation of

7, and returns the corresponding representa-
tion of

7 1. Thus, to find the representation, T, of the sum,
T =

A

Y simply involves
T = Y o
/
(A Y ). (8)
which is implemented by two subtractors, a complex
approximation unit, and two adders:
ARNOLD AND COLLANGE: A REAL/COMPLEX LOGARITHMIC NUMBER SYSTEM ALU 203
1. Unless / = c, log
/
in this paper is defined slightly differently than the
typical definition ln(
A), ln(/) used in complex analysis.

T
1
= Y
1
1[o
/
(A Y )[.
T
0
= (Y
0
[o
/
(A Y )[) mod 2.
(9)
The major cost in this circuit is the complex o
/
unit.
Subtraction simply requires adding to Y
0
before
performing (8).
4 NOVEL CLNS ADDITION ALGORITHM
The complex addition logarithm, o
/
(7), has the same
algebraic definition as its real-valued (:
/
(:)) counterpart,
and simple algebra on the complex 7 = 7
1
i7
0
reveals the
underlying polar conversion required to increment a CLNS
value by one:
o
/
(7) = log
/
(1 /
7
)
= log
/
(1 cos(7
0
)/
7
1
i sin(7
0
)/
7
1
).
(10)
This complex function can have its real and imaginary parts
computed separately. In all of the prior CLNS literature [2],
[3], [13], the real part is derived from (10) as:
1(o
/
(7)) =
log
/
(1 2 cos(7
0
)/
7
1
/
27
1
)
2
. (11)
and the imaginary part is derived from (10) as:
(o
/
(7)) = arctan(1 cos(7
0
)/
7
1
. sin(7
0
)/
7
1
). (12)
where arctan(r. y) is the four-quadrant function, like
atan2(y,x) in many programming languages.
In the prior literature, (11) and (12) are computed directly
from the complex 7, either using fast, but costly, simple
lookup [20], or less expensive methods like cotransforma-
tion [4], CORDIC [13], or content-addressable memory [19].
The proposed approach uses a different derivation in which
the real and imaginary parts are manipulated separately.
Unlike the prior literature, in the proposed technique, the
real values in the intermediate computations are repre-
sented as conventional LNS real values; the angles remain
as scaled fixed-point values at every step. The advantages of
the proposed technique are lower cost and the ability to
reuse the same hardware for both real- and complex-LNS
operations. In order to use this novel approach, a different
derivation from (10) is required, while still considering this
a single complex function of a complex variable:
o
/
(7) = log
/
_
1 sgncos(7
0
)/
logcos(7
0
)
/
7
1
i sgnsin(7
0
)/
logsin(7
0
)
/
7
1
_
.
(13)
where sgncos(7
0
) and sgnsin(7
0
) are the signs of the
respective real-valued trigonometric functions, and where
logcos(7
0
) = log
/
[ cos(7
0
)[ and logsin(7
0
) = log
/
[ sin(7
0
)[ are
real-valued functions compatible with a real-valued LNS
ALU. For simplicity, (13) ignores the trivial cases, when 7
0
is
a multiple of ,2, which causes sin(7
0
) or cos(7
0
) to be zero.
These cases are resolved in the Appendix and reported in
Table 1. In order to calculate the complex-valued function
(13) using a conventional real LNS, there are two cases that
choose whether the 1 operation is implemented with the
real :
/
function or the real d
/
function.
Case 1. (,2 < 7
0
< ,2) is when 1 means a sum
(:
/
) is computed:
o
/
(7) = log
/
__
1 /
logcos(7
0
)7
1
_
i sgnsin(7
0
)/
logsin(7
0
)7
1
_
= log
/
_
/
:
/
(logcos(7
0
)7
1
)
isgnsin(7
0
)/
logsin(7
0
)7
1
_
.
(14)
Case 2. (7
0
< ,2 or 7
0
,2) is when 1 means a
difference (d
/
) is computed:
o
/
(7) = log
/
__
1 /
logcos(7
0
)7
1
_
isgnsin(7
0
)/
logsin(7
0
)7
1
_
= log
/
_
/
d
/
(logcos(7
0
)7
1
)
isgnsin(7
0
)/
logsin(7
0
)7
1
_
.
(15)
Uptothis point, insteadof givingseparate real andimaginary
parts, we have described a complex function equivalent to
(10) in a novel way that eventually will be amenable to being
computed with a conventional real LNS ALU. Of course, to
use such an ALU, it will be necessary to compute the real and
imaginary parts separately. This derivation, which is quite
involved, is given in the Appendix.
5 IMPLEMENTATION
Fig. 1 shows a straightforward implementation of
(13)more precisely its real (26) and imaginary (34) parts
TABLE 1
Table of Required CLNS ALU Sign Logic
derived in the Appendix. The operation of the sign logic
unit is summarized in Table 1, which takes into account the
special cases described in the Appendix. The sign logic
unit controls the Four-Quadrant (4Q) Correction Unit, the
:
/
,d
/
unit, and the output Multiplexor (Mux). This logic
operates in two modes: complex mode, which is the focus
of this paper, and real mode (which uses the :
/
/d
/
unit to
perform traditional LNS arithmetic). In real mode, 7
0
= 0 is
used to represent a positive sign, and 7
0
= is used to
represent a negative sign. Given the angular quantization,
the LNS sign bit corresponds to the most significant bit of
^
7
0
, with the other bits masked to zero.
In order to minimize the cost of CLNS hardware
implementation, we exploit several techniques to transform
the arguments of the function approximation units into
limited ranges where approximation is affordable. Assum-
ing / = 2, the reduction for the addition logarithm is typical
of real LNS implementations:
:
2
(r) =
0. r < 2
n
c:
.
log
2
(1 2
r
). 2
n
c:
< r _ 0.
r :
2
(r). 0 < r < 2
n
c:
.
r. r _ 2
n
c:
.
_
_
(16)
where n
c:
_ /
1
describes the wordsize needed to reach the
essential zero, [16] which is the least negative value that
causes log
2
(1 2
r
) to be quantized to zero. We use similar
range reduction for the subtraction logarithm when : is far
from zero, together with cotransformation [21] for : near
zero. The scaled arctangent is nearly equal to :
2
for negative
arguments (only because we choose / = 2 and 4, scaling),
although the treatment of positive arguments reflects the
asymptotic nature of the arctangent:
o
2
(r) =
0. r < 2
n
c:
.
4,arctan(2
r
). 2
n
c:
< r _ 0.
2 o
2
(r). 0 < r < 2
n
c:
.
2. r _ 2
n
c:
.
_
_
(17)
The range reduction for the trigonometric functions
involves one case for each octant:
c
2
(r) =
c
2
(r). r < 0.
logcos(,4r). 0 _ r < 1.
d
2
(2 c
2
(2 r)),2. 1 _ r < 2.
2
n
c:
1
. r = 2.
d
2
(2 c
2
(r 2)),2. 2 < r < 3.
c
2
(4 r). 3 _ r < 4.
c
2
(r 8). r _ 4.
_
_
(18)
We can reuse (18) to yield logsin(,4r) = c
2
(r 2) as well
as logcos(,4r) = c
2
(r). Instead of , the output at the
singularity, 2
n
c:
1
is large enough to trigger essential zero in
the later units, but is small enough to avoid overflow.
6 DIRECT FUNCTION LOOKUP
There are several possible hardware realizations for (26)
and (34), which appear in the Appendix. When )
1
and )
0
are small, it is possible to use a direct table lookup for ^ :
/
,
^
d
/
, ^ o
/
, and ^ c
/
. This implementation allows round-to-nearest
results for each functional unit, reducing roundoff errors in
(26) and (34). Fig. 2a shows the errors in computing the
real part (26) as a function of 7
1
and 7
0
with such round-
to-nearest tables when )
1
= )
0
= 7. These errors have a
Root-Mean-Squared (RMS) value around 0.003. Fig. 2b
shows errors for the imaginary part (34), with an RMS
value around 0.002. The errors in these figures spike
slightly near 7
1
= 0, 7
0
= because of the inherent
singularity in o
/
(7). In contrast to later figures, here the
only noticeable error spike occurs near the singularity.
The memory requirements for each of the ^ :
/
,
^
d
/
, and
^ o
/
direct lookup tables (with a
^
7
1
input) is 2
n
c:
)
1
words,
assuming reduction rules like (16) and (17), which allow
the interval spanned by these tables to be as small as
(2
n
c:
. 0). The trigonometric tables, with
^
7
0
inputs, require
2
2)
0
words because 7
0
is reduced to 0 _
^
7
0
< 4 (equivalent
to 0 _ 7
0
< ) and two integer bits are sufficient to cover
this dynamic range. Instead of the more involved (18) rule
used in later sections, c
/
(r) = c
/
(r) suffices for direct
lookup.
Since there are two instances of trigonometric tables and
four instances of other tables in Fig. 1, the total number of
words required for na ve direct lookup is 2
)n
c:
2
2
)3
assuming ) = )
1
= )
0
. Table 2 shows the number of words
required by prior 2D direct lookup methods [3], [19]
compared to our proposed 1D direct lookup implementa-
tion of (26) and (34). Savings occur for ) _ 5, and grow
exponentially more beneficial as ) increases.
For ) _ 8, the cost of direct implementation grows
exponentially, although at a slower rate than the prior
methods. One approach to reduce table size is to use
Fig. 1. CLNS ALU with complex and real modes.
interpolation on a smaller table. The cost of a simple linear
interpolation unit is related to the size of the region of the
interpolation and to the absolute maximum value of the
second derivative in that region. Table 3 gives the range and
the derivatives for the required functions. The d
/
function is
not considered as it may be computed via cotransformation
[21]. It is clear that logsin has a singularity similar to d
/
, which
is the reason we avoid evaluating it directly, and instead,
have chosen to use the relationship cos(r ,2) = sin(r) =
1 cos
2
(r)
_
in (18). Because of the quantization of ^ c
2
, this
approximation is imperfect very near the singularity. In
Section 7.7, we introduce a novel approach to deal with the
singularity of logsin, but for the moment we will use (18).
The absolute maximum value (~1.78) of the second
derivative for logcos in its restricted range is large;
however, the fact that the range is restricted to 0 _ r _ 1
mitigates this. It is well known [12] that the absolute
maximum value (~0.17) of the second derivative of :
/
is
modest; however, this table needs to cover a much larger
range than logcos . The absolute maximum value (~0.15) of
the arctangent exponential function is about the same as :
2
and its range is equally large. These values predict that
interpolation from a uniformly spaced table will be the best
choice for c
/
, but would require significant memory for :
/
and o
/
. Instead, it would be more effective to optimize the :
/
and o
/
approximations into subdomains, as explained in the
next section. Such subdomains are effective for :
/
and o
/
(because their absolute second derivatives span many
binades from ~0.0 to over 0.1), but are not effective for c
/
(because its absolute second derivative only varies one
binade from
2
,(16 ln(/)) to
2
,(8 ln(/))).
7 OPTIMIZED FUNCTION UNIT SYNTHESIS
The novel CLNS algorithm requires several special function
units, which should be optimized to minimize the cost of
the units, especially as the precision, ), increases. These
units were generated using enhanced versions of existing
tools to simplify generating and optimizing synthesizable
VHDL for these units.
7.1 FPLibrary
FPLibrary is an implementation of floating point and LNS
operators for FPGAs developed by Detrey and De Dinechin
[8]. It provides addition, multiplication, division, andsquare-
root operators for both systems, with a user-defined
precision, scaling up to IEEE single precision. LNS addition
and subtraction are implemented using multipartite tables,
generated in VHDL for every combination of parameters. It
was later extended by Vouzis et al. to perform the LNS
subtraction using cotransformation [21].
7.2 HOTBM
A method for function evaluation, using tables and
polynomial interpolations, called Higher Order Table-
Based Methods (HOTBM) was developed by Detrey and
De Dinechin [9]. The input domain of the function to be
evaluated is split into regular intervals, and the function
is approximated by a minimax polynomial inside each
interval. Polynomials are evaluated as a sum of terms, each
term being computed either by a powering unit and a
multiplier or a table. The size of each component (table,
multiplier, and powering unit) is tuned to balance the
rounding error and avoid unnecessary computations. As an
implementation of this method, the authors also provide a
tool written in C++ which generates hardware function
evaluators in synthesizable VHDL. An exhaustive search in
the parameter space is performed during generation to
estimate the most area- and delay-efficient design.
7.3 FloPoCo
FloPoCo, for Floating-Point Cores, is intended to become a
superset of both FPLibrary and HOTBM. Like HOTBM, it
consists of a C++ program generating VHDL code [10]. In
addition to the usual floating-point and fixed-point
arithmetic operators, it can generate more exotic operators,
such as long accumulators [11] or constant multipliers [6]. It
aims at taking advantage of FPGA flexibility in order to
increase efficiency and accuracy compared to a na ve
translation of a software algorithm to an FPGA circuit.
Although it includes an implementation of HOTBM for
fixed-point function evaluation and a whole range of
floating-point operators, the LNS part of FPLibrary was
not ported to FloPoCo by its developers.
TABLE 2
Memory for Prior and Novel Direct Methods
Fig. 2. Error in o
2
(7) using direct lookup with )
1
= )
0
= 7. (a) Real part, (b) Imaginary part.
7.4 A CLNS Coprocessor
We extended the FloPoCo library with real LNS arithmetic.
To perform the LNS addition, the input domain of :
/
is split
into three subdomains: [2
n
c:
. 8[, [8. 4[, and [4. 0[,
where n
c:
is such that :
/
evaluates to zero in [ . 2
n
c:
[.
This is done to account for the exponential nature of :
/
(:)
for smaller values of :. The same partitioning was used in
FPLibrary. We then use an HOTBM operator to evaluate :
/
inside each interval. This partitioning scheme allows
significant area improvements compared to a regular
partitioning as performed by HOTBM.
Likewise, the domain of d
/
is split into intervals. The
lower range [2
n
c:
. t[ is evaluated using several HOTBM
operators just like :
/
. As we get closer to the singularity of d
/
near 0, the function becomes harder to evaluate using
polynomial approximations. The upper range [t. 0[ is
evaluated using cotransformation. This approach is similar
to the one used by Vouzis et al. in FPLibrary [21].
We used this extended FloPoCo library to generate the
VHDL code for the functional units in a CLNS ALU at
various precisions. One component was generated for
each function involved in the CLNS addition. Real LNS
functions :
/
and d
/
are generated by the extended
FloPoCo. The function logcos(,4r) is implemented as
an HOTBM component, and 4,arctan(2
r
) is evaluated
using several HOTBM components with the same input
domain decomposition as :
/
.
Given the complexity of HOTBM and the influence of
synthesizer optimizations, it is difficult to estimate precisely
the best value of the d
/
threshold t, cotransformation
parameter ,, and HOTBM order using simple formulas.
We synthesized the individual units for a Virtex-4 LX25
with Xilinx ISE 10.1 using various parameters. For each
precision tested, the combination of parameters yielding the
smallest area on this configuration was determined. Table 4
sums up the parameters obtained. We plan to automate this
search in the future by using heuristics relying on the
latency/area estimation framework integrated in FloPoCo.
7.5 Time/Area Trade-Off
The synthesis results are listed in detail in Tables 5 and 6,
where c
:
, c
:,d
, c
c
, and c
o
are the areas of the respective
units, and t
:
, t
:,d
, t
c
, and t
o
are the corresponding delays.
One implementation option would be to follow the
combinational architecture in Fig. 1, in which case the area
is roughly c
:
2c
:,d
c
c
c
o
because one d
/
together with
o
/
implements both logcos and logsin. (We neglect the
insignificant area for four fixed-point adders, a mux, and
the control logic.) The delay is 2t
:,d
t
c
max(t
o
. t
:
).
An alternative, which saves area (c
:,d
c
c
c
o
), uses
three cycles to complete the computation (so that the same
hybrid unit may be reused for all three :
/
,d
/
usages). The
total area is less than double the area c
:,d
of a stand-alone
real LNS unit. The delay is roughly 3t
:,d
, since the hybrid
unit has the longest delay. Despite using only about half
the area of the combinational alternative, the three-cycle
approach produces the result in about the same time as the
combinational option.
It is hard to make direct comparisons, but the CORDIC
implementation of CLNS reported in [13] uses 10 stages.
Some of those stages have relatively large multipliers, similar
to what is generated for our circuit by FloPoCo. If such
CORDIC stages have a delay longer than 0.3 of our proposed
clock cycle, our approach would be as fast or faster than [13].
7.6 Accuracy
Fig. 3a shows the errors in computing the real part (26) as a
function of 7
1
and 7
0
using the ) = 7 function units
generated by FloPoCo. These errors have an RMS value
around 0.007. This is larger than the errors when computing
the direct lookup with the round-to-nearest method shown
in Fig. 2a, but overall the errors appear uniform except, as
before, near the singularity. As such, this may be acceptable
for some applications. In contrast, Fig. 3b shows errors for the
imaginary part (34), which are significantly more than the
errors for direct lookup round-to-nearest shown in Fig. 2b.
These errors have an RMS value around 0.008.
To determine the cause of the much larger error,
simulations were run, in which each of the function units
generated by FloPoCo was replaced with round-to-nearest
units. When both logsin and logcos are computed with
round-to-nearest, the imaginary result has smaller errors
which are much closer to those of Fig. 2b. Using sin(r) =
1 cos
2
(r)
_
contributes to the extra errors, which are then
magnified by the slightly larger roundoff errors from all the
other FloPoCo units.
7.7 Novel Sine Approximation
The source of this additional error visible in Fig. 3b is the
quantization of logcos as ^ c
/
in (18). Extra guard bits could
TABLE 4
Parameters Used for Synthesis
TABLE 3
Table of Functions and Derivatives
help, but this, in turn, would require extra guard bits in all
the approximation units, including the very costly one for
^
d
/
. We consider the cost of such a solution to be prohibitive.
Although inexpensive methods [1] for dealing with
logsin(2
r
) have been proposed, they are not applicable here.
Instead, we need an alternative way to compute logsin(r)
for r ~ 0. The novel approach, which we propose, is based
on the simple observation that for small r,
sin(r) ~ r. (19)
The values of r, which are considered small enough to be
approximated this way, will depend on the precision, ). The
CLNS hardware uses a quantized angle, which means (19)
can be restated as:
logsin(,4 r) ~ log
/
(r) log
/
(,4). (20)
We wish to implement this without adding any additional
tables, delay, or complexity to the hardware (aside from a
tiny bit of extra logic), which means we rule out evaluating
log
/
(r) directly. Instead, we use a novel logarithm approx-
imation [4] that only needs d
/
hardware plus a few trivial
adders, and which is moderately accurate in cases like this,
where r is known to be near zero:
log
/
(r) = d
/
(r) log
/
(ln(/)) r,2. (21)
The error is on the order of r
2
,24. Substituting (21) into (20),
we have
logsin(,4 r) ~ d
/
(r) r,2 (log
/
(,4) log
/
(ln(/))).
(22)
The binary value of
log
2
(,4) log
2
(ln(2)) = 0.0010111000100
2

suggests 1,8, 3,16, or 23,128 are low-cost approximations
for this constant. Fig. 4 shows the error of approximating
logsin with the Pythagorean approach,
1 cos
2
(,4 r)
_
(implemented as d
2
(2 c
2
(r)),2), versus the novel approach
(22) for ) = 7. For values of r near zero, the novel approach is
consistently better than the Pythagorean approach. Because
of quantization, both plots are noisy which makes themcross
each other several times in the middle. For ) = 7, r = 0.375 is
a reasonable choice for the point at which the approximation
switches from the novel approach to the Pythagorean one.
Replacing these cases in (18), we have
TABLE 5
Area of Function Approximation Units (Slices) on Xilinx Virtex-4
Fig. 3. Error in o
2
(7) using FloPoCo with Pythagorean (18) and )
1
= )
0
= 7. (a) Real part. (b) Imaginary part.
TABLE 6
Latency of Function Approximation Units (ns) on Xilinx Virtex-4
c
2
(r) =
c
2
(r). r < 0.
logcos(,4r). 0 _ r < 1.
d
2
(2 c
2
(2 r)),2. 1 _ r < 1.625.
d
2
(r 2) (2 r),2
(log
2
(,4) log
2
(ln(2))). 1.625 _ r < 2.
2
n
c:
1
. r = 2.
d
2
(2 r) (r 2),2
(log
2
(,4) log
2
(ln(2))). 2 < r < 2.375.
d
2
(2 c
2
(r 2)),2. 2.375 _ r < 3.
c
2
(4 r). 3 _ r < 4.
c
2
(r 8). r _ 4.
_
_
(23)
The same function approximation units and pipeline
schedule described earlier can implement (23) as easily as
(18), with only two additional adders (and possibly a
register to make r available at the proper time).
Fig. 5a shows the error for the real part of the approximate
o
2
(7) using (23) for range reduction and otherwise using
) = 7 FloPoCo-generatedapproximation units. Fig. 5b shows
the errors for the corresponding imaginary part. In both
figures, especially for the imaginary part, the errors are much
closer to the amount seen in round-to-nearest. The RMS
errors (0.006 for the real part and 0.005 for the imaginary
part) are in between those for round-to-nearest and those for
Pythagorean FloPoCo.
8 OBJECT-ORIENTED CLNS SIMULATION
In order to test the proposedtechniques ina complex number
application, like the FFT, we wouldlike tosubstitute different
arithmetic implementations without significant change to the
application code. The polymorphism of an object-oriented
language like Java makes such experimentation easy.
Although operator overloading in languages like C++ would
make this slightly nicer, Javas polymorphic method-call
syntax makes it fairly easy to describe the complex FFT
butterfly succinctly:
t = w[j[.mul(x[mi[);
x[mi[ = x[m[.sub(t);
x[m[ = x[m[.add(t);
where the variables in the application are of an abstract
class, which defines four abstract methods (PLUSI(),
inc(), recip(), and mul(CmplxAbs y)) that return
references to such a CmplxAbs object and two abstract
methods (real() and imag()) that return doubles. In
addition, CmplxAbs defines several concrete methods that
can be used in the application:
public CmplxAbs MINUS1(){
return(PLUSI().mul(PLUSI()));}
public CmplxAbs div(CmplxAbs y){
return(this.mul(y.recip()));}
public CmplxAbs add(CmplxAbs y){
return(this.mul((y.div(this)).inc()));}
public CmplxAbs neg(){
return(this.mul(MINUS1()));}
public CmplxAbs sub(CmplxAbs y){
return(this.add(y.neg()));}
Surprisingly, this is sufficient to define both polar and
rectangular complex arithmetic implementations. Only three
arithmetic methods (mul, recip, and inc) need to be
provided for each derived class. Also, the derived class
needs to provide three utility methods (the accessors for the
constant i and the rectangular parts of this). Although best
suited for CLNS implementations, this abstract class also
works with a rectangular class using two floating-point
instance variables manipulated by (1) and (2) together with
the rectangular definition of reciprocal and incrementation.
This rather unusual factoring of complex arithmetic has the
advantage that application code can be tested with the
rectangular implementation to isolate whether there are any
CLNS-related bugs in the application or in the class
definition itself. Two CLNS classes derived from the
abstract class store
^
7
1
and
^
7
0
as integers. The first CLNS
class implements ideal, round-to-nearest 2D lookup of
Fig. 4. Errors for Pythagorean and novel logsin(,4 r) for 0 < r _ 1.
Fig. 5. Error in o
2
(7) using FloPoCo with novel (23) and )
1
= )
0
= 7. (a) Real part. (b) Imaginary part.
o
/
(7) as in the prior literature as given by (11) and (12). The
second class, which is derived from the first CLNS class,
implements the proposed approach (26) and (34), using
tables that can simulate any of the design alternatives
discussed earlier. The first CLNS class defines all six
methods; the second CLNS class inherits everything except
its inc method. As CLNS formulas are involved and prone
to implementation error, this object-oriented approach
reduces duplication of untested formulas. In other words,
we test add, sub, etc., using simple rectangular definitions;
we test CLNS methods with the more straightforward (11)
and (12). Only when we are certain these are correct do we
test with the novel inc method in the grandchild class.
Having the two CLNS classes allows us to reuse the same
application code to see what effect the proposed ALU will
have compared to ideal CLNS arithmetic. In this case, the
application is a 64-point radix-two FFT whose input is a
real-valued 25 percent duty cycle square wave plus
complex white noise. This is rerun 100 times with different
pseudorandom noise. On each run, several FFTs are
computed using the same data: nearly exact rectangular
double-precision arithmetic, ideal (2D table lookup) ) = 7
CLNS, and variations (direct, FloPoCo-Pythagorean, FloPo-
Co-novel-sine) of the proposed ) = 7 CLNS. The CLNS
results from each run are compared against the double-
precision rectangular FFT on the same data. The ideal CLNS
FFT has RMS error 0.00025 and maximum error of 0.0033.
The proposed direct lookup approach has about 50 percent
higher RMS error (about 0.00035) and similar maximum
error (0.0029). The FloPoCo-Pythagorean approach did
much worse (RMS error of 0.00073 and maximum error of
0.01). The FloPoCo-novel-sine approach is better (RMS error
of 0.00056 and maximum error of 0.0064). In other words,
the best FloPoCo implementation loses about one bit of
accuracy compared to the best possible (but unaffordable)
) = 7 CLNS implementation and the FloPoCo-Pythagorean
loses more. To put these errors in perspective, the
quantization step for
^
7
1
is 2
7
~ 0.0078, and on that scale
perhaps even the errors observed in the FloPoCo-Pythagor-
ean approach may be acceptable for some applications.
9 CONCLUSIONS
A new addition algorithm for complex log-polar addition
was proposed, based around an existing real-valued LNS
ALU. The proposed design allows this ALUto continue to be
used for real arithmetic, in addition to the special complex
functionality described in this paper. The novel CLNS
algorithm requires extra function units for log-trigonometric
functions, which may have application beyond complex
polar representation. Two implementation options for the
novel special units were considered: mediumaccuracy direct
lookup and higher accuracy interpolation (as generated by a
tool called FloPoCo). The errors resulting from these two
alternatives were studied. As expected, round-to-nearest
direct lookup tables give the lowest roundoff errors. We
show that the errors in the FloPoCo implementation can be
reduced by using a more accurate logsin unit than one based
on Pythagorean calculations from logcos . Instead, we
proposed a novel algorithm for logsin of arguments near
zero which uses the same hardware as the Pythagorean-only
approach used in our earlier research [5].
We compared these implementations of our novel CLNS
algorithm to all known prior approaches. Our novel method
has speed comparable to CORDIC, and uses orders-of-
magnitude less area than prior 2D table lookup approaches.
As such, our approach provides a good compromise
between speed and area. Considering that CLNS offers
smaller bus widths than conventional rectangular repre-
sentation of complex numbers, our proposed algorithm
makes the use of this rather unusual number system in
practical algorithms, like the FFTs in OFDM, more feasible.
Using an object-oriented simulation, we have observed the
additional errors introduced by our proposed methods in
an FFT simulation. With our most accurate direct lookup
approach, this is on the order of half of a bit. With the
FloPoCo-novel-sine method, the additional error is about
one bit. Given that CLNS offers many bit savings compared
to rectangular arithmetic, this level of additional error
seems a reasonable trade-off in exchange for the huge
memory savings our proposed CLNS ALU offers.
In the course of implementing the CLNS ALU, we
uncovered and corrected several bugs in the HOTBM
implementation of FloPoCo. We also made it easier to use
by allowing the user to select the input range and scale of
the target function instead of having to map its ranges
manually to [0. 1[ [1. 1[. Hence, our work contributes to
the maturity of the FloPoCo tool beyond the field of LNS.
APPENDIX
In Case 1 (,2 < 7
0
< ,2), the real part of (14) can be
described as (24). There is a similar derivation
1(o
/
(7))
= 1
_
log
/
_
/
:
/
(logcos(7
0
)7
1
)
i sgnsin(7
0
)/
logsin(7
0
)7
1
__
= log
/
_
/
2:
/
(logcos(7
0
)7
1
)
/
2(logsin(7
0
)7
1
)
_ _
=
log
/
(/
2:
/
(logcos(7
0
)7
1
)
/
2(logsin(7
0
)7
1
)
)
2
=
log
/
(/
2(logsin(7
0
)7
1
):
/
(2(:
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
)))
)
2
= logsin(7
0
) 7
1
:
/
(2(:
/
(logcos(7
0
) 7
1
) (logsin(7
0
) 7
1
)))
2
. (24)
1(o
/
(7))
= 1
_
log
/
_
/
d
/
(logcos(7
0
)7
1
)
i sgnsin(7
0
)/
logsin(7
0
)7
1
__
= log
/
_
/
2d
/
(logcos(7
0
)7
1
)
/
2(logsin(7
0
)7
1
)
_ _
=
log
/
(/
2d
/
(logcos(7
0
)7
1
)
/
2(logsin(7
0
)7
1
)
)
2
=
log
/
(/
2(logsin(7
0
)7
1
):
/
(2(d
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
)))
)
2
= logsin(7
0
) 7
1
:
/
(2(d
/
(logcos(7
0
) 7
1
) (logsin(7
0
) 7
1
)))
2
. (25)
1(o
/
(7))
=
logsin(7
0
) 7
1
:
/
(2(:
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
)))
2
.
,2 < 7
0
< 0 or 0 < 7
0
< ,2.
:
/
(logcos(7
0
) 7
1
). 7
0
= 0.
logsin(7
0
) 7
1
:
/
(2(d
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
)))
2
.
< 7
0
< ,2 or 7
0
,2.
d
/
(logcos(7
0
) 7
1
). 7
0
= .
_
_
(26)
from (15) for Case 2, (7
0
< ,2 or 7
0
,2), shown as (25).
The subexpression (logsin(7
0
) 7
1
) in (24) and (25)
becomes when 7
0
= 0 or 7
0
= . The former case, in
fact, is trivially 1(o
/
(7)) = :
/
(7
1
), which is equivalent to
1(o
/
(7)) = :
/
(logcos(7
0
) 7
1
) when 7
0
= 0. A similar case
occurs when 7
0
= in which case 1(o
/
(7)) = d
/
(7
1
).
Equation (26) summarizes the cases for the real part in our
novel method.
The imaginary part uses the four-quadrant arctangent.
The computation of the four-quadrant arctangent [23]
depends on the signs of its two arguments:
arctan(r. y). y < 0.
arctan(y,r). y _ 0. r < 0.
arctan(y,r). y _ 0. r 0.
,2. y 0. r = 0.
_
_
(27)
Eliminating the recursion in (27), and noting that the value
passed to the one-argument arctangent is now always
nonnegative, we have
arctan([y[,[r[). y < 0. r < 0.
arctan([y[,[r[). y < 0. r 0.
arctan([y[,[r[). y _ 0. r < 0.
arctan([y[,[r[). y _ 0. r 0.
,2. y < 0. r = 0.
,2. y 0. r = 0.
_
_
(28)
The result produced for [ sin(7
0
)/
7
1
[,[1 cos(7)/
7
1
[ in
LNS depends on whether :
/
or d
/
is required to produce
1 cos(7
0
)/
7
1
, which again depends on the sign of the
cosine (either positive with ,2 < 7
0
< ,2 or negative
with 7
0
< ,2. 7
0
,2).
It is obvious for the range of 7
0
involved that y =
sin(7
0
)/
7
1
< 0 when 7
0
< 0. The condition when r =
1 cos(7
0
)/
7
1
< 0 is slightly more complicated, involving
cos(7
0
)/
7
1
< 1, a condition that can only happen
when sgncos(7
0
) < 0 (i.e., 7
0
< ,2 or 7
0
,2) and
[ cos(7
0
)/
7
1
[ 1. In LNS, the latter condition is equivalent
to logcos(7
0
) 7
1
0. The opposite condition r 0 hap-
pens when the cosine is positive, (,2 < 7
0
< ,2), or
when cos(7
0
)/
7
1
1. In LNS, the latter condition is
equivalent to logcos(7
0
) 7
1
< 0. Combining these condi-
tions with (14), (15), and (28), we have (29). Simplifying the
conditions, we get (30).
Note that the intermediate expression in (31) is the
negative of that used for the real part in the CLNS ALU
derived above. In two cases, we need to expand the
arctangent derivation into :
/
and d
/
subcases so that we can
substitute the intermediate expression into the arctangent.
Again, eliminating impossible conditions yields (33). As
in (26), there is a similar problem about the singularity of
logsin(7
0
) in the case of 7
0
= 0 or 7
0
= . Again, this is
trivial since [o
/
(7)[ = 0 or [o
/
(7)[ = . Taking these
into account, we can substitute the LNS computation of the
quotient into the arctangent to form (34).
(o
/
(7))
=
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
< 0 and
(7
0
< ,2 or 7
0
,2) and logcos(7
0
) 7
1
0
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. (7
0
< 0 and (,2 < 7
0
< ,2)) or logcos(7
0
) 7
1
< 0
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
_ 0 and (7
0
< ,2
or 7
0
,2) and logcos(7
0
) 7
1
0
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. (7
0
_ 0 and (,2 < 7
0
< ,2)) or logcos(7
0
) 7
1
< 0
,2. 7
0
< 0 and (7
0
< ,2 or 7
0
,2) and
logcos(7
0
) 7
1
= 0
,2. 7
0
_ 0 and (7
0
< ,2[7
0
,2) and
logcos(7
0
) 7
1
= 0.
_
_
(29)
(o
/
(7)) =
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
< ,2
and logcos(7
0
) 7
1
0
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
< 0 and
(7
0
,2 or logcos(7
0
) 7
1
< 0)
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
,2
and logcos(7
0
) 7
1
0
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
_ 0 and (7
0
< ,2 or logcos(7
0
) 7
1
< 0)
,2. 7
0
< ,2 and logcos(7
0
) 7
1
= 0
7
0
,2 and logcos(7
0
) 7
1
= 0.
_
_
(30)
log
_
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_
=
_
:
/
(logcos(7
0
) 7
1
) (logsin(7
0
) 7
1
)
_
.
,2 < 7
0
< ,2.
_
d
/
(logcos(7
0
) 7
1
) (logsin(7
0
) 7
1
)
_
.
7
0
< ,2 or 7
0
,2.
_
_
(31)
(o
/
(7))
=
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
< ,2 and
logcos(7
0
) 7
1
0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
_ ,2 and
(7
0
,2 or logcos(7
0
) 7
1
< 0).
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. ,2 < 7
0
< 0 and
(7
0
,2 or logcos(7
0
) 7
1
< 0).
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
,2 and
logcos(7
0
) 7
1
0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
_ ,2 and (7
0
< ,2
or logcos(7
0
) 7
1
< 0).
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 0 < 7
0
_ .
,2. 7
0
< ,2 and logcos(7
0
) 7
1
= 0.
,2. 7
0
,2 and logcos(7
0
) 7
1
= 0.
_
_
(32)
(o
/
(7))
=
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
< ,2 and
logcos(7
0
) 7
1
0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
_ ,2 and
logcos(7
0
) 7
1
< 0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. ,2 < 7
0
< 0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
,2 and
logcos(7
0
) 7
1
0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 7
0
_ ,2 and
logcos(7
0
) 7
1
< 0.
arctan
[ sin(7
0
)/
7
1
[
[1 cos(7
0
)/
7
1
[
_ _
. 0 < 7
0
_ ,2.
,2. 7
0
< ,2 and logcos(7
0
) 7
1
= 0.
,2. 7
0
,2 and logcos(7
0
) 7
1
= 0.
_
_
(33)
(o
/
(7))
=
0. 7
0
= and logcos(7
0
) 7
1
< 0.
. 7
0
= and logcos(7
0
) 7
1
0.
arctan(/
(d
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
))
).
< 7
0
_ ,2 andlogcos(7
0
) 7
1
0.
arctan(/
(d
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
))
). 7
0
_ ,2
and logcos(7
0
) 7
1
< 0.
arctan(/
(:
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
))
).
,2 < 7
0
< 0. 0. 7
0
= 0.
arctan(/
(d
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
))
). 7
0
,2
and logcos(7
0
) 7
1
0.
arctan(/
(d
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
))
). 7
0
,2 and
logcos(7
0
) 7
1
< 0.
arctan(/
(:
/
(logcos(7
0
)7
1
)(logsin(7
0
)7
1
))
). 0 < 7
0
_ ,2.
,2. 7
0
< ,2 and logcos(7
0
) 7
1
= 0.
,2. 7
0
,2 and logcos(7
0
) 7
1
= 0.
_
_
(34)
REFERENCES
[1] M.G. Arnold, Approximating Trigonometric Functions with the
Laws of Sines and Cosines Using the Logarithmic Number
System, Proc. Eighth EuroMicro Conf. Digital System Design,
pp. 48-55, 2005.
[2] M.G. Arnold, T.A. Bailey, J.R. Cowles, and C. Walter, Analysis of
Complex LNS FFTs, Signal Processing Systems SIPS 2001: Design
and Implementation, F. Catthoor and M. Moonen, eds., pp. 58-69,
IEEE Press, 2001.
[3] M.G. Arnold, T.A. Bailey, J.R. Cowles, and C. Walter, Fast
Fourier Transforms Using the Complex Logarithm Number
System, J. VLSI Signal Processing, vol. 33, no. 3, pp. 325-335, 2003.
[4] M.G. Arnold, T.A. Bailey, J.R. Cowles, and M.D. Winkel,
Arithmetic Co-Transformations in the Real and Complex
Logarithmic Number Systems, IEEE Trans. Computers, vol. 47,
no. 7, pp. 777-786, July 1998.
[5] M.G. Arnold and S. Collange, A Dual-Purpose Real/Complex
Logarithmic Number System ALU, Proc. 19th IEEE Symp.
Computer Arithmetic, pp. 15-24, June 2009.
[6] N. Brisebarre, F. de Dinechin, and J.-M. Muller, Integer and
Floating-Point Constant Multipliers for FPGAs, Proc. Intl Conf.
Application-Specific Systems, Architectures and Processors, pp. 239-
244, 2008.
[7] N. Burgess, Scaled and Unscaled Residue Number System to
Binary Conversion Techniques Using the Residue Number
System, Proc. 13th Symp. Computer Arithmetic (ARITH 97),
pp. 250-257, Aug. 1997.
[8] J. Detrey and F. de Dinechin, A Tool for Unbiased Comparison
between Logarithmic and Floating-Point Arithmetic, J. VLSI
Signal Processing, vol. 49, no. 1, pp. 161-175, 2007.
[9] J. Detrey and F. de Dinechin, Table-Based Polynomials for Fast
Hardware Function Evaluation, Proc. IEEE Intl Conf. Application-
Specific Systems, Architecture and Processors, pp. 328-333, July 2005.
[10] F. de Dinechin, C. Klein, and B. Pasca, Generating High-
Performance Custom Floating-Point Pipelines, Proc. Intl Conf.
Field-Programmable Logic, Aug. 2009.
[11] F. de Dinechin, B. Pasca, O. Cret , and R. Tudoran, An FPGA-
Specific Approach to Floating-Point Accumulation and Sum-of-
Products, Field-Programmable Technology, pp. 33-40, IEEE Press,
2008.
[12] D.M. Lewis, An Architecture for Addition and Subtraction of
Long Word Length Numbers in the Logarithmic Number
System, IEEE Trans. Computers, vol. 39, no. 11, pp. 1325-1336,
Nov. 1990.
[13] D.M. Lewis, Complex Logarithmic Number System Arithmetic
Using High Radix Redundant CORDIC Algorithms, Proc. 14th
IEEE Symp. Computer Arithmetic, pp. 194-203, Apr. 1999.
[14] R. Mehmke, Additionslogarithmen fu r Complexe Gro ssen,
Zeitschrift fur Math. Physik, vol. 40, pp. 15-30, 1895.
[15] R. Muscedere, V.S. Dimitrov, G.A. Jullien, and W.C. Miller,
Efficient Conversion from Binary to Multi-Digit Multidimen-
sional Logarithmic Number Systems Using Arrays of Range
Addressable Look-Up Tables, Proc. 13th IEEE Intl Conf. Applica-
tion-Specific Systems, Architectures and Processors (ASAP 02),
pp. 130-138, July 2002.
[16] E.E. Swartzlander and A.G. Alexopoulos, The Sign/Logarithm
Number System, IEEE Trans. Computers, vol. 24, no. 12, pp. 1238-
1242, Dec. 1975.
[17] E.E. Swartzlander et al., Sign/Logarithm Arithmetic for FFT
Implementation, IEEE Trans. Computers, vol. 32, no. 6, pp. 526-
534, June 1983.
[18] P.R. Turner, Complex SLI Arithmetic: Representation, Algo-
rithms and Analysis, Proc. 11th IEEE Symp. Computer Arithmetic,
pp. 18-25, July 1993.
[19] P. Vouzis and M.G. Arnold, A Parallel Search Algorithm for
CLNS Addition Optimization, Proc. IEEE Intl Symp. Circuits and
Systems (ISCAS 06), pp. 20-24, May 2006.
[20] P. Vouzis, M.G. Arnold, and V. Paliouras, Using CLNS for FFTs
in OFDM Demodulation of UWB Receivers, Proc. IEEE Intl
Symp. Circuits and Systems (ISCAS 05), pp. 3954-3957, May 2005.
[21] P. Vouzis, S. Collange, and M.G. Arnold, Cotransformation
Provides Area and Accuracy Improvement in an HDL Library for
LNS Subtraction, Proc. 10th EuroMicro Conf. Digital System Design
Architectures, Methods and Tools, pp. 85-93, Aug. 2007.
[22] P. Vouzis, S. Collange, and M.G. Arnold, LNS Subtraction Using
Novel Cotransformation and/or Interpolation, Proc. 18th Intl
Conf. Application-Specific Systems, Architectures and Processors,
pp. 107-114, July 2007.
[23] http://www.wikipedia.org/wiki/arctangent, Oct. 2008.
[24] http://www.xlnsresearch.com, 2010.
Mark G. Arnold received the BS and MS
degrees from the University of Wyoming, and
the PhD degree from the University of
Manchester Institute of Science and Technol-
ogy (UMIST), United Kingdom. From 1982 to
2000, he was on the faculty of the University
of Wyoming, Laramie. From 2000 to 2002, he
was a lecturer at UMIST, United Kingdom. In
2002, he joined the faculty of Lehigh Uni-
versity, Bethlehem, Pennsylvania. In 1976, he
codeveloped SCELBAL, the first open-source floating-point high-level
language for personal computers. In 1997, he received the Best
Paper Award from Open Verilog International for describing the
Verilog Implicit To One-hot (VITO) tool that he codeveloped. In 2007,
he received the Best Paper Award from the Application-Specific
Systems, Architectures and Processors (ASAP) Conference. He is
the author of Verilog Digital Computer Design. His current research
interests include computer arithmetic, hardware description lan-
guages, microrobotics and embedded control, and multimedia and
application-specific systems. He is a member of the IEEE.
Sylvain Collange received the masters degree
in computer science from the

Ecole Normale
Suprieure de Lyon, France, in 2007. He is
currently in the final year of a PhD program at
the University of Perpignan, France. In 2006, he
worked as a research intern at Lehigh Uni-
versity in Pennsylvania. He joined NVIDIA in
Santa Clara, California, for an internship in
2010. His current research focuses on parallel
computer architectures. His other interests
include computer arithmetic and general-purpose computing on
graphics processing units.
> For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.

Real/Complex Logarithmic Number System ALU

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Real/Complex Logarithmic Number System ALU

Diunggah oleh

Hak Cipta:

Format Tersedia

A Real/Complex Logarithmic Number

t = r y = y( r, y 1), even though such a step never

A), ln(/) used in complex analysis.

Anda mungkin juga menyukai