IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
I. I NTRODUCTION
Fig. 1.
1063-8210 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Fig. 2.
79
(2)
where yo [n] is error free output signal. In this way, the power
consumption can be greatly lowered while the SNR can still
be maintained without severe degradation [2].
III. P ROPOSED ANT M ULTIPLIER D ESIGN
U SING F IXED -W IDTH RPR
In this paper, we further proposed the fixed-width RPR to
replace the full-width RPR block in the ANT design [2],
as shown in Fig. 2, which can not only provide higher
computation precision, lower power consumption, and lower
area overhead in RPR, but also perform with higher SNR,
more area efficient, lower operating supply voltage, and lower
power consumption in realizing the ANT architecture. We
demonstrate our fixed-width RPR-based ANT design in an
ANT multiplier.
The fixed-width designs are usually applied in DSP applications to avoid infinite growth of bit width. Cutting off n-bit
least significant bit (LSB) output is a popular solution to construct a fixed-width DSP with n-bit input and n-bit output. The
hardware complexity and power consumption of a fixed-width
DSP is usually about half of the full-length one. However,
truncation of LSB part results in rounding error, which needs
to be compensated precisely. Many literatures [13][22] have
been presented to reduce the truncation error with constant
correction value [13][15] or with variable correction value
[16][22]. The circuit complexity to compensate with constant
corrected value can be simpler than that of variable correction
value; however, the variable correction approaches are usually
more precise.
In [16][22], their compensation method is to compensate
the truncation error between the full-length multiplier and
n1
x i 2i ,
Y =
n1
yj 2j.
(3)
j =0
i=0
2n1
k=0
pk 2k =
n1
n1
x i y j 2i+ j .
(4)
j =0 i=0
The (n/2)-bit unsigned full-width BaughWooley partial product array can be divided into four subsets, which are most
significant part (MSP), input correction vector [ICV()], minor
80
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 3. 12 12 bit ANT multiplier is implemented with the six-bit fixedwidth replica redundancy block.
ICV [MICV()], and LSP, as shown in Fig. 3. In the fixedwidth RPR, only MSP part is kept and the other parts
are removed. Therefore, the other three parts of ICV(),
MICV(), and LSP are called as truncated part. The truncated
ICV() and MICV() are the most important parts because
of their highest weighting. Therefore, they can be applied to
construct the truncation error compensation algorithm.
To evaluate the accuracy of a fixed-width RPR, we can
exploit the difference between the (n/2)-bit fixed-width RPR
output and the 2n-bit full-length MDSP output, which is
expressed as
= P Pt
(5)
n1
yj2j
j = n2 +1
n1
x i 2i
i= 3n
2 j
+ f x n1 y n2 , x n2 y n2 +1 , x n3 y n2 +2 , . . . , x n2 y n2 +2
+ f x n2 y n2 , x n3 y n2 +1 , x n4 y n2 +2 , . . . , x n2 yn2
n1
yj2
j = n2 +1
n1
j = n2 +1
n1
x i 2i + f (ICV) + f (MICV)
i= 3n
2 j
yj2j
n1
x i 2i + f (EC)
(6)
i= 3n
2 j
Fig. 4. Statistical curves of average truncation error in the LSP block and
the curves of compensation function with 1, , and + 1 in the 12-bit
fixed-width RPR-based ANT multiplier.
Fig. 5.
Analysis of absolute average compensation error under various
values in the 12-bit fixed-width RPR-based ANT multiplier.
81
TABLE I
R ELATION B ETWEEN THE PARTIAL P RODUCT T ERM S L OCATION IN
ICV AND THE S TATISTICAL C OMPENSATION E RROR E AVG
Table I, the average error value for each ICV vector with the
same partial product term summation value is nearly the same
even their partial product terms location is different. That is
to say that no matter ICV = (1,0,0,0,0,0), ICV = (0,1,0,0,0,0),
ICV = (0, 0, 1, 0, 0, 0), ICV = (0, 0, 0, 1, 0, 0), ICV =
(0, 0, 0, 0, 1, 0), or ICV = (0, 0, 0, 0, 0, 1), their weight in
each partial product term for truncation error compensation
is nearly the same. Therefore, we apply the same weight of
unity to each input correction vector element. This conclusion
is beneficial for us to inject the compensation vector into the
fixed-width RPR directly. In this way, no extra compensation
logic gates are needed for this part compensation and only
wire connections are needed.
For the = 0 case, we go further to analyze the error
profile in the ICV and MICV. In ICV, we can find that all the
truncation errors are positive when = 0, as shown in Fig. 6.
It implies us that if we adopt the multiple compensation
vectors for the average compensation error terms are larger
than 0.5 2(3n/2), we can lower the compensation error effectively and no additional compensation error will be generated.
The multiple compensation vectors are constructed by ICV()
0,
,
f (ICV) = 1,
2,
if l = 0
if l = 1, or l = 2 & (up MICV = 0 & down MICV = 0)
if l = 2 & (up MICV = 0 or down MICV = 0), l = 3,
or l = 4 & (up MICV = 0 & down MICV = 0)
if l = 4 & (up MICV = 0 or down MICV = 0)
(7)
82
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 9. Proposed high-accuracy fixed-width RPR multiplier with compensation constructed by the multiple truncation EC vectors combined ICV together
with MICV.
100%.
(11)
,% =
(Truncated)
The smaller value of avg , ms , max , and represents the
more accurate EC performance. The precision analysis results
of various fixed-width RPR multipliers or full-width RPR multiplier are shown in Table II. The fixed-width RPR multipliers
are the six-bit multipliers while their LSPs are truncated and
various error compensation vectors are applied. The full-width
RPR multiplier is the six-bit by six-bit multiplier. As shown
in Table II, the fixed-width RPR designs usually perform with
higher truncation errors than that of the full-width RPR design
83
TABLE II
C OMPARISONS OF THE A BSOLUTE M EAN E RROR , THE M EAN -S QUARE
E RROR , AND THE VARIANCE OF E RROR U NDER VARIOUS
RPR-BASED 12-B IT ANT M ULTIPLIER D ESIGNS
Fig. 11. Comparisons of delay profiles for the six-bit fixed-width RPR, the
six-bit full-width RPR, and the 12-bit full-length multiplier.
Fig. 10. Comparisons of the output signal SNR for various RPR designs
with different truncated bits.
from 6.83 to 9.13 dB. The analysis results show that the
output signal quality of the conventional full-width RPR-based
ANT design is influenced by the truncation error. However,
the output SNR of the our proposed fixed-width RPR-based
ANT design can still be maintained high and is not severely
influenced by the truncation error because a precise error
compensation vector is applied.
To lowering supply voltage beyond critical supply voltage
without sacrificing the throughput and without leading to
severe SNR degradation, the critical computation delay in
the RPR block must be as fast as possible. The shorter
computation delay in RPR can bring the benefits of lower
overscaling supply voltage adopted. The comparison of delay
profiles under various 10 000 input random patterns for
the presented six-bit fixed-width RPR design, the previous
six-bit full-width RPR design, and the standard 12-bit fulllength multiplier are shown in Fig. 11. Because the computation word length in the RPR block is less than that in the
main block computation unit, the critical computation delay
and the average computation delay in the RPR block can all be
shorter. In the presented fixed-width RPR design, the critical
computation delay and the average computation delay can be
further shortened since the fanout and wires loading of adder
cells are lowered to be close to half. As shown in Fig. 11, the
critical computation delay in the proposed RPR block can be
shortened by 56.3% as compared with the main computation
block. As compared with the previous full-width RPR, the
critical computation delay in our proposed RPR block can
also be shortened by 12.6%.
To determine the optimized bit-length of fixed-RPR, comparisons of the Kvos and RPR area with different bit length
of RPR block are shown in Fig. 12. Our goal is to select
the bit length of fixed-RPR, which can achieve the lowest
Kvos. The smaller RPR can perform with higher speed, lower
power consumption, and lower area. However, RPR with lower
bit length would also lead to SNR degradation. Therefore, a
tradeoff between hardware area and RPR bit length is needed.
In this paper, the optimized bit length of fixed-RPR is selected
as six based on the lowest Kvos analysis results shown in
Fig. 12.
84
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 12. Comparisons of the Kvos and RPR area with different bit length
of RPR block.
Fig. 13. Comparisons of the error probability of various RPR blocks under
different VOS scale.
Fig. 14. Comparisons of the output SNR of various RPR designs under
different Kvos.
Fig. 15. Comparisons of the total logic synthesis cell area of various RPR
designs with different bit length.
85
Fig. 16. Chip layout of the proposed 12-bit fixed-width RPR-based ANT
multiplier design.
TABLE III
P ERFORMANCE S PECIFICATION S UMMARY OF THE P ROPOSED 12-B IT
F IXED -W IDTH RPR-BASED ANT M ULTIPLIER D ESIGN
Fig. 17.
Power-saving comparisons under various design environments.
(a) Comparisons of the power consumption of various multiplier designs under
different Kvos. (b) Comparisons of the SNR under different power saving
percentage.
86
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 18. Comparisons of the output SNR of various process corners under
different Kvos.
[5] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, Low-power digital signal processing using approximate adders, IEEE Trans. Comput.
Added Des. Integr. Circuits Syst., vol. 32, no. 1, pp. 124137, Jan. 2013.
[6] Y. Liu, T. Zhang, and K. K. Parhi, Computation error analysis in
digital signal processing systems with overscaled supply voltage, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 4, pp. 517526,
Apr. 2010.
[7] J. N. Chen, J. H. Hu, and S. Y. Li, Low power digital signal processing
scheme via stochastic logic protection, in Proc. IEEE Int. Symp. Circuits
Syst., May 2012, pp. 30773080.
[8] J. N. Chen and J. H. Hu, Energy-efficient digital signal processing
via voltage-overscaling-based residue number system, IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 7, pp. 13221332,
Jul. 2013.
[9] P. N. Whatmough, S. Das, D. M. Bull, and I. Darwazeh, Circuit-level
timing error tolerance for low-power DSP filters and transforms, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 6, pp. 1218,
Feb. 2012.
[10] G. Karakonstantis, D. Mohapatra, and K. Roy, Logic and memory
design based on unequal error protection for voltage-scalable, robust
and adaptive DSP systems, J. Signal Process. Syst., vol. 68, no. 3,
pp. 415431, 2012.
[11] Y. Pu, J. P. de Gyvez, H. Corporaal, and Y. Ha, An ultra low
energy/frame multi-standard JPEG co-processor in 65-nm CMOS with
sub/near threshold power supply, IEEE J. Solid State Circuits, vol. 45,
no. 3, pp. 668680, Mar. 2010.
[12] H. Fuketa, K. Hirairi, T. Yasufuku, M. Takamiya, M. Nomura,
H. Shinohara, et al., 12.7-times energy efficiency increase of
16-bit integer unit by power supply voltage (VDD) scaling from 1.2V
to 310mV enabled by contention-less flip-flops (CLFF) and separated
VDD between flip-flops and combinational logics, in Proc. ISLPED,
Fukuoka, Japan, Aug. 2011, pp. 163168.
[13] Y. C. Lim, Single-precision multiplier with reduced circuit complexity
for signal processing applications, IEEE Trans. Comput., vol. 41, no. 10,
pp. 13331336, Oct. 1992.
[14] M. J. Schulte and E. E. Swartzlander, Truncated multiplication with
correction constant, in Proc. Workshop VLSI Signal Process., vol. 6.
1993, pp. 388396.
[15] S. S. Kidambi, F. El-Guibaly, and A. Antoniou, Area-efficient multipliers for digital signal processing applications, IEEE Trans. Circuits
Syst. II, Exp. Briefs, vol. 43, no. 2, pp. 9095, Feb. 1996.
[16] J. M. Jou, S. R. Kuang, and R. D. Chen, Design of low-error fixed-width
multipliers for DSP applications, IEEE Trans. Circuits Syst., vol. 46,
no. 6, pp. 836842, Jun. 1999.
[17] S. J. Jou and H. H. Wang, Fixed-width multiplier for DSP application,
in Proc. IEEE Int. Symp. Comput. Des., Sep. 2000, pp. 318322.
[18] F. Curticapean and J. Niittylahti, A hardware efficient direct digital
frequency synthesizer, in Proc. 8th IEEE Int. Conf. Electron., Circuits,
Syst., vol. 1. Sep. 2001, pp. 5154.
[19] A. G. M. Strollo, N. Petra, and D. D. Caro, Dual-tree error compensation for high performance fixed-width multipliers, IEEE Trans. Circuits
Syst. II, Exp. Briefs, vol. 52, no. 8, pp. 501507, Aug. 2005.
[20] S. R. Kuang and J. P. Wang, Low-error configurable truncated multipliers for multiply-accumulate applications, Electron. Lett., vol. 42,
no. 16, pp. 904905, Aug. 2006.
[21] N. Petra, D. D. Caro, V. Garofalo, N. Napoli, and A. G. M. Strollo,
Truncated binary multipliers with variable correction and minimum
mean square error, IEEE Trans. Circuits Syst., vol. 57, no. 6,
pp. 13121325, Jun. 2010.
[22] I. C. Wey and C. C. Wang, Low-error and area-efficient fixedwidth multiplier by using minor input correction vector, in Proc.
IEEE Int. Conf. Electron. Inf. Eng., vol. 1. Kyoto, Japan, Aug. 2010,
pp. 118122.
I-Chyn Wey was born in Taipei, Taiwan, in 1979.
He received the B.S. and M.S. degrees in electronics
engineering from Chang Gung University, Taoyuan,
Taiwan, and the Ph.D. degree in electronics engineering from National Taiwan University, Taipei, in
2001, 2003, and 2008, respectively.
He is currently an Assistant Professor with the
Department of Electrical Engineering, Chang Gung
University. His current research interests include
VLSI CMOS circuits design, noise-tolerant CMOS
circuits, near-threshold-voltage CMOS circuits, and
ultralow-power CMOS circuits design.
87