04447243

2116
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 55, NO. 7, AUGUST 2008
Reducing Lookup-Table Size in Direct Digital Frequency Synthesizers Using Optimized Multipartite Table Method
Davide De Caro, Member, IEEE, Nicola Petra, Member, IEEE, and Antonio G. M. Strollo, Senior Member, IEEE
AbstractThe use of the multipartite table methods (MTMs) to implement high-performance direct digital frequency synthesizers (DDFSs) is investigated in this paper. A closed-form expressions for the spurious-free dynamic range (SFDR) is obtained when a single table of offset (TO) is used in the multipartite approximation. In this case, the optimal design that minimizes storage requirement for a given SFDR can be obtained analytically. A numerical algorithm is also presented to obtain the optimal design also when two or more TOs are employed is the approximation. The VLSI implementation results and the comparison with previously proposed DDFS architectures demonstrate the effectiveness of multipartite table methods for the realization of high performance direct digital synthesizers. Index TermsCMOS digital integrated circuits, direct digital synthesis (DDS), direct digital frequency synthesizers (DDFS), frequency synthesis, multipartite table method (MTM), phase-to-sinusoid amplitude conversion, read-only memory (ROM) compression.
Fig. 1. Simplied schematic of a DDFS. DAC and low-pass lter are included when analog output is needed.
I. INTRODUCTION ODERN digital communication systems require frequency synthesizers with ne frequency resolution, fast channel switching speed, and large bandwidth. These requirements are surpassing the capabilities of conventional analog phase-locked loops. Direct digital frequency synthesizers (DDFSs) are ideally suited for these demanding applications, being characterized by ultrahigh-precision frequency control, short tuning latency, fast frequency switching with phase continuity, and excellent stability [1][4]. The basic DDFS architecture was originally proposed by Tierney et al. [1] and is shown in Fig. 1. The phase accumulator generates instantaneous phase values, while the sine generator produces a digital sinewave signal. Analog output is obtained by using a digital-to-analog converter (DAC) followed by a low-pass reconstruction lter. of the generated sinewave is proportional The frequency to the frequency control word and is given by
(1)
Manuscript received November 23, 2006; revised March 13, 2007 and June 19, 2007. First published February 7, 2008; last published August 13, 2008 (projected). This paper was recommended by Associate Editor J. R. (a.k.a. Rong-Jian) Chen. The authors are with the Department of Electronics and Telecommunication Engineering, University of Naples, 80125 Naples, Italy (e-mail: nicpetra@unina.it). Digital Object Identier 10.1109/TCSI.2008.918008
where is the clock frequency. By increasing the wordlength of the phase accumulator , the DDFS can achieve an excellent frequency resolution. The most critical block in a DDFS is the sine generator. In the simplest implementation, the output of the accumulator addresses a read-only memory (ROM). The ROM implements a lookup table (LUT) storing -bit digitized sine waveform. In this brute-force approach, a very large ROM is needed, with the total ROM size being bits. To reduce the ROM size without impairing frequency resolution, the phase value passed to the sine generator is normally truncated to bits. Phase truncation reduces ROM size to bits, however, it introduces spurious noise in the DDFS outputs [1][4] which should be carefully taken into account in the design phase. Another well-known technique to reduce ROM size is to store the sine values only for angles in . Sine values for the full range of input phase are generated by exploiting the quarter-wave symmetry of trigonometric functions and trigonometric identities. In this way, the ROM size can be reduced by a factor of four. Even after performing phase truncation and exploiting quarter wave symmetry, the size of the LUT is usually prohibitive. For this reason, several alternative approaches for the implementation of the sine generator have been proposed. A comprehensive review of phase to sinusoid amplitude conversion techniques has been recently published in [5]. In general, optimization of sine generator architecture involves trading off numerical precision and the sine computation method against the sine wave spectral purity and maximum clock rate. The spurious free dynamic range (SFDR), which is dened as the ratio between the amplitude of the wanted sinusoid and the amplitude of the largest undesired frequency component, is the parameter commonly used to characterize the DDFS spectral purity. The DDFS architectures based on CORDIC-like angle-rotation algorithms [7][9] are well suited when a large SFDR is required but are rather ineffective for high-clock-frequency applications. These approaches, in fact, require very small lookup
1549-8328/$25.00 2008 IEEE
DE CARO et al.: REDUCING LUT SIZE IN DIRECT DIGITAL FREQUENCY SYNTHESIZERS USING OPTIMIZED MTM
2117
memories but use cumbersome arithmetic circuitry that reduces clock frequency and increases power dissipation. In polynomial and piecewise polynomial interpolation architectures, [10][15], a small ROM (often implemented as a random logic) is employed to store polynomial coefcients. Adders and multipliers are required to implement the polynomial approximation. When high speed is the primary concern, piecewise-linear approximation with optimized coefcients appears to be one of the most effective approaches [10], [15]. ROM compression techniques use approximations in which the LUT storing sine values is subdivided in two smaller parts (a coarse ROM and a ne ROM). The outputs of coarse and ne ROMs are added together to yield the nal sine value. The rst ROM compression technique was proposed by Hutchinson [1], [5]. An improved approach, based on trigonometric approximations, was developed by Sunderland and then improved by Nicholas [5], [6]. The DDFS presented in [16] uses another technique in which total ROM size is further reduced by decomposing both coarse and ne ROMs as the sum of an error ROM and a quantization ROM. The multipartite table method (MTM) is a very effective technique for table-based function evaluation, recently proposed in [18]. The MTM can be seen as a generalization of the Nicholas small ROMs technique. The LUT is decomposed in [a table of initial values (TIV) plus table of offsets (TOs)], whose outputs are added together to obtain the required function value. From an implementation point of view, the optimal value comes from a tradeoff between the total ROM size (that decreases with ) and the multi-operand adder complexity. The implementation results reported in [19] highlight that the MTM is a technique ideally suited for DDFS implementations, requiring both small ROM and a minimal arithmetic overhead. This paper investigates the optimization of MTM for DDFS implementation. An analytical technique that allows to obtain the optimal multipartite table approximation, which minimizes the overall ROM size while guaranteeing a target SFDR value, is presented in the paper. As shown in [19], optimizing for SFDR results in a substantial memory saving with respect to the original approach of [18], where the optimal multipartite table was searched by imposing a bound on the maximum absolute approximation error. First, this paper investigates the MTM with a single TO, the bipartite table method (BTM). In this case, the optimal SFDR value and the ROM content which maximizes the SFDR are both derived in closed form. This allows us to easily determine, with straightforward calculations, the optimal ROM decomposition. The developed analytical technique allows to obtain the optimal bipartite table decomposition avoiding the numerical search technique used in the approach of [19]. In the second step, the analytical approach developed for the BTM is extended in the general MTM, with two or more TOs. In this case, the optimal MTM approximation is found by using a novel search algorithm which, extending our BTM technique, is able to nd the optimal MTM decomposition avoiding any brute-force exhaustive search that would require an unacceptably large computing time to evaluate the SFDR for each possible input decomposition. The paper also studies the implementation tradeoffs involved in MTM-based DDFS. It is shown that the optimal number of
Fig. 2. Sine generator architecture, exploiting half-wave sine symmetry.
TOs depends on the required SFDR. Optimal table decomposition and content is given for 60- and 80-dBc cases. This paper is organized as follows. Section II reports a brief review of ROM compression algorithms. Section III investigates the MTM with a single TO (the BTM). The extension to MTM with two or more TOs is described in Section IV. The tradeoff between silicon area, clock speed, and power is investigated in Section V by comparing the simulated performances of several MTM-based DDFS implemented in a 0.25- m CMOS technology. This paper focuses only on the digital portion of the system shown in Fig. 1. When analog outputs are needed, the DAC characteristics should be taken into account since the SFDR and power dissipation of the DAC could limit system performances. II. ROM COMPRESSION ALGORITHMS A. Quadrant Compression The architecture of the sine generator block using quadrant compression is shown in Fig. 2. The -bits input signal rep. The two most signicant bits resents the input phase in of determine the quadrant in which the input phase lies. The least-signicant bits of signal , composed by the , represents an angle in , scaled to a binary fraction [0,1). The input of the sine calculation block is the -bit signal and the sine calculation block computes the -bit output (2) is the weight of the least signicant bit of where . As shown in Fig. 2, in the rst quadrant, , and the output of the sine calculation block is straightforwardly sent to the DDFS output. In the other quadrants, the output sine wave is reconstructed by conditionally complementing the input and offset in the output of the sine calculation block. The (2) allows using a simple 1s complementor for the signal , in place of a more complex 2s complementor [20]. B. Sunderland and Nicholas Algorithms The Sunderland technique reduces ROM size by employing simple trigonometric identities. The signal is decomposed as , where , , and correspond to the most
2118
signicant bits (MSBs), the middle bits, and the least signicant bits (LSBs) of . Due to the relative magnitudes of , , and , the function in (2) is approximated as
(3) The last equation is implemented by using a coarse ROM, implementing the function , a ne ROM for the function , and an adder. The Nicholas algorithm [6] uses the same coarse/ne ROM partitioning of the Sunderland architecture. In the Nicholas algorithm, however, the ROM coefcients are obtained by using numerical optimization, instead of closed-form expression (3). Fine ROM samples are obtained by minimizing either the maxand imum absolute error or the mean-square error between . C. BTM The BTM, introduced in [17], is a table-based approach that implements a particular piecewise-linear approximation for . The range [0,1) of is divided in equal length is approximated segments. In each segment the function as
Fig. 4. Implementation of the BTM, exploiting the symmetry in the TO entries. Fig. 3. BTM for p ,q . The entries of the TIV are represented with the heavy dots. The expanded view highlights the linear interpolation implemented through the TO
=1
=3
The starting point of the th segment is , with . segments into The idea behind the BTM is to group the larger intervals (with ) and to use the same interpolating slope in each larger interval. Therefore, the same interpolating slope is employed for adjacent segments, where (5)
(4) and
Fig. 5. MTM input signal decomposition.
Fig. 3 shows an example of the BTM algorithm, for and . In this case, we have eight segments while only two slope values are used to perform the linear interpolation. The rst slope value is used for the segments from 1 to 4, while the second slope value is employed for the segments from 5 to 8. The implementation of the BTM requires aTIV that stores and a TO for . The TIV is addressed by the MSBs . The total number of and the total number of TIV entries is of slopes considered in the approximation is , therefore the MSBs of address the TO. Moreover, the term corresponds to the subword composed by the LSBs of (with ). This subword, therefore, also addresses the TO and the total number of TO entries is . Fig. 4 shows the symmetric [17] implementation of the BTM. In this case, the TIV stores the value of the interpolating function in the middle of each segment. The TO stores the offsets between the approximating function and the TIV value in each segment. As highlighted in the expanded view of Fig. 3, this
choice makes symmetric the values stored in the TO for each segment. This property is exploited to halve the size of the TO: the TO stores only the absolute offset values, corresponding to the right half of each segment, and an addersubtractor is employed to perform the interpolation, as shown in Fig. 4. D. MTM The MTM, described in [18], generalizes the BTM. In the multipartite table approach, the -bit input signal is decomnonoverlapping subwords: posed in of lengths respectively. The value of the input operand is and the length is ; see Fig. 5. starts from the observation The MTM approximation of . Now, the that the TO computes the multiplication: term can be seen as the sum the subwords and the multiplication implemented by the TO can be written as:
2119
discussion and without loss of generality, let us also neglect the LSB/2 phase offset in (2). As discussed previously, the BTM corresponds to a piece. Therefore, we can use the wise-linear approximation of analytical approach developed in [10] and [14] to obtain the amplitude of the generated harmonics. Let us indicate as the DDFS output, obtained by applying quadrant compression to . The function has period and odd symmetry. Therefore, it can be represented by a Fourier sine series as follows: (7) Even harmonics are zero, since has quadrant symmetry. Odd harmonics amplitude, following the analysis of [10], can be calculated as (8) for odd and 0 otherwise, with (9) is indexed by the and bits shown in Fig. 5. In The [18] it is shown that a signicant reduction in memory size can be achieved by indexing the -th Table of Offsets with and , where: . Fig. 6 shows the implementation of the MTM approximation, when two TOs are employed. Symmetry is exploited also in MTM to reduce the size of the TO. The values to be stored in the ROMs are obtained in [18] by minimizing the maximum absolute approximation error. Closed-form expression for ROM coefcients and error bounds are also provided in [18]. In general, increasing the number of TOs allows to reduce the total memory size. However, any TO requires the introduction of an additional adder input, with a tradeoff between the total ROM size and the multi-operand adder complexity. Moreover, using more tables increases the discretization error, requiring the introduction of guard bits [18] that may partly overcome the advantages in terms of memory size. III. BTM WITH SFDR OPTIMIZATION Here, we focus on bipartite table approximation. First, an expression for the harmonic content will be obtained. Then a simple closed-form expression giving the SFDR upper bound will be derived. Finally, we will address the problem of determining the optimal decomposition that minimizes the ROM size for a given SFDR. A. Harmonics Calculation For the time being, let us neglect the effects of quantization, as that will be considered in subsequent sections. Therefore, we so that only two parameters ( and ) will assume characterize the bipartite decomposition. Moreover, to simplify (13) . The equations (8)(13) allow to compute where and the harmonic amplitude from the knowledge of the values and from the value of the two parameters and that characterize the bipartite decomposition. B. SFDR Optimization The problem of SFDR optimization consists in determining the and that maximize the SFDR, for given values of and . This problem is treated in Appendix I, where an analytical technique able to obtain the optimal coefcients and (10) where coefcients and are given by
Fig. 6. Implementation of the MTM using two TOs.
distributed into the sum of
. Therefore, the TO can be smaller TOs: with (6)
(11) Since the same interpolating slope is employed for adjacent subintervals, the slopes are constrained as follows:
(12) By imposing the conditions (12) in (11), we observe that all coefcients with index not divisible by are zero. As a concan be written as sequence, the function
2120
Fig. 7. Optimal SFDR in BTM as a function of p and q . Fig. 8. Calculated harmonics, for a DDFS using the BTM with SFDR optimizaand p , corresponding to a BTM approximation tion. In this case, q segments and M different slopes. The inset shows the apwith s proximation error.
is described and the value of the optimal SFDR is also calculated. In Appendix I, it is shown that the optimal SFDR can be written in two different forms [cf. (31) and (41)], depending on value. the Equations (41) and (31) simplied for can be grouped in a single handy expression as follows: (14) where is the Kronecker delta function: for and 0 otherwise. Fig. 7 shows the behavior of the optimal SFDR, given by (14). corresponds to the upper bound derived in The case [10]. The SFDR increases with . As displayed in Fig. 7, for a constant value, the SFDR decreases by decreasing . In fact, the lower is, the lower the number of different slopes is that we use in BTM approximation. and Fig. 8 shows the calculated harmonics, for , corresponding to a BTM approximation with segdifferent slopes. The analysis of Appendix I ments and shows that, for this set of parameters, the dominant harmonics , , and , where from (21) are and . Fig. 8 shows that the largest harand (22) monics are indeed the 15th, 49th, and 65th, as expected, with an SFDR slightly lower than 60 dBc, in agreement with (14). The amplitude of higher order harmonics is well below this value. The inset of Fig. 8 shows the approximation error , which is discontinuous at the segment boundaries. The approximation error is in the range , . It can be noted that the developed SFDR optimization is different from a minmax approximation, since the maximum positive and negative approximation errors are not equal in modulus. C. Optimal Design The rst step to determine the optimal BTM design, that minimizes the ROM size for a target SFDR, is the selections of the and values (see Fig. 2). The value of is obtained taking into account the spurs introduced by phase quantization [1][4] (15)
= 16
=4
=2 =4
hence (16) where the function rounds to the nearest integer, larger than (or equal to) . The value of is related to amplitude quantization errors. There is no expression available for the effect of amplitude quantization on SFDR. On the other hand, the effect of amplitude quantization on signal-to-noise ratio (SNR) is well known. If we consider the SNR, we have from [21] that the phase , otherwise quantization dominates the SNR when amplitude quantization dominates. Therefore, one of the two or is selected. In the values, either will be assumed for dBc, following, is assumed for larger SFDR values. while Now let us determine the optimal values of and . To that purpose, a few candidate solutions that allow to reach the target SFDR are initially obtained. The best BTM decomposition is selected as the candidate solution with the minimal ROM size. The rst candidate solution is the BTM decomposition with the minimum number of segments that allows to reach the target . The SFDR. This is the decomposition with can easily be obtained from (14) as value of
(17) and the The second candidate solutions has minimum value of that allows to reach the target SFDR. This value of is obtained through a simple search, using again (14) to compute the SFDR. The other candidate solutions are obtained in a similar way, and selecting the minimum value assuming to reach the target SFDR, for increasing values.
2121
TABLE I OPTIMAL DECOMPOSITIONS AND ROM CONTENT FOR 60-dBc SFDR, USING OPTIMIZED BTM
Fig. 9. Design of a 60-dBc DDFS, based on optimized BTM. At the optimal , with a total ROM size of 352 b. point q and p
=5
=2
As an example, for 60-dBc SFDR, the following four candi, date solutions are obtained: , , and . Now, the ROM size is computed for each candidate solution and the best BTM decomposition, with the minimal ROM size, is selected. The ROM size can easily be computed. The TIV size is . In order to determine the TO size, let us observe that the maximum slope of the BTM can be estimated as the maximum , given by . Therefore, the largest value to derivative of be stored in the TO can be approximated as and the TO values can be represented with bits. The total ROM size can hence be written as (18) where the symmetry of the ne ROM entries has been taken into account and . , ) is The design example for a 60-dBc SFDR ( is assumed. When shown in Fig. 9. In this case, increases, the TIV size increases while the TO decreases. The optimal point is achieved for , , with a total ROM of 352 b and an SFDR (before rounding) of 65.2 dBc. D. Tables Entries The TIV stores the value of the interpolating function in the middle of each segment. The content of TIV (before rounding) is given by (19) is a constant, equal to one-half where value and the LSB/2 phase offset of (2) the maximum is also taken into account. The TO stores the offsets between the approximating function and the TIV value in each segment
Fig. 10. Total ROM size as a function of SFDR, for optimized MTM algorithms.
(20) The actual values stored in the TIV and the TO are obtained by rounding the right-hand sides of (19) and (20). To provide improved performance in the presence of amplitude quantization, the values stored in LUTs can be scaled before rounding
[6] (amplitude optimization). To that purpose, a search is performed by multiplying the right-hand sides of (19) and (20) by , with . For each trial value, (19) and a factor (20) are rounded to ll the LUTs, and the SFDR is computed by -point fast Fourier transform (FFT). The best performing a rounded TIV and TO are selected as the ones yielding the largest SFDR. The use of a very small step size (less than one LSB) during the search guarantees a negligible reduction of the output sinewave. The ROMs content for the optimized 60-dBc DDFS is reported in Table I. The total ROM size is 352 b and the obtained SFDR is 64.69 dBc. As a comparison, the DDFS recently proposed in [16] (using quad line range compression, the Sunderland technique, and additional quantization and error ROM compression) uses 368 b of memory, reaches 55-dBc SFDR, and requires a six-input multi-operand adder. Fig. 10 shows the total memory required by the optimized BTM algorithm, as a function of SFDR. As can be seen, the storage requirements are much smaller than uncompressed memory, with a compression ratio increasing with SFDR. On the other hand, the ROM size still increases exponentially with the SFDR. Therefore, the BTM algorithm becomes ineffective when SFDR larger than 80 dBc are required. IV. MTM WITH SFDR OPTIMIZATION The MTM can be seen as a piecewise-linear approximation, with an additional error component due to the splitting of
2122
TABLE II MTM APPROXIMATION WITH TWO TOS. OPTIMAL DECOMPOSITIONS AND ROM CONTENT FOR 60- AND 80-dBc SFDR
Fig. 11. Algorithm to determine the optimal MTM decomposition, with two TOs.
the TO into smaller TOs. The SFDR reduction due to this additional error component cannot easily be determined analytically. Therefore, we developed a search algorithm to nd the optimal MTM decomposition (that minimizes the ROM size for a given SFDR). The proposed algorithm imposes an upper bound on the maximum error in the time domain to reduce the search space (see Appendix II). Moreover, in the developed algorithm, according to [18], the slopes employed , , etc., are obtained by averaging the slopes used in (see (42) in Appendix II). Please note that using more in tables increases the discretization error. In our algorithm, this problem is solved by considering the addition of a maximum of two guard bits in the tables. A. Using Two TOs Let us start by considering a multipartite approximation with two TOs, that is, a tripartite approximation. The input word decomposition is shown in Fig. 6. The proposed algorithm is shown in Fig. 11 and is composed of three main steps. A time-consuming numerical SFDR calculation is performed only in the third step of the algorithm. of the BTM In the rst step, the candidate couples decomposition and the total ROM size corresponding to the optimal BMT decomposition are obtained, as described in the previous section. In the second step of the algorithm, for each candidate couple, the possible tripartite decompositions are enumerated by varying and . To limit the search space, in the enumeration only the decompositions with are considered. This condition, as described in Appendix II, guarantees a bound on the additional error component of tripartite approximation. Moreover, only the decompositions characterized
by a ROM size smaller than the one obtained with the BTM are saved. For each , , , and values, which satisfy the above conditions, three candidate decompositions, with a number of equal to 0, 1, and 2, are saved to be considered in guard bits the nal phase of the algorithm. A nal search is performed in the third step of the algorithm. After sorting the candidate decompositions in ascending ROM size, the SFDR is computed for each decomposition. The rst candidate solution that meets the target SFDR decomposition is the optimal one, with the minimal ROM size. Table II shows the obtained tripartite decompositions and the ROMs contents for 60- and 80-dBc SFDR. No guard bits are needed in this case to reach the target SFDR. The CPU time needed to nd the optimal decomposition on a 3-GHz Pentium IV PC varies from about 1 s, for a target SFDR of 60 dBc, up to about 150 s, for a target SFDR of 120 dBc. B. General Case (More Than Two TOs) For the general case, when more than two TOs are employed, the search algorithm is very similar to that described in Fig. 11. The only difference is the enumeration of the possible decompositions in the second step of the algorithm. Also, in this case, the time-consuming SFDR computation is carried out only in the third step of the algorithm. The CPU time needed to nd the optimal decomposition for the most complex case investigated in this paper (four TOs and 120 dBc target SFDR) is about 15 min on a 3-GHz Pentium IV PC. Fig. 10 shows the total ROM size required by DDFS using optimized MTM algorithm as a function of SFDR. As can be seen, a sensible decrease in memory size is obtained by using the MTM algorithm with two TOs, while the improvement is less evident when three or four TOs are employed.
2123
TABLE III COMPARISON BETWEEN ROM SIZE IN RECENTLY PROPOSED DDFSS
TABLE IV IMPLEMENTATION RESULTS FOR OPTIMIZED DDFSS
TABLE V HIGH-SPEED DDFS IMPLEMENTATION RESULTS
Table III shows a comparison between the ROM size used in recently proposed DDFS architectures. As can be seen, the optimized MTM approach compares favorably even with multiplier-based techniques such as those by Curticapean [22], [23], Bellauar [12], and De Caro [14]. However, it is worth noting that the introduction of additional ne ROMs requires the utilization of a larger multi-operand adder that might become a bottleneck in terms of speed or silicon area. V. VLSI SIMULATIONS RESULTS We have implemented several DDFSs for SFDR values of 60, 80, and 100 dBc. A 24-b accumulator was used in every DDFS. All circuits have been synthesized by using a standard design ow, starting from VHDL synthesizable description, followed by gate-level optimization and standard cells place and route. The technology is 0.25- m CMOS, with one poly and ve metals. In the designed DDFSs, we have not used slow and powerhungry full-custom ROMs, but instead we have implemented the ROMs by using standard cells, with the help of automatic synthesis tools. As discussed in [13], this not only facilitates design reuse, but also allows to reach high clock frequencies with reduced dissipation. In addition, the circuits can be designed to meet different system requirements, by specifying speed and area constraints during synthesis. Table IV shows the simulation results in the absence of pipelining in the sine generator, with a single pipeline level in the accumulator. Timing constraints have been imposed during synthesis and optimization. The considered clock periods are: 3, 4, and 5 ns for 60-, 80-, and 100-dBc DDFSs, respectively. Fast carry look-ahead adder (BrentKung parallel-prex architecture [28]) was employed both in the accumulator and in the sine generator. As shown in Table IV, for 60-dBc DDFS there is no advantage in increasing the number of TOs, and the best architecture cor-
responds to the bipartite approximation. In this architecture, the TIV and the TO are synthesized by using only 73 and 25 gates, m , respectively, with a total ROM area of about which is less than 15% of the total DDFS area. The total ROM area decreases if two (or more) TOs are used. This improvement, however, is more than compensated for by the multi-operand adder needed to sum TIV and TOs outputs. The overall effect is a larger (and slightly slower) circuit. For 100-dBc SFDR, the best design uses four TOs. In this case, an area reduction of about 45% is obtained with respect to the BTM implementation. In this implementation, in fact, the percentage of total circuit area taken by the ROMs is relevant, and any reduction in ROM size results in an improvement in total DDFS area. Note, however, that using more than three TOs yields only a marginal ROM size (and circuit area) reduction with a slightly larger power dissipation. For 80-dBc DDFSs, the best tradeoff between adder and ROMs size is achieved by using two TOs. It is interesting to note that the tradeoff found here between ROM and adder complexities is similar to the tradeoff existing in piecewise polynomial DDFSs between ROM and arithmetic circuitry (see [14]). Reaching a high clock frequency is an important issue for DDFSs. The circuits designed with the MTM are ideally suited for high clock frequency operation, requiring both small LUTs and a very simple arithmetic circuitry. Table V shows the simulated performances of 60- and 80-dBc high-speed DDFSs. In these circuits, high-speed operation is gained by introducing two pipeline levels in the accumulator and three additional pipeline levels in the sine generator. In particular, the rst pipeline stage is introduced on the ROMs ad-
2124
TABLE VI COMPARISON BETWEEN RECENTLY PROPOSED DDFSS
dress lines, the second pipeline stage is inserted on the ROMs outputs, while the third pipeline stage is introduced in the multioperand adder. Several implementations have been realized by varying timing constraints during synthesis and optimization and by using different parallel-prex adder topologies (e.g., either BrentKung or KoggeStone [28]). As shown in Table V, a clock frequency larger than 800 MHz for a 60-dBc DDFs and larger than 700 MHz for an 80-dBc DDFS can be reached with a price in term of silicon area. For 60-dBc circuits, using high-speed and area-hungry KoggeStone topology in the sine generator is ineffective due to the reduced adder wordlength. On the other hand, in the accumulator, the use of KoggeStone architecture is mandatory to reach a clock frequency larger than 700 MHz. For high-speed 80-dBc DDFS, the use of KoggeStone architecture in the sinecosine generator allows to achieve the best performances due to larger wordlengths. A fair comparison between the performances of DDFSs developed in this paper and previously proposed circuits is not easy. This is due to the wide range of possible architectural and implementation choices, like the desired SFDR, the accumulator wordlength, the structure of the sine generator (which can be either single phase or quadrature), the implementation technology, the standard-cell library, and so on. The data shown in Table VI have been obtained by selecting some recently published DDFS circuits, using CMOS technology, SFDR, and clock frequency similar to the ones considered in this paper. For 60-dBc SFDR, the design in [26] uses only 60% of the area of our DDFS, but is four times slower, provides a 20% higher power dissipation, and uses a more advanced 0.18- m technology.
Considering 60-dBc high-speed implementations, our proposed technique allows reaching a high clock frequency without requiring the parallel operation of multiple sinecosine generators, as in [16]. Therefore, silicon area of our DDFS is only a small fraction with respect to that of [16], and moreover the power dissipation of the proposed circuit uses about one third of the power of that in [16]. For an 80-dBc SFDR, the 250-MHz version of the proposed circuit reveals itself to be both faster and smaller than piecewise-linear DDFSs. However, some of the difference could be attributed to the use of a simpler single-phase architecture in this paper. A similar consideration applies for the high-speed version of the 80-dBc DDFS. Interestingly, the data in Table VI show that the developed DDFS largely outperforms recently proposed circuits even when high SFDR (100 dBc) are required. As an example, the area occupation and the power dissipation of the proposed circuit are about one third and one sixth with respect to the design proposed in [27]. This clearly demonstrates the effectiveness of the ROM compression technique. VI. CONCLUSION We have investigated the optimization of the MTM for DDFS implementation. An analytical technique that allows to obtain the optimal bipartite approximation (that minimizes the overall ROM size, while guaranteeing a target SFDR value) has been presented. An effective search algorithm has been proposed to select the optimal multipartite approximation. Several DDFS have been implemented and simulated in a 0.25- m CMOS technology. We have found that the optimal number of TOs to be used depends on the required SFDR. For
2125
an SFDR of the order of 60 dBc, the best results are obtained by employing a single TO, while using two TOs is the recommended choice for an 80-dBc SFDR. For a 100-dBc SFDR, either three or four TOs should be considered. The obtained VLSI simulation results and the comparison with previously proposed DDFS architectures demonstrate the effectiveness of MTMs for the realization of high-performance direct digital synthesizers. APPENDIX I In order to minimize harmonic contents, we put at zero the amplitude of as many harmonics as possible, while keeping . xed the amplitude of the fundamental harmonic has Let us start by observing, from (13), that function odd symmetry and is periodic in with period
are canDue to (26) and (27), all of the harmonics of and of celled, with the exception of the fundamental , , , etc. From (8), (9), (13), and (21)(27), we can write
(28)
for (21) , from (9), has instead even symmetry and The function is periodic in with period (22) Combining the periodicity and symmetry of one has and , for
(29)
(30)
(23) (24) Let us now focus our attention on the amplitude of the fundamental and of the harmonics for . We will show a posteriori that, at the optimal point, the higher order harmonics are smaller in amplitude with respect to the ones in . the range In order to minimize harmonic contents, we impose the following conditions: for (25)
Please note that, for , the only harmonics to be considered are , and (29) and (30) do not apply. This corresponds to the (unconstrained) piecewise-linear sine approximation considered in [10] and the corresponding optimal SFDR is calculated in [10] as
(31) , we have to take into account only and , so that (30) does not apply. , considered in the following, we In the general case have to consider also the harmonics given by (30). appears From (29), it comes out that the parameter only in the expression of harmonics and . Since these two harmonics are linearly dependent on , their maximum absolute value is minimized when . Therefore, at the optimal point, one of the following two conditions is met: either or . It can easily be seen that the condition to be imposed to reduce harmonics . amplitude is the last one: Substituting into (29), one obtains For
Owing to (23), the last equation implies that all values with up to are also zero. Therefore, we index from have for , and the only values different from zero are , , etc. Conditions similar to (26) are imposed for the function for From (24) and (27), the only , etc. (27) values different from zero are (26)
for
(32)
2126
Similarly, from (30), in order to minimize the amplitude of and , we impose harmonics , thus obtaining
, and values. Coefcients from (11).
and
are nally computed
APPENDIX II Let us consider a multipartite approximation with two TOs, i.e., a tripartite approximation. The input word decomposition uses different slopes, comis shown in Fig. 6. While puted according the technique presented in Section III, uses slope values, with . Therefore, the slopes of (with ) are approximated in . This results in an error compowith a single value nent that is minimized by assuming (42) (35) Therefore, the dominant harmonic between is the one at lower frequency and is given by (35). Equation (35) holds also in the case . Let us now consider (28). By imposing an unitary value for , one obtains the fundamental (36) where is the maximum value of the modulus of the second and is the distance between the midpoints derivative of and the last segment using of the rst segment using (37) (45) The maximum SFDR is achieved by choosing the value that minimizes the maximum amplitude of harmonics , given by (35). The three harmonics from (37), and , , and are linearly dependent from . The best solution is found when . One has (38) (39) The above equation can also be expressed in terms of as follows: and (48) By substituting (46) and (47) into (48), we have (40) the last equation can be simplied as follows: (41) In order to obtain the optimal coefcients and for a and given decomposition , , we rst compute according to (36) and (38). The other and values are calculated by using (26), (27), (32), and (33). Then, the linear equations system (9) and (13) is resolved to obtain optimal , (49) The last condition is satised when (50) In general, it can be shown that the error component due to is smaller than when (50) is imposed in addition to the following inequalities: (51) Substituting, we obtain (46) Following the analysis of [17], the algorithmic error of the BTM can be bounded as (47) From our simulations, we have noted that the optimal MTM decompositions always verify the following inequality: The corresponding slope error is given by . The error on the computed function is hence (43) The value of approximation can be estimated with the help of Taylor
for From (32) and (33), it can easily be seen that for with
(33)
(34)
(44)
For
2127
REFERENCES [1] J. Tierney, C. M. Rader, and B. Gold, A digital frequency synthesizer, IEEE Trans. Audio Electroacoust., vol. AU-19, no. 1, pp. 4857, Mar. 1971. [2] B. G. Goldberg, Digital Frequency Synthesis Demystied. Eagle Rock, VA: LLH Technology, 1999. [3] V. F. Kroupa, Direct Digital Frequency Synthesizer. New York: IEEE Press, 1998. [4] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Design and Applications. Norwell, MA: Kluver, 2001. [5] J. M. P. Langlois and D. Al-Khalili, Phase to sinusoid amplitude conversion techniques for direct digital frequency synthesis, Inst. Proc. Elect. Eng. Circuits Devices Syst., vol. 151, no. 6, pp. 519528, Dec. 2004. [6] H. T. Nicholas, III and H. Samueli, A 150-MHz direct digital frequency synthesizer in 1.25-micron CMOS with90 dBc spurious performance, IEEE J. Solid-State Circuits, vol. 26, no. 12, pp. 19591969, Dec. 1991. [7] A. Madisetti, A. Y. Kwentus, and A. N. Willson, A 100-MHz, 16-b, direct digital frequency synthesizer with a 100-dBc spurious-free dynamic range, IEEE J. Solid-State Circuits, vol. 34, no. 8, pp. 10341043, Aug. 1999. [8] F. Curticapean, K. I. Palomaki, and J. Niittylahti, Quadrature direct digital frequency synthesizer using angle rotation algorithm, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2003, vol. II, pp. 8184. [9] Y. Song and B. Kim, Quadrature direct digital frequency syntesizer using interpolation based angle rotation, IEEE Trans. Very Large-Scale Integr. (VLSI) Syst., vol. 12, no. 7, pp. 701710, Jul. 2004. [10] J. M. P. Langlois and D. Al Khalili, Novel approach to the design of direct digital frequency synthesizers based on linear interpolation, IEEE Trans. Circuits Sys. II, Analog Digit. Signal Process., vol. 50, no. 9, pp. 567578, Sep. 2003. [11] D. De Caro, E. Napoli, and A. G. M. Strollo, Direct digital frequency synthesizers with polynomial hyperfolding technique, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 7, pp. 337344, Jul. 2004. [12] A. Bellaouar, M. S. OBrecht, A. M. Fahim, and M. I. Elmasry, Lowpower direct digital frequency synthesis for wireless communications, IEEE J. Solid-State Circuits, vol. 35, no. 3, pp. 385390, Mar. 2000. [13] J.-S. Wang, S.-J. Lin, and C. Yeh, A low-power high-SFDR CMOS direct digital frequency synthesizer, in Proc. IEEE Circuits Syst. Symp., May 2005, vol. 2, pp. 16701673. [14] D. De Caro and A. G. M. Strollo, High performance direct digital frequency synthesizers using piecewise polynomial approximation, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 2, pp. 324337, Feb. 2005. [15] D. De Caro and A. G. M. Strollo, High performance direct digital frequency synthesizers in 0.25 m CMOS using dual-slope approximation, IEEE J. Solid-State Circuits, vol. 40, no. 11, pp. 22202227, Nov. 2005. [16] B. D. Yang, J. H. Choi, S. H. Han, L. S. Kim, and H. K. Yu, An 800 Mhz low-power direct digital frequency synthesizer with on-chip D/A converter, IEEE J. Solid-State Circuits, vol. 39, no. 5, pp. 761774, May 2004. [17] J. E. Stine and M. J. Schulte, Approximating elementary functions with symmetric bipartite tables, IEEE Trans. Comput., vol. 48, no. 8, pp. 842847, Aug. 1999. [18] F. De Dinechin and A. Tisserand, Multipartite table methods, IEEE Trans. Comput., vol. 54, no. 3, pp. 319330, Mar. 2005. [19] A. G. M. Strollo, D. De Caro, and N. Petra, A 630 MHz, 76 mW direct digital frequency synthesizer using enhanced ROM compression technique, IEEE J. Solid-State Circuits, vol. 42, no. 2, pp. 350360, Feb. 2007. [20] J. Vankka, Methods of mapping from phase to sine amplitude in direct digital synthesis, IEEE Trans. Ultrasonic Ferroelectr. Freq. Control, vol. 44, no. 2, pp. 526534, Mar. 1997. [21] J. Vankka, L. Lindemberg, and K. Halonen, Direct digital synthesizer with tunable phase and amplitude error feedback structures, Proc. IEE Circuits Devices Syst., vol. 151, no. 6, pp. 529535, Dec. 2004. [22] F. Curticapean and J. Niittylahti, A hardware efcient direct digital frequency synthesizer, in Proc. IEEE Int. Conf. Electron., Circuits Syst., Sep. 25, 2001, pp. 5154. [23] F. Curticapean and J. Niittylahti, Low power direct digital frequency synthesizer, in Proc. 43rd IEEE Midwest Symp. Circuits Syst., Lansing, MI, Aug. 911, 2000, pp. 822825.
[24] S. Liao and L.-G. Chen, A low-power low-voltage direct digital frequency synthesizer, in Proc. Int. Symp. VLSI Technol., Syst., Applications, Jun. 1997, pp. 265269. [25] J. F. Ardekani, booth encoded multiplier generator using optimized Wallace trees, IEEE Trans. Very Large-Scale Integr. (VLSI) Syst., vol. 1, no. 2, pp. 120125, Jun. 1993. [26] J. M. P. Langlois and D. Al Khalili, Low power direct digital frequency synthesizer in 0.18 m CMOS, in Proc. Custom Integr. Circuits Conf., Sep. 2003, pp. 2124. [27] Y. Song and B. Kim, A 14-b direct digital frequency synthesizer with sigma-delta noise shaping, IEEE J. Solid-State Circuits, vol. 39, no. 5, pp. 847851, May 2004. [28] B. Parhami, Computer Arithmetic, Algorithms and Hardware Designs. New York: Oxford Univ. Press, 2000.
M2N
Davide De Caro (M05) received the M.S. degree in electronic engineering (with honors) and the Ph.D. degree in electronic engineering and computer science from the University of Napoli Federico II, Naples, Italy, in 1999 and 2003, respectively. Since March 2003, he has been a Researcher with the Department of Electronics and Telecommunication Engineering, University of Naples, Naples, where he is involved with high-performance ip-ops (including both low-power and high-speed structures), VLSI implementation of arithmetic circuits, direct digital frequency synthesizers, and digital mixers. He is the author or coauthor of more than 30 technical papers on international journals and refereed international conferences. He has acted as a reviewer for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, the IEEE TRANSACTIONS ON VERY LARGE-SCALE INTEGRATED (VLSI) SYSTEMS, and the IEEE JOURNAL OF SOLID-STATE CIRCUITS.
Nicola Petra (M08) received the Laurea degree (summa cum laude) and the Ph.D. degree from the University of Napoli Federico II, Naples, Italy, in 2002 and 2007, respectively. His research interests include design of digital VLSI circuits for telecommunications and high-performances arithmetic circuits. He is now a Researcher with the Department of Electronics and Telecommunications Engineering, University of Napoli Federico II. He has acted as a reviewer for the IEEE TRANSACTIONS ON VLSI SYSTEMS.
Antonio G. M. Strollo (SM06) received the Laurea degree (cum laude) in electronic engineering and the Ph.D. degree in electronic engineering and computer science from the University of Napoli Federico II, Naples, Italy, in 1988 and 1992, respectively. From 1990 to 1998, he was a Research Assistant with the Department of Electronic Engineering, University of Napoli, Naples. In November 1998, he was appointed an Associate Professor with the University of Napoli Federico II and has been a full Professor since November 2002. His initial research activities covered the area of power electronics. In this eld, he has worked on switching power converter simulation, modeling and simulation of power devices (SIT, IGBT, superjunction), SPICE modeling of PiN diodes and IGBTs, characterization techniques, electro-thermal modeling of power devices, and optimization techniques of power bipolar devices with local lifetime control. His current research interests include design and analysis of VLSI circuits. In particular, he is working on advanced architectures for direct-digital frequency synthesis, techniques for clock dithering in digital ASICs, high-performance arithmetic circuits, and high speed ip-ops. He has published more than 100 papers on international journals and refereed conferences.

04447243

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

04447243

Diunggah oleh

Hak Cipta:

Format Tersedia

2116

1549-8328/$25.00 2008 IEEE

Fig. 2. Sine generator architecture, exploiting half-wave sine symmetry.

Fig. 5. MTM input signal decomposition.

Fig. 6. Implementation of the MTM using two TOs.

distributed into the sum of

. Therefore, the TO can be smaller TOs: with (6)

TABLE III COMPARISON BETWEEN ROM SIZE IN RECENTLY PROPOSED DDFSS

TABLE IV IMPLEMENTATION RESULTS FOR OPTIMIZED DDFSS

TABLE V HIGH-SPEED DDFS IMPLEMENTATION RESULTS

TABLE VI COMPARISON BETWEEN RECENTLY PROPOSED DDFSS

, and values. Coefcients from (11).

are nally computed

Anda mungkin juga menyukai