Anda di halaman 1dari 10

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO.

1, JANUARY 2007

151

A 380 MHz Direct Digital Synthesizer/Mixer With Hybrid CORDIC Architecture in 0.25 m CMOS
Davide De Caro, Member, IEEE, Nicola Petra, Student Member, IEEE, and Antonio Giuseppe Maria Strollo, Senior Member, IEEE
AbstractThe paper describes the implementation of a 380 MHz, 13 bit, direct digital synthesizer/mixer IC in 0.25 m CMOS technology. The circuit employs an innovative architecture which divides the 4 rotation operation required in the quadrature synthesizer/mixers, in three rotations. The rst two rotations are implemented by using a CORDIC datapath completely realized in carry-save arithmetic. The directions of the CORDIC rotations are computed in parallel by using a little lookup table, for the rst rotation, and a multiply by constant and addition circuit for the second rotation. The nal (third) rotation is multiplier-based, in order to reduce the circuit latency and increase the circuit performances. The CORDIC datapath is implemented with a novel approach both at the algorithmic level and at the transistor level. At the algorithmic level the combined employ of sign-extension prevention, overow prevention and a novel rounding scheme are presented. At the transistor level a design style that jointly uses full-CMOS and DPL to improve the circuit latency is described. The overall circuit performances are very interesting. The synthesizer/mixer IC, realized in a 0.25 m CMOS technology, has an area occupation of 0.22 mm2 and dissipates 152 mW at 380 MHz with a supply voltage of 2.5 V. Index TermsAngle rotator, carry save arithmetic, CORDIC algorithm, digital downconverter, direct digital frequency synthesizer, digital mixer, digital synthesizer, digital tuner, digital upconverter, mixer, modulator, overow prevention, quadrature modulator, rounding.

Fig. 1. Synthesizer/mixer nonoptimal architectures: a) DDFS-based architecture; b) CORDIC-based architecture.

frequency synthesizer/mixer (DDFSM), which is in ubiquitous use for many communication subsystems such as tuners, derotators, up and down frequency converters (see [8], [35]). In addition, the quadrature mixer is the front-end of various modulation/demodulation schemes, such as binary phase shift keying (BPSK), quadrature phase shift keying (QPSK), and quadrature amplitude modulation (QAM) (see [35]). and , and The inputs of a DDFSM are two signals a frequency control word . The outputs of the system are computed according to the following equations: (1) where (2) The equations (1) and (2) correspond to a complex multiplication between an input vector in the complex plane, with coordinates , and an unitary modulus vector . A rst implementation for the DDFSM includes two distinct functional units [1]; see Fig. 1(a). The rst one is a direct digital frequency synthesizer (DDFS) [2], [3] that generates the seand . The second one is quences a complex multiplier, which uses four real multipliers, one adder and and one subtractor to generate the outputs according to (1). This implementation is generally nonoptimal [4], [8]. The DDFS is in fact a cumbersome circuit itself. Moreover, the complex multiplier does not exploit the property that is unitary. the modulus of one of the inputs A second possible implementation [5], [6] employs a simple overowing accumulator that generates the angle and a rotator using the CORDIC algorithm [7] to implement (1); see

I. INTRODUCTION

N RECENT years, there has been a growing trend in the communication technologies to shift from analog toward digital techniques. The use of digital techniques, in fact, overcomes many analog hardware limitations (like high sensitivity to process and temperature variations, difcult portability as the VLSI technology scales down, etc.). Moreover, the programmability offered by digital techniques provides exibility that is especially important in the context of rapidly evolving communication standards. Owing to advances in CMOS circuit performances, digital techniques are nowadays able of handling intermediate frequency (IF) or even low radio frequency (RF) tasks. One of the most basic building-block in this context is the direct digital

Manuscript received April 3, 2006; revised June 30, 2006. Chip fabrication was supported by MOSIS Research Educational Program. The authors are with the Department of Electronics and Telecommunication Engineering, University of Napoli Federico II, Napoli 80125, Italy (e-mail: dadecaro@unina.it). Digital Object Identier 10.1109/JSSC.2006.886527

0018-9200/$20.00 2007 IEEE

152

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

Fig. 1(b). Unfortunately, the CORDIC algorithm in its standard implementation is inherently slow, using many cascade computation stages. The recent approaches [8][11] overcome the limitations of the simple architectures of Fig. 1 by implementing the synthesizer/mixer as the cascade of two stages: a coarse angle rotation followed by a ne rotation stage. In [8][10] both the coarse rotation and the ne rotation employ a multiplier-based architecture, while the approach of [11] uses a CORDIC architecture for the coarse rotation and a multiplier-based ne rotation. The IC implementations [8], [10] are very effective, with high speed operation and reduced hardware complexity. Until now, no IC implementation exists of the mixed approach of [11]. This paper [12] introduces a novel combined approach, named Hybrid CORDIC, to realize a synthesizer/mixer. This approach splits the rotation required in the synthesizer/mixer circuit in three rotations. A rst rotation is performed by employing a CORDIC datapath in which the rotation direction are computed in parallel, by employing a lookup table. The second rotation is also CORDIC-based, with rotations directions computed in parallel analytically. The nal (third rotation) is multiplier-based. The parallel evaluation of the rotations directions allows an efcient use of the carry-save arithmetic in the CORDIC datapath of the rst two rotation blocks, without requiring additional carry-propagate adders (as in [19], [20]) or the introduction of additional CORDIC sub-rotations (as in [21]). The nal multiplier-based rotation allows to reduce the overall number of pipelining levels and the circuit complexity as well. At the transistor level, a novel approach, which combines full-CMOS and double-pass-transistor logic (DPL) [30] design styles, is presented to implement the CORDIC datapath. The paper is organized as follows. Section II introduces the top level implementation of the synthesizer/mixer. Section III discusses the algorithmic aspects of the novel Hybrid CORDIC architecture. Section IV highlights the main advantages of the Hybrid CORDIC architecture in comparison to the state-of-the-art architectures. The carry-save implementation of the CORDIC stages is discussed in Section V, while the mixed CMOS-DPL design style is presented in Section VI. In Section VII the prototype IC, realized in 0.25 m CMOS technology, is presented, and the experimental results are compared to the state-of-the-art implementations.

Fig. 2. Top-level architecture of the designed DDFSM IC. (=4) 1 2 .

'

is given by

perform a rotation by an angle binary fractional value in [0, 1]:

represented with a

(3) The least signicant bit of has a weight that will be indicated . in the following as The other minor subsystems in Fig. 2 (1s complementer, swappers and 2s complementers controlled by control logic) employ the symmetries of the sine and cosine functions [8], [10] to perform the complete rotation in the full interval. It is phase shift in worth to highlight that introducing of a the rotator block, it is possible to completely eliminate the error due to the employ of a simple 1s complementer to evaluate the angle (see [2], [3], [16]). III. HYBRID CORDIC ROTATOR ALGORITHM The architecture of the Hybrid Cordic rotator is shown in by the angle Fig. 3. The circuit rotates its input vector . The rotation is performed in three steps. The rst two steps are performed with a CORDIC datapath, while the nal step is realized by using two multipliers. A. First Rotation In the rst step, the angle , where is divided in two sub-words

(4) II. SYNTHESIZER/MIXER BASIC ARCHITECTURE The top-level architecture of the designed DDFSM IC is shown in Fig. 2. The circuit is sized in order to exhibit a 90 dBc spurious free dynamic range (SFDR). The input word-length is 12 bit while the output word-length is 13 bit. The 32 bit , phase accumulator generates the rotation angle represented with a binary fractional value in [0, 1]. The rotation angle is truncated to 16 bit, introducing output spurs that are below the 90 dBc SFDR constraint. The heart of the circuit is the Hybrid CORDIC rotator block. This block is able to (5) and is the complement of . The goal of the rst stage is to perform a rotation by an angle . To that purpose, the rst rotation block close to uses the CORDIC algorithm, described by the following equations:

(6)

DE CARO et al.: A 380 MHz DIRECT DIGITAL SYNTHESIZER/MIXER WITH HYBRID CORDIC ARCHITECTURE IN 0.25 m CMOS

153

Fig. 3. Hybrid CORDIC rotator architecture.

where is equal to . The algorithm starts with , and . To simplify hardware implementation, only four CORDIC sub-rotations are performed . in (6), resulting in a rotation by an angle From the CORDIC algorithm properties, it can be easily shown that the absolute value of the residual angle is upper bounded by . Therefore, by choosing four rotations in the rst stage, we have about the same maxand [see (5)]. imum absolute value for both of the rst rotation in (6) is xed The direction since . The directions of the remaining rotations depend only on . These directions, therefore, can be precomputed, by using (6), and stored in the lookup table shown in Fig. 3. The lookup table is very small, having three . The residual angle , similarly address bits , thereto values, depends only on , , . Also fore, can be stored in the lookup table. Finally, let us note that the four CORDIC sub-rotations (6) amplify the modulus of the input vector by a factor (7) The amplication is inconsequential in many applications [4][6], [11] and is not compensated in the proposed approach. B. Second Rotation In order to complete their operation, the second and third stages of the Hybrid CORDIC architecture rotate the vector (the output of the rst stage) by an angle (8) The angle is computed by using the multiplier and the multiplier is needed to calcuadder shown in Fig. 3. The late from its scaled representation; see (5). Since, as we have observed before, the absolute values of and are both , the absolute value of is lower than . By lower than representing with 11 bits, we have (9) A phase quantization error in the range is introduced in (9). This results in an maximum error at the , outputs of the DDFSM equal to . This value is much lower than the weight of the less signicant bit at . the outputs of the DDFSM

The angle is then split as the sum of two sub-angles , where (10) (11) The second rotation block is aimed to perform the rotation by the angle , whereas the rotation by the angle is assigned to the nal rotation block. In the second rotation we employ a CORDIC algorithm computation. The rotation directions are without the as follows: obtained directly by the bits of for The corresponding CORDIC equations are (13) is the output of the rst rotation where stage; see Fig. 3. The actual rotation angle obtained with (13) is not exactly the , given by required angle but is instead an angle (14) From (10), (12) the angle can be written as (15) As a consequence, the second rotation block introduces a phase : error, (12)

(16) With simple manipulations, it is possible to show that upper bounded by is

(17) The phase error of the second rotation introduces an error on each component of the DDFSM output. From (17), is much . lower than the weight of the output LSB

154

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

TABLE I PERFORMANCES OF THE PROPOSED ARCHITECTURE

Like the rst rotation block, also the CORDIC rotations (13) amplify the modulus of the input vector, by a factor

(18) Therefore, the total amplication factor is (19) C. Final (Third) Rotation The nal rotation block in Fig. 3 implements the rotation by . The operation to be performed by this block can be written as (20) This nal rotation could also be computed by using the CORDIC algorithm. However, as observed in [17] and [18], when the rotation angle is small a complex multiplier is able to reduce the latency and improve the performances. . ThereIn our case, the absolute value of is lower than fore, we can approximate sine and cosine functions in (20) as (21) In this way, the nal rotation is realized without the need of lookup tables to store sine and cosine values. The approximation (21) introduces an error on the DDFSM and . It can be easily shown that this error outputs component is upper bounded by . As shown in Fig. 3, we have introduced two rounders in the nal rotation stage, to reduce the wordlength of multipliers input. The two rounders introduce an additional error at the output. We have . An analytical derivation of the joined effect of all algorithmic and quantization errors is not easy. We performed bit-level simulations, by operating the DDFSM in two modes. In DDS mode and so that the circuit generates two quadrature sinewave outputs. In SSB mode a sinusoidal input is applied to the DDFSM, that operates as a digital upconverter with image rejection. Table I summarizes the performances of the developed architecture. IV. COMPARISON WITH STATE-OF-THE-ART APPROACHES The main advantage of the proposed Hybrid CORDIC architecture is the parallel computation of the rotations directions and . This computation is performed with a small lookup table,

a multiplier by constant and an adder. Therefore, simple and effective carry-save [31] implementation for the datapaths can be used, avoiding the speed penalties due to carry propagation [5]. Previously proposed carry-save CORDIC architectures require a datapath, and also additional carry-propagate adders to determine rotations directions [19], [20]. Other techniques do not include carry-propagate adders, but require the introduction of extra rotations [21]. The rst two CORDIC rotation blocks in our architecture resemble the algorithms proposed in [22]. However, in the partitioned Hybrid CORDIC algorithm of [22] the partitioning and the handling of the rotation angle would require a huge lookup table for its implementation. On the other hand, the mixed Hybrid CORDIC algorithm, also proposed in [22], does not partition the rotation angle. Therefore, its implementation requires in the rst stage either a full datapath or a lookup table addressed by all the bits of the rotation angle. The solution of [11] uses two rotation stages. The rst one is a CORDIC rotator, while the second one is multiplier-based (as originally proposed in [17]). The CORDIC rotator of [11] uses a number of stages comparable to the overall stages used in the rst and second block of our architecture. The use in [11] of a single CORDIC rotator, however, requires a lookup table much larger than the one used in our architecture. The recently proposed DDFSM implementations [8][10] use an architecture composed by two multiplier-based rotation stages. These architectures require a total of 8 small-width multipliers. The experimental results shown in the following demonstrate that the Hybrid CORDIC architecture is more effective, especially in terms of power and area occupation. V. HYBRID CORDIC IMPLEMENTATION The most critical subsystem in the Hybrid CORDIC architecture of Fig. 3 are the CORDIC stages. In fact, the lookup table is very small and can be effectively be synthesized as random multiplier requires only the sum of few partial logic. The products that can easily be merged with the adder needed to compute in a single summation tree. The nal rotation of the Hybrid CORDIC architecture of Fig. 3 uses multiply-accumulate circuits also realized with a single summation tree. The sign-extension prevention technique [23] has been used to realize the subtraction needed to . compute A. Carry-Save Implementation of the Cordic Stages An innovative architecture has been devised to implement the rst and second CORDIC rotation stages. The basic equation to implement the CORDIC stages is (22) where is the direction of the CORDIC sub-rotation, while represents the order of the sub-rotation. The (22) implements the computation in (6), (13). The computation can be easily and in (22) and changing the sign obtained by swapping of . Since, in our architecture, the CORDIC rotations directions are efciently evaluated in parallel, the implementation was performed by using carry-save arithmetic. Rewriting (22)

DE CARO et al.: A 380 MHz DIRECT DIGITAL SYNTHESIZER/MIXER WITH HYBRID CORDIC ARCHITECTURE IN 0.25 m CMOS

155

Fig. 4. Detailed implementation of the rst and second rotation blocks with carry-save arithmetic. The datapath is built by one wiring block and six CORDIC sub-rotations driven by the directions  . . .  ,  . . .  .

Fig. 5. Optimized bit-level implementation in carry-save arithmetic of the elementary stage (23); i is the order of the elementary stage.

in carry-save [31] form, we obtain the main equation to be implemented:

(23) Fig. 4 shows the detailed carry-save datapath of the seven CORDIC stages needed in the architecture of Fig. 3. , inputs of the circuit of Fig. 4, are in twos comThe plement representation. The rst two blocks in Fig. 4 (labeled wiring) implement the rst CORDIC sub-rotation with a xed , see Section III). These blocks are also in direction ( charge of the conversion from twos complement to carry-save representation and therefore can be realized by simple wiring and complementations, without additional logic. The six remaining CORDIC sub-rotations are implemented by using the elementary stages in Fig. 4. Each elementary stage implements (23). The wordlength of the , signals inside the datapath of Fig. 4 is increased by 2 LSBs (in order to reduce the overall error introduced by the CORDIC elaboration) and by 1 MSB (to avoid overow). The two nal vector merging adders (VMAs), in Fig. 4, convert the result to twos complement representation. Rounding is

, sigalso performed in the VMAs to provide the nal nals with a wordlength of 13 bits. Fig. 5 shows the terms to be added to implement (23) at the bit level. In this gure, is the binary value associated to ( if and for ). Fig. 5 highlights the use of both the sign-extension prevention of [23] and the overow prevention of [19]. Both techniques allow to reduce the circuit complexity with respect to simpler carry-save approaches [6]. In order to implement the two subtractions of (23) the bits of and are XORed with . Moreover, a twos complement LSB) is constant (the bit equal to s in the column of weight also added. The rounding constant has been computed in order to minimize the rounding error. For all elementary stages, but the one marked with a star in Fig. 4, the rounding error is minimized if and if . Therefore, when the sum of the twos complement constant and is equal to LSB. For the elementary stage indicated with a star in Fig. 4, the optimal rounding constant is zero. B. Elementary Stage Implementation Fig. 6 shows that the terms of Fig. 5 can be summed with a single row of 4-2 compressors [24]. Besides these blocks, the

156

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

Fig. 6. Implementation of the i-th order elementary stage by using 4-2 compressors and half-adders. For the elementary stage marked with a star in Fig. 4, k = s and k = s. For the other elementary stages k = 0 and k = 1.

Fig. 7. Implementation of ha0 (a) and ha1 (b) half-adder circuits.

circuit requires half adders ( and for the compression of the MSBs) and XOR gates (for conditional complementing). circuits are the traditional half adders which compute The . The circuits, instead, compute . These blocks allow the summation of the sign-extension prevention constant (see Fig. 5) without requiring additional hardware. circuit is well known [34]. An effective implemenThe tation in CMOS logic (derived from the 28T full-adder [34]) is circuit is described by the following shown in Fig. 7(a). The equations: (24) and is implemented as shown in Fig. 7(b). It is interesting to observe, in Fig. 6, that the employ of the sign-extension prevention allows the use of a couple of half adder circuits in place of a single 4-2 compressor, to compute the most signicant bits. The most efcient realizations of the 4-2 compressor [25][28] requires about 60 MOS transistors, while the couple of half adder circuits, realized as shown in Fig. 7 require only a total of 28 transistors. The sign-extension prevention technique is, therefore, able to provide a very low circuit complexity. The number of 4-2 compressors decreases with the increase of the order of the stage and, in our approach, this results in a substantial gain in area.

The timing performances of the elementary stage shown in Fig. 6 are limited by two critical paths. The rst timing critical circuit, shown in Fig. 8(a), is composed by a 4-2 compressor with two inputs conditionally complemented. The best available implementations of the 4-2 compressor [27], [28] provide a delay of three XOR gates, and include a total of four XOR gates plus two multiplexers. Therefore, a straightforward implementation of the circuit of Fig. 8(a) would require a maximum delay of four XOR gates. An optimized implementation of this rst timing critical circuit can be obtained by embedding the two XOR gates driven by in the 4-2 compressor. This is not straightforward, since (due to redundancy of the carry-save arithmetic) different Boolean equation sets exist which provide the same arithmetic function of a 4-2 compressor. We have found that an optimal solution can be obtained starting from the Boolean equations set of the 4-2 compressor introduced by Ghosh et al. [29], and embedding the XOR gates in the circuit, as shown in the following equations:

(25) Fig. 8(b) shows the gate level implementation of (25). Our circuit exhibits only three XOR gates on the critical path, highlighting an evident advantage in terms of speed with respect to the implementation of Fig. 8(a). Moreover, the circuit of Fig. 8(b), requiring a total of ve XOR gates plus two multiplexers, results in one less XOR gate with respect to the implementation of Fig. 8(a) using the state-of-the-art 4-2 compressor of Hsiao et al. [28]. Let us now consider the second timing critical circuit of Fig. 9(a), corresponding to the overow prevention logic, on the left-hand side of Fig. 6. A straightforward implementation of the circuit would present four gates on the critical path (by assuming the delay of an half adder comparable to the delay of an XOR gate). An optimized implementation, with a delay of three XOR gates can be obtained by exploiting the redundancy of carry-save arithmetic. In fact, the two half adders surrounded

DE CARO et al.: A 380 MHz DIRECT DIGITAL SYNTHESIZER/MIXER WITH HYBRID CORDIC ARCHITECTURE IN 0.25 m CMOS

157

Fig. 8. Optimal implementation of the rst timing critical block in Fig. 6. (a) Logical function. (b) Detailed implementation with simple gates.

Fig. 9. Optimal implementation of the second timing critical block in Fig. 6. (a) Logical function. (b) Detailed implementation with simple gates.

by the dashed line in Fig. 9(a) are described by the following Boolean equations:

(26) . By exploiting the redundancy of the where carry-save arithmetic, we can rewrite the Boolean equations of this block, preserving its arithmetic function, as follows: (27) Proceeding in a similar way for the second column of half adders in Fig. 9(a), with some manipulations, we obtain for the whole circuit of Fig. 9(a) the following equivalent equations:

(28) where ; . The resulting circuit is shown in Fig. 9(b), where the critical path from all inputs to all outputs is composed of three gates (two XOR and one multiplexer or two XOR and one NAND gate). to all outputs is two gates (one XOR The worst delay from and one multiplexer). Since the input arrives with a delay of one gate [see Fig. 6 and Fig. 7(b)], this path results again in a total delay of three gates. VI. MIXED CMOS-DPL IMPLEMENTATION In order to simplify IC design, the DDFSM has been implemented by using a standard cell approach, with automatic

place and routing. To optimize performances special purpose cells were designed to implement the timing critical circuits of Fig. 8(b) and Fig. 9(b). These circuits, being composed mainly by XOR gates and multiplexers, are well suited for a pass transistor logic implementation. Having high speed operation as our main target, we employed the double-pass-transistor (DPL) logic style [30], as shown in Fig. 10. DPL is a double-rail logic. In the developed cells, each input is converted from single to dual rail by using a couple of inverters. In this way passgate inputs, that are not suited for the timing models used by the timing analysis tools, are also avoided. The inverters 15 in the circuit of Fig. 10(b) and the inverters 16 in Fig. 10(c) increase the circuit speed by limiting the maximum number of series transistors. Moreover, the inverters 12 in Fig. 10(b) make the propagation delay of the Carry output independent from the capacitive load on the Sum output. A similar consideration applies to the inverters 12 and 34 in Fig. 10(c). In this way the developed DPL circuits are fully compatible with the other full-CMOS standard cells of the library. It is worth noting that not all gates have to be dual rail. The gates which drive the outputs, both in Fig. 10(b) and in Fig. 10(c), can be single rail. Also the XOR gate which drives in Fig. 10(b) can the single rail multiplexer that calculates be implemented in a single rail style. The number of transistors, the propagation delay and the power dissipation obtained by employing the proposed DPL-CMOS design style are reported in Table II. For comparison, the same table reports also the performances achievable by using a standard cell library with only full-CMOS logic,

158

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

Fig. 10. Transistor level implementation of the special cells of Fig. 8(b) and Fig. 9(b). (a) Basic gates implementation. (b) DPL implementation of the circuit of Fig. 8(b). (c) DPL implementation of the circuit of Fig. 9(b).

TABLE II SIMULATED PERFORMANCES OF DIFFERENT CORDIC STAGES CONFIGURATIONS BY EMPLOYING PROPOSED DPL-CMOS AND FULL-CMOS STYLES

TABLE III EXPERIMENTAL PERFORMANCES OF THE SYNTHESIZER/MIXER

VII. EXPERIMENTAL RESULTS The DDFSM with the optimized carry-save CORDIC architecture and the mixed CMOS-DPL design style has been fabricated on a test chip (see Fig. 11) in a 2.5 V, 0.25 m CMOS technology. The DDFSM has been synthesized from a VHDL description, and has been automatically placed and routed by using a commercial tool. The DDFSM accepts a 32 bit frequency control word, resulting in a frequency resolution of about 0.088 Hz. Table III summarizes the main characteristics of the circuit. Fig. 12 reports the experimental digital output spectrum of the , ), DDFSM when operated in DDS mode ( showing an SFDR larger than 93 dBc. Besides the DDFSM the chip includes a built in self test structure (SA Mixer) and two programmable ring oscillators (RO Fast and RO Slow) to make the measurement of DDFSM maximum clock frequency and power dissipation easier. Also included in the chip it is a DDFS which can provide inputs to the synthesizer/mixer to test the single and double sideband modulation functionality of the circuit.

without special cells. All stages have been designed for a 0.25 m, 2.5 V technology. The analysis of Table II reveals that proposed design style allows about a 35% reduction of the propagation delay by providing about the same number of transistors and power dissipation of the full-CMOS realization.

DE CARO et al.: A 380 MHz DIRECT DIGITAL SYNTHESIZER/MIXER WITH HYBRID CORDIC ARCHITECTURE IN 0.25 m CMOS

159

TABLE IV COMPARISON WITH RECENTLY PROPOSED DESIGNS

dBc, requires about a 2.32 times larger area with respect to our implementation. Table IV shows, moreover, that our circuit is able to work correctly up to 385 MHz, whereas the best result obtained in literature was 330 MHz. VIII. CONCLUSION The paper has described in detail the implementation of an high performances synthesizer/mixer IC which exploits improvements at the algorithmic, architectural and transistor levels. rotation In the novel synthesizer/mixer architecture, the operation has been split in three rotations. The rst two rotations use a CORDIC datapath completely realized in carry-save arithmetic. This is possible since the directions of the CORDIC rotations are computed in parallel by using a little lookup table in the rst rotation and a fast multiply by constant and addition circuit in the second rotation. The nal (third) rotation is multiplier-based, in order to reduce the circuit latency and increase the circuit performances. The CORDIC datapath has been realized in carry-save arithmetic. In this datapath the combined employ of sign extension prevention, overow prevention and a novel rounding scheme have been presented. At the transistor level a design style that jointly uses full CMOS and DPL to improve the circuit latency has also been described. The realized synthesizer/mixer IC shows very good performances in terms of power dissipation, area and maximum clock frequency. REFERENCES
Fig. 12. Output spectrum of the DDFSM in DDS mode (X f = 380 MHz.

Fig. 11. Test chip realized in CMOS 0.25 m technology. The chip includes our optimized synthesizer/mixer (Synth/Mixer) a DDFS, two ring oscillators (RO Fast and RO Slow) and the built-in self-test logic (SA Mixer) for easy circuit testing.

= 01, Y = 0);

Table III also reports the experimental performances of the developed DDFSM. The circuit exhibits a very low power dissipation (0.40 mW/MHz) with a maximum clock frequency slightly lower than 400MHz. The experimental performances of the proposed circuit are compared in Table IV with the performances of the best DDFSMs available in literature based on two stage multiplier architecture and implemented with the same 0.25 m technology. It can be observed that the developed architecture allows a more than three-fold reduction of power dissipation, with also a substantial reduction in the silicon area with respect to [8]. The circuit in [10], while able to reach a SFDR of 100

[1] L. K. Tan and H. Samueli, A 200 MHz direct digital synthesizer/mixer in 0.8 m CMOS, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 193200, Mar. 1995. [2] B. G. Goldberg, Direct Digital Frequency Synthesis Demystied. Eagle Rock, VA: LLH Technology Publishing, 1999. [3] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Design and Applications. Norwell, MA: Kluwer Academic, 2001. [4] S. Nahm, K. Han, and W. Sung, A CORDIC-based digital quadrature mixer: comparison with a ROM-based architecture, in Proc. IEEE ISCAS, 1998, pp. 385388. [5] G. Gielis, R. Van de Plassche, and J. Van Valburg, A 540 MHz 10-b polar to Cartesian converter, IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 16451650, Nov. 1991. [6] Y. Ahn, S. Nahm, and W. Sung, VLSI design of a CORDIC-based derotator, in Proc. IEEE ISCAS, 1998, pp. 449452. [7] J. E. Volder, The CORDIC trigonometric computing technique, IRE Trans. Electron. Comput., vol. EC-8, no. 3, pp. 330334, Sep. 1959. [8] A. Torosyan, D. Fu, and A. N. Wilson, A 300 MHz direct digital synthesizer/mixer in 0.25 m CMOS, IEEE J. Solid-State Circuits, vol. 38, no. 6, pp. 875887, Jun. 2003. [9] D. Fu and A. N. Wilson, A high-speed processor for digital sine/cosine generation and angle rotation, in Proc. 42nd Asilomar Conf. Signal, Systems and Computers, 1998, vol. 1, pp. 177181.

160

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007

[10] Y. Song and B. Kim, A quadrature digital synthesizer/mixer architecture using ne/coarse coordinate rotation to achieve 14-b input, 15-b output, and 100-dBc SFDR, IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 18531861, Nov. 2004. [11] F. Curticapean and J. Niittylahti, An improved digital quadrature frequency down-converter architecture, in 35th Asilomar Conf. Signals, Systems and Computers, Nov. 2001, pp. 13181321. [12] D. De Caro, N. Petra, and A. G. M. Strollo, A 380 MHz, 150 mW direct digital synthesizer/mixer in 0.25 m CMOS, in IEEE ISSCC Dig. Tech. Papers, 2006, pp. 258259. [13] H. T. Nicholas and H. Samueli, An analysis of the output spectrum of direct digital frequency synthesizers in the presence of phase accumulator truncation, in Proc. 41st Annu. Frequency Control Symp., May 1987, pp. 495502. [14] A. Torosyan and A. N. Willson, Jr., Analysis of the output spectrum for direct digital frequency synthesizers in the presence of phase truncation and nite arithmetic precision, in Proc. 21th Symp. Image and Signal Processing and Analysis, 2001, pp. 458463. [15] F. Curticapean and J. Niittylahti, Exact analysis of spurious signals in direct digital frequency synthesizers due to phase truncation, Electron. Lett., vol. 39, no. 6, pp. 499501, Mar. 2003. [16] J. Vankka, Methods of mapping from phase to sine amplitude in direct digital synthesis, IEEE Trans. Ultrason. Ferroelectr. Freq. Contr., vol. 44, no. 2, pp. 526534, Mar. 1997. [17] H. M. Ahmed, Signal processing algorithms and architectures, Ph.D. dissertation, Dept. Electr. Eng., Stanford Univ., Stanford, CA, Dec. 1981. [18] , Efcient elementary function generation with multipliers, in Proc. 19th Symp. Computer Arithmetic, Sep. 1989, pp. 5259. [19] T. G. Noll, Carry-save arithmetic for high speed digital signal processing, in Proc. IEEE ISCAS, 1990, pp. 982986. [20] N. Takagi, T. Asada, and S. Yajima, Redundant CORDIC methods with a costant scale factor for sine and cosine computation, IEEE Trans. Comput., vol. 40, no. 9, pp. 989995, Sep. 1991. [21] T. B. Juang, S. F. Hsiao, and M. Y. Tsai, Para-CORDIC: parallel CORDIC rotation algorithm, IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 51, no. 8, pp. 15151524, Aug. 2004. [22] S. Wang, V. Piuri, and E. E. Swartzlander, Hybrid CORDIC algorithms, IEEE Trans. Comput., vol. 46, no. 11, pp. 12021207, Nov. 1997. [23] J. F. Ardekani, M N booth encoded multiplier generator using optimized wallace trees, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 1, no. 2, pp. 120125, Jun. 1993. [24] M. Nagamatsu, S. Tanaka, J. Mori, K. Hirano, T. Noguchi, and K. Hatanaka, A 15-ns 32 32-b CMOS multiplier with an improved parallel structure, IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 494497, Apr. 1990. [25] J. Mori, M. Nagamatsu, M. Hirano, S. Tanaka, M. Noda, Y. Toyoshima, K. Hashimoto, H. Hayashida, and K. Maeguchi, A 10-ns 54 54-b parallel structured full array multiplier with 0.5-m CMOS technology, IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 600606, Apr. 1991. [26] G. Goto, T. Sato, M. Nakajima, and T. Sukemura, A 54 54-b regularly structured tree multiplier, IEEE J. Solid-State Circuits, vol. 27, no. 9, pp. 12291236, Sep. 1992. [27] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki, and Y. Nakagome, A 4.4-ns CMOS 54 54-b multiplier using pass-transistor multiplexer, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 251257, Mar. 1995. [28] S. F. Hsiao, M. R. Jiang, and J. S. Yeh, Design of high speed lowpower 3-2 counter and 4-2 compressor for fast multipliers, Electron. Lett., vol. 34, no. 4, pp. 341342, Feb. 1998. [29] D. Ghosh, S. K. Nandy, and K. Parthasarathy, TWTXBB: a low latency, high throughput multiplier architecture using a new 4-2 compressor, in Proc. 7th Int. Conf. VLSI Design, Jan. 1994, pp. 7782. [30] M. Suzuki, N. Ohkubo, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki, and Y. Nakagome, A 1.5-ns 32-b CMOS ALU in double pass-transistor logic, IEEE J. Solid-State Circuits, vol. 28, no. 11, pp. 11451151, Nov. 1993.

[31] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. Oxford, U.K.: Oxford Univ. Press, 1999. [32] Y. H. Hu, The quantization effects of the CORDIC algorithm, IEEE Trans. Signal Process., vol. 40, no. 4, pp. 834844, Apr. 1992. [33] S. Y. Park and N. I. Cho, Fixed-point error analysis of CORDIC processor based on the variance propagation formula, IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 51, no. 3, pp. 573584, Mar. 2004. [34] N. H. E. Weste and K. Eshragian, Principles of CMOS VLSI Design. Reading, MA: Addison-Wesley, Jan. 1993, 0201533766. [35] J. Proakis, Digital Communications, 4th ed. New York: McGrawHill, Aug. 2000. Davide De Caro (M05) was born in Naples, Italy, on February 9, 1973. He received the M.S. degree in electronic engineering with honors in 1999, and the Ph.D. degree in electronic engineering and computer science in 2003, both from the University of Naples Federico II, Italy. He has worked in the area of digital integrated VLSI circuit design for the last eight years. Since March 2003, he has been a Researcher at the Department of Electronics and Telecommunication Engineering of the University of Naples, Italy, where he is working on high-performance ip-ops (including both low-power and high-speed structures), VLSI implementation of arithmetic circuits (squarers, xed-width multipliers, ReedSolomon decoders, Galois-eld multipliers), direct digital frequency synthesizers and digital mixers. Dr. De Caro is author or coauthor of more than 30 technical papers in international journals and refereed international conferences. He acted as a reviewer for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I and IEEE TRANSACTIONS ON VLSI SYSTEMS.

Nicola Petra (S05) was born in 1974 in Naples, Italy. He received the M.S. degree in electronic engineering with honors in 2002 from the University of Naples Federico II. He is presently working towards the Ph.D. degree at the Department of Electronics Engineering of the University of Naples Federico II. His research interests include design of digital VLSI circuits for telecommunications and high-performance arithmetic circuits.

Antonio Giuseppe Maria Strollo (M05SM06) was born in 1963. He received the Laurea degree (with honors) in electronic engineering in 1988, and the Ph.D. degree in electronic engineering and computer science, in 1992, both from the University of Naples Federico II, Italy. From 1990 to 1998, he was a full time Researcher at the Department of Electronic Engineering of the University of Naples. In November 1998, he was appointed Associate Professor at the University of Naples Federico II. Since November 2002, he has been Full Professor at the same University. Currently, he is the head of the Department of Electronic and Telecommunication Engineering of the University of Naples Federico II. His initial research activities covered the area of bipolar devices modelling and power electronics. His current research interests include design and analysis of VLSI digital circuits. In particular, he is working on: advanced architectures for direct-digital frequency synthesis and for digital mixers, high-performance arithmetic circuits and high performance and low-power ip-ops. He has authored or co-authored more than 100 papers on international journals and refereed conferences.

Anda mungkin juga menyukai