Anda di halaman 1dari 5

A Digital Clock Multiplier for Globally Asynchronous Locally Synchronous Designs

Thomas Olsson, Peter Nilsson, and Mats Torkelson. Dept of Applied Electronics, Lund University. P.O. Box 118, SE-22100, Lund, Sweden. Email: Thomas.Olsson@tde.lth.se, Peter.Nilsson@tde.lth.se, Mats.Torkelson@tde.lth.se.

ABSTRACT For large high-speed globally synchronous ASICs, designing the clock distribution net becomes a troublesome task. Besides problems caused by clock skew, the clock net also is a major source of power consumption. Partitioning the design into locally clocked blocks reduces clock skew problems and if handled correctly it also reduces power consumption. However, to achieve these positive effects, the blocks need on-chip clock generators having properties as small area and low power consumption. Therefore, a low power, high frequency, small area digitally controlled on-chip clock generator is designed and fabricated. INTRODUCTION On the path towards large single-chip solutions, many functional blocks come together on one die. With growing die sizes and shrinking clock periods, clock skew will be a problem. Another problem is the large power consumption of the clock buffers and clock net. A design concept known as GALS (Globally Asynchronous Locally Synchronous) has recently attracted renewed attention as it promises a solution to the named problems. In GALS, the design is partitioned into a number of synchronous blocks with local autonomous clocks. Data exchange between the blocks is realized by asynchronous means. GALS is introduced in [1]. Experimental designs have been implemented in [2] and [3]. In the GALS concept, a number of advantages are introduced. Clock skew constraints and clock power consumption are reduced since the extensions of the local clock trees become smaller. The clock speed and power supply voltage of the synchronous blocks can be adjusted individually. As a consequence of nonsynchronous local clock flanks the total peak current and with it the induced noise decreases. However, a number of issues still must be solved to fully exploit the GALS potential. No methodology is yet established to implement a GALS design. A methodology converting synchronous into GALS designs is proposed in [4]. Asynchronous

communication forces to address metastability and deadlock issues [2], [5] and [ 6]. The efficiency of GALS relies heavily on the presence of small area low power on-chip clock generators for the local synchronous blocks. A design for such a clock generator is described in this paper. Traditionally, a phase locked loop (PLL) is used for on-chip clock generation. The reasons for using a PLL for are often such as low phase noise and low clock jitter. However, in many digital applications, phase noise and clock jitter is of less importance. Moreover, integrating an analog PLL in a digital noisy environment is difficult. In addition to noise issues, the PLL is also sensitive to process variations. Fully digital methods are more desirable for reasons as robustness, small size and low power consumption.

CLOCK MULTIPLIER A very robust and easy implemented fully digital clock multiplier is proposed in [7]. This clock multiplier produces a fixed number of cycles for each period of an external reference clock signal followed by an idle margin (see Fig. 1). The internal frequency of this clock multiplier is fixed and hence the idle margin must be long enough to cope with factors as simulation model inaccuracy, process variations and temperature changes. For increased flexibility and easy reuse, it is desirable having a clock multiplier that can vary the number of cycles in the burst. Also, to increase the robustness and decrease power consumption, the internal operation frequency of the clock multiplier must be adaptive to frequency changes of the reference signal. Such a frequency adaptive digitally controlled on-chip clock multiplier is designed using a 0 .35 m standard CMOS technology from AMS. The presented clock multiplier generates a burst of pulses for each period of a low frequency reference signal. The resulting output after complete frequency adaptation is a burst of cycles followed by a small margin of about 1ns at 3 V supply voltage (see Fig. 2). This margin is imposed by the internal logic speed and resolution of the clock multiplier. If an output frequency

of 1 GHz and a frequency multiplication of 256 is considered, a 1ns margin leads to about 0.4 % of idle time during one reference cycle.
Output clock signal

Reference signal

The number of output cycles per reference cycle is counted in the eight bit cycle counter. A comparison between the cycle count and the reference value (the multiplication factor) controls whether the delay control counter shall go up or down. The delay control counter delivers a control word to the digitally controlled oscillator. In case the oscillator runs temporarily at a too high operation frequency, the excess of cycles is blocked at the output using the NAND gate. As a result, the output never consists of more than the desired number of cycles. Digitally Controlled Oscillator (DCO)

Figure 1. Frequency multiplication followed by idle time.


Margin

Output clock signal

Reference signal

Local clock generators based on ring oscillators have many advantages such as robustness, small size and low power consumption. The ring oscillator in its simplest form consists of an odd number of inverters connected in a circular chain. Such a circuit has no stable operation point and will therefore oscillate. The ring oscillator frequency is determined by the propagation time through the chain of inverters. There are many methods to manipulate the ring oscillator frequency. The most straightforward technique is to change the propagation delay by changing the number of inverters. However, using only this method, the oscillation is set to a fixed frequency. Other techniques are to use current starved inverters [8] or a delay line of controllable capacitors [9].

Figure 2. Frequency multiplication after adaptation.

IMPLEMENTATION The clock multiplier basically consists of a ring oscillator with variable operation frequency, a cycle counter and a delay control counter (see Fig. 3). .
Input delay state Input reference

Vctr[0-6] RESET

Figure 4. Ring oscillator with shunt capacitors.


up/down Delay control counter (7bit) Vctr[0-6]
V DD VDD VDD

Digitally controlled oscillator

Cycle counter (8bit)

RESET Multiplication factor Output clock

GND

GND

Figure 3. Clock multiplier.

Figure 5. Fast reset of ring oscillator.

The variable operation frequency is obtained by inserting digitally controlled shunt capacitors in the ring oscillator (see Fig. 4). As capacitors, different sized nmos transistors are used. The choice of digitally controlled shunt capacitors gives a robust implementation of a controllable delay [ 9]. The reset signal in Fig. 4 and 5 sets the ring oscillator to its initial state. To enable a fast reset of the ring oscillator, each node in the ring oscillator is forced to either a 0 or a 1, as shown in Fig. 5. Fig. 6 shows both simulated and measured DCO behavior in period-time versus input digital word at 3 V power supply voltage. Fig. 6 shows a measured slope of 2.7 ps/bit. The resolution of the DCO is set by the slope times the multiplication factor for the cycle counter. The slope is designed to give a resolution of about 1ns for the largest multiplication factor of 256. The simulation indicates a frequency range between 556 MHz and 864 MHz for the 3 V power supply voltage. The measured frequency range for 3 V supply is 790 MHz to 1.08 GHz. Since a conservative simulation model is used, all timing is slightly faster in the measurements. A larger frequency range is easily implemented using a larger number of shunt capacitors or by lowering the resolution. If the frequency range of the DCO is increased to 50% of the maximum frequency, a very large actual frequency range can be obtained by dividing the frequency by a factor of two using a set of flip-flops. The linearity of the DCO is quite good except for the 6:th bit (at digital word = 32 and 32+64). This could have been avoided if greater caution had been taken during the design process since this nonlinearly behavior is also visible in the simulation results. Cycle counter The main component of the cycle counter is an eightbit counter. This counter allows for any multiplication factor up to 256. The cycle counter is the bottleneck with respect to maximum output frequency. In fact, caution must be taken not to produce higher frequencies in the oscillator than the cycle counter can handle. A high frequency counter using dynamic flip-flops is therefore designed for this purpose. Since the frequency limitations in a circuit are rather set by the clocked blocks than the clock multiplier, the limitation set by this counter is in fact not limiting the overall circuit performance. Delay control counter The cycle counter updates the seven-bit delay control counter thus changing the operation frequency of the ring oscillator. These updates are done once for each period of the reference signal. The delay control counter is loaded with a start value. The state of the delay control counter is put as the output. Fast frequency transitions are then possible by

storing the state for a desired frequency and then loading the delay control counter with the stored state. If different known frequencies are used at different times, a delay control state is loaded for each frequency.
1.8 1.7 1.6 1.5 1.4 T,[s] 1.3 1.2 1.1 1 0.9 0 20 40 60 Digital word 80 100 120 x 10
-9

Figure 6. DCO behaviour. SIMULATION AND MEASUREMENT RESULTS The maximum output frequency versus power supply voltage is shown in Fig. 7. The upper clock frequency is measured to 1.15 GHz with a 3.3 V supply voltage. As expected, the frequency decreases when the supply voltage is lowered. However, the maximum frequency is still as high as 710 MHz with 2 V supply voltage. If the voltage is reduced further, the frequency will go down rapidly. quadratically.
x 10
8

10

f,[Hz]

1.5

2 2.5 Power supply voltage, [V]

Figure 7. Maximum output frequency versus power supply voltage.

A process with low threshold voltage will improve the current drive at low voltages [10] since the delay in a CMOS gate is proportional to Vdd/(Vdd-Vt)2 [11]. The clock multiplier is measured down to 0.8 V supply voltage where the maximum frequency is 10 MHz. The total power consumption/MHz versus power supply voltage is shown in Fig. 8. Dynamic power consumption will in general give the major contribution [8]. The relationship between the dynamic power consumption and the supply voltage is Pd=CLfVdd2. Consequently, when the supply voltage is lowered, the power consumption will decrease
x 10 2
-5

The design consists of about 1000 transistors and occupies a total chip area of 0.0625 mm2 (0.25 mm0.25 mm). Fig. 9 shows a die photo for the total design including pads and power routing. The small block at the side of the clock multiplier is a counter, which is used for dividing the output frequency by 64 to simplify measurement. Fig. 10 is a simulated timing diagram showing the reference clock and the output bursts of 16 cycles for 3 V power supply voltage. In Fig. 10, the circuit is fully adapted to the reference frequency.

Power consumption, [W/MHz]

Figure 10. Simulated burst of sixteen pulses.


0 1 1.5 2 2.5 Power supply voltage, [V] 3

FUTURE WORK Figure 8. Power consumption/MHz versus power supply voltage. To achieve a low power design, the most advantageous strategy is to reduce the power supply voltage, since the power consumption is proportional to the square of the supply voltage. In [12], dynamic voltage scaling of both the oscillators and the functional blocks, leading to an energy efficient design is proposed. For a future implementation, this technique will be applied to a functional block containing the adaptive clock multiplier.

CONCLUSIONS A 1.15 GHz 0.06 mm2 prototype of an on-chip clock multiplier is designed in a standard 0.35 m process. The clock multiplier has properties as small area, high speed and low power consumption. These properties makes it very suitable for Globally Asynchronous Locally Synchronous designs. This clock multiplier is designed for high frequency operation. The high frequency translates into low power operation when the power supply voltage is reduced radically for the sacrifice of reduced maximum frequency. The clock multiplier delivers a maximum output frequency of 92 MHz at 1 V supply voltage while consuming 0.16 mW.

Figure9. Die photo.

REFERENCES [1] D. M. Chapiro, Globally-Asynchronous LocallySynchronous Systems, Ph.D. Dissertation, Computer Science Department, Stanford University, Stanford, CA, October, 1984. [2] K. Y. Yun, R. P. Donohue, Pausible clocking : A first Step towards Heterogenous Systems, Proceedings of ICCD96, Page 118 ff. [3] J. Muttersbach, T. Villiger, H. Kaeslin, N. Felber, W. Fichtner, Globally Asynchronous Locally Synchronous Architectures to Simplify the Design of On-Chip Systems, Proceedings of the 12th Annual IEEE International ASIC/SOC Conference. [4] T. Meincke, A. Hemani, S. Kumar, P. Ellervee, J. berg , T. Olsson, P. Nilsson, D. Lindqvist and H. Tenhunen, Globally Asynchronous Locally Synchronous Architecture for Large High-Performance ASICs, Proc. of ISCAS99. [5] C. A. Traver, A Testable Model for stoppable clock ASICs, ASIC conference and exhibit, 1991. [6] S. J. Jou, I. Y. Chuang, Low-Power Globally Asynchronous Locally Synchronous design using selftimed circuit technology, Proceedings of 1997 IEEE International Symposium on Circuits and Systems, Vol 3,Page 1808 ff. [7] P. Nilsson and M. Torkelson, A Monolitic Digital ClockGenerator for On-Chip Clocking of Custom DSPs IEEE J. Solid-State Circuits, vol.31, No. 5, pp. 700-706, May. 1996. [8] J. M. Rabaey. Digital Integrated Circuits: A Design perspective, Prentice hall, 1996. [9] P. Andreani, F. Bigongiari, R. Roncella, R. Saletti and P. Terrini, A Digitally Controlled Shunt Capacitor CMOS Delay Line, Analog Circuits and Signal Processing, Kluwer Academic Publishers, Volume 18, pp. 89-96. [10] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-power CMOS digital design. IEEE Journal of SolidState Circuits, 27:pp 473-484, April 1992. [11] D. Pucknell and K. Eshraghian. Basic VLSI Design. Prentice Hall, 1988. [12] T. D. Burd and R. W. Brodersen, Processor Design for Portable Systems , Journal of VLSI Signal Processing, Kluwer Academic Publishers, Volume 13, Numbers 2/3, August/September 1996, pp. 203-222.

Anda mungkin juga menyukai