# High Performance Level-Converting Flip-Flop with a Simple Pulse Generator and a Fast Latch

Hyoun Soo Park<sup>1</sup>, Hong Bo Che<sup>2</sup>, Wook Kim<sup>3</sup>, and Young Hwan Kim<sup>4</sup> <sup>1, 2, 3, 4</sup> Division of Electrical and Computer Engineering, POSTECH

San 31 Hyoja-Dong, Pohang, Gyeongbuk 790-784, Republic of Korea

E-mail: 1 hawk@postech.ac.kr, 2 dianzi@postech.ac.kr, 3 undine@postech.ac.kr, 4 youngk@postech.ac.kr

Abstract: This paper proposes a high-performance levelconverting flip-flop (LCFF) for multi- $V_{DD}$  systems, called the explicit pulse-triggered dual-pass-transistor flip-flop (EPDFF). The proposed EPDFF provides both low power and high speed operations through the use of a simple pulse generator and a simple latch with a short signal propagation path. In experiments, EPDFF outperformed six existing LCFFs in both power consumption and delay in its operating range. After optimization for the minimum power-delay product (PDP), EPDFF had 19.4~52.6% less PDP than existing LCFFs, and had the smallest transistor area among the seven LCFFs we compared.

### 1. Introduction

With the increasing demand for mobile applications, power consumption has become a top priority in the design of these circuits. Lowering supply voltage is an effective power reduction method because it leads a roughly cubic reduction of leakage power as well as a quadratic reduction of dynamic power [1]. However, lowering supply voltage of the whole system incurs the degradation of system operating speed. The multi- $V_{DD}$  system approach that uses more than one power-supply voltage achieves the advantage of lowering of supply voltage while avoiding the speed degradation [2]. Exploiting the fact that the logic gates with lower power-supply voltage  $(V_{DDL})$  consume less power, at the expense of being slower, than the logic gates with high power-supply voltage  $(V_{DDH})$ , the system replaces  $V_{DDH}$  gates on non-critical paths by  $V_{DDL}$  gates as long as the operating speed is maintained.

In a multi- $V_{DD}$  system, it is necessary to use level converters (LCs), which convert  $V_{DDL}$  signals to  $V_{DDH}$ signals. This conversion prevents large static current from flowing through the weakly turned-on PMOS transistors of  $V_{DDH}$  gates that are driven by the  $V_{DDL}$  logic '1' signal. However, the use of LCs induces the overheads of power, delay and area [2–5].

A level-converting flip-flop (LCFF), a special flip-flop (F/F) embedding LC, takes  $V_{DDL}$  input and clock signals, and provides the  $V_{DDH}$  output signal. In a multi- $V_{DD}$  system, LCFF can be used instead of LC and  $V_{DDH}$  F/F, or  $V_{DDL}$  F/F and LC. If we use LCFF instead of LC and F/F, the overheads can be reduced. In addition, LCFF helps to reduce the power consumption of the clock tree in the synchronous system by taking  $V_{DDL}$  clock signals [2–5].

A LCFF is classified into a master-slave LCFF, a sense amplifier-based LCFF and a pulse-triggered LCFF according to the operating characteristic of F/F. Among these, a pulse-triggered LCFF, consists of a pulse generator and a latch with the level converting function, allows time borrowing unlike LCFFs of other two types because of having zero or negative setup time. In addition, this LCFF has a small data-to-output (DQ) delay, as it has low logic complexity and a small number of circuit stages. A pulsetriggered LCFF is classified into the implicit pulsetriggered LCFF (IP-LCFF) and the explicit pulse-triggered LCFF (EP-LCFF), depending on whether the pulse generator is located inside or outside the latch. Typically, EP-LCFF has higher power overhead for a pulse generator than IP-LCFF. However, EP-LCFF can share the pulse generator with other EP-LCFFs, and this lowers the practical power overhead.

This paper proposes an EP-LCFF using two newly invented simple circuits for a pulse generator and a latch, referred to as the explicit pulse-triggered dual-passtransistor flip-flop (EPDFF). By realizing a short signal propagation path on the simple structure, the proposed EPDFF simultaneously provides the advantages of low power and high speed.

## 2. The Proposed Flip-Flop

Figure 1 shows the schematic and the SPICE waveforms of the proposed positive edge-triggered EPDFF. In the figure, shaded gates represent  $V_{DDL}$  gates and others are  $V_{DDH}$  gates. Underlined nodes have  $V_{DDL}$  signals and others have  $V_{DDH}$ signals. The pulse generator of EPDFF generates a narrow positive pulse using the consecutive falling and rising transitions of signal <u>*ckpb*</u>, made as <u>*ckb*</u> becomes 0 after the rising transition of the clock signal. Then, dual NMOS pass-transistors, N1 and N2, are turned-on and EPDFF becomes transparent. During this short transparency period,  $V_{DDH}$  cross-coupled inverters accept  $V_{DDL}$  input signals, <u>d</u> and db, and simultaneously convert the  $V_{DDL}$  signals to the  $V_{DDH}$  signals.

EPDFF consists of two newly invented simple components, a pulse generator and a latch. The pulse generator of EPDFF uses two less transistors than the pulse generator of a pulsed half-latch level converter (PHL) [3], which is widely used in explicit pulse-triggered F/Fs. In addition, for sampling, storing, and level converting of the input signal, the latch of EPDFF uses only 10 transistors, while the latch of PHL uses 13 transistors. By using a small number of transistors, EPDFF consumes low power and occupies a small transistor area.

Another advantage of EPDFF is fast signal propagation. The latch of EPDFF has a dual-path structure which simultaneously propagates both true and complement input signals to s and r through N1 and N2, respectively. In the case of a  $0 \rightarrow 1$  input transition, node r is strongly pulled down to 0V through N2, and EPDFF propagates the input transition to q quickly. In the case of the 1 $\rightarrow$ 0 transition, node r is pulled up from 0V to  $V_{DDL}-V_{TH}$  through N2. Although this is not strong enough to provide fast propagation, node r is pulled up to  $V_{DDH}$  quickly by the upper path, N1 and IN1. As a result, EPDFF provides the fast propagation of 1 $\rightarrow$ 0 transition as well. Although the structure of the proposed latch looks similar to that of the multi-supply complementary pass-transistor flip-flop (MCPFF) [5], EPDFF uses only two NMOS passtransistors, while MCPFF uses four NMOS pass-transistors. By reducing the number of NMOS pass-transistors, EPDFF has shorter signal propagation paths than MCPFF.



Figure 1. Explicit pulse-triggered dual-pass-transistor flipflop (EPDFF) (a) Schematic (b) SPICE waveforms.

### 3. Experimental Results

As benchmarks to evaluate the performance of EPDFF, we used six existing LCFFs: slave-latch level-shifting flip-flop (SLLS) [4], master-slave, half-latch level converter (MSHL) [3], pulsed, precharged level converter (PPR) [3], PHL [3], indirect precharging flip-flop (IPFF) [5], and MCPFF [5]. We compared the performances of EPDFF and the six benchmark LCFFs through HSPICE simulation using the Berkeley Predictive Technology Model for 0.10µm technology [7]. The conditions for experiments were as follows:  $V_{DDH}$ =1.0V,  $V_{DDL}$ =0.7V,  $C_{l}$ =50fF, clock frequency=500MHz, and temperature=25°C. Here,  $C_{l}$  is the output load capacitance of each LCFF. In addition, we used

the test bench of [6], and set both the switching activity and the probability of the input signal to 0.5 for power estimation.

For the fair comparison of EPDFF with six benchmark LCFFs, we optimized seven LCFFs as follows. After setting the delay value of each LCFF, we optimized the gate widths of the MOSFETs using the optimizer of HSPICE for the minimum power consumption, while assuring correct operations. The gate length was given as the minimum dimension. Next, we repeated the same process after changing the delay constraint. During the optimization, we matched the rising delay and the falling delay as closely as possible.

After the optimization of seven LCFFs, we obtained the curves that describe their power-delay characteristics, shown in Fig. 2. In the figure, *Delay* represents the larger value of rising and falling minimum DQ delays [6]. The figure indicates that in the delay range above 136.8 [pSec], EPDFF consumes the least power among all LCFFs across all delay values. In addition, EPDFF has the smallest delay at the same power consumption values below 7.56 [µW].



Figure 2. Design exploration of seven LCFFs in power-delay space.

Table 1 presents the performance of each LCFF, obtained at the minimum PDP point on Fig. 2, in terms of timing parameters, power consumption, and PDP. In the table, these values are normalized with respect to those of SLLS, the first published LCFF among the benchmark LCFFs. In table, EPDFF has negative setup time as IPFF and PHL do. Therefore, EPDFF has an advantage of

|             | F/F   | Timing parameters |        |              |                    |        | Dower              |       | סרוס               |      |                    |
|-------------|-------|-------------------|--------|--------------|--------------------|--------|--------------------|-------|--------------------|------|--------------------|
| Туре        |       | Setup             | Hold   | Setup + Hold |                    | Delay  |                    | rowei |                    | I DF |                    |
|             |       | (pSec)            | (pSec) | (pSec)       | <sup>+</sup> Norm. | (pSec) | <sup>+</sup> Norm. | (µW)  | <sup>+</sup> Norm. | (nA) | <sup>+</sup> Norm. |
| MS-         | SLLS  | 150               | 24.3   | 102.3        | 1.00               | 268.9  | 1.00               | 6.52  | 1.00               | 1.75 | 1.00               |
| LCFF        | MSHL  | 78.9              | 22.4   | 101.3        | 0.99               | 264.0  | 0.98               | 5.76  | 0.88               | 1.52 | 0.87               |
| IP-<br>LCFF | PPR   | 35.5              | 67.6   | 103.2        | 1.01               | 182.9  | 0.68               | 7.40  | 1.13               | 1.35 | 0.77               |
|             | IPFF  | -50.7             | 157.8  | 107.1        | 1.05               | 130.6  | 0.49               | 7.85  | 1.20               | 1.03 | 0.59               |
|             | MCPFF | 67.3              | 75.9   | 143.2        | 1.40               | 226.5  | 0.84               | 4.82  | 0.74               | 1.09 | 0.62               |
| EP-         | PHL   | -8.4              | 93.6   | 85.2         | 0.83               | 190.4  | 0.71               | 6.15  | 0.94               | 1.17 | 0.67               |
| LCFF        | EPDFF | -31.1             | 149.7  | 118.6        | 1.16               | 159.4  | 0.59               | 5.21  | 0.80               | 0.83 | 0.47               |
|             |       |                   |        |              |                    |        |                    |       |                    |      |                    |

Table 1. Performances of seven LCFFs at the minimum PDP point.

\*MS-LCFF: Master-Slave LCFF, \*Norm.: normalized value

allowing time borrowing in company with IPFF and PHL. The sum of setup and hold times represents the datasampling window of F/Fs. The F/Fs with the narrower window are preferred. EPDFF has a little wider sampling window than existing LCFFs except MCPFF do. However, EPDFF has 19.4~52.6% less PDP than existing LCFFs do.

Table 2 shows the total gate width of each LCFF, obtained at the minimum PDP point. EPDFF uses the smallest number of transistors along with MCPFF, and occupies the smallest transistor area. Especially, EPDFF is an EP-LCFF that can share its pulse generator with other EPDFFs. Hence, in case that EPDFF shares its pulse generator with others, its transistor area is more reduced. This is demonstrated in a next experiment.

Table 2. Total gate widths of seven LCFFs at the minimum PDP point.

| Туре        | F/F   | Number of transistors | Total gate<br>width (μm) | Normalized total gate width |
|-------------|-------|-----------------------|--------------------------|-----------------------------|
| MS-         | SLLS  | 26                    | 16.3                     | 1.00                        |
| LCFF        | MSHL  | 23                    | 14.6                     | 0.90                        |
|             | PPR   | 33                    | 19.5                     | 1.20                        |
| IP-         | IPFF  | 21                    | 13.7                     | 0.84                        |
| LUII        | MCPFF | 20                    | 12.5                     | 0.77                        |
| EP-<br>LCFF | PHL   | 25(12)                | 14.2                     | 0.87                        |
|             | EPDFF | 20(10)                | 11.2                     | 0.69                        |

Table 3 represents the circuit robustness against supply-voltage noise of each LCFF optimized for the minimum PDP. We investigated the robustness of the seven LCFFs by obtaining the worst delay variations for the independent  $\pm 5\%$  noises of  $V_{DDL}$  and  $V_{DDH}$ . Considering that a 5% voltage drop in the supply voltage can cause a 15% increase in the gate delay [8], it is concluded all LCFFs are robust against supply-voltage noises.

Table 3. Delay variations for the independent  $\pm 5\%$  noises of  $V_{DDL}$  and  $V_{DDH}$  of seven LCFFs optimized for the minimum PDP.

| Tuno        | E/E   | Min. DQ d              | Delay |              |
|-------------|-------|------------------------|-------|--------------|
| Type        | Γ/Γ   | Normal value Worst val |       | increase (%) |
| MS-         | SLLS  | 268.9                  | 301.2 | 12.0         |
| LCFF        | MSHL  | 264                    | 295.7 | 12.0         |
| IP-<br>LCFF | PPR   | 182.9                  | 202.4 | 10.7         |
|             | IPFF  | 130.6                  | 145.9 | 11.7         |
|             | MCPFF | 226.5                  | 253.2 | 11.8         |
| EP-<br>LCFF | PHL   | 190.4                  | 208.6 | 9.6          |
|             | EPDFF | 159.4                  | 180.9 | 13.5         |

Among the seven LCFFs, EPDFF and PHL are EP-LCFFs that can share their pulse generators with others. Fig. 3 shows the performances and the total gate width of each EPDFF and each PHL (after optimization of their circuits for the minimum PDP) when they share a pulse generator with neighboring F/Fs of the same type. In the experiment, the parasitics of interconnects between latches and a pulse generator were not considered. In Fig. 3, Delay represents the larger value of rising and falling minimum DQ delays. According to Fig. 3, the more the number of EPDFFs and PHLs sharing the pulse generator increases, the more their power consumptions and total gate widths are reduced. However, their delays increase due to the degradation of the pulse shape, caused by the increased load capacitance of the pulse generator. PHL has a minimum PDP of 0.86 [fJ] when nine PHLs share the same pulse generator. On the other hand, EPDFF has a minimum PDP of 0.64 [fJ] when six EPDFFs share the proposed pulse generator, which is 25.6% less than that of PHL. When comparing these two minimum PDP points, we find EPDFF has a power consumption of 3.49 [µW] and a delay of 184.1 [pSec], which are 16.1% and 10.6% improvements over PHL.



Figure 3. Performances and total gate widths of each EPDFF and each PHL when sharing the pulse generator with other F/Fs of the same type.

#### 4. Conclusion

The proposed EPDFF provides the advantages of low power, high speed and small transistor area by using a simple pulse generator and a fast latch. Experimental results indicate that EPDFF is superior in both power consumption and delay in the operating range with a delay above 136.8 [pSec] and a power consumption level below 7.56 [ $\mu$ W] in comparison with these other LCFFs. In addition, in an experiment aimed at evaluating a unique feature of EP-LCFF, i.e., its ability to share the pulse generator with neighbors, EPDFF outperformed PHL in all aspects of delay, power consumption, PDP and area overhead.

#### Acknowledgements

This work was supported by IDEC and the BK 21 program.

#### References

- R.K. Krishnamurthy, A. Alvandpour, V. De, and S. Borkar, "High-performance and low-power challenges for sub-70nm microprocessor circuits," Proc. IEEE Custom Integrated Circuits Conf., pp.125–128, May 2002.
- [2] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanazawa, M. Ichida, and K. Nogami, "Automated low-power technique exploiting multiple supply voltages applied to a media processor," IEEE J. Solid-State Circuits, vol.33, no.3, pp.463–472, March 1998.
- [3] F. Ishihara, F. Sheikh, and B. Nikolic, "Level conversion for dual-supply systems," IEEE Trans. on VLSI Systems, vol.12, no.2, pp.185–195, Feb. 2004.
- [4] M. Hamada, M. Takahashi, H. Arakida, A. Chiba, T. Terazawa, T. Ishikawa, M. Kanazawa, M. Igarashi, K. Usami, and T. Kuroda, "A top-down low power design technique using clustered voltage scaling with variable supply-voltage scheme," Proc. IEEE Custom Integrated Circuits Conf., pp.495–498, May 1998.

- [5] H.S. Park, B.H. Lee, and Y.H. Kim, "Level converting flip-flops for high-speed and low-power applications," IEICE Trans. Fundamentals, vol.e89-A, no.6, pp.1740–1743, June 2006.
- [6] V. Stojanovic, and V.G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for highperformance and low-power systems," IEEE J. Solid-State Circuits, vol.34, no.4, pp.536–548, April 1999.
- [7] Y. Cao, T. Sato, M. Orshansky, D. Sylvester, and C. Hu, "New paradigm of predictive MOSFET and interconnect modeling for early circuit design," Custom Integrated Circuits Conf., pp.201–204, May 2000.
- [8] D.S. Cho, K.H. Lee, G.J. Jang, T.S. Kim, and J.T. Kong, "Efficient modeling techniques for IR drop analysis in ASIC designs," Proc. IEEE ASIC/SOC Conf., pp.64–68, Sept. 1999.