Design and performance evaluation of a low-power data-line SRAM sense amplifier

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>As Published</td>
<td><a href="http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5403784">http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5403784</a></td>
</tr>
<tr>
<td>Publisher</td>
<td>Institute of Electrical and Electronics Engineers</td>
</tr>
<tr>
<td>Version</td>
<td>Final published version</td>
</tr>
<tr>
<td>Accessed</td>
<td>Sun Mar 31 06:33:00 EDT 2019</td>
</tr>
<tr>
<td>Citable Link</td>
<td><a href="http://hdl.handle.net/1721.1/59362">http://hdl.handle.net/1721.1/59362</a></td>
</tr>
<tr>
<td>Terms of Use</td>
<td>Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.</td>
</tr>
<tr>
<td>Detailed Terms</td>
<td></td>
</tr>
</tbody>
</table>
Design and Performance Evaluation of a Low-power Data-line SRAM Sense Amplifier

Haitao Fu
Department of Materials Science and Engineering
Massachusetts Institute of Technology
Cambridge, MA, USA
Email: sidneyfu@mit.edu

Anh-Tuan Do
Department of Electrical and Electronic Engineering
Nanyang Technological University
Singapore
Email: atdo@ntu.edu.sg

Kiat-Seng Yeo
Department of Electrical and Electronic Engineering
Nanyang Technological University
Singapore
Email: eksyeo@ntu.edu.sg

Zhi-Hui Kong
Department of Electrical and Electronic Engineering
Nanyang Technological University
Singapore
Email: zhkong@ntu.edu.sg

Abstract—The SRAM which functions as the cache for system-on-chip is vital in the electronic industry. The heavy bit- and data-line capacitances are the major roadblocks to its performance. A high-performance SRAM is proposed using a 1.8 V/0.18 µm CMOS standard process from Chartered Semiconductor Manufacturing Ltd (CHRT). It incorporates a discharging mechanism that helps eliminating the waiting time during the read operation, hence offering a faster sensing speed and lower power consumption. Our post-layout simulation results have shown that it improves the sensing speed and power consumption by 51.4%, and 62.47%, respectively when compared with the best published design. The total Power-Delay-Product (PDP) is 81.79% better. Furthermore, it can operate at a supply voltage as low as 0.8 V with a high stability to the bit-line capacitances variation and mismatch.

I. INTRODUCTION

SRAM plays an increasingly important role in System-on-Chip (SoC) applications [1]. SRAM circuit becomes crucial in present day IC design with high requirements. With fast growing technology, IC design is pushing memory cell to be highly integrated. As a result, the power dissipation and speed are affected with increasing bit line capacitances. Although various designs have been proposed to exhibit low power and high speed performance, SRAM with higher speed and density are still in demand [5]. In addition, low voltage operation is inevitable for future VLSI [2]. Circuits which are able to work under low voltage supply and maintain a good performance will definitely outperform other circuits in the long run. The major problem that affects the performance of current design is the long interconnect bit-lines. They contribute a lot of difficulties to the speed as well as power dissipation. The primary bottleneck on speed within an SRAM memory core occurs at two critical circuits: the bit line multiplexer and the SA interface [3]. SRAM design is constrained by its compact area requirement, which forces the use of near minimum sized transistor for the memory cell design. The small memory cell must drive large capacitive bit-lines resulting in a very small signal swing. This will limit the speed of any sensing scheme that requires the development of a specific level of differential voltage to initiate the sensing operation [4]. Hence, the key strategy of overcoming the speed and power limitation is to diminish the bit-line swing.

Therefore, sense-amplifier circuits must be able to detect a small signal generated on data lines. The current mode SA [3] innovated in 1991 uses 4 PMOS transistors as a current conveyor to reduce the voltage swing on the bit-line and hence diminish the speed bottlenecks caused by the heavily loaded bit lines. This is the most commonly used techniques to reduce the delay time as well as power consumption from about 5ns for conventional voltage-mode operation to less than 0.3ns for current-mode signals [3].

In spite of the superior performance brought by current-mode sense amplifier, the key drawback of these designs is that it is difficult to control the latching time during the read operation, because an early latching often causes an error in operation [6]. In this paper, a new SA is proposed. It exhibits a superior performance to the previous designs in terms of both sensing speed and power consumption. In addition, it is capable to work under the voltage supply as low as 0.8V, with process variation tolerance. By using a data-line controller, the undesirable crossing phenomenon caused by early latching is better controlled. This has improved the sensing speed to 0.49ns in post-layout simulation. Meanwhile it pulls data-line voltages close to VDD level, so as to reduce the voltage swing on the data-line capacitors during a read operation. It brings down the power consumption and further enhances the speed.

II. PROPOSED DESIGN

The proposed SA is presented in Figure 1. Standard 6T memory cells are used to verify the operation of the SA. The SA consists of two parts: the data line control circuit and the cross coupled latch SA (CCLSA) which will sense the differential voltages developed on the data-lines. The proposed circuit uses a modified cross-coupled SA to realize the fast sensing scheme. Furthermore, the current amplifiers and the current conveyor are removed from the bit-lines to reduce the bit-line loads. The two PMOS P1 and P2 will pull the data line voltages up to VDD level all the time.

In the read phase, P3 and P4 build up a column selector, which will be turned on by setting the signal CS1 low. Simultaneously, EQ will go high to turn off the equalization transistor PEQ, and control signal CONS goes low, where
PMOS transistors P3 and P6 will be on and NMOS transistors N7 and N8 are off. The two differential currents are passing through the two column selectors P3 and P4, charging up the data-line capacitors creating different voltage levels. Consequently, a pair of PMOS converter will convert the differential voltages at points E and F into two differential currents again to charge up the voltages at A and B so as to drive the SA. The size of the two PMOS P3 and P4 must be properly sized to control the charging time of the circuit. The signal EN will go high in order to turn on the switch transistors N9 to start sensing. Hence \( \text{EN} \) sets to low to cut off transistor N10 to let points C and D go differentially at the same time. Four transistors P7, P8, N3, and N4 build up a cross-coupled latch, which is capable of generating a large voltage swing even a small difference in voltages is applied to it. The conventional inverter is used to full swing the differential voltages developed at points C and D.

One of the key features in the proposed circuit is that the voltages on data lines will always be clamped close to \( V_{DD} \) level; hence in the sensing mode, the charging mechanism on the data lines is very fast. For good impedance matching such a SA has to have a low input resistance. This keeps the bit-line voltage almost constant and results in fast read operation [7].

In the standby mode, the column selector is off where CS1 signal goes high, and EN goes low to block the unnecessary leakage currents. The NMOS N9 is used to block any leakage current to save the power. Meanwhile, data line control signal CONS goes high to turn off PMOS P4, hence the bottom cross-coupled SA will be separated from the data line during the standby period. Once the bit line loading is decoupled from output nodes there will be a lot of improvement in the sensing delay [8]. Meanwhile, the voltages at points A and B are discharged down to ground through two NMOS transistors N7 and N8. Without N7 and N8, the voltages at A and B will be kept to \( 2V_{th} \) level during standby in the self discharging process, therefore in next read cycle, the charging process will be shorter to turn on the cross coupled latch below. The shorter charging time is not sufficient for the right currents to come down to the data lines to charge up the data line capacitors. As a result, two PMOS P3 and P4 will be in cut off mode at the start of the read operation; no currents will be going down through the SA until VA and VB are charged up to close to \( V_{DD} \) level to drive CCLSA again. In this case, two PMOS transistors P3 and P6 actually provide a small delay time for the data line voltages to go correctly by charging up VA and VB. This short delay time provided by P3 and P6 is vital in design since it provides enough time for data line voltages to swing and consequently no latching time is needed in EN signal. In such a configuration, the sensing time is tremendously reduced, and more power is saved.

Transistor N9 is a critical component in this proposed design. N9 is able to control the voltage level at points C and D at standby by cutting off the current path to ground. Besides this function, more importantly, in the standby period, once the transistor it is cut off, the voltages at both points A and B will be fully discharged to ground level. This has prepared charging time long enough at start of read cycle for the desired currents to come down from the long bit lines.

The new SA’s layout is shown in Figure 2. Its transistor sizes and performance are summarized in tables I and II, respectively. The new circuit demonstrates a speed improvement of 51.4% and 62.47% less in power consumption. As a result the total power-delay product is ameliorated by 81.79% than the latest charge transfer circuit.

Figure 1 Proposed circuit

The proposed circuit in this paper was designed by using a 1.8 V/0.18 \( \mu \)m CMOS process. The sensing delay was defined as the difference in time between the 50% point of the output voltage and 50% point of the row select. To ensure a fair comparison, all the transistor sizes of five SA designs have been fully optimized to achieve minimum power-delay-product (PDP) value. Signal sequences stored in memory cells are identical for every circuit. The initial bit- and data-line capacitances are set to 1pF, and load-line capacitances are set to 0.1pF. Parametric analyses are conducted in Cadence to test the circuit behavior with respect to capacitance change. Delay time is measured from the time row select is triggered to the time when valid output data appears. Power dissipation is the product of current passing through and voltage across the power supply source.
Before the read cycle, voltage levels at two data lines are conserved near to $V_{DD}$. Voltages at control points A and B are discharged to ground level. Once the 1st row and 1st column are selected with voltages at A and B are at ground potential, there will not be any current flow through cross coupled latch SA at the bottom (CCLSA). Only after 0.2ns the currents start to flow through the CCLSA, and further amplification is generated at points C and D. As a result it takes 0.14ns for voltages at points A and B to turn on the CCLSA and make $V_C$ and $V_D$ to swing. Consequently, no waiting time is needed before turning on the CCLSA. In this mechanism, it has saved time in reading cycle; meanwhile less power is consumed during the standby period and read period.

Table I: Sizing of the Transistors used in the proposed design. All Transistors have the same channel length $L = 0.18\mu m$.

<table>
<thead>
<tr>
<th>Transistor</th>
<th>$W (\mu m)$</th>
<th>Transistor</th>
<th>$W (\mu m)$</th>
</tr>
</thead>
<tbody>
<tr>
<td>P1-P2</td>
<td>15</td>
<td>P7-P8, N3-N4</td>
<td>3.5</td>
</tr>
<tr>
<td>6T cell</td>
<td>0.3</td>
<td>P9-P10</td>
<td>3</td>
</tr>
<tr>
<td>P3-P4</td>
<td>15</td>
<td>N5-N6</td>
<td>2</td>
</tr>
<tr>
<td>PEQ</td>
<td>10</td>
<td>N1-N2</td>
<td>7</td>
</tr>
<tr>
<td>P5-P6</td>
<td>5</td>
<td>N9</td>
<td>10</td>
</tr>
<tr>
<td>N7-N8</td>
<td>2.5</td>
<td>N10</td>
<td>1.5</td>
</tr>
</tbody>
</table>

Table II: Summary of performance of the circuits in comparison at post-layout simulations.

<table>
<thead>
<tr>
<th>Circuit</th>
<th>Sensing delay (ns)</th>
<th>Power Consumption (mW)</th>
<th>Power-Delay Product (pJ)</th>
<th>Layout area (um$^2$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Proposed</td>
<td>0.49</td>
<td>0.224</td>
<td>0.109</td>
<td>420</td>
</tr>
<tr>
<td>Charge Transfer [9]</td>
<td>1.01</td>
<td>0.597</td>
<td>0.603</td>
<td>568</td>
</tr>
<tr>
<td>Ultra Low Power [10]</td>
<td>1.34</td>
<td>0.526</td>
<td>0.705</td>
<td>579</td>
</tr>
<tr>
<td>High Speed [11]</td>
<td>0.91</td>
<td>0.983</td>
<td>0.894</td>
<td>659</td>
</tr>
</tbody>
</table>

Figure 2 Layout of the proposed design

Figure 3 Delay and Power consumption of the circuits in comparison versus $C_{BL}$ variation.

Figure 4 Delay and Power consumption of the circuits in comparison versus $C_{DL}$ variation.
IV. PERFORMANCE COMPARISON AND EVALUATION

Figure 3 shows the delay time and power consumption sensitivity with respect to bit-line changes. It has shown that the proposed design (ND) is the best performer in terms of both speed and power consumption. Figure 4 illustrates the delay time and power consumption sensitivity with respect to data-line capacitance. High Speed (HS) and Ultra Low Power (ULP) circuits which include a current sensing cross-coupled CMOS latch as the sense amplifier are insensitive to the variation of data-line capacitance. Charge Transfer circuit (CT) operation relies on data line capacitor for voltage transfer; hence the variation of data line capacitor will certainly affect the sensing speed. This has proven that the proposed design is also insensitive to data-line capacitance. Therefore it is a robust design with respect to capacitance variation.

In addition, we also carried out the supply voltage variations and we prove that proposed design is still able to operate as low as 0.8V shown in Figure 2. When it is subject to bit-line mismatches, we have found out that the proposed design is able to tolerate 400% mismatches, as shown in Figure 3, while the sensing delay only increases from 0.37ns to 0.526ns based on pre-lay out simulation.

V. CONCLUSION

A novel sense amplifier with ultra low power and high speed is presented. It isolates the CCLSA from the data lines to keep data line voltage swings small and ensure voltage levels high enough to drive CCLAS in the next reading cycle. In general, post layout simulation proves significant reduction in sensing speed and power consumption. Its low operation voltage down to 0.8V and insensitive to variations and mismatches provide an attractive solution for the future high speed low power memory circuit.

REFERENCES


![Figure 2 Sensing delay of the proposed design versus V_DD variation](image2)

![Figure 3 Sensing delay of the proposed design versus C_BL variation](image3)