# A Body-Voltage-Sensing-Based Short Pulse Reading Circuit for Spin-Torque Transfer RAMs (STT-RAMs) Fengbo Ren, Henry Park, Richard Dorrance, Yuta Toriyama, C.-K. Ken Yang, Dejan Marković Department of Electrical Engineering, University of California, Los Angeles, CA, USA E-mail: fren@ee.ucla.edu #### **Abstract** With scaling of CMOS and Magnetic Tunnel Junction (MTJ) devices, conventional low-current reading techniques for STT-RAMs face challenges in achieving reliability and performance improvements that are expected from scaled devices. The challenges arise from the increasing variability of the CMOS sensing current and the reduction in MTJ switching current. This paper proposes a short-pulse reading circuit, based on a body-voltage sensing scheme to mitigate the scaling issues. Compared to existing sensing techniques. our technique shows substantially higher read margin (RM) despite a much shorter sensing time. A narrow current pulse applied to an MTJ significantly reduces the probability of read disturbance. The RM analysis is validated by Monte-Carlo simulations in a 65-nm CMOS technology with both CMOS and MTJ variations considered. Simulation results show that our technique is able to provide over 300 mV RM at a GHz frequency across process-voltage-temperature (PVT) variations, while the reference designs require 4.3 ns and 2.3 ns sensing time for a 200 mV RM, respectively. The effective read energy per bit required by the proposed sensing circuit is around 195 fJ in the nominal case. #### Keywords Emerging memory, STT-RAM, sensing circuit, short-pulse reading, body-voltage sensing, read margin. #### 1. Introduction Over the last decade, extensive research has been carried out in the search of a scalable "universal memory." Phase-Change RAM (PC-RAM) has been shown to be a viable replacement for Flash [1]. Resistive RAM (RRAM) is in its initial stage of exploration [2] and its benefits are yet to be seen. Recently, STT-RAM has been regarded as the front runner, because it can achieve a smaller cell size than SRAM, better performance than DRAM, the non-volatility of Flash, and better endurance (on the order of 1016 read/write cycles) than Magnetoresistive RAM (MRAM) [3]-[5]. Compared to MRAM, another advantage of STT-RAM is that the switching current scales with device size [5] due to the nature of spin-torque transfer. With future scaling, the variation in CMOS devices is likely to continue to increase, and the critical current density (J<sub>C</sub>) of the MTJ devices will decrease. These two effects combined will greatly impede the reliability of the MTJ read operation unless the reading is to be performed with levels of current that are comparable to those used for the write operation. Most existing STT-RAM reading schemes use a low-current reading (LCR) in which a sensing current smaller than the writing current is applied on the selected MTJ to avoid read disturbance [3], [6]-[8]. This approach leads to a sensing current that is strictly bounded by the long duration switching current ( $I_C$ ) of the MTJ. Consequently, the scaling of $J_C$ will eventually challenge the viability of the LCR sensing scheme for a high-speed reading. To solve the problem, a short-pulse reading (SPR) scheme has been proposed in [9], where a sensing current that is similar in magnitude to the writing current is used to read the MTJ, but with a much shorter pulse width. However, no circuit implementation of the SPR scheme has been published thus far. Naturally, there has been no clear answer to what the best circuit structure to implement the SPR is. In this work, we propose an SPR circuit structure with a body-voltage sensing circuit. To study its suitability for the SPR, we analyze the read margin (RM) and performance of the proposed sensing circuit and compare them to those of the two reference designs [7], [8] under the proposed SPR structure. The analysis is validated by Monte-Carlo simulations in HSPICE using a 65-nm CMOS technology, considering both CMOS and MTJ variations. Results show that the proposed sensing circuit outperforms the reference designs by a large margin in sensing speed for the same energy. In the worst case of PVT variations, the proposed circuit can achieve a RM as high as 300 mV under a 1 V supply with only 0.78 ns of sensing time, while the reference designs require 4.3 ns and 2.3 ns to achieve a RM of 200 mV, respectively. The remainder of the paper is organized as follows. Section 2 introduces MTJ basics and discusses the SPR scheme. Implementation of the proposed SPR scheme is described in Section 3. Section 4 reviews the reference designs, introduces analysis metrics and the simulation setup. The comparison results and discussions are presented in Section 5. Section 6 concludes the paper. ## 2. Towards High-Speed Reading of STT-RAM **Figure 1:** The basic MTJ structure illustrating parallel and anti-parallel states and switching current. ### 2.1. MTJ Switching Characteristics MTJ is the storage element of STT-RAM. It consists of two ferromagnetic layers separated by a thin nonconductive tunneling barrier (e.g. MgO) as shown in Fig. 1. The thicker ferromagnet, whose layer-stack structure fixes its magnetic orientation, is called the fixed layer or the pinned layer. The thinner layer, of which magnetic orientation can be changed, is called the free layer. The MTJ exhibits two resistive states determined by the relative magnetization directions of the fixed and the free layers: a parallel (P) orientation produces a low resistance ( $R_P$ ) and an anti-parallel (AP) orientation results in a high resistance ( $R_{AP}$ ). The resistance difference between the two states is measured by tunnel magnetoresistance ratio (TMR), defined as ( $R_{AP} - R_P$ )/ $R_P$ . A higher TMR indicates better readability and is thereby preferred by the reading operation. Similar to MRAMs, STT-RAMs store information in MTJs in a magnetic form—"0" and "1" are represented as different magnetization directions of the free layer. The switching of the MTJ is made by applying a bi-directional writing current to the device as shown in Fig. 1: the current in the direction from the fixed (free) to the free (fixed) layer writes the MTJ into the AP (P) state. A typical switching characteristic of the MTJ is depicted in Fig. 2. The contours **Figure 2:** The switching characteristic of the MTJ with a free-layer stack of $Co_{60}Fe_{20}B_{20}$ and a size of 50 nm by 130 nm. (a) AP to P, (b) P to AP. show that the current density required for achieving a certain switching probability is a function of the switching time. This graph indicates that there is a tradeoff between the amplitude and the pulse width of the sensing/writing current in STT-RAM. Note also that the AP to P and the P to AP switching are asymmetric—P to AP usually requires a higher current density for the same switching probability. In this paper, $J_{\rm C}$ refers to the critical current density required for 100% switching probability, and the critical current ( $I_{\rm C}$ ) is calculated as $J_{\rm C}$ times the MTJ junction area. #### 2.2. The Need for SPR In STT-RAM design, the writing current distribution should stay above the 100% region (Fig. 2) to guarantee successful writing. Similarly, the sensing current distribution has to be kept below the 0% region to avoid accidental switching (read disturbance). The conventional LCR scheme avoids the read disturbance by keeping the read-current amplitude substantially below I<sub>C</sub>. Typical writing currents today are in the 300-500 µA range and reading with 1/3 or 1/5 of the write current is still feasible. However, scaled MTJs would need to work with writing currents on the order of 10s of µA [4], making the LCR impractical for fast reading. Alternatively, the SPR scheme uses a higher sensing current amplitude with a shorter duration [9] to effectively improve the sensing speed without risking the read disturbance. As a result, the SPR scheme is of great interest for designing the fast and reliable reading circuit for future STT-RAMs. This paper contributes an architecture and a circuit design of the SPR scheme. The proposed SPR circuit will be discussed in the following sections. ## 3. Design of High-Speed SPR Circuit #### 3.1. SPR Architecture The key idea of SPR is to perform reading with a short pulse of the sensing current applied on the MTJ device. Generally, the shorter the pulse, the shorter the sensing time and the lower the chance for a read disturbance. Figure 3 shows the proposed SPR scheme, which includes a sensing circuit that compares the cell value with a reference followed by a capturing latch. The sensing circuit uses a voltage sensing scheme. The data sensor converts the MTJ resistance into a voltage signal ( $V_{DATA}$ ) that has two levels, $V_{H}$ and $V_{L}$ corresponding to sensing $R_{AP}$ and $R_{P}$ , respectively. The reference sensor Figure 3: Proposed SPR scheme. averages $V_H$ and $V_L$ to generate the reference voltage $V_{REF}$ . The difference of $V_{DATA}$ and $V_{REF}$ is further amplified by the $2^{nd}$ -stage amplifier so that the resulting differential output $V_{OUT}$ reflects the sensed resistance by its polarity, namely $V_{OUT} > 0$ (< 0) for reading $R_{AP}$ ( $R_P$ ). The capturing latch regenerates $V_{\rm OUT}$ into a full-swing signal. Theoretically, the latch utilizing positive feedback has an infinite gain and is capable of resolving an arbitrarily small voltage difference. Practically, the minimum resolvable voltage is limited by mismatch and noise. Using a strong positive feedback allows for early and quick data regeneration. Figure 4 illustrates the timing diagram of the proposed SPR scheme. Instead of waiting for $V_{OUT}$ to completely settle, the capturing latch is enabled to regenerate the final output, once $V_{OUT}$ with sufficient RM has been established (RM is analyzed in Section 4). Then the sensing circuit is disabled for the rest of the reading operation to cut off the sensing current thereby minimizing its pulse width. ### 3.2. Body-Voltage Sensing Circuit (BVSC) In the sensing circuit, the sensing signal $V_{DATA}$ is converted from the sensing current ( $I_{MTJ}$ ) through a load network. The swing of $V_{DATA}$ , which has to be large enough to suppress device mismatches and noise, is given by $$V_{H} - V_{L} = (I_{MTJ,P} - I_{MTJ,AP}) \cdot R_{LOAD}, \qquad (1)$$ where the conversion gain ( $R_{LOAD}$ ) is the small signal resistance of the load device. According to Eq. (1), increasing $R_{LOAD}$ increases the signal swing, but only to a certain extent. As the MTJ resistance varies due to the geometry and randomness in the tunnel barrier thickness due to manufacturing, $V_{DATA}$ (and $V_{REF}$ ) also has variation. We define a statistical measure of the worst-case margin between $V_H$ and $V_L$ as the sensing margin (SM), given by $$SM = \mu(V_{H} - V_{L}) - 3\sigma(V_{H} - V_{L}). \tag{2}$$ One should note that the primary design objective is to maximize SM, not simply its mean. Further increasing $R_{\rm LOAD}$ beyond some point would eventually deteriorate SM, since a higher $R_{\rm LOAD}$ also amplifies the variance of the sensing signal. In addition, a load device with large $R_{\rm LOAD}$ is not desirable for SPR as it introduces a large RC time constant, limiting the sensing speed [10]. Therefore, choosing an optimum $R_{\rm LOAD}$ , that should be neither too big nor too small, is critical to the quality of sensing. Figure 5 (a)-(c) shows different types of loads that are commonly used for resistance-sensing circuits. From the above discussion, it is noted that none of them are ideally Figure 4: Timing diagram of the SPR circuit. **Figure 5:** Different types of transistor loads and their R<sub>LOAD</sub>. (a) diode-connected load, (b) current source load, (c) current mirror load, (d) body-connected load. suited for SPR. The diode-connected load (Fig. 5 (a)) has a small $R_{LOAD}$ (1/g<sub>m</sub>), resulting in a small SM. The currentsource load (Fig. 5 (b)) has a large R<sub>LOAD</sub> (r<sub>O</sub>), making it sensitive to the MTJ resistance variation. Its bandwidth at the sensing node $(V_{DATA})$ is also limited. The current-mirror load (Fig. 5 (c)) allows for the sensing and amplification to be performed in the same stage, but it has imbalanced load impedance and a limited bandwidth at the sensing node. To account for both SM and speed, we propose a bodyconnected load as shown in Fig. 5 (d). This load connects the body (n-well) and the drain terminal of a PMOS transistor. Its effective $R_{LOAD}$ is $1/g_{mb}$ [10]. As compared to the diode-connected load $(R_{LOAD} = 1/g_m)$ , the bodyconnected load has a larger output impedance due to the fact that the body voltage is weaker at tuning the current than the gate voltage is $(g_{mb} < g_m)$ . On the other hand, $1/g_{mb}$ is still much smaller than the R<sub>LOAD</sub> of the current-source load (r<sub>O</sub>). As a result, the body-connected load properly trades off speed to effectively increase SM, instead of choosing an extreme as in the cases from Fig. 5 (a)-(c). Figure 6 shows the schematic of the proposed BVSC. Besides a body-connected PMOS load, an NMOS clamp transistor cascading with the column mux device is used in the sensor circuits for controlling the sensing current as well as for shielding the BL voltage from the voltage variation at node $V_{\text{DATA}}$ . The reference sensors connecting to the reference cells constantly sense $R_P$ and $R_{AP}$ to generate $V_L$ and V<sub>H</sub>, respectively. A voltage divider network in between generates $V_{\text{REF}}$ by averaging $V_{\text{L}}$ and $V_{\text{H}}.$ The $2^{\text{nd}}\text{-stage}$ amplifier uses two differential pairs, each generating one of the differential outputs (V<sub>OUT+</sub> or V<sub>OUT-</sub>). This differential amplifier provides extra signal swing at the outputs, enabling a more reliable and early data regeneration. The V<sub>OUT</sub> is finally regenerated to a full-scale signal by the dynamic latch. The proposed BVSC has a significant speed advantage due to two key factors: 1) the body-connected loads guarantee a large bandwidth of the sensor circuits 2) the amplification stage, which is completely decoupled from the sensing stage, has more freedom in tuning the current to trade off power with performance. Figure 6: Schematic of the proposed body-voltage-sensing-based SPR circuit. # 4. Comparison Method #### 4.1. Reference Designs We compare the proposed BVSC with two recent designs that use current-sensing scheme [7], [8] for reading. The first reference design is an improved current-mirror-based sensing circuit (CMSC) presented in [7]. This design adds an equalizer to the sense amplifier outputs to mitigate the issues of the imbalanced output impedance and the skewed sensing time of reading R<sub>P</sub> and R<sub>AP</sub>, of the original current-mirror sense-amplifier based design [6]. The second reference design is the split-path sensing circuit (SPSC) [8]. This design implements a double current-mirror based differential amplifier by splitting the sensing current into two paths and mirroring them differentially to improve the output signal swing and RM. ## 4.2. Read Margin (RM) The timing of enabling the data regeneration phase is critical to the sensing integrity. To avoid reading errors caused by the device mismatch and noise, the regeneration phase cannot be activated until the target signal amplitude of $V_{\rm OUT}$ has been established. This condition is given by $$|V_{OUT}| \ge NM + V_{OS-DL},$$ (3) where $V_{OS\text{-}DL}$ is the input-referred offset voltage of the dynamic latch, and NM is the noise margin. Similar to SM, the $V_{OUT}$ fluctuation, due to CMOS and MTJ variations, is statistically characterized by defining $RM_P$ and $RM_{AP}$ as $$RM_P = \mu(V_{OUTP}) + 3\sigma(V_{OUTP}), \qquad (4)$$ and $$RM_{AP} = \mu(V_{OUT,AP}) - 3\sigma(V_{OUT,AP}), \qquad (5)$$ for reading $R_P$ and $R_{AP}$ , respectively. Fig. 7 illustrates the definition shown in Eqs. (4) and (5). Note that $RM_P$ and $RM_{AP}$ have different polarities, however it is the absolute value that represents the actual read margin. So the overall RM is defined as the smaller of the two as $$RM = \min(|RM_P|, |RM_{AP}|). \tag{6}$$ It is important to note the difference between SM defined in Eq. (2) and RM defined here: SM characterizes the worst-case signal swing of the single-ended output of sensor circuits, while RM measures the worst-case signal amplitude of the differential outputs of the 2<sup>nd</sup>-stage amplifier, for Figure 7: Illustration of the definition of RM. **Table I:** Summary of MTJ parameters. | Size | 50 nm × 130 nm | |----------------|----------------------------| | RA | $14.8 \ \Omega \cdot um^2$ | | TMR | 110% | | $\sigma_{RA}$ | 4% | | $\sigma_{TMR}$ | 5% | reading $R_P$ and $R_{AP}$ , respectively. With the RM definition in Eq. (6), the condition in Eq. (3) can be expressed as $$RM > NM + V_{OS-DL}. (7)$$ Eq. (7) indicates that higher RM allows better noise margin, but it also requires more sensing time to achieve the noise margin. Consequently, a reliable reading with fast access demands proper tradeoff between noise margin and sensing time. In general, the higher the RM a sensing circuit is able to achieve, the shorter the sensing time it needs for meeting the same noise margin target. Therefore, one of the main objectives in designing the SPR circuit is to maximize RM of the sensing circuit. #### 4.3. Simulation Setup The three sensing circuits are compared under the same SPR structure (Fig. 3) by performance, RM, and reliability through HSPICE simulations in a 65-nm CMOS technology. The MTJ model used in our simulations is summarized in Table I and Fig. 2. Both the chip-to-chip and across-chip local variation of CMOS device, and MTJ variations are implemented in our Monte Carlo simulations. The MTJ variation is modeled by the standard deviation of RA ( $\sigma_{RA}$ ) and TMR ( $\sigma_{TMR}$ ) extracted from measurements [11]. A total $\pm 5~\sigma$ of the MTJ variation is considered. For all circuits, the key design parameters such as the geometry and the bias voltage of critical transistors are optimized using the built-in **Figure 8:** Distribution of sensing signal $(V_{DATA})$ swing $(V_H - V_L)$ of SPSC (diode-connected load) and BVSC (body-connected load). SM is calculated using Eq. (2). optimization tool in HSPICE for the targets of maximizing the RMs at both 1 ns and 10 ns sensing time, under the same sensing current. The target sensing current ( $I_{MTJ,P}$ ) through the selected MTJ device was set at 50 $\mu$ A. ### 5. Comparison Results Figure 8 shows the sensing signal ( $V_{DATA}$ ) swing ( $V_{H}$ - $V_{L}$ ) comparison between SPSC that uses a diode-connected load and the proposed BVSC with a body-connected load. Using Eq. (2), SM is extracted from this plot. It can be seen that the body-connected load provides a better sensing quality due to a larger signal swing—it outperforms the diode-connected load with over a 3.5x higher SM. Such an improvement greatly relaxes the device matching constraints **Figure 9:** The single-ended output $(V_{OUT^+}, V_{OUT^-})$ distribution of (a) CMSC, (b) SPSC, and (c) BVSC, and the differential outputs $(V_{OUT})$ distribution of (d) CMSC, (e) SPSC, and (f) BVSC in the nominal case. RM is calculated using Eqs. (4) and (5). of the following amplifier stage. Figure 9 shows the output voltage distribution of the three sensing circuits after V<sub>OUT</sub> is completely settled. RM is extracted according to Eqs. (4) and (5). Note in Fig. 9 (a) that CMSC has the identical $V_{\text{OUT}}$ distribution for both reading $R_{\text{P}}$ and reading $R_{\text{AP}},$ as the common $V_{\text{OUT-}}$ is generated directly from the reference cell without differential amplification [7]. Consequently, its maximum range of $V_{OUT}$ is limited to $\pm V_{DD}/2$ , and so is RM. The simulation result (Fig. 9 (d)) shows that CMSC is able to achieve an RM of 285 mV under the nominal supply voltage (1V). Alternatively, the V<sub>OUT</sub> can be generated from a differential amplification stage in SPSC and BVSC circuits. This method can produce a V<sub>OUT</sub> in complement to V<sub>OUT</sub>, thereby effectively doubling RM. However, as SPSC performs sensing and amplification in the same stage, transistors in SPSC are placed in series with the memory cells. As a result, the voltage headroom consumed by these devices limits its output swing and subsequently RM. Figure 9 (e) shows that SPSC is able to achieve over 520 mV RM under 1V V<sub>DD</sub>. The proposed BVSC implements the sensing and the amplification in separate stages. Its RM is proportional to the output swing of the 2<sup>nd</sup> stage amplifier and is intrinsically large. So the BVSC design should be optimized with more emphasis on the power budget rather than large RM. As shown in Fig. 9 (f), BVSC has the largest RM that is over 620 mV under 1V V<sub>DD</sub>. With respect to CMSC and SPSC, BVSC has a RM improvement of 335 mV and 100 mV, respectively. The sensing time required for a certain RM is of practical interest to SPR. RM comparison measured at different sensing times is shown in Fig. 10. The SPR technique demands for a faster and wider separation of the RM curves. From this perspective, BVSC clearly shows the best suitability to SPR with great advantages in both RM and performance. CMSC has more balanced RM<sub>P</sub> and RM<sub>AP</sub>, which results from the equalizer used at the outputs as suggested in [7]. However, the equalization phase also brings about 1 ns overhead in delay, reducing CMSC's **Figure 10:** RM versus sensing time in the nominal case. BVSC has the best RM and performance due to body-voltage sensing and the 2<sup>nd</sup>-stage amplifier. **Figure 11:** RM versus sensing time with temperature variation. RM is calculated using Eqs. (4), (5) and (6). effective RM at short sensing times. SPSC has an improved RM as compared to CMSC, but the limited bandwidth at output nodes restricts its RM with a short sensing time also. The results in Fig. 10 show that, with a 1 ns sensing time, the proposed BVSC is able to achieve over 600 mV RM in the nominal case. This RM is 2.2 and 1.3 times higher than that of CMSC and SPSC with sensing times of 5 ns, respectively. The effect of temperature variation on the RM and performance is shown in Fig. 11, with RM calculated using Eqs. (4), (5) and (6). Temperature variation has little impact on performance for all the sensing circuits. In the worst case of temperature, the performance degradation for CMSC, SPSC, and BVSC (for a RM level of 200 mV) is about 0.33 ns, 0.12 ns, and 0.01 ns, and the RM reduction is 6%, 4%, and 3%, respectively. Figure 12 shows the effect of supply voltage variation. Decreasing voltage has negative impact on performance, as expected. The RM of CMSC and SPSC gets reduced accordingly, so does the common mode (CM) level of the **Figure 12:** RM versus sensing time with supply voltage variation. RM is calculated using Eqs. (4), (5) and (6). **Table II:** Sensing time required for different RM levels in the worst case across PVT variations. | RM (mV) | Sensing Time (ns) | | | | |---------|-------------------|------|------|--| | | CMSC | SPSC | BVSC | | | 100 | 1.96 | 1.48 | 0.57 | | | 200 | 4.33 | 2.25 | 0.65 | | | 300 | N/A | 3.16 | 0.78 | | | 400 | N/A | 4.41 | 1.0 | | | 500 | N/A | N/A | 1.72 | | | 600 | N/A | N/A | 3.64 | | sensing signal ( $V_{DATA}$ ) in BVSC. For BVSC, the output swing of the $2^{nd}$ stage amplifier is not only inversely related to the CM level of $V_{DATA}$ , but also proportional to the supply voltage. As a result, these two factors cancel each other, and BVSC is less sensitive to the supply voltage variation. In the worst case of supply voltage, the performance degradation for CMSC, SPSC, and BVSC (for a RM level of 200 mV) is about 1.5 ns, 0.23 ns, and 0.1 ns, and the RM reduction is 28%, 9%, and 6%, respectively. Table II summarizes the sensing time required for achieving different RM levels in the worst case across PVT variations. As the condition in Eq. (7) suggests, RM must be large enough before launching the regeneration phase, in order to overcome the device variations and noise. Simulation results show that a small input-referred offset voltage ( $\sigma(V_{OS-DL}) < 10 \text{ mV}$ ) can be achieved by properly sizing up the dynamic latch. By considering $\pm 3\sigma$ of the input-referred offset, a 200 mV RM can guarantee a noise margin of around 170 mV. The worst-case sensing time required by CMSC, SPSC, and BVSC for achieving such level of noise margin is 4.33 ns, 2.25 ns, and 0.65 ns, respectively. The proposed BVSC has significant speed advantage over CMSC and SPSC. It is able to provide over 300 mV RM at a GHz speed, which enables the practical application of the SPR scheme. Table III summarizes the average power of the sensing circuits in the nominal case, and the effective read energy per bit based on the worst-case sensing time required for a RM level of 200 mV (Table II). BVSC consumes higher power than CMSC and SPSC, resulting from the use of the 2<sup>nd</sup>-stage sense amplifier. However, due to the higher speed, the effective read energy per bit required by BVSC is close to that of SPSC and much lower than that of CMSC. This indicates that the proposed BVSC is able to greatly boost the read performance without sacrificing energy efficiency. Note that body-voltage sensing requires isolated N-wells for the PMOS transistors in sensor circuits (Fig. 6). Besides, BVSC has more transistors due to the 2<sup>nd</sup>-stage amplifier. **Table III:** Average power and read energy per bit | | CMSC | SPSC | BVSC | |----------------------|------|-------|-------| | Power (µW) | 135 | 80 | 300 | | Read Energy/bit (fJ) | 585 | 178.5 | 195.5 | **Figure 13:** The sensing current distribution, (a) $I_{MTJ,AP}$ , (b) $I_{MTJ,P}$ , with scaled switching characteristic of the MTJ (geometry from Table I and $J_C$ scaled by 0.5). These result in a certain area overhead on the peripheral circuitry. However, its impact on the overall area diminishes in proportion to the utilization rate of the memory. Figure 13 shows the sensing current distribution of the three circuits when operating with a 170 mV noise margin, plotted with the MTJ switching characteristic. To project the advantage of BVSC-based SPR circuit in future scaled STT-RAMs, we assume both $J_C$ and the size of the MTJ are scaled down by a factor of two. Clearly, the conventional reading techniques tend to be more destructive at such a level of scaling. With a similar amount of the sensing current ( $I_{MTJ,P} = 50 \, \mu A$ ), BVSC is able to operate with 3-7 times shorter sensing time, which significantly reduces the probability of read disturbance. Therefore, BVSC has better support for future advanced STT-RAMs in terms of performance and reliability. #### 6. Conclusions A body-voltage sensing based short-pulse reading circuit is a viable solution for high-speed and reliable reading of future scaled STT-RAMs. The proposed body-connected load properly trades off sensing speed with over 3x improvement in sensing margin, as compared to the conventional diode- connected load. A 2<sup>nd</sup>-stage differential amplifier further enhances the read margin, which allows earlier data latching with the same level of noise margin. As a result, the proposed SPR circuit is able to perform high-speed readings with the shortest current pulse reported to date. Such a short pulse (~ 1ns) has great promise to eliminate read disturbance and to support the aggressive scaling required of future low-power STT-RAM memories. ### **Acknowledgments** This work was supported by the DARPA STT-RAM (HR0011-09-C-0114) program. The authors thank Prof. Jianping Wang and Dr. Hui Zhao form the University of Minnesota for providing MTJ measurement data. #### References - [1] G. De Sandre, et al., "A 4 Mb LV MOS-Selected Embedded Phase Change Memory in 90 nm Standard CMOS Technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 52–63, Jan. 2011. - [2] W. Otsuka, et al., "A 4Mb Conductive-Bridge Resistive Memory with 2.3GB/s Read-Throught put and 216MB/s Program-Throughput," in *Proc. Int. Solid-State Circuits Conf.*, 2011, pp. 210–211. - [3] K. Tsuchida, et al., "A 64Mb MRAM with Clamped-Reference and Adequate-Reference Schemes," in *Proc. Int. Solid-State Circuits Conf.*, 2010, pp. 258–259. - [4] D. Smith, et al., "Latest Advances and Roadmap for In-Plane and Perpendicular STT-RAM," in 3<sup>rd</sup> Int. Memory Workshop (IMW), May 2011, pp. 1–3. - [5] F. TAbrizi, "Non-volatile STT-RAM: A True Universal Memory," Grandis Inc, Aug, 2009. - [6] G.D. Arndt, et al., "A 16-Mb MRAM Featuring Bootstrapped Write Drivers," *IEEE J. Solid-State Circuits*, Vol. 40, No. 4, pp. 902-908, Apr. 2005. - [7] J.P. Kim, et al., "A 45nm 1Mb Embedded STT-MRAM with Design Techniques to Minimize Read-Disturbance," in *Proc. VLSI Symp.*, 2011, pp. 296-297. - [8] S.O. Jing, et al., "Split Path Sensing Circuit," U.S. Patent 2010/0321976 A1, filled June 17, 2009. - [9] K. Ono, et. al., "A Disturbance-Free Read Scheme and A Compact Stochastic-Spin-Dynamics-Based MTJ Circuit Model for Gb-Scale SPRAM," in *Proc. IEEE Int. Electron Devices Meet.*, 2009, pp. 1–4. - [10] B. Razavi, Design of Analog CMOS Integrated Circuits, Chapter 6, Tata McGraw-Hill, 2002. - [11] R. Dorrance, et. al., "Scalability and Design-Space Analysis of a 1T-1MTJ Memory Cell," in *Proc. ACM/IEEE Int. Symp. on Nanoscale Arch.*, June 2011, pp. 32–36.