# Reference Calibration of Body-Voltage Sensing Circuit for High-Speed STT-RAMs

Fengbo Ren, Student Member, IEEE, Henry Park, Student Member, IEEE, Chih-Kong Ken Yang, Fellow, IEEE, and Dejan Marković, Member, IEEE

Abstract—With the continuing scaling of MTJ, the high-speed reading of STT-RAM becomes increasingly difficult. Recently, a body-voltage sensing circuit (BVSC) has been proposed for boosting the sensing speed. This paper analyzes the effectiveness of using the reference calibration technique to compensate for the device mismatches and improve the read margin of BVSC. HSPICE simulation results show that a 2-bit reference calibration can improve the worst-case read margin in a 1-Mb memory by over 3 times. This leads to up to 30% higher yield across all process corners. In order to maintain the yield improvement even in the worst-case corner, independent calibration circuitry has to be deployed for each memory array.

*Index Terms*—Body-voltage sensing, CMOS, magnetic tunnel junction (MTJ), nonvolatile memory, read margin, reference calibration, sensing margin, spin-transfer torque random access memory (STT-RAM).

#### I. INTRODUCTION

**S** PIN-TORQUE transfer RAMs (STT-RAMs) have been the subject of extensive research in the past several years. STT-RAM is often perceived as the "universal memory" due to its potential for high density, low energy, and high speed. Prototypes incorporating smaller cell size than SRAM, better performance than DRAM, non-volatility of Flash, and the endurance on the order of  $10^{16}$  read/write cycles have been reported [1]–[8]. Moreover, the switching current reduction, driven by the dimension and critical current density ( $J_C$ ) scaling of the magnetic tunnel junction (MTJ), has been pushing down the power consumption of STT-RAM toward embedded and mobile applications [9]–[15].

With the continuing scaling of MTJ, the high-speed reading of STT-RAM becomes increasingly difficult, not only because both the CMOS and MTJ variability keep increasing, but also the switching current of MTJ will reach the order of 10  $\mu$ A, which can be very challenging for reliable high-speed sensing (Fig. 1) [16]. To improve the sensing, our previous work [17] implements the concept of short pulse reading (SPR) [16] to allow a higher read current for better sensing speed. This paper

Digital Object Identifier 10.1109/TCSI.2013.2252653



Fig. 1. Scaling trend of MTJ switching current according to the Grand is STT-RAM roadmap [7].

enhances the reliability of the proposed body-voltage sensing circuit (BVSC) [17] by adding the capability of calibrating the reference voltage level. The enhanced BVSC features shorter sensing time for higher sensing margin and less read disturbance as compared to prior sensing circuits, which makes the scheme suitable for future technology scaling.

The key component in the BVSC approach is the body-connected load [17], [18]. The body-connected load utilizes bodyvoltage modulation (BVM) to adjust the sensing voltage according to the sensing current. While the other types of sensing circuits that adopt the diode-connected or the current source load suffer from deficiency either in sensing margin or speed, the BVSC is optimized to support both features [17]. However, BVM is more sensitive to the threshold voltage variation as compared to gate-voltage modulation (GVM) [17]. As a result, a shifting of the sensing voltage in the worst case corner would deteriorate the effective sensing margin of BVSC. If the reference voltage level is to be fixed, the corresponding read margin would also be degraded. This paper explores the feasibility of using a reference calibration technique to recover the read margin loss due to process variations for BVSC. The main motivation is to improve the stability of BVSC for better yield of STT-RAMs. The next section briefly reviews the SPR concept and the BVSC with the definitions of sensing margin and read margin. Section III describes the main idea of the reference calibration in details. Section IV discusses simulation results showing that a simple 2-bit reference calibration improves the worst-case read margin in a 1-Mb memory by over 3 times, leading to a significant yield increase (up to 30%) across all corners. Conclusions are presented in Section V.

Manuscript received September 27, 2012; revised January 24, 2013; accepted February 15, 2013. Date of publication April 02, 2013; date of current version October 24, 2013. This work was supported by the DARPA STT-RAM (HR0011-09-C-0114) program. This paper was recommended by Associate Editor M. M. Khellah.

The authors are with the Department of Electrical Engineering, University of California, Los Angeles, CA 90095 USA (e-mail: fren@ee.ucla.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.



Fig. 2. Basic MTJ structure. Switching current from the fixed (free) to the free (fixed) layer switches the MTJ into a parallel (anti-parallel) state.

## II. HIGH-SPEED READING OF STT-RAM THROUGH BVSC

#### A. STT-RAM and SPR

MTJ is the storage element of STT-RAM. It consists of two ferromagnetic layers separated by a thin nonconductive tunneling barrier (e.g., MgO) as shown in Fig. 2. The thicker ferro magnet with fixed magnetic orientation is called the fixed layer or the pinned layer. The thinner layer with flexible magnetic orientation is called the free layer. The MTJ exhibits two resistive states determined by the relative magnetization directions of the fixed and free layers: a parallel (P) orientation produces a low resistance (R<sub>P</sub>) and an anti-parallel (AP) orientation results in a high resistance (R<sub>AP</sub>). The resistance difference between the two states is measured by the tunnel magneto-resistance ratio (TMR), defined as  $(R_{AP} - R_P)/R_P$ . A higher TMR indicates better readability and is thereby preferred by the reading operation.

In STT-RAMs, data is stored in MTJs in a magnetic form: "0" and "1" are represented by magnetization direction of the free layer. The switching of the MTJ can be controlled by a bi-directional writing current as shown in Fig. 2: the current in the direction from the fixed (free) to the free (fixed) layer writes the MTJ into the AP (P) state. As shown in Fig. 3, the switching probability of MTJ can be characterized as a function of both the switching current and the switching time (duration of the switching current) [19]. In STT-RAM design, the sensing current distribution has to be kept within the 0% switching probability region in order to avoid destructive read [17]. Note that 0% switching probability corresponds to low read currents for long read durations and high read currents for short read durations. Different from the low current reading (LCR) scheme, the SPR scheme senses the cell orientation with a current that is close in amplitude to the writing current but with much shorter pulse to improve the sensing speed without risking read disturbance [17].

# B. BVSC

A schematic of the sensing stage of BVSC is shown in Fig. 4. In the sensing circuit, the resistance difference between the  $R_P$  and  $R_{AP}$  states is captured by a sensing current difference  $(\Delta I)$ , which is converted into a voltage difference  $(\Delta V)$  through a load transistor  $(P_L)$ . The conversion ratio from  $\Delta I$  to  $\Delta V$  is given by the small-signal resistance  $(R_{LOAD})$  of the load transistor as





Fig. 3. The switching characteristic of the MTJ [19] with illustrations of the sensing current distribution in the short pulse reading (SPR) and the low current reading schemes.



Fig. 4. The schematic of the sensing stage of BVSC.

Note that the two sensing voltages  $V_H$  and  $V_L$  are strongly affected by the process and parametric variations in CMOS and MTJ devices. Using the sensing voltage statistics, we define the worst-case margin between  $V_H$  and  $V_L$  as the sensing margin (SM), given by

$$SM = \mu (V_H - V_L) - 3\sigma (V_H - V_L).$$
(2)

According to our previous study [17], a small  $R_{LOAD}$  (as in a diode-connected load) leads to a high sensing speed but low sensing margin. On the other hand, a large  $R_{LOAD}$  (e.g., a current source load) results in slow speed and large variation of the sensing voltage, leading to limited sensing margin. In order to provide large sensing margin while maintaining sensing speed, BVSC uses a body-connected load, which has an effective  $R_{LOAD}$  5–6 times bigger than that of the diode-connected load, but 2–3 orders of magnitude smaller than that of the current source load. As a result, BVSC offers balanced sensing speed vs. sensing margin tradeoff.

The sensing margin defined in (2) statistically characterizes the quality of resistance sensing in presence of device variability, but it does not show the exact voltage difference seeing at the sense amplifier input when reading a memory cell. In order to capture that, we define the read margin (RM) as the readability of the sensing voltage given a reference voltage ( $V_{\rm REF}$ ) and the input-referred offset ( $V_{\rm OS}$ ) of a sense amplifier. According to the resistance state of the memory cell being read, we have  $\rm RM_{AP}$  and  $\rm RM_{P}$  defined as

 $RM_{AP} = V_H - V_{REF} - V_{OS},$ 

and

$$RM_P = V_{BEF} - V_L - V_{OS},\tag{4}$$

(3)

Respectively. The overall RM is defined as the worst case of the two,

$$RM = min\{RM_P, RM_{AP}\}.$$
(5)

Ideally,  $V_{\rm REF}$  should be equal to the common-mode level of the sensing voltage, as  $V_{\rm REF} = (V_{\rm H} + V_{\rm L})/2$ , and it can be generated by a voltage divider network that connects the  $V_{\rm DATA}$  of two sensing stages that sense  $R_{\rm P}$  and  $R_{\rm AP}$ , respectively. However, for the memory cells in an array sharing the same sense amplifier,  $V_{\rm H}$  and  $V_{\rm L}$  are subject to process variations and could be independent variables, whereas,  $V_{\rm REF}$  and  $V_{\rm OS}$  have to be common factors. As a result, the optimal  $V_{\rm REF}$  for each array is subject to the actual distribution of the sensing voltages and should be determined on a case-by-case basis. In order to find the optimal  $V_{\rm REF}$  for maximizing the read margin, a reference calibration method is proposed and discussed in the following sections.

#### III. DEVICE VARIATION AND REFERENCE CALIBRATION

## A. Impact of CMOS and MTJ Variations

The device variation of MTJ can be lumped into independent Gaussian variations of  $R_P$  and TMR [20], [21]. The effect of such variations together with the variation of access transistors on the sensing behavior can be visualized as shown in Fig. 5. The right hand side of Fig. 5 shows the I-V curves of the clamp transistor  $(N_C)$  and the MTJ cell. Depending on the resistance state of MTJ, the cross-point of the two curves determines the sensing current, I<sub>MTJ</sub>. The left hand side of Fig. 5 presents the I-V curve of the load transistor (P<sub>L</sub>). Thus, by projecting the same  $I_{MTJ}$  onto the I-V curve of  $P_L$ , the sensing voltage, V<sub>DATA</sub>, can be obtained. The variation of each device contributes to the variation of V<sub>DATA</sub>, yet at different levels. In memory design, different memory arrays are usually driven by independent sensing stages. For each array, the variations of  $P_L$  and  $N_C$  tend to affect its  $V_{DATA}$  statistics globally—they shift the mean of  $\mathrm{V}_{\mathrm{DATA}}$  distribution of the whole array. On the other hand, the MTJ device variation tends to populate  $V_{DATA}$ of each single cell around the mean locally. These combined global and local deviations may cause substantial yield loss if a fixed global V<sub>REF</sub> is used.

Another potential problem which may reduce the effectiveness of the body-connected load is that the P-N junction formed



Fig. 5. I-V curves of the load and clamp transistors in the sensing stage of BVSC illustrating the impact of process variations on the sensing behavior.

between the source and the body regions of the PMOS transistor can be turned on if  $V_{DATA}$  is well below  $V_{DD}(V_{DD} - 0.7 V)$ . This junction leakage would cause BVM to haveweaker control and the effective  $R_{LOAD}$  to increase (Fig. 5). As a result, the effective sensing margin would be reduced if the operating point of  $P_L$  is shifted beyond the inflection point  $(V_{DD} - 0.7 V)$ shown in Fig. 5 due to process variations. This effect is the most prominent in the fast-NMOS slow-PMOS (FS) corner. Fig. 6 shows the histograms of V<sub>DATA</sub> in a 1 Mb memory at typical and FS corners. During  $\mathrm{R}_{\mathrm{AP}}$  sensing,  $\mathrm{N}_{\mathrm{C}}$  is in deep saturation such that its V<sub>DS</sub> drop will vary substantially if the sensing current changes due to process variation. On the other hand, during R<sub>P</sub> sensing, the sensing current at FS corner is sufficiently high such that  $N_{\rm C}$  may enter the linear region. As a result, the  $V_{\rm H}$ distribution shows relatively larger shift between the two corners (typical and FS) than that of the  $V_L$  distribution as shown in Fig. 6, and the effective sensing margin is degraded in the FS corner. Such shifting would cause read errors if a fixed  $V_{REF}$  is used. Therefore, a reference calibration scheme that can adaptively adjust V<sub>REF</sub> according to the sensing voltage statistics is desired in order to best utilize the sensing margin, especially at the FS corner.

#### B. Reference Calibration

As multiple MTJ cells in the same array share a sensing circuit, the variations of  $P_L$  and  $N_C$  can be compensated by calibrating the V<sub>REF</sub> level at the sense amplifier input. The generation of multiple reference levels can be implemented by using resistor taps as shown in Fig. 7. By choosing the optimal  $V_{REF}$  level through the configuration bit (SEL), the read margin does not degrade as much from the device mismatch and hence the chance of reading errors reduces. This concept is illustrated in Fig. 8. The distributions of  $V_L$  and  $V_H$  for reading the whole memory array are modeled as Gaussian distributions with different mean and standard deviation. The band bounded by dashed lines around V<sub>REF</sub> represents the zone where reading errors would occur, if  $V_{DATA}$  falls within the band, due to device mismatch of the sense amplifier and random noises. The width of the error zone can be characterized by  $V_{\rm OS}$  of the sense amplifier and the target noise margin (NM). Note that since  $V_{REF}$  is usually generated from sensing a separate reference array other than regular memory arrays,  $V_{DATA}$  may vary independently from V<sub>REF</sub>. Fig. 8(a) illustrates an example where the V<sub>DATA</sub> distribution of the whole array is shifted toward the right-hand side of  $V_{REF}$ . In this case, the worst-case read margin of the whole array might be tiny or even negative



Fig. 6. The  $\rm V_{DATA}$  distribution of a 1-Mb memory with BVSC at: (a) the typical and (b) the FS corners.



Fig. 7. Proposed reference calibration scheme. Resistor taps generate the  $V_{\rm REF}$  levels for digital calibration.

and reading errors are very likely to occur. However, this can be fixed by calibrating the  $\rm V_{REF}$  level as shown in Fig. 8(b). As  $\rm V_{REF}$  increases, the read margin for reading P and AP states



Fig. 8. Illustration of reference calibration used to improve the worst-case read margin. (a) Before calibration, (b) after calibration.

become more balanced such that the worst case read margin is significantly improved.

For analytical purpose, the reading error is modeled mathematically as follows. Suppose that  $\mu_p$ ,  $\sigma_p$  and  $\mu_{AP}$ ,  $\sigma_{AP}$  are the mean and standard deviation of V<sub>H</sub> and V<sub>L</sub> distributions, respectively, and  $d_P$  and  $d_{AP}$  represent the distance from the boundary of the error zone to  $\mu_P$  and  $\mu_{AP}$ , respectively. Then, the probability for a read error to occur when reading a memory array with N MTJ cells is the complement probability of all the V<sub>DATA</sub> to be distributed out of the error zone, as given by

$$P_{err} = 1 - \left[\frac{1}{2} \cdot \operatorname{erf}\left(\frac{d_P}{\sqrt{2} \cdot \sigma_P}\right) + \frac{1}{2} \cdot \operatorname{erf}\left(\frac{d_{AP}}{\sqrt{2} \cdot \sigma_{AP}}\right)\right]^N.$$
(6)

Fig. 9 plots the read error probability color map as a function of  $d_P/\sigma_P$  and  $d_{AP}/\sigma_{AP}$ . As  $d_P/\sigma_P$  and  $d_{AP}/\sigma_{AP}$  increase, which implies less device variability, the error probability goes down exponentially. For practical design, the sum of and is usually limited by the sensing margin and is assumed to be fixed. In this case, the minimum is achieved when

$$\frac{d_P}{\sigma_P} = \frac{d_{AP}}{\sigma_{AP}},\tag{7}$$

as indicated in Fig. 9. Therefore, the primary goal of reference calibration is to provide the best available read margin for the worst-case reading by choosing the optimal  $V_{\rm REF}$  level that satisfies (7). Ideally, this can be done with continuous tuning of the reference levels. In practice, there is a tradeoff between increasing the number of configuration bits (granularity of  $V_{\rm REF}$ ) and chip area.

The detailed calibration algorithm is shown in Fig. 10. The algorithm begins by setting  $V_{\rm REF}$  to the middle node of the resistor taps, SEL = Max./2. The lower (SEL<sub>L</sub>) and upper (SEL<sub>H</sub>) bounds of the preferred  $V_{\rm REF}$  is determined as the boundary between a successful and a failed read-after-write operation on data pattern "0" and "1," respectively. Therefore, the



Fig. 9. Read error probability for a 512-cell memory array as a function of the sensing margin to device variability ratio  $(d_P/\sigma_P \text{ and } d_{AP}/\sigma_{AP})$ . Numbers on the contour lines represent error exponent (see vertical bar on the right).



Fig. 10. Flow diagram of the reference calibration method. The upper and lower boundaries of the proffered  $\rm V_{REF}$  is searched by sweeping the control bits (SEL). At the end of the calibration process,  $\rm V_{REF}$  is set as selected by SEL = (SEL\_H + SEL\_L)/2.

read and write of both data patterns are required to exercise all the cells in an array. In the algorithm,  $SEL_L$  is first searched with data pattern "0." Since the first SEL value may lead to either a successful or a failed read, the searching of  $SEL_L$  may follow different directions according to the first result. Differently, as long as  $SEL_L$  is determined, we only need to search for  $SEL_H$  in the other direction. Theoretically, all the  $V_{REF}$  levels within the bounds are error free. To maximize read margin, the optimum  $V_{REF}$  is selected by  $SEL = (SEL_H + SEL_L)/2$ .

A block diagram that illustrates the STT-RAM architecture with the reference calibration technique applied is shown in Fig. 11. The calibrated control bits can be stored in an off-chip



Fig. 11. Block diagram of the STT-RAM architecture with the reference calibration technique applied. SS: Sensing Stage, CC: Calibration Circuitry, SR: Shift Register.

TABLE I Summary of MTJ Parameters

| Cine           | 50 mm × 120 mm         |
|----------------|------------------------|
| Size           | 30 mm × 130 mm         |
| RA             | 14.8 Ω•um <sup>2</sup> |
| TMR            | 110%                   |
| $\sigma_{RA}$  | 4%                     |
| $\sigma_{TMR}$ | 5%                     |

EEPROM which automatically loads control datainto the local registers for selecting the optimal  $V_{\rm REF}$  during reset or initialization. Therefore, the calibration process only needs to be performed by once after the fabrication. In the case of the device having time-varying characteristics, the calibration control bits may also be updated by software periodically.

## **IV. SIMULATION RESULTS**

#### A. Simulation Setup

In order to analyze the effectiveness of the reference calibration technique and determine the number of configuration bits for effective calibration, we simulate the reading of a 512-cell memory array using HSPICE Monte Carlo (MC) simulations with a 65-nm CMOS model. Both the across-chip variations and the chip-to-chip variations are enabled in the simulations. At each process corner,  $10^6$  MC runs are conducted for statistics parameter extraction purposes [22]. The MTJ model and its variation parameters used in the simulations are summarized in Table I. The MTJ variation is modeled by the standard deviation of resistance-area (RA) ratio ( $\sigma_{RA}$ ) and TMR ( $\sigma_{TMR}$ ) extracted from measurements [20]. A total  $\pm 5\sigma$  of the MTJ variation is considered.

## B. Read Margin and Yield Improvements

Fig. 12 shows the read margin statistics of a 512-cell memory array extracted from  $10^6$  MC runs in the nominal case. For Fig. 12(b)–(d), the optimal V<sub>REF</sub> in the simulation is determined using the algorithm shown in Fig. 10. The simulation results show that a simple 2-bit reference calibration can effectively improve the worst case read margin by >3 times. With one extra calibration bit, another 30% improvement can be achieved. The improvement margin nearly saturates at 4 calibration bits. Clearly, the amount of improvement by reference calibration becomes much less significant when



Fig. 12. The read margin statistics of a 512-cell memory array extracted from  $10^6$  MC runs in the nominal case with: (a) no calibration, (b) 2-bit calibration, (c) 3-bit calibration, (d) 4-bit calibration.

the calibration resolution exceeds 2 bits. This is because the reference calibration actually re-distributes the sensing margin around  $V_{\rm REF}$  rather than enlarging it. Fig. 12 indicates that a 2-bit reference calibration is sufficient for the nominal case.

Fig. 13 presents the yield of a 512-cell memory array calculated from (6) using parameters extracted from the MC simulations in different process corners. One sigma of the  $V_{OS}$  of the sense amplifier is assumed to be 11 mV. Note that the  $V_{REF}$  level that maximizes the read margin in the nominal case is chosen as the nominal operating point. However, in the worstcase corner (FS corner) the array yield is around 70% without any reference calibration. This indicates only tuning the operation point of  $V_{REF}$  is not sufficient for compensating the variations in the worst-case corner. If a 1-Mb memory is built using multiple such arrays, the overall yield would be lower than 1% in the FS corner. However, with a 2-bit reference calibration, the yield of a 1-Mb memory can be improved to over 99.7% across all corners. This result indicates that reference calibration can be very effective in compensating within-die variations.

In addition, reference calibration can also be used to compensate for the device mismatch of sense amplifiers by shifting  $V_{REF}$  against the  $V_{OS}$ . Fig. 14 illustrates the yield improvement by reference calibration of a 512-cell memory array as a function of the variability of the sense amplifier. In the nominal case, the yield drops rapidly as the standard deviation of  $V_{OS}$ increases beyond 15 mV without reference calibration. With reference calibration, the yield stays nearly constant. This trend becomes more prominent in the FS corner. These results show that the reference calibration technique is able to relax the device matching requirements of the sense amplifier design without sacrificing the yield. Similar results have also been reported by a study on a self-reference scheme [23]. However, the self-reference scheme improves the reading robustness at the cost of lowering the sensing speed [23], while our technique does not affect the sensing speed at all.



2937

Fig. 13. Yield of a 512-cell memory array at different process corners with: (a) no calibration, (b) 2-bit calibration.



Fig. 14. Yield of a 512-cell memory array as a function of the sense amplifier variability at: (a) the typical and (b) the FS corners.



Fig. 15. (a) Yield of an 1-Mb memory when multiple arrays are sharing the same calibrated  $\rm V_{REF}$  at: (a) the typical and (b) the FS corners.

## C. Reference Sharing

For all the above-mentioned results, we assume each array of the memory has its own control bits for independent reference calibration. In order to minimize the area overhead of the calibration circuitry, we also study the feasibility of sharing a single calibrated  $V_{REF}$  across multiple arrays. Fig. 15 illustrates how the yield of a 1-Mb memory is affected by sharing the calibrated V<sub>REF</sub>. As one would expect, sharing across more than 32 arrays results in a yield drop in the typical corner. In the FS corner, the yield drops rapidly as more than 1 array are sharing the same calibrated V<sub>REF</sub>. Although increasing the calibration resolution helps to alleviate the drop rate, the yield loss is still significant. As the array size covered by the same calibrated  $V_{REF}$ increases, the chance of the tail of the sensing voltage distribution exceeding the calibrated  $V_{REF}$  also increases (Fig. 6). As a result, the effectiveness of the reference calibration technique diminishes. This indicates that dedicated calibration circuitry has to be deployed for each memory array if the yield loss due to the worst-case corner is critical to the designers.

The area overhead of calibration circuitry may depend on the column mux ratio and the area utilization rate of the memory. In the case of a 2-bit calibration, the calibration circuitry introduced to each array only includes a short resistor ladder, a few transmission gates, and a few control bit registers. In addition, applying the reference calibration relaxes the device matching requirements of the sense amplifier (Fig. 14), which allows for potential area saving from using smaller devices and may mitigate the area overhead. According to our estimation, the overall area overhead of reference calibration circuitry is limited to 10% of the peripheral circuitry. Such impact reduces as the column mux ratio and memory area utilization rate increases.

#### V. CONCLUSION

This paper presents a technique of using reference calibration as an enhancement to BVSC to enable fast and reliable reading of STT-RAM. The simulation results show that by applying a simple 2-bit reference calibration, the worst case read margin due to process variations can be improved by over 3 times, leading to a significant yield increase (up to 30%) across all corners. Moreover, the reference calibration technique improves the yield in the presence of device mismatch in the sense amplifier design. In practical design, where the yield loss due to the worst-case corner is critical to the designers, dedicated reference calibration circuitry should be employed to each memory array.

## ACKNOWLEDGMENT

The authors thank Prof. Jianping Wang and Dr. Hui Zhao form the University of Minnesota for providing MTJ measurement data. This work was supported by the DARPA STT-RAM program.

#### REFERENCES

- G. De Sandre et al., "A 4 Mb LV MOS-selected embedded phase change memory in 90 nm standard CMOS technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 1, pp. 52–63, Jan. 2011.
- [2] W. Otsuka et al., "A 4Mb conductive-bridge resistive memory with 2.3 GB/s read-throughput and 216 MB/s program-throughput," in *Proc. Int. Solid-State Circuits Conf.*, San Francisco, CA, USA, 2011, pp. 210–211.
- [3] K. Tsuchida et al., "A 64 Mb MRAM with clamped-reference and adequate-reference schemes," in Proc. Int. Solid-State Circuits Conf., San Francisco, CA, USA, 2010, pp. 258–259.
- [4] Y. Pan et al., "Quasi-nonvolatile SSD: Trading flash memory nonvolatility to improve storage system performance for enterprise applications," in Proc. 18th Int. Symp. High Perform. Comput. Archit. (HPCL'12), New Orleans, LA, USA, 2012, pp. 1–10.
- [5] Y. Pan *et al.*, "On the case of using quasi-EZ-NAND flash memory to build future solid-state drives," *IEEE Trans. Comput.*, 2012, to be published.
- [6] G. Dong *et al.*, "Estimating information-theoretical NAND flash memory storage capacity and its implication to memory system design space exploration," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 20, no. 9, pp. 1705–1714, 2012.
  [7] D. Smith *et al.*, "Latest advances and roadmap for in-plane and per-
- [7] D. Smith *et al.*, "Latest advances and roadmap for in-plane and perpendicular STT-RAM," in *Proc. 3rd Int. Memory Workshop (IMW)*, Dallas, TX, USA, 2011, pp. 1–3.
- [8] Y. Pan, "Exploring the use of emerging nonvolatile memory technologies in future FPGAs," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, 2012, to be published.
- [9] F. Ren and D. Marković, "True energy-performance analysis of the MTJ-based logic-in-memory architecture (1-Bit full adder)," *IEEE Trans. Electron Devices*, vol. 57, no. 5, pp. 1023–1028, May 2010.
- [10] C. Zhang *et al.*, "Mapping channel estimation and MIMO detection in LTE-advanced on a reconfigurable cell array," in *Proc. Int. Symp. Circuits Syst. (ISCAS)*, Seoul, Korea, 2012, pp. 1799–1802.
- [11] H. Wu et al., "A 60 GHz on-chip RF-interconnect with λ/4 coupler for 5 Gbps Bi-directional communication and multi-drop arbitration," in Proc. IEEE Custom Integr. Circuits Conf. (CICC), San Jose, CA, USA, 2012, pp. 9–12.
- [12] W. Xu et al., "In-place FPGA retiming for mitigation of variational single-event transient faults," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 6, pp. 1372–1381, Jun. 2011.

- [13] C. Zhang et al., "Energy efficient MIMO channel pre-processor using a low complexity on-line update scheme," in *Proc. IEEE NORCHIP*, Copenhagen, Denmark, 2012, pp. 1–4.
- [14] N. Amini et al., "Experimental analysis of IEEE 802.15.4 for on/off body communications," in Proc. 22nd IEEE Symp. Personal Indoor Mobile Radio Commun. (PIMRC), Toronto, ON, Canada, 2011, pp. 2138–2142.
- [15] W. Xu et al., "eCushion: An eTextile device for sitting posture monitoring," in *IEEE Int. Conf. Body Sensor Networks (BSN)*, Dallas, TX, USA, 2011, pp. 194–199.
- [16] K. Ono *et al.*, "A disturbance-free read scheme and a compact stochastic-spin-dynamics-based MTJ circuit model for Gb-scale SPRAM," in *Proc. IEEE Int. Electron Devices Meet.*, Baltimore, MD, USA, 2009, pp. 1–4.
- [17] F. Ren et al., "A body-voltage-sensing-based short pulse reading circuit for spin-torque transfer RAMs (STT-RAMs)," in Proc. 13th Int. Symp. Quality Electron. Design (ISQED'12), Santa Clara, CA, USA, pp. 275–282.
- [18] M. F. Chang et al., "A 0.5 V 4 Mb logic-process compatible embedded resistive RAM (ReRAM) in 65 nm CMOS using low-voltage currentmode sensing scheme with 45 ns random read time," in *Proc. Int. Solid-State Circuits Conf.*, San Francisco, CA, USA, 2012, pp. 434–435.
- [19] H. Zhao *et al.*, "Spin-torque driven switching probability density function asymmetry," *IEEE Tran. Magn.*, vol. 48, no. 11, pp. 3818–3820, Nov. 2012.
- [20] R. Dorrance *et al.*, "Scalability and design-space analysis of a 1T-1 MTJ memory cell for STT-RAMs," *IEEE Trans. Electron Devices*, vol. 59, no. 4, pp. 878–887, Apr. 2012.
- [21] R. Dorrance et al., "Scalability and design-space analysis of a 1T-1MTJ memory cell," in Proc. ACM/IEEE Int. Symp. Nanoscale Arch. (NANOARCH'll), San Diego, CA, USA, Jun. 2011, pp. 32–36.
- [22] R. Kanj et al., "Mixture importance sampling and its application to the analysis of SRAM designs in the presence of rare failure events," in *IEEE/ACM Proc. 43rd Design Autom. Conf.*, Anaheim, CA, USA, 2006, pp. 69–72.
- [23] Z. Sun et al., "Variation tolerant sensing scheme of spin-transfer torque memory for yield improvement," in *IEEE/ACM Int. Conf. Comput.-Aid-Design*, Brooklyn, OH, USA, 2010, pp. 432–437.



Fengbo Ren (S'10) was born in Shenyang, China. He received the B.Eng. degree in electrical engineering from Zhejiang University, Hangzhou, China, in 2008, and the M.S. degree from University of California, Los Angeles, CA, USA, in 2010, where he is currently a Ph.D. candidate, specializing in circuit and embedded systems.

In 2006, he studied as exchange student with the Department of Electronic & Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong. During the fall of 2009, he

worked as an Engineer Intern with the Digital ASIC Group, Qualcomm Inc., San Diego, CA, USA, where he contributed to the low power design flow of a smart phone SOC. During the summer of 2012, he worked as a Ph.D. Intern with the Data Center Group, Cisco Systems Inc., San Jose, CA, USA, where he was involved in the FPGA emulation of a data center switch ASIC. His current research interests include circuit design and design optimization for STT-RAM and efficient DSP architectures for the sparse signal processing in compressive sensing applications.



Henry Park (S'07) was born in Los Angeles, CA, USA. He received the B.S.E.E. degree (summa cum laude) from Seoul National University, Seoul, Korea, in 2003, and the M.S.E.E. degree from UCLA, Los Angeles, CA, USA, in 2009, where he is currently working toward the Ph.D. degree in electrical engineering.

From 2003 to 2006, he was with Hunter Technology, Seoul, Korea, where he developed microprocessor embedded systems with digital and analog interface. During the summer and fall of

2009, he was with Broadcom, Irvine, CA, USA, where he was involved in high precision data converters and noise chopper design. His research interests include data converters and statistical analysis of memory cell stability.



**Chih-Kong Ken Yang** (S'94–M'98–SM'07–F'10) was born in Taipei, Taiwan. He received the B.S. and M.S. degrees in 1992 and the Ph.D. degree in 1998 from Stanford University, Stanford, CA, USA, in electrical engineering.

He joined University of California at Los Angeles, CA, USA, as an Assistant Professor in 1999 and has been a Professor since 2009. His current research area is high-performance mixed-mode circuit design for VLSI systems such as clock generation, high-performance signaling, low-power digital

functional blocks, and analog-to-digital conversion.



**Dejan Marković** (S'96–M'06) received the Dipl.Ing. degree from the University of Belgrade, Serbia, in 1998 and the M.S. and Ph.D. degrees from the University of California, Berkeley, CA, USA, in 2000 and 2006, respectively, all in electrical engineering.

In 2006, he joined the faculty of the Electrical Engineering Department at the University of California, Los Angeles, CA, USA, as an Assistant Professor. Since 2009, he has been affiliated with the Biomedical Engineering Interdepartmental

Program at UCLA as a co-chair of the Neuroengineering field. He is also a director of the Integrated Circuits track within the UCLA Master of Science in Engineering Online Program. His current research is focused on integrated circuits for emerging radio and healthcare systems, programmable ICs, design with post-CMOS devices, optimization methods and CAD flows.

Dr. Marković was awarded the CalVIEW Fellow Award in 2001 and 2002 for excellence in teaching and mentoring of industry engineers through the UC Berkeley distance learning program. In 2004, he was a co-recipient of the Best Paper Award at the IEEE International Symposium on Quality Electronic Design. In recognition of the impact of his Ph.D. work, he received the 2007 David J. Sakrison Memorial Prize at UC Berkeley. He received an NSF CAREER Award in 2009. In 2010, he was a co-recipient of ISSCC Jack Raper Award for Outstanding Technology Directions and a winner of the DAC/ISSCC Student Design Contest.