基于MRAM的新型存内计算范式

杨茜; 王远博; 王承智; 常亮

doi:10.20193/j.ices2097-4191.2024.06.005

PDF(6757 KB)

集成电路与嵌入式系统 ›› 2024, Vol. 24 ›› Issue (6) : 29-40. DOI: 10.20193/j.ices2097-4191.2024.06.005

研究论文

基于MRAM的新型存内计算范式

杨茜 ¹ ,
王远博 ¹ ,
王承智 ²^,^* ,
常亮 ¹^,^*

作者信息 +

Novel computing-in memory methodology based on MRAM

YANG Xi ¹ ,
WANG Yuanbo ¹ ,
WANG Chengzhi ²^,^* ,
CHANG Liang ¹^,^*

Author information +

文章历史 +

摘要

存内计算(CIM,Computing-in Memory)是一种为缓解“内存墙”和“功耗墙”而出现的新兴架构。因CPU处理器和存储器速度发展不均衡性,冯·诺依曼架构这类中央处理器与存储器分离的结构逐渐失去其优越性。存内计算提出以计算和存储相结合的方式来减少数据的搬移,极大地提升了计算效率。MRAM作为最有潜力的新一代非易失存储器件,被视为构建高效存内计算架构的有力候选者。以MRAM为基础构建的存内计算根据计算过程的不同可分为MRAM模拟存内计算和MRAM数字存内计算。数字存内计算又可以根据数字逻辑产生的方式分为MRAM写入式存内计算、MRAM读取式存内计算以及MRAM近存计算。MRAM模拟存内计算利用高并行度摊销能耗,在单位面积上,吞吐量和能效都具有数字存内计算无法比拟的优势,但也因其易受PVT影响等特征在实际应用中有所限制。MRAM数字存内计算实现方式多样,写入式存内计算几乎消除了存储器外的数据搬移,虽然当前工艺下的MRAM所需的翻转能耗和时延过大,导致该方式一直停留在仿真阶段,但不妨碍该存内计算是缓解“内存墙”最有效的手段之一;读取式存内计算严重依赖于读取放大器的功能设计,在相关领域有所发展,但所受限制较大;近存计算是当前MRAM非易失器件和CMOS电路在计算速度和计算能效差异较大的情况下,融合两者优势的优解,在实际应用中具有巨大的益处。

Abstract

Computing-in Memory (CIM) is an emerging architecture to alleviate "memory walls" and "power consumption walls".With further expansion of the imbalance between CPU processing speed and memory access speed,structures that separate the central process and memory,such as the Von Neumann architecture,lose its superiority.CIM proposes a novel structure that combines the compute unit and the storage to reduce data movement,significantly improving computational efficiency.MRAM,as one of the most promising next-generation nonvolatile memory devices,is also considered a strong candidate for building efficient CIM architectures.CIM based on MRAM can be divided into analog MRAM CIM and digital MRAM CIM according to the different calculation processes.Digital MRAM CIM can be further divided into MRAM write-type CIM,MRAM read-type CIM,and MRAM near memory computing (MRAM NMC) based on how digital logic is generated.Analog MRAM CIM utilizes high parallelism to amortize energy consumption,which has unparalleled advantages in throughput and energy efficiency per unit area compared to digital CIM.However,It is also limited due to its susceptibility to PVT and other characteristics.The implementation methods of digital MRAM CIM are diverse,and write-type CIM almost eliminates data movement outside the memory.Although the flip energy consumption and delay required by the current process of MRAM are too large,which leads to this method staying in the simulation,it does not hinder that write-type CIM is one of the most effective means to alleviate the "memory wall".Read-type CIM relies on the functional design of read amplifiers which has severe limitations,but there still have some developments in the related field.NMC method is an optimal solution that combines the advantages of MRAM nonvolatile devices and CMOS circuits,though there are significant differences in speed and computational energy efficiency between this two devices,and has achieved significant benefits in practical applications.

导出引用

杨茜, 王远博, 王承智, 等. 基于MRAM的新型存内计算范式[J]. 集成电路与嵌入式系统. 2024, 24(6): 29-40 https://doi.org/10.20193/j.ices2097-4191.2024.06.005

YANG Xi, WANG Yuanbo, WANG Chengzhi, et al. Novel computing-in memory methodology based on MRAM[J]. Integrated Circuits and Embedded Systems. 2024, 24(6): 29-40 https://doi.org/10.20193/j.ices2097-4191.2024.06.005

中图分类号： TP333 (存贮器)

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]	SAHA R, PUNDIR Y P, PAL P K. Comparative analysis of STT and SOT based MRAMs for last level caches[J]. Journal of Magnetism and Magnetic Materials, 2022(551):169161. 本文引用 [3]

[2]	ZHANG J, WANG Z, VERMA N. A machine-learning classifier implemented in a standard 6T SRAM array[C]// 2016 IEEE symposium on vlsi circuits (vlsi-circuits).IEEE, 2016:1-2. 本文引用 [2]

[3]	JUNG S, LEE H, MYUNG S, et al. A crossbar array of magnetoresistive memory devices for in-memory computing[J]. Nature, 2022, 601(7892):211-216. 本文引用 [11]

[4]	CHOWDHURY Z, HARMS J D, KHATAMIFARD S K, et al. Efficient in-memory processing using spintronics[J]. IEEE Computer Architecture Letters, 2017, 17(1):42-46. 本文引用 [6]

[5]	TU F, WANG Y, WU Z, et al. ReDCIM:Reconfigurable Digital Computing-In-Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration[J]. IEEE Journal of Solid-State Circuits, 2022, 58(1):243-255. 本文引用 [5]

[6]	CHIH Y D, LEE P H, FUJIWARA H, et al.An 89TOPS/W and 16. 3 TOPS/mm² all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications[C]// 2021 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2021, 64:252-254. 本文引用 [5]

[7]	HE Y, YUE J, FENG X, et al. An RRAM-Based Digital Computing-in-Memory Macro with Dynamic Voltage Sense Amplifier and Sparse-Aware Approximate Adder Tree[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2022, 70(2):416-420. 本文引用 [6]

[8]	JING Z, YAN B, YANG Y, et al. VSDCA:A voltage sensing differential column architecture based on 1T2R RRAM array for computing-in-memory accelerators[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(10):4028-4041. 本文引用 [6]

[9]	HUNG J M, HUANG Y H, HUANG S P, et al.An 8-Mb DC-current-free binary-to-8b precision ReRAM nonvolatile computing-in-memory macro using time-space-readout with 1286.4-21. 6 TOPS/W for edge-AI devices[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2022, 65:1-3. 本文引用 [6]

[10]	JUNG S, KIM S J. MRAM In-memory computing macro for AI computing[C]// 2022 International Electron Devices Meeting (IEDM).IEEE, 2022:33.4.1-33.4. 4. 本文引用 [8]

[11]	ZHANG Y, WANG J, LIAN C, et al. Time-domain computing in memory using spintronics for energy-efficient convolutional neural network[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(3):1193-1205. 本文引用 [13]

[12]	CAI H, BIAN Z, HOU Y, et al.A 28nm 2Mb STT-MRAM Computing-in-Memory Macro with a Refined Bit-Cell and 22.4-41. 5 TOPS/W for AI Inference[C]// 2023 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2023:500-502. 本文引用 [9]

[13]	XIAO Z, NAIK V B, CHEUNG S K, et al. Device Variation-Aware Adaptive Quantization for MRAM-based Accurate In-Memory Computing Without On-chip Training[C]// 2022 International Electron Devices Meeting (IEDM).IEEE, 2022:10.5.1-10.5. 4. 本文引用 [10]

[14]	LUO Y, YU S. AILC: Accelerate on-chip incremental learning with compute-in-memory technology[J]. IEEE Transactions on Computers, 2021, 70(8):1225-1238. 本文引用 [8]

[15]	CAI H, GUO Y, LIU B, et al. Proposal of analog in-memory computing with magnified tunnel magnetoresistance ratio and universal STT-MRAM cell[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(4):1519-1531. 本文引用 [8]

[16]	HE J, HUANG Y, LASTRAS M, et al. RVComp:Analog Variation Compensation for RRAM-based In-Memory Computing[C]// Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023:246-251. 本文引用 [6]

[17]	SU J W, SI X, CHOU Y C, et al. Two-way transpose multibit 6T SRAM computing-in-memory macro for inference-training AI edge chips[J]. IEEE Journal of Solid-State Circuits, 2021, 57(2):609-624. 本文引用 [5]

[18]	CHOWDHURY Z, HARMS J D, KHATAMIFARD S K, et al. Efficient in-memory processing using spintronics[J]. IEEE Computer Architecture Letters, 2017, 17(1):42-46. 本文引用 [7]

[19]	ZABIHI M, CHOWDHURY Z I, ZHAO Z, et al. In-memory processing on the spintronic CRAM: From hardware design to application mapping[J]. IEEE Transactions on Computers, 2018, 68(8):1159-1173. 本文引用 [8]

[20]	KVATINSKY S, BELOUSOV D, LIMAN S, et al. MAGIC-Memristor-aided logic[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2014, 61(11):895-899. 本文引用 [5]

[21]	ALMASI H, XU M, XU Y, et al. Effect of Mo insertion layers on the magnetoresistance and perpendicular magnetic anisotropy in Ta/CoFeB/MgO junctions[J]. Applied Physics Letters, 2016, 109(3). 本文引用 [4]

[22]	KIM T, JANG Y, KANG M G, et al. SOT-MRAM digital PIM architecture with extended parallelism in matrix multiplication[J]. IEEE Transactions on Computers, 2022, 71(11):2816-2828. 本文引用 [6]

[23]	BAEK S C, PARK K W, KIL D S, et al. Complementary logic operation based on electric-field controlled spin-orbit torques[J]. Nature Electronics, 2018, 1(7):398-403. 本文引用 [4]

[24]	ANGIZI S, HE Z, BAGHERZADEH N, et al. Design and evaluation of a spintronic in-memory processing platform for nonvolatile data encryption[J]. IEEE transactions on computer-aided design of integrated circuits and systems, 2017, 37(9):1788-1801. 本文引用 [5]

[25]	WANG J, BAI Y, WANG H, et al. Reconfigurable bit-serial operation using toggle SOT-MRAM for high-performance computing in memory architecture[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(11):4535-4545. 本文引用 [7]

[26]	FENG J, WANG B, HOU Z, et al. In-Memory Computing Paradigms by Exploiting the Intrinsic Interactions Between STT and SOT[J]. IEEE Transactions on Electron Devices, 2022, 69(12):6769-6775. 本文引用 [5]

[27]	LUO L, LIU X, JIANG L, et al. Linear error correction codec implementation based on an in-memory computing architecture for nonvolatile memories[J]. IEEE Transactions on Electron Devices, 2022, 69(6):3455-3461. 本文引用 [6]

[28]	IMANI M, GUPTA S, KIM Y, et al. Floatpim:In-memory acceleration of deep neural network training with high precision[C]// Proceedings of the 46th International Symposium on Computer Architecture, 2019:802-815. 本文引用 [5]

[29]	KANG W, WANG H, WANG Z, et al. In-memory processing paradigm for bitwise logic operations in STT-MRAM[J]. IEEE Transactions on Magnetics, 2017, 53(11):1-4. 本文引用 [6]

[30]	WANG C, WANG Z, WANG G, et al. Design of an area-efficient computing in memory platform based on STT-MRAM[J]. IEEE Transactions on Magnetics, 2020, 57(2):1-4. 本文引用 [5]

[31]	ANGIZI S, SUN J, ZHANG W, et al. Graphs:A graph processing accelerator leveraging sot-mram[C]// 2019 Design,Automation & Test in Europe Conference & Exhibition (DATE).IEEE, 2019:378-383. 本文引用 [6]

[32]	WANG X, YANG J, ZHAO Y, et al. Triangle counting accelerations:From algorithm to in-memory computing architecture[J]. IEEE Transactions on Computers, 2021, 71(10):2462-2472. 本文引用 [6]

[33]	ANGIZI S, HE Z, AWAD A, et al. MRIMA:An MRAM-based in-memory accelerator[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019, 39(5):1123-1136. 本文引用 [5]

[34]	CHEN X, WANG X, JIA X, et al. Accelerating Graph-Connected Component Computation With Emerging Processing-In-Memory Architecture[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(12):5333-5342. 本文引用 [5]

[35]	CHANG T C, CHIU Y C, LEE C Y, et al.A 22nm 1Mb 1024b-read and near-memory-computing dual-mode STT-MRAM macro with 42. 6 GB/s read bandwidth for security-aware mobile devices[C]// 2020 IEEE International Solid-State Circuits Conference-(ISSCC).IEEE, 2020:224-226. 本文引用 [6]

[36]	CHIU Y C, YANG C S, TENG S H, et al.A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-55. 1 TOPS/W 8b MAC for AI operations[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2022, 65:178-180. 本文引用 [5]

[37]	CHIU Y C, KHWA W S, LI C Y, et al.A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160. 1 TOPS/W for Edge-AI Devices[C]// 2023 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2023:496-498. 本文引用 [5]

[38]	NARLA S, KUMAR P, LAGUNA A F, et al. Design of a compact spin-orbit-torque-based ternary content addressable memory[J]. IEEE Transactions on Electron Devices, 2022, 70(2):506-513. 本文引用 [4]

[39]	AGRAWAL A, ANKIT A, ROY K. SPARE:Spiking neural network acceleration using ROM-embedded RAMs as in-memory-computation primitives[J]. IEEE Transactions on Computers, 2018, 68(8):1190-1200. 本文引用 [4]

[40]	CHANG M T, ROSENFELD P, LU S L, et al. Technology comparison for large last-level caches (L 3 Cs):Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM[C]// 2013 IEEE 19th international symposium on high performance computer architecture (HPCA).IEEE, 2013:143-154. 本文引用 [3]

[41]	XU N, LU Y, QI W, et al. STT-MRAM design technology co-optimization for hardware neural networks[C]// 2018 IEEE International Electron Devices Meeting (IEDM).IEEE, 2018:15.3.1-15.3. 4.

[42]	LEE P H, LEE C F, SHIH Y C, et al. A 16nm 32Mb Embedded STT-MRAM with a 6ns Read-Access Time,a 1M-Cycle Write Endurance,20-Year Retention at 150° C and MTJ-OTP Solutions for Magnetic Immunity[C]// 2023 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2023:494-496.

[43]	CHEN W H, LI K X, LIN W Y, et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors[C]// 2018 IEEE International Solid-State Circuits Conference-(ISSCC).IEEE, 2018:494-496.

[44]	YOON J H, CHANG M, KHWA W S, et al. 29.1 a 40nm 64kb 56. 67 tops/w read-disturb-tolerant compute-in-memory/digital rram macro with active-feedback-based read and in-situ write verification[C]// 2021 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2021, 64:404-406.

[45]	SPETALNICK S D, CHANG M, CRAFTON B, et al. A40nm 64kb 26. 56TOPS/W 2. 37Mb/mm 2 RRAM Binary/Compute-in-Memory Macro with 4. 23 x Improvement in Density and > 75% Use of Sensing Dynamic Range[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC).IEEE, 2022, 65:1-3.