自旋磁随机存储器时间域存内计算方法研究

谭嘉惠, 苏炯哲, 周荣, 张椿钲, 蔡浩

集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (8) : 53-63.

PDF(14266 KB)
PDF(14266 KB)
集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (8) : 53-63. DOI: 10.20193/j.ices2097-4191.2025.0045
新兴计算芯片设计研究专刊

自旋磁随机存储器时间域存内计算方法研究

作者信息 +

Research on time-domain in-memory computing method for spin-transfer torque magnetic random access memory

Author information +
文章历史 +

摘要

基于自旋转移矩磁性随机存取存储器(STT-MRAM)的存内计算(Computing-in-Memory, CIM)有望成为克服“存储墙”瓶颈的有效途径。提出了一种基于时间域适用于STT-MRAM的高能效存内计算设计方案:定制串联式存储单元结构,通过串联晶体管与互补MTJ设计,在计算模式下形成多行存储单元串联的磁阻链,并结合时间域转换电路,将电阻值转换为脉冲延时信号。进一步设计互补串联阵列架构,通过正负权重分列存储生成差分时间信号,支持有符号数计算。在量化电路设计方面,提出逐次逼近型(SAR)时间数字转换器(TDC),该电路采用电压可调延时链与触发器结合的结构。为实现多比特乘累加运算,提出有符号数权重编码方案与数字后处理架构,通过编码权重映射和数字移位累加算法,将8位输入与8位权重的乘积累加运算分解为低5位时间域计算与数字域高位计算,输出21位全精度结果。基于28 nm CMOS工艺完成版图设计和后仿真,在0.9 V电压下实现分辨裕度为270 ps的9位乘累加运算,单次操作能耗仅为16 fJ,所设计的5位SAR-TDC实现时间量到数字量的高线性度转换。设计了面积为0.026 mm2的9 Kb时间域存算宏单元,包含存储单元阵列、SAR-TDC模块、计算电路以及读/写控制电路。宏单元在执行卷积层计算与全连接层计算时,分别可以实现26.4 TOPS/W与42.8 TOPS/W的能效,在实现8位精度计算的同时面积效率能够达到0.523 TOPS/mm2

Abstract

Computing-In-Memory (CIM) based on Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is expected to be an effective way to overcome the "memory wall" bottleneck. This paper proposes a high-energy-efficient CIM design scheme for STT-MRAM in the time domain: a custom series-connected memory cell structure, through the series connection of transistors and complementary MTJ design, forms a magnetic resistance chain of multiple rows of memory cells in the computing mode, and combines a time-domain conversion circuit to convert the resistance value into a pulse delay signal. Further, a complementary series array architecture is designed, generating differential time signals through the separate storage of positive and negative weights to support signed number calculations. In terms of quantization circuit design, a Successive Approximation Register (SAR) Time-to-Digital Converter (TDC) is proposed, which adopts a structure combining a voltage-adjustable delay chain and a flip-flop. To achieve multi-bit multiply-accumulate operations, a signed number weight encoding scheme and a digital post-processing architecture are proposed. Through encoding weight mapping and digital shift-accumulate algorithms, the 8-bit input and 8-bit weight multiply-accumulate operation is decomposed into low 5-bit time-domain calculation and high-bit digital-domain calculation, outputting a 21-bit full-precision result. Based on the 28 nm CMOS process, the layout design and post-simulation were completed. At 0.9 V voltage, a 9-bit multiply-accumulate operation with a resolution margin of 270 ps was achieved, with an energy consumption of only 16 fJ per operation. The designed 5-bit SAR-TDC achieves high linearity conversion from time to digital. A 9 Kb time-domain CIM macrocell with an area of 0.026 mm2 was designed, including a memory cell array, SAR-TDC module, computing circuit, and read-write control circuit. The macrocell can achieve energy efficiencies of 26.4 TOPS/W and 42.8 TOPS/W when performing convolutional layer and fully connected layer calculations, respectively, while achieving 8-bit precision calculation and an area efficiency of 0.523 TOPS/mm2.

关键词

磁性随机存储器 / 存内计算 / 时间域计算 / 逐次逼近型TDC

Key words

magnetic random access memory / computing-in-memory / time-domain computing / successive approximation register time-to-digital converter

引用本文

导出引用
谭嘉惠, 苏炯哲, 周荣, . 自旋磁随机存储器时间域存内计算方法研究[J]. 集成电路与嵌入式系统. 2025, 25(8): 53-63 https://doi.org/10.20193/j.ices2097-4191.2025.0045
TAN Jiahui, SU Jiongzhe, ZHOU Rong, et al. Research on time-domain in-memory computing method for spin-transfer torque magnetic random access memory[J]. Integrated Circuits and Embedded Systems. 2025, 25(8): 53-63 https://doi.org/10.20193/j.ices2097-4191.2025.0045
中图分类号: TP872 (远距离控制和信号、远距离控制和信号系统)   

参考文献

[1]
LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[2]
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[3]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016:770-778.
[4]
AHN J, YOO S, MUTLU O, et al. PIM-enabled instructions:a low-overhead,locality-aware processing-in-memory architecture[C]// 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), 2015: 336-348.
[5]
CHEN Y, LUO T, LIU S, et al. DaDianNao: a machine-learning supercomputer[C]// 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014: 609-622.
[6]
CHIU Y C, YANG C S, TENG S H, et al. A 22nm 4Mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-55.1 TOPS/W 8b MAC for AI operations[C]// 2022 IEEE International Solid- State Circuits Conference (ISSCC). San Francisco,CA,USA: IEEE, 2022:178-180.
[7]
ZHANG Q, AN H, FAN Z, et al. A 22 nm 3.5 TOPS/W flexible micro-robotic vision SoC with 2 MB eMRAM for fully-on-chip intelligence[C]// 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). Honolulu,HI,USA: IEEE, 2022:72-73.
[8]
FAN Z, AN H, ZHANG Q, et al. Audio and image cross-modal intelligence via a 10 TOPS/W 22 nm SoC with back-propagation and dynamic power gating[C]// 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). Honolulu,HI,USA: IEEE, 2022:18-19.
[9]
KIM D, JANG Y, KIM T, et al. BiMDiM:area efficient Bi-directional MRAM digital in-memory computing[C]// 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). Incheon,Korea,Republic of: IEEE, 2022:74-77.
[10]
SI X, ZHOU Y, YANG J, et al. Challenge and trend of SRAM based computation-in-memory circuits for AI edge devices[C]// 2021 IEEE 14th International Conference on ASIC (ASICON). Kunming,China: IEEE, 2021:1-4.
[11]
XIAO Z, NAIK V B, CHEUNG S K, et al. Device variation-aware adaptive quantization for MRAM-based accurate In-memory computing without on-chip training[C]// 2022 International Electron Devices Meeting (IEDM). San Francisco,CA,USA: IEEE, 2022:10.5.1-10.5.4.
[12]
SI X, CHEN J J, TU Y N, et al. 24.5 a twin-8T SRAM computation-In-memory macro for multiple-bit CNN-based machine learning[C]// 2019 IEEE International Solid- State Circuits Conference- (ISSCC). San Francisco,CA,USA: IEEE, 2019:396-398.
[13]
CHEN Z, GU J. 19.7 a scalable pipelined time-domain DTW engine for time-series classification using multibit time flip-flops with 140Giga-cell-updates/s throughput[C]// 2019 IEEE International Solid-State Circuits Conference - (ISSCC). San Francisco,CA,USA: IEEE, 2019:324-326.
[14]
YANG J, KONG Y, WANG Z, et al. 24.4 sandwich-RAM:an energy-efficient In-memory BWN architecture with pulse-width modulation[C]// 2019 IEEE International Solid- State Circuits Conference - (ISSCC). San Francisco,CA,USA: IEEE, 2019:394-396.
[15]
PARK H, LEE K, PARK J.A 10T SRAM compute-In-memory macro with analog MAC operation and time domain conversion[C]// 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). Incheon,Korea,Republic of: IEEE, 2022:455-458.
[16]
WU P C, SU J W, CHUNG Y L, et al. A 28 nm 1 Mb time-domain computing-in-memory 6T-SRAM macro with a 6.6 ns latency, 1241 GOPS and 37.01 TOPS/W for 8b-MAC operations for edge-AI devices[C]// 2022 IEEE International Solid- State Circuits Conference (ISSCC). San Francisco,CA,USA: IEEE, 2022:1-3.
[17]
LOU J, LANIUS C, FREYE F, et al. All-digital time-domain compute-in-memory engine for binary neural networks with 1.05 POPS/W energy efficiency[C]// ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC). Milan,Italy: IEEE, 2022:149-152.
[18]
WANG T, SHAN W. An energy-efficient In-memory BNN architecture with time-domain analog and digital mixed-signal processing[C]// 2019 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH). Qingdao,China: IEEE, 2019:1-6.
[19]
CHEN Z, JIN Q, YU Z, et al. DCT-RAM:a driver-free process-In-memory 8T SRAM macro with multi-bit charge-domain computation and time-domain quantization[C]// 2022 IEEE Custom Integrated Circuits Conference (CICC). Newport Beach,CA,USA: IEEE, 2022:1-2.
[20]
赵巍胜, 王昭昊, 彭守仲, 等. STT-MRAM存储器的研究进展[J]. 中国科学:物理学力学天文学, 2016, 46(10):70-90.
ZHAO W S, WANG Z H, PENG S Z, et al. Research progress of STT-MRAM memory devices[J]. Science China Physics, Mechanics & Astronomy, 2016, 46(10):70-90 (in Chinese).
[21]
AVILALA A, REDDY S, KAMARAJUGADDA D S, et al. High resolution time-to-digital converter design with anti-PVT-variation mechanism[C]// 2021 IEEE 4th International Conference on Electronics Technology (ICET). Chengdu,China: IEEE, 2021:452-455.
[22]
JUNG S, LEE H, MYUNG S, et al. A crossbar array of magnetoresistive memory devices for in-memory computing[J]. Nature, 2022, 601(7892):211-216.
[23]
HSU H H, WEN T H, HUANG W H, et al. A nonvolatile AI-edge processor with SLC-MLC hybrid ReRAM compute-in-memory macro using current-voltage-hybrid readout scheme[J]. IEEE Journal of Solid-State Circuits, 2024, 59(1):116-127.
[24]
KIM H, YOO T, KIM T T H, et al. Colonnade: a reconfigurable SRAM-based digital bit-serial compute-In-memory macro for processing neural networks[J]. IEEE Journal of Solid-State Circuits, 2021, 56(7):2221-2233.
[25]
CAI H, GUO Y, LIU B, et al. Proposal of analog In-memory computing with magnified tunnel magnetoresistance ratio and universal STT-MRAM cell[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2022, 69(4):1519-1531.
[26]
WEN T H, HUNG J M, HSU H H, et al. A 28nm nonvolatile AI edge processor using 4Mb analog-based near-memory-compute ReRAM with 27.2 TOPS/W for tiny AI edge devices[C]// 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). Kyoto,Japan: IEEE, 2023:1-2.

PDF(14266 KB)

Accesses

Citation

Detail

段落导航
相关文章

/