基于自旋转移矩磁性随机存取存储器(STT-MRAM)的存内计算(Computing-in-Memory, CIM)有望成为克服“存储墙”瓶颈的有效途径。提出了一种基于时间域适用于STT-MRAM的高能效存内计算设计方案:定制串联式存储单元结构,通过串联晶体管与互补MTJ设计,在计算模式下形成多行存储单元串联的磁阻链,并结合时间域转换电路,将电阻值转换为脉冲延时信号。进一步设计互补串联阵列架构,通过正负权重分列存储生成差分时间信号,支持有符号数计算。在量化电路设计方面,提出逐次逼近型(SAR)时间数字转换器(TDC),该电路采用电压可调延时链与触发器结合的结构。为实现多比特乘累加运算,提出有符号数权重编码方案与数字后处理架构,通过编码权重映射和数字移位累加算法,将8位输入与8位权重的乘积累加运算分解为低5位时间域计算与数字域高位计算,输出21位全精度结果。基于28 nm CMOS工艺完成版图设计和后仿真,在0.9 V电压下实现分辨裕度为270 ps的9位乘累加运算,单次操作能耗仅为16 fJ,所设计的5位SAR-TDC实现时间量到数字量的高线性度转换。设计了面积为0.026 mm2的9 Kb时间域存算宏单元,包含存储单元阵列、SAR-TDC模块、计算电路以及读/写控制电路。宏单元在执行卷积层计算与全连接层计算时,分别可以实现26.4 TOPS/W与42.8 TOPS/W的能效,在实现8位精度计算的同时面积效率能够达到0.523 TOPS/mm2。
Computing-In-Memory (CIM) based on Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is expected to be an effective way to overcome the "memory wall" bottleneck. This paper proposes a high-energy-efficient CIM design scheme for STT-MRAM in the time domain: a custom series-connected memory cell structure, through the series connection of transistors and complementary MTJ design, forms a magnetic resistance chain of multiple rows of memory cells in the computing mode, and combines a time-domain conversion circuit to convert the resistance value into a pulse delay signal. Further, a complementary series array architecture is designed, generating differential time signals through the separate storage of positive and negative weights to support signed number calculations. In terms of quantization circuit design, a Successive Approximation Register (SAR) Time-to-Digital Converter (TDC) is proposed, which adopts a structure combining a voltage-adjustable delay chain and a flip-flop. To achieve multi-bit multiply-accumulate operations, a signed number weight encoding scheme and a digital post-processing architecture are proposed. Through encoding weight mapping and digital shift-accumulate algorithms, the 8-bit input and 8-bit weight multiply-accumulate operation is decomposed into low 5-bit time-domain calculation and high-bit digital-domain calculation, outputting a 21-bit full-precision result. Based on the 28 nm CMOS process, the layout design and post-simulation were completed. At 0.9 V voltage, a 9-bit multiply-accumulate operation with a resolution margin of 270 ps was achieved, with an energy consumption of only 16 fJ per operation. The designed 5-bit SAR-TDC achieves high linearity conversion from time to digital. A 9 Kb time-domain CIM macrocell with an area of 0.026 mm2 was designed, including a memory cell array, SAR-TDC module, computing circuit, and read-write control circuit. The macrocell can achieve energy efficiencies of 26.4 TOPS/W and 42.8 TOPS/W when performing convolutional layer and fully connected layer calculations, respectively, while achieving 8-bit precision calculation and an area efficiency of 0.523 TOPS/mm2.