Special Issue of Emerging Computing Chip Design
TAN Jiahui, SU Jiongzhe, ZHOU Rong, ZHANG Chunzheng, CAI Hao
Computing-In-Memory (CIM) based on Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is expected to be an effective way to overcome the "memory wall" bottleneck. This paper proposes a high-energy-efficient CIM design scheme for STT-MRAM in the time domain: a custom series-connected memory cell structure, through the series connection of transistors and complementary MTJ design, forms a magnetic resistance chain of multiple rows of memory cells in the computing mode, and combines a time-domain conversion circuit to convert the resistance value into a pulse delay signal. Further, a complementary series array architecture is designed, generating differential time signals through the separate storage of positive and negative weights to support signed number calculations. In terms of quantization circuit design, a Successive Approximation Register (SAR) Time-to-Digital Converter (TDC) is proposed, which adopts a structure combining a voltage-adjustable delay chain and a flip-flop. To achieve multi-bit multiply-accumulate operations, a signed number weight encoding scheme and a digital post-processing architecture are proposed. Through encoding weight mapping and digital shift-accumulate algorithms, the 8-bit input and 8-bit weight multiply-accumulate operation is decomposed into low 5-bit time-domain calculation and high-bit digital-domain calculation, outputting a 21-bit full-precision result. Based on the 28 nm CMOS process, the layout design and post-simulation were completed. At 0.9 V voltage, a 9-bit multiply-accumulate operation with a resolution margin of 270 ps was achieved, with an energy consumption of only 16 fJ per operation. The designed 5-bit SAR-TDC achieves high linearity conversion from time to digital. A 9 Kb time-domain CIM macrocell with an area of 0.026 mm2 was designed, including a memory cell array, SAR-TDC module, computing circuit, and read-write control circuit. The macrocell can achieve energy efficiencies of 26.4 TOPS/W and 42.8 TOPS/W when performing convolutional layer and fully connected layer calculations, respectively, while achieving 8-bit precision calculation and an area efficiency of 0.523 TOPS/mm2.