A review on ROM-SRAM hybrid compute-in-memory architecture

DU Xirui, YIN Guodong, CHEN Yiming, CHEONG Ling-An, YU Tianyi, YANG Huazhong, LI Xueqing

Integrated Circuits and Embedded Systems ›› 2025, Vol. 25 ›› Issue (8) : 10-22.

PDF(8045 KB)
PDF(8045 KB)
Integrated Circuits and Embedded Systems ›› 2025, Vol. 25 ›› Issue (8) : 10-22. DOI: 10.20193/j.ices2097-4191.2025.0041
Special Issue of Emerging Computing Chip Design

A review on ROM-SRAM hybrid compute-in-memory architecture

Author information +
History +

Abstract

Neural networks are representative algorithms of artificial intelligence, but their huge number of parameters poses new challenges to their hardware deployment at the edge. On the one hand, for the flexibility of applications, computing hardware is required to be able to transfer the deployed model between tasks through parameter fine-tuning at the edge. On the other hand, in order to improve computing energy efficiency and performance, it is necessary to implement large-capacity on-chip storage to reduce off-chip memory access costs. The recently proposed ROM-SRAM hybrid compute-in-memory architecture is a promising solution under mature CMOS technology. Thanks to the high-density ROM-based compute-in-memory, most of the weights of the neural network can be stored on the chip, cutting the reliance on off-chip memory access. Meanwhile, SRAM-based compute-in-memory can provide flexibility for edge compute-in-memory based on high-density ROM. To expand the design and application space of ROM-SRAM hybrid compute-in-memory architecture, it is necessary to further improve the density of ROM-based compute-in-memory to support larger networks and explore solutions to obtain greater flexibility through a small amount of SRAM compute-in-memory. This paper introduces several common techniques to improve the memory density of ROM-based compute-in-memory, as well as the neural network fine-tuning methods based on the ROM-SRAM hybrid compute-in-memory architecture to improve flexibility. The solutions to the deployment of ultra-large-scale neural networks and the bottleneck of dynamic matrix multiplication in large language models with long sequences are discussed, and the outlook for the broad design space and application prospects of ROM-SRAM hybrid compute-in-memory architecture is provided.

Key words

artificial intelligence / neural network accelerator / computing-in-memory / read-only memory / integrated circuit

Cite this article

Download Citations
DU Xirui , YIN Guodong , CHEN Yiming , et al . A review on ROM-SRAM hybrid compute-in-memory architecture[J]. Integrated Circuits and Embedded Systems. 2025, 25(8): 10-22 https://doi.org/10.20193/j.ices2097-4191.2025.0041

References

[1]
HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C]// CVPR2016. Las Vegas,NV,USA: IEEE, 2016:770-778.
[2]
BLATTMANN A, DOCKHORN T, KULAL S, et al. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets: arXiv:2311.15127[M/OL]. arXiv, 2023.
[3]
TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: Open and Efficient Foundation Language Models: arXiv:2302.13971[M]. arXiv, 2023.
[4]
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
[5]
YENDURI G, M R, G C S, et al. Generative Pre-trained Transformer:A Comprehensive Review on Enabling Technologies,Potential Applications,Emerging Challenges, and Future Directions:arXiv:2305.10435[M]. arXiv, 2023.
[6]
ZHANG S, ROLLER S, GOYAL N, et al. OPT:Open Pre-trained Transformer Language Models: arXiv:2205.01068[M]. arXiv, 2022.
[7]
YE L, WANG Z, LIU Y, et al. The Challenges and Emerging Technologies for Low-Power Artificial Intelligence IoT Systems[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(12):4821-4834.
[8]
ZHOU S, WU Y, NI Z, et al. DoReFa-Net:Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients: arXiv:1606.06160[M]. arXiv, 2018.
[9]
CHEN T, BAO H, HUANG S, et al. THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption:arXiv:2206.00216[M]. arXiv, 2022.
[10]
SONG J, WANG Y, TANG X, et al. A 16Kb Transpose 6T SRAM In-Memory-Computing Macro Based on Robust Charge-Domain Computing[C]// 2021 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE, 2021:1-3.
[11]
WULF W, MCKEE S A. Hitting the Memory Wall: Implications of the Obvious[R]. USA: University of Virginia, 1994.
[12]
SOHAN M, SAI RAM T, RAMI REDDY C V. A review on yolov8 and its advancements[C]// International Conference on Data Intelligence and Cognitive Informatics. Springer,Singapore, 2024:529-545.
[13]
JOONYOUNG KIM, YOUNSU KIM. HBM:Memory solution for bandwidth-hungry processors[C]// 2014 IEEE Hot Chips 26 Symposium (HCS). Cupertino,CA,USA: IEEE, 2014:1-24.
[14]
SMITH A, LOH G H, WUU J, et al. AMD InstinctTM MI300X Accelerator:Packaging and Architecture Co-Optimization[C]// 2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). Honolulu,HI, Honolulu,HI,USA: IEEE, 2024:1-2.
[15]
PREZIOSO M, MERRIKH-BAYAT F, HOSKINS B D, et al. Training and Operation of an Integrated Neuromorphic Network Based on Metal-Oxide Memristors[J]. Nature, 2015, 521(7550):61-64.
[16]
ZHANG J, WANG Z, VERMA N. A Machine-Learning Classifier Implemented in a Standard 6T SRAM Array[C]// 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). 2016:1-2.
[17]
AGRAWAL A, JAISWAL A, LEE C, et al. X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2018, 65(12):4219-4232.
[18]
BISWAS A, CHANDRAKASAN A P. CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks[J]. JSSC, 2019, 54(1):217-230.
[19]
SI X, CHEN J J, TU Y N, et al. A Twin-8T SRAM Computation-in-Memory Macro for Multiple-Bit CNN-Based Machine Learning[C]// 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2019:396-398.
[20]
XIE S, NI C, SAYAL A, et al. eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing[C]// ISSCC’21.San Francisco,CA, USA: IEEE, 2021:248-250.
[21]
HA S, KIM S, HAN D, et al. A 36.2 dB High SNR and PVT/Leakage-Robust eDRAM Computing-In-Memory Macro With Segmented BL and Reference Cell Array[J]. IEEE Transactions on Circuits and Systems II:Express Briefs, 2022, 69(5):2433-2437.
[22]
XIA L, GU P, LI B, et al. Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication[J]. Journal of Computer Science and Technology, 2016, 31(1):3-19.
[23]
PAN Y, OUYANG P, ZHAO Y, et al. A Multilevel Cell STT-MRAM-Based Computing in-Memory Accelerator for Binary Convolutional Neural Network[J]. IEEE Transactions on Magnetics, 2018, 54(11):1-5.
[24]
CAI H, BIAN Z, HOU Y, et al. A 28nm 2Mb STT-MRAM Computing-in-Memory Macro with a Refined Bit-Cell and 22.4-41.5 TOPS/W for AI Inference[C]// 2023 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2023:500-502.
[25]
SOLIMAN T, MULLER F, KIRCHNER T, et al. Ultra-Low Power Flexible Precision FeFET Based Analog In-Memory Computing[C]// 2020 IEEE International Electron Devices Meeting. San Francisco,CA,USA: IEEE, 2020:29.2.1-29.2.4.
[26]
WANG L, LI W, ZHOU Z, et al. A Flash-SRAM-ADC-Fused Plastic Computing-in-Memory Macro for Learning in Neural Networks in a Standard 14nm FinFET Process[C]// 2024 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2024, 67:582-584.
[27]
JHANG C J, XUE C X, HUNG J M, et al. Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices[J]. IEEE Transactions on Circuits and Systems I:Regular Papers, 2021, 68(5):1773-1786.
[28]
HUNG J M, JHANG C J, WU P C, et al. Challenges and Trends of Nonvolatile In-Memory-Computation Circuits for AI Edge Devices[J]. IEEE Open Journal of the Solid-State Circuits Society, 2021:1.
[29]
YAN B, HSU J L, YU P C, et al. A 1.041-Mb/mm2 27.38-TOPS/W Signed-INT 8 Dynamic-Logic-Based ADC-Less SRAM Compute-in-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2022, 65:188-190.
[30]
CHATTERJEE K, KIM S, KARBASIAN G, et al. Self-Aligned,Gate Last,FDSOI,Ferroelectric Gate Memory Device With 5.5-nm Hf0.8Zr0.2O2,High Endurance and Breakdown Recovery[J]. IEEE Electron Device Letters, 2017, 38(10):1379-1382.
[31]
HU E J, SHEN Y, WALLIS P, et al. LoRA:Low-Rank Adaptation of Large Language Models: arXiv:2106.09685[M]. arXiv, 2021.
[32]
HINTON G E, SALAKHUTDINOV R R. Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006, 313(5786):504-507.
High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
[33]
VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need: arXiv:1706.03762[M]. arXiv, 2023.
[34]
YOSINSKI J, CLUNE J, BENGIO Y, et al. How Transferable are Features in Deep Neural Networks[J]. Advances in Neural Information Processing Systems, 2014,27.
[35]
KANG M, KEEL M S, SHANBHAG N R, et al. An Energy-Efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM[C]// 2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). IEEE, 2014:8326-8330.
[36]
SI X, CHEN J J, TU Y N, et al. A Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors[J]. IEEE Journal of Solid-State Circuits, 2019, 55(1):189-202.
[37]
VALAVI H, RAMADGE P J, NESTLER E, et al. A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute[J]. IEEE Journal of Solid-State Circuits, 2019, 54(6):1789-1799.
[38]
JIANG Z, YIN S, SEO J S, et al. C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism[J]. IEEE Journal of Solid-State Circuits, 2020, 55(7):1888-1897.
[39]
YANG J, KONG Y, WANG Z, et al. 24.4 Sandwich-RAM:An Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation[C]// 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2019:394-396.
[40]
CHIH Y D, LEE P H, FUJIWARA H, et al. An 89TOPS/W and 16.3 TOPS/mm2 All-Digital SRAM-Based Full-Precision Compute-In Memory Macro in 22nm for Machine-Learning Edge Applications[C]//2021 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2021, 64:252-254.
[41]
YUAN Y, YANG Y, WANG X, et al. A 28nm 72.12 TFLOPS/W Hybrid-Domain Outer-Product Based Floating-Point SRAM Computing-in-Memory Macro with Logarithm Bit-Width Residual ADC[C]// 2024 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2024, 67:576-578.
[42]
CHEN X, LI S, ZHANG Z, et al. A 28nm 64kb Bit-Rotated Hybrid-CIM Macro with an Embedded Sign-Bit-Processing Array and a Multi-Bit-Fusion Dual-Granularity Cooperative Quantizer[C]// 2025 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2025, 68:260-262.
[43]
ZHANG Y, XU L, DONG Q, et al. Recryptor: A Reconfigurable Cryptographic Cortex-M0 Processor with In-Memory and Near-Memory Computing for IoT Security[J]. IEEE Journal of Solid-State Circuits, 2018, 53(4):995-1005.
[44]
FUJIWARA H, MORI H, ZHAO W C, et al. A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computing-in-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2022, 65:1-3.
[45]
SI X, TU Y N, HUANG W H, et al. A Local Computing Cell and 6T SRAM-Based Computing-in-Memory Macro with 8-b MAC Operation for Edge AI Chips[J]. IEEE Journal of Solid-State Circuits, 2021, 56(9):2817-2831.
[46]
DIAO H, HE Y, LI X, et al. A Multiply-Less Approximate SRAM Compute-in-Memory Macro for Neural-Network Inference[J]. IEEE Journal of Solid-State Circuits, 2024.
[47]
YUAN Y, ZHANG B, YANG Y, et al. A 28nm 192.3 TFLOPS/W Accurate/Approximate Dual-Mode-Transpose Digital 6T-SRAM CIM Macro for Floating-Point Edge Training and Inference[C]// 2025 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2025, 68:258-260.
[48]
WANG Y, YANG X, QIN Y, et al. A 28nm 83.23 TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications[C]// 2024 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2024, 67: 566-568.
[49]
WANG X, JIAO T, YANG Y, et al. A 28nm 17.83-to-62.84 TFLOPS/W Broadcast-Alignment Floating-Point CIM Macro with Non-Two's-Complement MAC for CNNs and Transformers[C]// 2025 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2025, 68:254-256.
[50]
YUE Z, XIANG X, WANG Y, et al. A 51.6 TFLOPs/W Full-Datapath CIM Macro Approaching Sparsity Bound and <2-30 Loss for Compound AI[C]// 2025 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2025, 68:1-3.
[51]
SEHGAL R, MEHRA R, NI C, et al. Compute-MLROM: Compute-in-Multi Level Read Only Memory for Energy Efficient Edge AI Inference Engines[C]// ESSCIRC 2023 IEEE 49th European Solid State Circuits Conference (ESSCIRC). 2023: 37-40.
[52]
YIN G, CHEN Y, ZHOU M, et al. Cramming More Weight Data onto Compute-in-Memory Macros for High Task-Level Energy Efficiency Using Custom ROM with 3984-Kb/mm2 Density in 65-nm CMOS[J]. IEEE Journal of Solid-State Circuits, 2024, 59(6):1912-1925.
[53]
CHEONG L A, WANG C, ZHOU M, et al. A 28nm 166.9 TOPS/W x Mb/mm2 DRAM-Free QLC Compute-in-ROM Macro Supporting High Task-Level Inference Energy Efficiency for Tiny AI Edge Devices[C]// 2024 IEEE Asian Solid-State Circuits Conference (A-SSCC). IEEE,2024:1-3.
[54]
YU T, LIAO T, ZHOU M, et al. DCiROM:A Fully Digital Compute-in-ROM Design Approach to High Energy Efficiency of DNN Inference at Task Level[C]// 2025 30th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2025:100-105.
[55]
CHEN Y, DU X, YIN G, et al. 3D-METRO:Deploy Large-Scale Transformer Model on a Chip Using Transistor-Less 3D-Metal-ROM-Based Compute-in-Memory Macro[C]// 2025 30th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2025:642-647.
[56]
YIN G, CHEN Y, LEE M, et al. A 28nm 8928Kb/mm2-Weight-Density Hybrid SRAM/ROM Compute-in-Memory Architecture Reducing >95% Weight Loading from DRAM[C]// 2024 IEEE Custom Integrated Circuits Conference (CICC), 2024:1-2.
[57]
CHEN Y, YIN G, TAN Z, et al. YOLoC:Deploy Large-Scale Neural Network by ROM-Based Computing-in-Memory Using Residual Branch on a Chip[C]// Proceedings of the 59th ACM/IEEE Design Automation Conference. San Francisco California: ACM, 2022:1093-1098.
[58]
SZEGEDY C, LIU W, JIA Y, et al. Going Deeper with Convolutions[M/OL]. 2014.arXiv:1409.4842.
[59]
CHEN Y, YIN G, LEE M, et al. Hidden-ROM: A Compute-in-ROM Architecture to Deploy Large-Scale Neural Networks on Chip with Flexible and Scalable Post-Fabrication Task Transfer Capability[C]// Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design. San Diego California: ACM, 2022:1-9.
[60]
RAMANUJAN V, WORTSMAN M, KEMBHAVI A, et al. What’s Hidden in a Randomly Weighted Neural Network[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle,WA,USA: IEEE, 2020:11890-11899.
[61]
HE K, ZHANG X, REN S, et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification[C]// IEEE International Conference on Computer Vision (ICCV). Santiago,Chile: IEEE, 2015:1026-1034.
[62]
LOH G H. 3d-stacked memory architectures for multi-core processors[C]// ISCA’08: Proceedings of the 35th Annual International Symposium on Computer Architecture. USA: IEEE Computer Society, 2008:453-464.
[63]
KRISHNAN G, MANDAL S K, PANNALA M, et al. SIAM:Chiplet-Based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks[J]. ACM Trans. Embed. Comput. Syst., 2021, 20(5s).
[64]
SHAO Y S, CLEMONS J, VENKATESAN R, et al. Simba: Scaling deep-learning inference with multi-chip-module-based architecture[C]// Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019:14-27.
[65]
NAFFZIGER S, BECK N, BURD T, et al. Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen Processor Families: Industrial Product[C]// 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021:57-70.
[66]
LI J, SHI H, JIANG X, et al. QuickLLaMA: Query-aware Inference Acceleration for Large Language Models: arXiv:2406.07528[M]. arXiv, 2024.
[67]
XU Y, JIE Z, DONG H, et al. ThinK: Thinner Key Cache by Query-Driven Pruning:arXiv:2407.21018[M]. arXiv, 2024.
[68]
ZHANG Z, SHENG Y, ZHOU T, et al. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models[C]// Advances in Neural Information Processing Systems, 2023:34661-34710.
[69]
LI Y, HUANG Y, YANG B, et al. SnapKV:LLM Knows What You are Looking for Before Generation[C]// Advances in Neural Information Processing Systems, 2024:22947-22970.
PDF(8045 KB)

Accesses

Citation

Detail

Sections
Recommended

/