PDF(8045 KB)
ROM-SRAM混合存内计算架构综述
杜禧瑞, 尹国栋, 陈一鸣, 曾令安, 于天熠, 杨华中, 李学清
集成电路与嵌入式系统 ›› 2025, Vol. 25 ›› Issue (8) : 10-22.
PDF(8045 KB)
PDF(8045 KB)
ROM-SRAM混合存内计算架构综述
A review on ROM-SRAM hybrid compute-in-memory architecture
神经网络是人工智能的代表性算法,然而其庞大的参数量对其在边缘端的硬件部署提出了新的挑战。在边缘端,一方面,为了应用的灵活性,要求计算硬件能够通过模型参数的微调来实现网络在任务间的迁移;另一方面,为了计算能效和性能,需要实现大容量的片上存储以减少片外访存开销。近期提出的ROM-SRAM混合存内计算架构是在成熟CMOS工艺下很有潜力的一种方案。得益于高密度ROM存内计算,神经网络的大部分权重可以部署在片内而不依赖片外访存;与此同时,SRAM存内计算可以为基于高密度ROM的边缘端存内计算提供灵活性。为了扩展ROM-SRAM混合存内计算架构设计和应用的空间,需要进一步提高ROM存内计算的密度以支持更大的网络,并探索通过少量SRAM存内计算获得更大灵活性的方案。文中介绍了几种常见的提升ROM存内计算密度的方法,以及基于ROM-SRAM混合存内计算架构的神经网络微调以提升灵活性的方法,并讨论了超大规模神经网络的部署方案和长序列大语言模型中遇到的动态矩阵乘瓶颈的解决方案,展望了ROM-SRAM混合存内计算架构广阔的设计空间和应用前景。
Neural networks are representative algorithms of artificial intelligence, but their huge number of parameters poses new challenges to their hardware deployment at the edge. On the one hand, for the flexibility of applications, computing hardware is required to be able to transfer the deployed model between tasks through parameter fine-tuning at the edge. On the other hand, in order to improve computing energy efficiency and performance, it is necessary to implement large-capacity on-chip storage to reduce off-chip memory access costs. The recently proposed ROM-SRAM hybrid compute-in-memory architecture is a promising solution under mature CMOS technology. Thanks to the high-density ROM-based compute-in-memory, most of the weights of the neural network can be stored on the chip, cutting the reliance on off-chip memory access. Meanwhile, SRAM-based compute-in-memory can provide flexibility for edge compute-in-memory based on high-density ROM. To expand the design and application space of ROM-SRAM hybrid compute-in-memory architecture, it is necessary to further improve the density of ROM-based compute-in-memory to support larger networks and explore solutions to obtain greater flexibility through a small amount of SRAM compute-in-memory. This paper introduces several common techniques to improve the memory density of ROM-based compute-in-memory, as well as the neural network fine-tuning methods based on the ROM-SRAM hybrid compute-in-memory architecture to improve flexibility. The solutions to the deployment of ultra-large-scale neural networks and the bottleneck of dynamic matrix multiplication in large language models with long sequences are discussed, and the outlook for the broad design space and application prospects of ROM-SRAM hybrid compute-in-memory architecture is provided.
人工智能 / 神经网络加速器 / 存内计算 / 只读存储器 / 集成电路
artificial intelligence / neural network accelerator / computing-in-memory / read-only memory / integrated circuit
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
/
| 〈 |
|
〉 |