PDF(8045 KB)
A review on ROM-SRAM hybrid compute-in-memory architecture
DU Xirui, YIN Guodong, CHEN Yiming, CHEONG Ling-An, YU Tianyi, YANG Huazhong, LI Xueqing
Integrated Circuits and Embedded Systems ›› 2025, Vol. 25 ›› Issue (8) : 10-22.
PDF(8045 KB)
PDF(8045 KB)
A review on ROM-SRAM hybrid compute-in-memory architecture
Neural networks are representative algorithms of artificial intelligence, but their huge number of parameters poses new challenges to their hardware deployment at the edge. On the one hand, for the flexibility of applications, computing hardware is required to be able to transfer the deployed model between tasks through parameter fine-tuning at the edge. On the other hand, in order to improve computing energy efficiency and performance, it is necessary to implement large-capacity on-chip storage to reduce off-chip memory access costs. The recently proposed ROM-SRAM hybrid compute-in-memory architecture is a promising solution under mature CMOS technology. Thanks to the high-density ROM-based compute-in-memory, most of the weights of the neural network can be stored on the chip, cutting the reliance on off-chip memory access. Meanwhile, SRAM-based compute-in-memory can provide flexibility for edge compute-in-memory based on high-density ROM. To expand the design and application space of ROM-SRAM hybrid compute-in-memory architecture, it is necessary to further improve the density of ROM-based compute-in-memory to support larger networks and explore solutions to obtain greater flexibility through a small amount of SRAM compute-in-memory. This paper introduces several common techniques to improve the memory density of ROM-based compute-in-memory, as well as the neural network fine-tuning methods based on the ROM-SRAM hybrid compute-in-memory architecture to improve flexibility. The solutions to the deployment of ultra-large-scale neural networks and the bottleneck of dynamic matrix multiplication in large language models with long sequences are discussed, and the outlook for the broad design space and application prospects of ROM-SRAM hybrid compute-in-memory architecture is provided.
artificial intelligence / neural network accelerator / computing-in-memory / read-only memory / integrated circuit
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
/
| 〈 |
|
〉 |