芯粒功能划分方法与互连体系综述

陈龙, 黄乐天

集成电路与嵌入式系统 ›› 2024, Vol. 24 ›› Issue (2) : 41-49.

PDF(9787 KB)
PDF(9787 KB)
集成电路与嵌入式系统 ›› 2024, Vol. 24 ›› Issue (2) : 41-49. DOI: 10.20193/j.ices2097-4191.2024.02.005
Chiplet研究专栏

芯粒功能划分方法与互连体系综述

作者信息 +

Chipet functional partitioning and interconnection review

Author information +
文章历史 +

摘要

目前,芯片设计面临“面积墙”的挑战,这为芯片制造带来了高昂的流片成本。芯粒技术可以通过成熟的工艺制程制造较小面积的芯片,然后通过先进封装方式打破面积墙的限制,实现芯片的敏捷设计,降低设计成本。而设置多大的芯粒颗粒度可以满足芯片设计的灵活需求,是利用芯粒技术的一个核心问题。芯粒功能的划分也影响着芯粒间的互连结构,如何实现各功能芯粒间互连是最终实现芯片功能的关键。因此,本文综述国内外近年来对芯粒功能划分上的研究、在芯粒设计空间上的探索以及芯粒功能划分对芯粒间互连网络影响,并指出芯粒的设计方法学是未来芯粒技术发展的重要研究方向。

Abstract

Facing the challenge of the "area wall" in chip design,there is a significant increase in chip manufacturing costs.The chiplet technology enables the production of small area chips using a mature process,and composing by advanced packaging techniques,which can overcome the limitations imposed by the area wall,facilitating agile chip design and reducing overall design costs.Determining an optimal chiplet particle size to meet flexible chip design requirements remains a crucial issue when utilizing chiplet technology.Furthermore,achieving interconnectivity between functional chiplets after dividing chip functions is pivotal for realizing the final functionality of the chip.Therefore,this paper provides a comprehensive review of recent research on chiplet function division,spatial exploration in chiplet design and the influence of chiplet function division on the inter-chip interconnect,while also pointing out that chipet design methodology is an important research direction for the development of chiplet technology in the future.

关键词

芯粒 / 芯粒功能颗粒度 / 芯粒间互连 / AMD / SiP

Key words

Chiplet / Chiplet particle / interconnectivity of chiplets / AMD / SiP

引用本文

导出引用
陈龙, 黄乐天. 芯粒功能划分方法与互连体系综述[J]. 集成电路与嵌入式系统. 2024, 24(2): 41-49 https://doi.org/10.20193/j.ices2097-4191.2024.02.005
CHEN Long, HUANG Letian. Chipet functional partitioning and interconnection review[J]. Integrated Circuits and Embedded Systems. 2024, 24(2): 41-49 https://doi.org/10.20193/j.ices2097-4191.2024.02.005
中图分类号: TN47 (大规模集成电路、超大规模集成电路)   

参考文献

[1]
MEENDERINCK C, JUURLINK B. (When) Will CMPs Hit the Power Wall[C]// European Conference on Parallel Processing.Springer,Berlin,Heidelberg, 2009.DOI:10.1007/978-3-642-00955-623.
[2]
W A WULF, S A MCKEE. Hitting the memory wall: implications of the obvious[J]. SIGARCH Comput. Archit. News, 1995, 23(1):20-24.
[3]
G SANTORO, G.TURVANI,M. GRAZIANO. New logic-in-memory paradigms:An architectural and technological perspective[J]. Micromachines, 2019, 10(6):368,.
Processing systems are in continuous evolution thanks to the constant technological advancement and architectural progress. Over the years, computing systems have become more and more powerful, providing support for applications, such as Machine Learning, that require high computational power. However, the growing complexity of modern computing units and applications has had a strong impact on power consumption. In addition, the memory plays a key role on the overall power consumption of the system, especially when considering data-intensive applications. These applications, in fact, require a lot of data movement between the memory and the computing unit. The consequence is twofold: Memory accesses are expensive in terms of energy and a lot of time is wasted in accessing the memory, rather than processing, because of the performance gap that exists between memories and processing units. This gap is known as the memory wall or the von Neumann bottleneck and is due to the different rate of progress between complementary metal–oxide semiconductor (CMOS) technology and memories. However, CMOS scaling is also reaching a limit where it would not be possible to make further progress. This work addresses all these problems from an architectural and technological point of view by: (1) Proposing a novel Configurable Logic-in-Memory Architecture that exploits the in-memory computing paradigm to reduce the memory wall problem while also providing high performance thanks to its flexibility and parallelism; (2) exploring a non-CMOS technology as possible candidate technology for the Logic-in-Memory paradigm.
[4]
C XUE, T HUANG, J LIU, et al. A 22nm 2mb reram computein-memory macro with 121-28tops/w for multibit MAC computing for tiny AI edge devices[C]// IEEE International Solid-State Circuits Conference (ISSCC), 2020:244-246.
[5]
J YUE, Z YUAN, X FENG, et al. A 65nm computing-in-memory-based CNN processor with 2.9-to-35.8tops/w system energy efficiency using dynamic-sparsity performance-scaling architecture and energy-efficient inter/intra-macro data reuse[C]// IEEE International Solid-State Circuits Conference (ISSCC), 2020:234-236.
[6]
T SINGH, S RANGARAJAN, D JOHN, et al. Zen 2:The AMD 7nm energy-efficient high-performance x86-64 microprocessor core[C]// IEEE International Solid-State Circuits Conference (ISSCC), 2020:42-44.
[7]
N BECK, S WHITE, M PARASCHOU, et al. ‘zeppelin’:An soc for multichip architectures[C]//IEEE International Solid-State Circuits Conference (ISSCC), 2018:40-42.
[8]
S NAFFZIGER. Pioneering Chiplet Technology and Design for the AMD EPYC and Ryzen Processor Families:Industrial Product[C]// 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA),Valencia,Spain, 2021:57-70.DOI:10.1109/ISCA52012.2021.00014.
[9]
J XIA, C CHENG, X ZHOU, et al. Kunpeng 920:The First 7-nm Chiplet-Based 64-Core ARM SoC for Cloud Services[J]. IEEE Micro, 2021, 41(5):67-75.DOI:10.1109/MM.2021.3085578.
[10]
S PAL. Designing a 2048-Chiplet, 14336-Core Waferscale Processor[C]// 2021 58th ACM/IEEE Design Automation Conference (DAC),San Francisco,CA,USA, 2021:1183-1188.DOI:10.1109/DAC18074.2021.9586194.
[11]
P VIVET, E GUTHMULLER, Y THONNART, et al. A 220gops 96-core processor with 6 chiplets 3dstacked on an active interposer offering 0.6ns/mm latency,3tb/s/mm2 inter-chiplet interconnects and 156mw/mm2@82%-peak-efficiency DCDC converters[C]//IEEE International Solid-State Circuits Conference (ISSCC), 2020:46-48.
[12]
MA X, WANG Y, WANG Y, et al. Survey on chiplets:interface,interconnect and integration methodology[C]// CCF Trans. HPC 4, 2022:43-52.
[13]
H ZHU. COMB-MCM:Computing-on-Memory-Boundary NN Processor with Bipolar Bitwise Sparsity Optimization for Scalable Multi-Chiplet-Module Edge Machine Learning[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC),San Francisco,CA,USA, 2022:1-3.DOI:10.1109/ISSCC42614.2022.9731657.
[14]
R HWANG, T KIM, Y KWON, et al. Centaur:A chiplet-based,hybrid sparse-dense accelerator for personalized recommendations[C]// ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2020:968-981.
[15]
Y S SHAO, J CLEMONS, R VENKATESAN, et al. Simba:Scaling deep-learning inference with multi-chip-module-based architecture[C]// IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019:14-27.
[16]
B ZIMMER, R VENKATESAN, Y S SHAO, et al. A 0.11 pj/op, 0.32-128 tops,scalable multi-chip-module-based deep neural network accelerator with ground-reference signaling in 16nm[C]// IEEE Symposium on VLSI Circuits, 2019:300.
[17]
BLYTHE DAVID. Xehpc ponte vecchio[C]//2021 IEEE Hot Chips 33 Symposium (HCS), IEEE Computer Society, 2021.
[18]
A ARUNKUMAR, E BOLOTIN, B Y CHO, et al. MCM-GPU:multi-chip-module gpus for continued performance scalability[C]// ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2017:320-332.
[19]
J YIN, Z LIN, O KAYIRAN, et al. Modular routing design for chiplet-based systems[C]// ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2018:726-738.
[20]
D GREENHILL, R HO, D M. LEWIS, et al. A 14nm 1ghz FPGA with 2.5d transceiver integration[C]// IEEE International Solid-State Circuits Conference (ISSCC), 2017:54-55.
[21]
T CHOU. NetFlex:A 22nm Multi-Chiplet Perception Accelerator in High-Density Fan-Out Wafer-Level Packaging[C]// 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu,HI,USA, 2022:208-209.doi:10.1109/VLSITechnologyandCir46769.2022.9830249.
[22]
M D ROTARU, W TANG, D RAHUL, et al. Design and Development of High Density Fan-Out Wafer Level Package (HD-FOWLP) for Deep Neural Network (DNN) Chiplet Accelerators using Advanced Interface Bus (AIB)[C]// 2021 IEEE 71st Electronic Components and Technology Conference (ECTC),San Diego,CA,USA, 2021:1258-1263.doi:10.1109/ECTC32696.2021.00204.
[23]
F ZARUBA, F SCHUIKI, L BENINI. Manticore:A 4096-Core RISC-V Chiplet Architecture for Ultraefficient Floating-Point Computing[J]. IEEE Micro, 2021, 41(2):36-42.DOI:10.1109/MM.2020.3045564.
[24]
S ZHU, M MIAO, Z ZHANG, et al. Research on A Chiplet-based DSA (Domain-Specific Architectures) Scalable Convolutional Acceleration Architecture[C]// 2022 23rd International Conference on Electronic Packaging Technology (ICEPT),Dalian,China, 2022:1-6.doi:10.1109/ICEPT56209.2022.9873177.
[25]
J LAN, V P NAMBIAR, R SABAPATHY, et al. Chiplet-based Architecture Design for Multi-Core Neuromorphic Processor[C]// 2021 IEEE 23rd Electronics Packaging Technology Conference (EPTC),Singapore,Singapore, 2021:410-412.doi:10.1109/EPTC53413.2021.9663898.
[26]
G KRISHNAN, S K MANDAL, C CHAKRABARTI, et al. System-Level Benchmarking of Chiplet-based IMC Architectures for Deep Neural Network Acceleration[C]// 2021 IEEE 14th International Conference on ASIC (ASICON),Kunming,China, 2021:1-4.doi:10.1109/ASICON52560.2021.9620238.
[27]
FU Y, BOLOTIN E, CHATTERJEE N, et al. GPU domain specialization via composable on-package architecture[J]. ACM Transactions on Architecture and Code Optimization (TACO), 2021, 19(1):1-23.
[28]
T BURD. Zen3:The AMD 2nd-Generation 7nm x86-64 Microprocessor Core[C]// 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco,CA,USA, 2022:1-3.doi:10.1109/ISSCC42614.2022.9731678.
[29]
S NAFFZIGER, K LEPAK, M PARASCHOU, et al. 2.2 AMD Chiplet Architecture for High-Performance Server and Desktop Products[C]// 2020 IEEE International Solid-State Circuits Conference (ISSCC),San Francisco,CA,USA, 2020:44-45.doi:10.1109/ISSCC19947.2020.9063103.
[30]
CHROMCZAK J, WHEELER M, CHIASSON C, et al. Architectural enhancements in intel agilex fpgas[C]// Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020:140-149.
[31]
G H LOH, S NAFFZIGER, K LEPAK. Understanding Chiplets Today to Anticipate Future Integration Opportunities and Limits[C]// 2021 Design,Automation & Test in Europe Conference & Exhibition (DATE),Grenoble,France, 2021:142-145.doi:10.23919/DATE51398.2021.9474021.
[32]
P MAJUMDER, S KIM, J HUANG, et al. Remote control:A simple deadlock avoidance scheme for modular systems-on-chip[J]. IEEE Transactions on Computers, 2020.
[33]
M PARASAR, H FARROKHBAKHT, N E JERGER, et al. Drain:Deadlock removal for arbitrary irregular networks[C]// 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).IEEE, 2020:447-460.
[34]
A RAMRAKHYANI, P V GRATZ, T KRISHNA. Synchronized progress in interconnection networks (spin):A new theory for deadlock freedom[C]// in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).IEEE, 2018:699-711.
[35]
A RAMRAKHYANI, T KRISHNA. Static bubble:A framework for deadlock-free irregular on-chip topologies[C]// 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).IEEE, 2017:253-264.
[36]
Y WU, L WANG, X WANG, et al. A deflection-based deadlock recovery framework to achieve high throughput for faulty NoCs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020(99):1.
[37]
A GRAENING, S PAL, P GUPTA. Chiplets:How Small is too Small?[C]//2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco,CA,USA, 2023:1-6.DOI:10.1109/DAC56929.2023.10247947.
[38]
M A KABIR, Y PENG. Chiplet-Package Co-Design For 2.5D Systems Using Standard ASIC CAD Tools[C]// 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC),Beijing,China, 2020:351-356.DOI:10.1109/ASPDAC47756.2020.9045734.
[39]
J KIM. Architecture,Chip,and Package Codesign Flow for InterposerBased 2.5-D Chiplet Integration Enabling Heterogeneous IP Reuse[C]// IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020, 28(11):2424-2437.DOI:10.1109/TVLSI.2020.3015494.
[40]
Z TAN, H CAI, R DONG, et al. NN-Baton:DNN Workload Orchestration and Chiplet Granularity Exploration for Multichip Accelerators[C]// 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA),Valencia,Spain, 2021:1013-1026.DOI:10.1109/ISCA52012.2021.00083.
[41]
P EHRETT, T AUSTIN, V BERTACCO. Chopin:Composing Cost-Effective Custom Chips with Algorithmic Chiplets[C]// 2021 IEEE 39th International Conference on Computer Design (ICCD),Storrs,CT,USA, 2021:395-399.DOI:10.1109/ICCD53106.2021.00069.
[42]
S PAL, D PETRISKO, R KUMAR, et al. Design Space Exploration for Chiplet-Assembly-Based Processors[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 28(4):1062-1073.DOI:10.1109/TVLSI.2020.2968904.
[43]
X HAO, Z DING, J YIN, et al. Monad: Towards Cost-Effective Specialization for Chiplet-Based Spatial Accelerators[C]// 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD),San Francisco,CA,USA, 2023:1-9.DOI:10.1109/ICCAD57390.2023.10323880.
[44]
H ZHENG, K WANG, A LOURI. A Versatile and Flexible Chiplet-based System Design for Heterogeneous Manycore Architectures[C]// 2020 57th ACM/IEEE Design Automation Conference (DAC),San Francisco,CA,USA, 2020:1-6.DOI:10.1109/DAC18072.2020.9218654.
[45]
PANO V, KUTTAPPA R, TASKIN B. 2019. 3D NoCs with active interposer for multi-die systems[C]//Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip.Association for Computing Machinery,New York,New York:Article 14.
[46]
THONNART Y, BERNABé S, CHARBONNIER J, et al. POPSTAR: a robust modular optical NoC architecture for chiplet-based 3D integrated systems[C]// Proceedings of the 23rd Conference on Design,Automation and Test in Europe.EDA Consortium,Grenoble,France, 2020:1456-1461.
[47]
NARAYAN A, THONNART Y, VIVET P, et al. System-level Evaluation of Chip-Scale Silicon Photonic Networks for Emerging Data-Intensive Applications[C]// 2020 Design,Automation & Test in Europe Conference & Exhibition (DATE), 2020:1444-1449.DOI:10.23919/DATE48585.2020.9116496.
[48]
WANG T, FENG F, XIANG S, et al. Application Defined On-chip Networks for Heterogeneous Chiplets: An Implementation Perspective[C]// 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022:1198-1210.DOI:10.1109/HPCA53966.2022.00091.
[49]
KANNAN A, JERGER N E, LOH G H. Enabling interposer-based disintegration of multi-core processors[C]// 2015 48th Annual IEEE/ACMInternational Symposium on Microarchitecture (MICRO), 2015:546-558.DOI:10.1145/2830772.2830808.
[50]
BESTA M, HOEFLER T. Slim fly:a cost effective low-diameter network topology[C]// Proceedings of the International Conference for High Performance Computing,Networking, Storage and Analysis.IEEE Press;New Orleans,Louisana, 2014:348-359.
[51]
BHARADWAJ S, YIN J, BECKMANN B, et al. Kite:a family of heterogeneous interposer topologies enabled via accurate interconnect modeling[C]// Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference. IEEE Press;Virtual Event,USA:Article 144.
[52]
KADOMOTO J, IRIE H, SAKAI S. Design of Shape-Changeable Chiplet-Based Computers Using an Inductively Coupled Wireless Bus Interface[C]// 2020 IEEE 38th International Conference on Computer Design (ICCD), 2020:589-596.DOI:10.1109/ICCD50377.2020.00103.
[53]
KADOMOTO J, MITSUNO S, IRIE H, et al. An Inductively Coupled Wireless Bus for Chiplet-Based Systems[C]// 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC), 2020:9-10. DOI:10.1109/ASP-DAC47756.2020.9045184.
[54]
KADOMOTO J, IRIE H, SAKAI S. WiXI:An Inter-Chip Wireless Bus Interface for Shape-Changeable Chiplet-Based Computers[C]// 2019 IEEE 37th International Conference on Computer Design (ICCD). 2019:100-108. DOI:10.1109/ICCD46524.2019.00021.
[55]
CHEN C H O, PARK S, KRISHNA T, et al. SMART:A single-cycle reconfigurable NoC for SoC applications[C]// 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE). 2013:338-343. DOI:10.7873/DATE.2013.080.
[56]
FARUQUE M A A, EBI T, HENKEL J. Configurable links for runtime adaptive on-chip communication[C]// 2009 Design, Automation & Test in Europe Conference & Exhibition. 2009:256-261.DOI:10.1109/DATE.2009.5090667.
[57]
PARIKH R, DAS R, BERTACCO V. Power-aware NoCs through routing and topology reconfiguration[C]// 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), 2014:1-6.
[58]
WANG M, WANG Y, LIU C, et al. Network-on-Interposer Design for Agile Neural-Network Processor Chip Customization[C]// 2021 58th ACM/IEEE Design Automation Conference (DAC). 2021:49-54.DOI:10.1109/DAC18074.2021.9586261.
[59]
GUIRADO R, KWON H, ABADAL S, et al. Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package[C]// Proceedings of the 26th Asia and South Pacific Design Automation Conference.Association for Computing Machinery,Tokyo,Japan, 2021:806-812.
[60]
LI Y, LOURI A, KARANTH A. SPRINT:A High-Performance, Energy-Efficient,and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference[J]. IEEE Transactions on Parallel and Distributed Systems, 2022(33):2332-2345.
[61]
LI Y, LOURI A, KARANTH A. SPACX:Silicon Photonics-based Scalable Chiplet Accelerator for DNN Inference[C]// 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022:831-845.DOI:10.1109/HPCA53966.2022.00066.
[62]
FARIBORZ M, YOO S J B. High Throughput Memory with Silicon Photonics in Chiplet-based Architectures for Irregular Workloads[C]// 2022 27th OptoElectronics and Communications Conference (OECC) and 2022 International Conference on Photonics in Switching and Computing (PSC), 2022:1-3.DOI:10.23919/OECC/PSC53152.2022.9849864.
[63]
TAHERI E, PASRICHA S, NIKDAST M. ReSiPI:A Reconfigurable Silicon-Photonic 2.5D Chiplet Network with PCMs for Energy-Efficient Interposer Communication[C]// 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD).IEEE, 2022:1-9.DOI:10.1145/3508352.3549432.
[64]
DEMIR Y, PAN Y, SONG S, et al. Galaxy:a high-performance energy-efficient multi-chip architecture using photonic interconnects[C]// Proceedings of the 28th ACM international conference on Supercomputing. Association for Computing Machinery,Munich,Germany, 2014:303-312.
[65]
PAN Y, KUMAR P, KIM J, et al. Firefly: Illuminating future network-on-chip with nanophotonics[C]// International Symposium on Computer Architecture.DBLP, 2009:429-440.DOI:10.1145/1555815.1555808.
[66]
GRANI P, PROIETTI R, AKELLA V, et al. Design and Evaluation of AWGR-Based Photonic NoC Architectures for 2.5D Integrated High Performance Computing Systems[C]// 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017:289-300. DOI:10.1109/HPCA.2017.17.

基金

高可扩展多芯粒异构集成存内计算加速器架构研究(92373111)

编辑: 薛士然
PDF(9787 KB)

Accesses

Citation

Detail

段落导航
相关文章

/