Ma, Chengyu, Li, Suolan, Liu, Yinuo, Zhao, Wenzhe, Ren, Pengju, Xia, Tian
Accepted: 2025-11-20
To address the performance bottleneck of Sparse Matrix-Vector Multiplication (SpMV) on GPU platforms, this paper proposes an optimization algorithm based on row re-segmentation and its accompanying performance evaluation model. The method first establishes a quantitative mapping relationship between matrix row lengths and computational resource allocation. By setting dynamic thresholds, the original matrix is partitioned into long-row and short-row submatrices, which are then computed using thread-level and thread-block-level parallel strategies respectively. This approach effectively alleviates the inherent conflict between GPU SIMT execution characteristics and irregular data distribution in sparse matrices. To quantify the additional overhead introduced during preprocessing, performance penalty models for Atomic Conflict and Padding are developed, transforming extra memory access and computation into computable cost functions. Building upon these models, a parameter space search algorithm is constructed that rapidly identifies optimal preprocessing parameters within predefined parameter sets by leveraging pre-acquired hardware performance metrics and matrix non-zero element distribution information. Experimental results demonstrate that the proposed optimization algorithm outperforms traditional GPU sparse computation library cuSPARSE across multiple benchmark sparse matrix datasets, achieving performance improvements of up to 1.26× and 1.17× in specific scenarios. Furthermore, the parameter search process incurs low overhead, and the method exhibits strong generalizability, demonstrating adaptability to diverse input matrices and GPU hardware architectures.