Home Browse Just Accepted

Just Accepted

Accepted, unedited articles published online and citable. The final edited and typeset version of record will appear in the future.
Please wait a minute...
  • Select all
    |
  • Ye, Anlong, Ma, Lingkun, Qu, Zongyi
    Accepted: 2025-03-17
    The Least Mean Square algorithm, as a typical adaptive filtering algorithm, has been widely used in the field of noise suppression, and its implementation is mainly based on general-purpose processors, which has the problem of low computational efficiency and performance.The RISC-V architecture, with the advantages of open-source, streamlining, and scalability, is suitable for the implementation of dedicated processors. In this paper, a RISC-V based specialized processor is designed for the LMS algorithm. The customized instruction set F extension is used to process floating point numbers, and MAC (Multiply Accumulate) instructions are added to the coprocessor to complete the acceleration of the LMS algorithm. Experimental studies show that the processor can realize effective noise cancellation, when the input signal-to-noise ratio is 5dB, the signal-to-noise ratio after noise cancellation is 17.5dB; The system uses FPU (Floating Point Unit) to execute the LMS algorithm, the number of instruction execution is 220354, and the execution cycle is 586221, and when this design scheme is used, working in FPU+MAC mode, the number of instruction execution is 31621, and the execution cycle is 89412, which improves the efficiency significantly.
  • Accepted: 2025-03-10
    To address the issue of caches being unable to predict nonlocal program execution and prepare for critical tasks, this paper proposes a high-security, first-level configurable instruction cache design. The design achieves flexible SRAM/Cache configurability through internal control registers. It ensures data access security for users at various levels through two granularity storage protection mechanisms: page-level and cache line-level. Rapid interaction with external storage data is achieved through direct memory access (DMA) to SRAM. A Universal Verification Methodology (UVM) verification platform is established to conduct module-level verification of the configurable instruction cache and collect coverage data. Different library functions are invoked to perform system-level verification and compare the hit rates of the cache under different L1P size configurations. A 40nm low-threshold library is utilized to conduct post-simulation verification of latency and power consumption. The results demonstrate that the designed cache can safely and swiftly switch between five L1P configurations of 32KB, 16KB, 8KB, 4KB, and 0KB during program execution, with a maximum path delay of 1.47ns and a total power consumption of 309.97mW, meeting the stable operation requirements of a 600MHz high-performance DSP.
  • Accepted: 2025-03-07
    With the rapid increase of data volume in today's era, higher requirements are put forward for data transmission technology, especially in terms of transmission speed, stability and reliability. When the traditional single channel LVDS data transmission system transmits for a long distance, its transmission rate is often only a few hundred Mbps, which is difficult to meet the requirements of high-speed data transmission. Therefore, this paper proposes a comprehensive optimization scheme: in terms of hardware, two groups of four channel LVDS chips are used as the high-speed interface of data, which realizes eight channel LVDS transmission and improves the data transmission rate; However, as the rate increases, the probability of error code generation under the effect of crosstalk and electromagnetic interference also increases. Therefore, it is necessary to cooperate with the anti-interference ability and long-distance transmission ability of LVDS dedicated data driver and cable equalizer system; In addition, RS encoding and decoding technology is introduced into the software logic to achieve error correction within a certain range. With automatic retransmission technology and CRC verification, the reliability of data transmission is improved. After many tests, the design can finally achieve 4000mb/s zero error transmission under 80m twisted pair shielded cable.
  • Accepted: 2025-03-07
  • Sang, Xianzhen, Li, Min, Cheng, Hu, Wei, Jinghe, Zhao, Wei, Wang, Zhengxing
    Accepted: 2025-03-04
    A quantization method for convolutional networks based on in memory computation is designed to address the issue of network performance degradation caused by the use of statistical methods to calculate analog-to-digital conversion coefficients when deployed on in memory computing circuits. This method first quantifies the activation values and weight coefficients of the convolutional layer. Then, based on the characteristics of the single Tile data stream in the in memory computing circuit, design an analog-to-digital conversion coefficient quantization network. Afterwards, design a method based on KL divergence to calculate the analog-to-digital conversion coefficients. Finally, the analog-to-digital conversion coefficients are mapped to conductance values and fused with the activation values and weight quantization coefficients in the convolutional layer, and converted into shift and fixed-point multiplication forms to achieve the deployment of inference in the in memory computing circuit of the convolutional network. Through software simulation experiments, it has been shown that compared with other methods for calculating analog-to-digital conversion coefficients, the designed quantization method reduces network performance less and is suitable for multi bit width mixed quantization in convolutional networks. Due to the software simulation fully simulating the data flow process of in memory computing circuits, the proposed method can be applied in engineering on in memory computing circuits.
  • Accepted: 2025-03-03
    With the development of intelligent transportation systems, license plate recognition systems have transitioned from traditional PC platforms to portable embedded terminals, thereby imposing higher demands on the accuracy, speed, and security of existing license plate recognition systems. RISC-V is an instruction set architecture characterized by being open-source, streamlined, efficient, low-power, and modular, offering a high degree of flexibility. In this paper, a license plate recognition system based on the Hummingbird E203 RISC-V processor is designed, utilizing an improved eight-direction Sobel operator for high-precision edge detection. The system is implemented on the Da Vinci PRO development board. Experimental results show that the system has a recognition accuracy rate of 96%, with an average recognition time of around 45ms. It demonstrates high recognition accuracy and real-time performance. Compared to traditional license plate recognition systems, this system offers better cost-effectiveness.
  • Accepted: 2025-02-28
    To achieve high parallel computing, fully homomorphic encryption hardware acceleration systems require the instantiation of a large number of cryptographic primitive operation units. As the most crucial primitive operation in fully homomorphic encryption, the circuit implementation area of modular multiplication has a significant impact on the overall area of the acceleration system. Addressing issues such as excessive resource usage, limited parameter, and dependency on macro core IPs in existing modular multiplier designs, this paper presents an efficient Montgomery modular multiplier based on FPGA. At the algorithmic level, the multiplier reduces the computational load through techniques such as NTT-Friendly modulus characteristics, compression, and encoding. At the circuit level, it minimizes resource through methods like time-division and data integration. Furthermore, the multiplier supports parameter configuration to implement Montgomery modular multiplication for different widths. Experimental results demonstrate that, for width of 32bit, the designed Montgomery modular multiplier operates at a clock frequency of 223 MHz with a latency of 26.9 ns, utilizing 1313 LUTs and 213 FFs. Compared to the baseline, the resource consumption is reduced by 32% on average, and the latency is improved by 16% on average, while the design is more flexible and highly applicable.
  • Accepted: 2025-02-25
    The article is based on the LMK04828 high-performance clock chip, combined with the multi-board cascade clock multi-channel JESD204B synchronous sampling application scenario. It sequentially analyzes the impact of the division factor on the phase certainty of the clock output from two directions: the divider and the phase-locked loop. On this basis, a cross-board cascade clock synchronization verification system is designed, and the system is explained in terms of mode configuration precautions, the second-level phase-locked loop divider coefficient conditions, and timing constraints between the SYNC signal and SYSREF. A specific synchronization control process is provided. Finally, through repeated power-up and resynchronization experiments, as well as experiments of repeatedly triggering the SYSREF pulse output after a single power-up, it is confirmed that the phase relationship of the clock output from the cross-board cascade clock chip remains unchanged, verifying the effectiveness of the synchronization scheme and phase certainty.
  • Accepted: 2025-02-25
    Under complex electromagnetic environment conditions, multiple target signals appear simultaneously. The comprehensive testing system needs to have independent real-time analysis capabilities for simultaneously arriving signals. Due to the huge amount of broadband data, existing systems cannot seamlessly extract and process multiple target signals in real time. To meet the extraction requirements of high-density target signal raw IQ data within broadband, this paper uses FPGA high-speed digital signal processing technology to achieve multi-channel down conversion at any frequency point within the analysis bandwidth and variable analysis bandwidth extraction filtering. Combined with framing and time-sharing scheduling read control, it can simultaneously separate up to 32 channels of target signal raw IQ data with a maximum analysis bandwidth of 300kHz within an 80MHz analysis bandwidth using DDR4 buffering and PCIe 3.0×8 bus transmission. The system can be self-adapting for transmitting multi-channel IQ data at different rates while displaying the broadband signal spectrum, achieving seamless transmission of multi-channel IQ data. During the measurement process, it supports the modification of the number, frequency, and bandwidth parameters of the target signal at any time. This technology lays the foundation for real-time analysis of multi-target signal parameters within broadband, and improves the real-time analysis and processing capabilities of high-density signals in comprehensive testing systems.
  • wang, shuai
    Accepted: 2025-02-14
    Aiming at the problems of scarce hardware resources and low development and testing efficiency in airborne communication software development, a real-time simulation verification platform based on Qemu and Huawei's private cloud is designed and implemented. It simulates and runs a configurable embedded target machine environment and Rehua operating system, realizes TDMA protocol time slot simulation, and improves simulation real-time performance through the SCHED_RR priority strategy. It uses network namespaces and VPN technology to build a multi-node virtual network. This platform is used for real-time simulation and automated testing of embedded protocol software for airborne communication equipment, effectively improving the efficiency and quality of software development and testing.
  • Accepted: 2025-01-20
    Millimeter-wave radar, as an important sensing technology, is widely used in applications such as autonomous driving, intelligent transportation, security monitoring, and industrial inspection. It offers high precision and strong anti-interference capability.With the continual advancement in the research of power integrity, signal integrity, and thermal stability, significant progress has been achieved both domestically and internationally in these domains. However, existing studies predominantly focus on individual aspects, lacking a comprehensive consideration of the interactions between power noise, signal interference, and thermal effects. This study introduces a unified simulation approach that integrates power, signal, and thermal effects through multi-physics analysis to optimize the overall performance of millimeter-wave radar hardware systems. Additionally, a capacitor optimization strategy is proposed, which involves increasing capacitor configurations within critical frequency bands to effectively enhance power integrity and ensure system stability. Simulation and experimental results demonstrate that the proposed method significantly improves the impedance characteristics of the Power Distribution Network (PDN), reduces power noise and signal interference, and optimizes the system's thermal management. Through these innovative approaches, this research enhances the comprehensive performance of high-speed circuit systems from multiple dimensions, providing new optimization strategies and methodologies for the design of high-performance hardware tailored for millimeter-wave radar applications.
  • Accepted: 2025-01-15
    In the process of special equipment testing, the need for test data collection and storage, for the test process, the data transmission rate is not high and the problem of transmission reliability. In this paper, we design a data transmission system combining FPGA and Gigabit Ethernet, using UDP protocol to increase the data transmission rate while adding data retransmission mechanism and packet counting to improve the reliability of data transmission. The experiment is verified on XILINX's FPGA board, and the experimental results prove that FPGA+Gigabit Ethernet data transmission is feasible and effectively improves the data transmission rate, has good maintainability and stability, and can be applied in practical engineering.
  • Wang, Yao, Wen, Tiedun, Chen, Yaping, Zhang, Tianhong
    Accepted: 2025-01-10
    The electronic controller of an aero-engine is a complex circuit system designed with numerous large-scale integrated circuits as the core. The traditional contact-based fault injection and detection methods relying on physical probes can no longer meet the testability design requirements of such complex circuits. This paper proposes a fault injection and detection method based on boundary scan for the core circuit of the aero-engine electronic controller. Based on the analysis of the core circuit, a boundary scan daisy chain and a boundary scan controller are designed, which have the ability to conduct fault injection and detection based on the interconnection between chips and the boundary scan units inside the chips. Combined with the overspeed protection logic of the engine, the fault injection and detection functions of the two methods are verified.
  • 温, 志 贤
    Accepted: 2024-12-09
    Before the mass production of chip engineering, it is necessary to conduct a comprehensive circuit performance test on the chip, screen the chips that meet the requirements, and avoid unqualified chips from entering the market. PMIC (Power Management IC) is a power management chip that realizes a variety of functions such as power conversion, power conversion, and current control through built-in DC-DC converters, current control, and protection mechanisms. Therefore, it is necessary to consider whether the technical indicators of the chip meet the requirements of use. In this paper, taking the UC3842 chip as an example, an analog chip performance test scheme based on Huafon STS8200 is proposed. In this paper, the test methods and test procedures of several important parameters of the chip (reference voltage, load regulation, linear regulation, oscillator frequency, rising and falling edge time, etc.) are studied. Finally, the experimental results of each parameter are within the range of effective values, and the results show that after testing 10 chips and LOOP100 the 10th chip, the test yield of the chip is 100%. It shows that the test scheme is real and effective.