Home Browse Just Accepted

Just Accepted

Accepted, unedited articles published online and citable. The final edited and typeset version of record will appear in the future.
Please wait a minute...
  • Select all
    |
  • Accepted: 2025-06-17
    This paper proposes a spatial array-based processor core design where computing units communicate data via an interconnect bus and perform computations directly through local memory, eliminating the need for centralized register files. The processing results from local computing units are propagated to other units via a broadcast bus. This organization exhibits linear scalability, as the scale of computing units is not constrained by centralized components. It further incorporates flexible broadcast and reduction mechanisms that better align with algorithmic data communication patterns, facilitating efficient algorithm mapping and physical implementation. The processing element array implemented based on this design has high performance scalability, with a unit area performance of up to 1.4TOPS/mm2@INT8. It is suitable for large-scale deployment as a high computing power processor processing core.
  • Accepted: 2025-06-17
    With the development of semiconductor technology towards the deep sub-micron node, traditional synchronous circuits are facing increasing challenges in issues such as clock skew and high power consumption of the clock tree. Compared with traditional synchronous circuits, the asynchronous architecture, which uses a local handshake protocol to replace the global clock signal, is gradually becoming a new paradigm for high-performance computing chip design due to its modular design, lack of clock skew, and low-power consumption advantages.This paper focuses on the technological requirements for high-performance integrated circuit chips in future fields such as artificial intelligence and the Internet of Things. From the perspectives of the clock tree and the handshake protocol, it analyzes the limitations of synchronous circuits in large-scale integrated circuits, reviews the latest progress in asynchronous circuit technology, discusses the advantages and disadvantages of the handshake protocol in terms of operating speed, energy efficiency, and anti - interference, and prospects the next key research directions of asynchronous circuits.
  • Accepted: 2025-06-12
    Abstract:Aiming at the real-time performance degradation caused by frequent UART interrupts during high-volume data communication in embedded systems, this paper proposes a DMA driver optimization scheme for the FreeModbus protocol stack based on the GD32E230 microcontroller. By restructuring UART transmit/receive interrupt service routines and implementing DMA mechanisms, the solution significantly reduces UART interrupt frequency and CPU occupancy. Experimental results demonstrate that under 115200 baud rate, the interrupt triggers for 255-byte frame transmission decrease from 256 to 2 times, with a 99% reduction in CPU occupancy time. This optimization substantially alleviates system load, providing a cost-effective communication enhancement solution for resource-constrained embedded devices.
  • Accepted: 2025-06-12
    The demand for massive data processing such as artificial intelligence has greatly promoted the development of chiplet integration technology, which further puts forward technical requirements on FC-BGA substrates, including large size, low warpage, electrical performance and high reliability. The late-model glass core substrate has attracted extensive attention owing to its intrinsic low dielectric coefficient, high thermal stability and chemical inertness. This article overviews the history, characteristics, and present challenges of glass core substrate. It also provides a summary and prospect on the future application of glass core substrate in chiplet integration.
  • Accepted: 2025-06-12
    To improve the clock synchronization performance of the CT detector, a clock synchronization method based on a Time-to-Digital Converter (TDC) feedback mechanism and presents a complete prototype architecture of the synchronization system. For the first time, the TDC delay measurement technique based on carry chains is introduced into the clock synchronization scenario of CT detectors. A high-resolution delay measurement module is designed, incorporating multi-level comparators and high-precision delay elements to implement the timing scheduling logic. A finite state machine is employed to control the synchronization process, forming a closed-loop feedback synchronization mechanism. Simulation experiments demonstrate that in a complex scenario with 512 channels and a maximum transmission distance of 1.28 m, the system's synchronization precision can be stably maintained within 10 ps, improving by two orders of magnitude compared to traditional solutions. Even under conditions with interference-induced jitter, dynamic compensation can be applied to maintain the system's synchronization precision within 10 ps.
  • Accepted: 2025-06-04
  • Accepted: 2025-05-29
  • Accepted: 2025-05-28
    Neural network models demand high processor performance and energy efficiency. The memory-computing integrated architecture is an energy-efficient solution. This paper introduces a digital interface simulation scheme to address simulation verification challenges in analog memory-computing integrated designs, aiming to improve simulation efficiency in large-scale computing scenarios. The scheme analyzes SRAM-based memory-computing integration and combines the SPICE model with a digital control circuit. This enables the use of digital methods for simulation and verification, potentially boosting development efficiency. An evaluation system comparing the digital interface simulation with traditional analog circuit simulation reveals that the proposed scheme achieves over 2x simulation speed and 1000x configuration efficiency. Supported by the Ministry of Science and Technology (2021YFB3601300), this research has been validated with tape-out at the 180nm process node, demonstrating the efficiency advantages of the digital interface simulation scheme for memory-computing integrated design in large-scale computing.
  • Accepted: 2025-05-28
  • Accepted: 2025-05-21
    Aiming at the data acquisition challenges of SSI protocol absolute encoders, this study proposes a method to directly acquire data from SSI protocol absolute encoders using DSP's SPI interface or McBSP interface. The system employs the TMS320F28335 as the master control chip and designs an interface circuit between the DSP and the SSI protocol absolute encoder. By properly configuring the registers of the DSP's SPI and McBSP interfaces and processing the interface data, the objective of directly obtaining 13-bit SSI protocol absolute encoder data via the DSP is achieved. This method can be extended to DSP-based acquisition of SSI protocol absolute encoder data of any bit-length, demonstrating strong practical applicability.
  • 许, 嘉珩
    Accepted: 2025-05-21
    High-precision TDC requires a multiphase clock with low jitter and low latency to ensure its normal operation. Therefore, a variety of key technologies have been adopted to optimize the design of modules such as the structure of the DLL system, charge pump, voltage-controlled delay line, and lock detection circuit. Sub-gate-level delay line technology enables DLL systems to generate multiphase clock signals with a delay of only 1.25ps. The design of the three-level structure enables the DLL system to get rid of the reliance on high-precision and high-frequency reference clock signals, and it can work normally using a 100MHz reference clock signal. The application of technologies such as current steering and dual clamping can effectively suppress the influence of non-ideal effects such as charge sharing and channel length modulation, and effectively improve the performance of static phase error and peak-to-peak jitter. Based on the 65nm process of TSMC, this paper has completed the design, simulation and wafer fabrication verification of the circuit. The simulation results show that the DLL system can achieve the set functions. The post-simulation results are as follows: The static phase error is approximately 13.19ps, the peak-to-peak jitter performance is approximately 1.01ps, and the system power consumption is 82.5mW. Ultimately, the test results show that the frequency range of the system is approximately 50MHz-320MHz, and the locking time is approximately 117.5 us.
  • Accepted: 2025-05-09
    This paper introduces a clock system based on MCU, including the components and advantages and disadvantages of the clock system. A clock architecture suitable for automotive electronic MCU is proposed, and the overall design framework is given. The working principle of the system and the relationship between the system clock and power consumption are analyzed in detail, and the clock design related to functional safety is described. Under the 40nm process library for rail technology, the circuit was simulated using soft tool and implemented and applied in the CKS32K1XX chip project.
  • Accepted: 2025-05-08
    Due to the high incidence risk of neonatal jaundice and the limitations of conventional monitoring methods, there is a clinical need for non-invasive dynamic jaundice detection devices. This study designed a wearable system for dynamic jaundice level and physiological parameter detection. The system includes a forehead-mounted wireless physiological signal collector, a Bluetooth communication host and a data analysis platform based on LabVIEW. The system uses the highly integrated MAX86916 optical sensor to synchronously collect four-channel photoplethysmography (PPG) signals, and obtains bilirubin, heart rate, and blood oxygen saturation data through signal processing. The nRF52832 microcontroller with Bluetooth Low Energy (BLE) protocol is used for wireless communication. The data is transmitted wirelessly to the Bluetooth communication host, and then uploaded to the LabVIEW program of the PC via USB for display and storage. To verify the functionality and performance of the system, basic parameter tests, and comparative tests of each physiological parameter with commercial instruments were conducted in sequence. The device (13.5g, 41mm×29.7mm×15.1mm) can monitor continuously for 5 hours. The detection range of the bilirubin prediction model is 1~13mg/dL. Measurements demonstrated bilirubin, heart rate and blood oxygen were in strong agreement with standard instruments (p>0.05). And the maximum absolute errors for bilirubin, HR, and SpO₂ measurements were 0.97mg/dL, 1bpm, and 2.1%, respectively. Moreover, the system can sensitively track the decrease in heart rate after exercise and the changes in blood oxygen during breath-holding. The system demonstrates accurate monitoring of jaundice, heart rate, and blood oxygen levels. It integrates the advantages of lightweight design, wearability, multi-parametric detection, and non-invasive dynamic measurement capabilities. After further optimization, it is expected to be applied in neonatal clinical monitoring.
  • Wang, Yiming, Wang, Yijiao, Wu, Jiayao, Zou, Tao
    Accepted: 2025-05-07
    A compact model of Fully Depleted Silicon-on-Insulator (FDSOI) based single transistor pixel sensor (1T-PS) affected by total ionizing dose (TID) effects is investigated, which considers the generation of fixed charges within the buried oxygen (BOX) layer and interface states at the surface below BOX. The corresponding relationship between irradiation dose and threshold voltage degradation of 1T-PS can be obtained through this model. By integrating the model into BSIM-IMG and executing the typical image input layer of an artificial neural network, the impact of the TID effects on the accuracy of 1T-PS arrays performing vector matrix multiplication (VMM) is studied. The results indicate that after exposure to a total ionizing dose of 600 krad(Si), the recognition accuracy decreases to approximately 85%.
  • Accepted: 2025-05-06
    To enhance the real-time performance, scalability, and reliability of CT detector nodes, and to meet the bus communication requirements of the current slip ring equipment, a CT detector CANopen node design method is proposed. This method involves hardware node design using the XMC4000 series microcontroller to build the hardware platform, as well as adaptation of hardware drivers and a real-time operating system. A customized object dictionary is designed to include command control and status feedback. By optimizing the state machine and message task flow, task real-time performance is improved.Test results show that the functionality complies with the CANopen protocol specifications, and the electrical characteristics meet the requirements for voltage, edge transition, and bit width testing. Additionally, under continuous transmission of 5,156 frames, the success rate reaches 100%, demonstrating high reliability. The bus delay does not exceed 480ns, ensuring excellent real-time performance. The designed object dictionary can be reused and transplanted in CT systems with different hardware platforms, effectively supplementing the current medical device configuration file (CiA-412).
  • Accepted: 2025-04-28
    Aiming at the problem of data noise interference in the monitoring of bridge bearing inclination and stress, a real-time monitoring data preprocessing system based on Kalman filter is designed in this paper. The system uses classical Kalman filtering algorithm to preprocess the tilt angle and stress data collected by the sensor, which effectively suppresses the noise interference. Based on the parallel computing advantage of FPGA, the functions of data acquisition, filtering and transmission are realized, and the monitoring data of bridge bearings are dynamically displayed in real time combined with the upper computer. The experimental results show that the signal-to-noise ratio (SNR) of the filtered stress signal is 32.10 dB, the mean square error (MSE) is 0.0518, the correlation coefficient is 0.5811, and the variance ratio is 0.7018; The SNR, MSE, correlation coefficient and variance ratio of inclination signal after filtering are 33.98 dB, 0.0418, 0.07032 and 0.6291, respectively, indicating that the filtering algorithm can effectively suppress noise interference. The system is stable and reliable, can achieve rapid data processing ability, and can meet the needs of real-time monitoring of bridge bearing inclination and stress. At the same time, this paper uses ASIC process to design the Kalman filter algorithm chip, and realizes the Kalman filter circuit based on 180nm CMOS process. The clock frequency is 160MHz, and the comprehensive area is 599131um ².
  • Accepted: 2025-04-18
    Based on HKN201 of Xiangteng Micro as the core, a strongly real-time, highly concurrent and high-performance edge-end embedded intelligent computing system has been constructed. Combining the design concept of high-speed circuits and practical engineering applications, the basic design methods and effective control measures for power integrity and signal integrity are provided, and simulation analysis is carried out with simulation results given. Finally, application tests are conducted on this system and performance indicators are presented. The experimental results show that this scheme has the characteristics of being universal, highly scalable and highly reliable, providing references for the research on intelligent products
  • Accepted: 2025-04-14
    High-speed SerDes rates have progressed from 56Gbps to 112Gbps and beyond. Preserving signal integrity at these ultra-high speeds while balancing power consumption, reliability, flexibility, and cost-effectiveness is a hot topic in current research. The latest research progress in key technologies related to 112G SerDes is deeply explored from four aspects—transmitter, receiver, clock structure, and low-power techniques—based on the current mainstream architecture of analog-to-digital conversion and digital signal processing. This exploration is provided as a reference for research related to high-speed SerDes technology.