TFLite-micro(TFLm) is a popular inference engine on MCU.We analyze the memory management mechanism and allocation strategy of TFLm,and the limitations.Currently,TFLm can only support the use of a single block of memory (Tensor Arena) for intermediate results required by model inference.This paper optimizes the memory management of TFLm to support the use of multiple blocks of discontinuous memory with very different read-write performance,and also creates overlaying tensors when possible.This improvement,not only more data traffic is drainaged to the on-chip fast memory,but also the peak memory usage is reduced.The experiment on i.MX RT1170 shows that the strategy in this paper can greatly improve the utilization of fast on-chip RAM for microcontrollers,which significantly alleviate the bottleneck of memory bandwidth,and shorten the inference time by up to more than a half.
Key words
TFLite-micro /
TFLm /
TinyML /
Tensor Arena /
i.MX RT1170 /
DTCM
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] David R,Duke J,Jain A,et al.TensorFlow Lite Micro:Embedded Machine Learning on TinyML Systems[J].2020.
[2] IC Insights.MCUs Expected to Make ModestComeback after 2020 Drop,2020.
[3] Lai L,Suda N,Chandra V. CMSISNN:Efficient Neural Network Kernels for Arm CortexM CPUs[J].2018.
[4] Howard A G,Zhu M,Chen B,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].2017.
[5] Sandler M,Howard A,Zhu M,et al.Inverted Residuals and Linear Bottlenecks:Mobile Networks for Classification,Detection and Segmentation[J].2018.