Unleash All Cores: Asymmetry-aware Scalable DNN Inference on Mobile CPUs

Abstract

Modern mobile CPUs typically adopt asymmetric multi-core architectures, where the large performance gap between big and little cores becomes a key bottleneck for on-device DNN inference. Existing inference engines fail to fully exploit all cores because naive parallelization suffers from load imbalance, while big-core-only execution wastes little-core compute capacity. We present SANI, an asymmetry-aware scalable DNN inference system for mobile CPUs. SANI combines core-aware task partitioning that matches per-core compute capability, dynamic load scheduling that rebalances work at runtime, and asymmetry-aware kernel transformation that reshapes operator implementations for heterogeneous cores. The evaluation on commercial mobile SoCs shows that SANI effectively balances core utilization, substantially reducing inference latency and energy consumption compared with state-of-the-art mobile inference engines.

Publication
In USENIX Symposium on Operating Systems Design and Implementation (OSDI)
QianLong Sang
QianLong Sang
Fourth Year CS Phd

My research interests include operating system, computer architecture and AIOS.