Abstract:Gesture recognition using millimeter-wave radar offers advantages such as contactless interaction, high detection accuracy, user privacy protection, and strong environmental adaptability, making it widely applicable in industrial human-computer interaction and smart home scenarios. However, existing millimeter-wave radar dynamic gesture recognition methods suffer from high model complexity, large computational costs, low recognition accuracy, and slow inference speed. To address these challenges, this paper proposes a millimeter-wave radar gesture recognition method based on an improved lightweight MobileViT network. Specifically, millimeter-wave radar echo signals of dynamic gestures are collected. After eliminating device noise and background interference, the data is reconstructed into a three-dimensional matrix with dimensions of sampling points × pulses × frames. The Fourier transform is then applied to generate the range-time and Doppler-time spectrograms, which are fed into the improved MobileViT network for feature extraction and fusion. Finally, the model outputs the recognized gesture categories. Experimental results demonstrate that the proposed MobileViT model reduces parameter space complexity to 0.167 M and computational complexity to 0.253 GFLOPs. The method is validated on a dataset containing 12 gesture types, achieving a recognition accuracy of 99.31%, proving its effectiveness.