In the study, the EEG signals are firstly acquired and processed, and then a multidomain fusion feature extraction algorithm is proposed, which fuses two algorithms, Improved Localized Feature Scale Decomposition (ILCD) and Adaptive Common Spatial Patterns (ACSP), to extract the features of both time-frequency and spatial domains. In order to improve the classification ability of MI signals, a convolutional neural network model based on spatial self-attention and multi-timescale feature extraction is designed to realize the classification of MI signals under motion by introducing multi-scale feature extraction and attention mechanism. Finally, a rehabilitation training system based on the algorithm of this paper was designed using mixed programming in Matlab and C. Subjects were selected for validation. The experimental results show that in the actual experiments with several subjects, the classification accuracy of this paper’s algorithm is up to 82%, and the average classification accuracy is 62.19%, and the rehabilitation training system built by the research can accurately extract the user’s EEG signals in real time and accurately control the movement of the rehabilitation robot according to the user’s imagination.