Action recognition and evaluation of traditional martial arts routines has always been difficult, and an intelligent solution based on deep learning is proposed to address this problem. This solution combines the convolutional neural network as well as the long and short-term memory network algorithms, the convolutional neural network is used to extract the spatial features of each video frame, and the long and short-term memory network is used to mine the characteristics of the action sequences of the wushu routines. Meanwhile, the fusion of different scale features is carried out in the feature layer, and the public pose estimation algorithm is used to predict the position of human joints as auxiliary information, and the position of joints and the image features are weighted and fused by attention to get the final image features. Finally, the end-to-end approach is used for model training, combined with the multi-objective classification regression method to improve the classification performance and prediction effect. The test shows that the recognition accuracy of the five types of traditional martial arts movements reaches 94.2%, and the correlation between the system prediction value and the score given by the boxer is r=0.89, which indicates that this system has certain practical application value and feasibility.