The development of digital humanities and intelligent audio analysis technology provides a new computational path for the emotional research of multi-ethnic folk songs. Taking the multi-ethnic Spring Festival ballads of Gansu Province as the object, this paper constructs a sentiment classification model by focusing on corpus collection, audio preprocessing, MFCC parameter extraction and LSTM neural network training. A total of 526 valid samples were sorted out, with a cumulative duration of 1149.2 minutes. The samples were labeled as four types of emotions: celebration, blessing, thinking and expressing, and narrative peace, and 39 dimensional MFCC temporal features were extracted as model input. Experimental results show that the model training loss decreases from 1.31 to 0.11, the training accuracy reaches 91.9%, and the validation accuracy reaches 85.4%. In the test set of 104 samples, the model correctly identified 90 samples, and the overall accuracy was 86.5%, the Precision, Recall and F1-score were 86.5%, 86.6% and 86.6%, respectively, which were better than SVM, CNN, RNN and GRU. The results show that the combination of MFCC and LSTM can effectively represent the emotional acoustic features in the Spring Festival songs, which provides technical support for the digital protection, emotional label construction and intelligent retrieval application of multi-ethnic Spring Festival songs in Gansu province.