Music cultural emotion modeling faces the problems of strong subjectivity of labeling, obvious differences in cultural contexts, and high cost of complex model deployment. In order to improve the usability and reliability of lightweight models in music emotion recognition, this paper proposes a selective knowledge distillation method. The method takes audio spectrogram, lyrics semantics and cultural labels as multi-source inputs, generates emotional soft labels and hidden layer representations by the teacher model, and dynamically adjusts the distillation weight by combining prediction confidence, modal consistency and category reliability, so that the student model preferentially absorbs high-confidence emotional knowledge. Experimental results show that the proposed method achieves an Accuracy of 0.837, a Macro-F1 of 0.819 and an AUROC of 0.914 in the comprehensive test, and the ECE is reduced to 0.052. The performance is close to the full teacher model with 19.2M parameters and 9.1ms inference delay. In the cross-dataset validation, the AUROC of the model remains between 0.837 and 0.876, and the CCC reaches 0.657 under the low-label condition. The results show that selective knowledge distillation can effectively coordinate model compression, emotion discrimination and confidence calibration, and provide reliable technical support for intelligent music recommendation, digital aesthetic education and human-computer emotional interaction.