Selective Knowledge Distillation in Music Cultural Emotion Modeling: A Mechanistic Study of Usability and reliability

Liu, Yang; Wang, Hao; Liu, Yukai

doi:10.65102/is2026910

Research article

Ingegneria Sismica

Volume 43 Issue 2
Pages: 1
-22

Selective Knowledge Distillation in Music Cultural Emotion Modeling: A Mechanistic Study of Usability and reliability

Author(s): ^¹, ^², ^³

¹School of Cruise and Art Design, Jiangsu Maritime Institute, Nanjing, 211170, China

²School of Music and Dance, Nanjing Normal University of Special Education, Nanjing, 211170, China

³School of Computer Science and Engineering of University of New South Wales, Sydney 2052, New South Wales, Australia

Published: 30/04/2026

Cite

Liu, Yang., Wang, Hao., and Liu, Yukai. “Selective Knowledge Distillation in Music Cultural Emotion Modeling: A Mechanistic Study of Usability and reliability.” Ingegneria Sismica Volume 43 Issue 2: 1-22, doi:10.65102/is2026910.

https://doi.org/10.65102/is2026910

Abstract

Music cultural emotion modeling faces the problems of strong subjectivity of labeling, obvious differences in cultural contexts, and high cost of complex model deployment. In order to improve the usability and reliability of lightweight models in music emotion recognition, this paper proposes a selective knowledge distillation method. The method takes audio spectrogram, lyrics semantics and cultural labels as multi-source inputs, generates emotional soft labels and hidden layer representations by the teacher model, and dynamically adjusts the distillation weight by combining prediction confidence, modal consistency and category reliability, so that the student model preferentially absorbs high-confidence emotional knowledge. Experimental results show that the proposed method achieves an Accuracy of 0.837, a Macro-F1 of 0.819 and an AUROC of 0.914 in the comprehensive test, and the ECE is reduced to 0.052. The performance is close to the full teacher model with 19.2M parameters and 9.1ms inference delay. In the cross-dataset validation, the AUROC of the model remains between 0.837 and 0.876, and the CCC reaches 0.657 under the low-label condition. The results show that selective knowledge distillation can effectively coordinate model compression, emotion discrimination and confidence calibration, and provide reliable technical support for intelligent music recommendation, digital aesthetic education and human-computer emotional interaction.

Keywords
music cultural emotion modeling; Selective knowledge distillation; Model reliability; Lightweight emotion recognition