In order to improve the accuracy of students ’emotion intelligent recognition and the real-time performance of teaching adjustment in the Belarusian folk music classroom, this paper constructs a computational framework that fuses multimodal perception and classroom adaptation decision-making. Based on 72 real classroom records, 118 students and 14 representative works, a multi-source dataset covering acoustic, visual, behavioral and teaching context information was established, and a gated fusion recognition model was designed to realize the joint representation of emotional perception, emotional understanding, collaborative participation and regulatory stability. On this basis, the generation mechanism and real-time feedback process of classroom teaching adaptation strategy are further constructed. Experimental results show that the Accuracy of the proposed model reaches 91.62% and Macro-F1 reaches 90.84%, which are 3.25% and 3.20% higher than those of the single Transformer model respectively. The strategy matching rate of the system adaptation group was 91.8%, the average response time was 1.84 s, and the improvement of classroom participation reached 18.7%. The research shows that embedding artificial intelligence methods into Belarusian folk music teaching can provide interpretable and deployable technical paths for classroom emotion recognition and precise intervention.