The integrated development of artificial intelligence and music education has put forward new requirements for objective and computable teaching quality evaluation. In this paper, we propose MusicEval-Net, a multimodal evaluation framework that fuses classroom audio, learner behavior logs, and textual feedback for teaching quality analysis of music courses. The system extracts pitch stability, rhythm deviation, sound intensity, interaction frequency, task completion, and semantic emotion features, which are mapped to teaching quality indicators through a multimodal deep learning model. Experiments were conducted in a university music course involving 120 students, 24 teaching sessions, and 1860 labeled samples. Compared with the manual scoring and the unimodal baseline, the Accuracy of the proposed model reaches 91.8%, the F1-score is 0.903, the MAE is 0.217, and the Cohen’s κ is 0.84. The model maintains stable results in terms of performance evaluation, participation recognition and feedback consistency judgment, and is incorporated into the manual review link, providing a computational path for scalable, data-driven and traceable teaching quality evaluation of music courses.