Aiming at the problems of weak emotional expression range, large modal differences and lack of dynamic adaptation of music intervention in the elderly, this paper proposes a deep learning based emotion recognition and personalized music therapy system for the elderly. This method fuses speech, face, physiological and interaction information, constructs a convolutional coding, bidirectional temporal modeling and cross-modal attention coordination framework, and realizes the stable discrimination of emotional states in the elderly. On this basis, a recommendation mechanism combining user preference, historical feedback and music content characteristics is introduced to form a closed loop of “identification-recommendation-update”. The experimental results show that the Accuracy, Macro-F1 and AUC of the proposed model reach 93.84%, 92.47%and 95.12%respectively. After 4 weeks of intervention, the emotional improvement rate of the experimental group increases to 27.6%, and the average inference delay of the system is 23.6 ms. The research shows that this method has good application potential in intelligent elderly care and digital music therapy scenarios.