Aiming at the problems of continuous fluctuation of emotional expression, significant individual differences, and difficulty in unified modeling of multimodal heterogeneous data in human-centered emotional interaction, this paper proposes a personalized emotion dynamic deep learning model. Focusing on multi-source data such as text, speech, expression images and interactive behaviors, this paper constructs a unified data representation and preprocessing process, and combines multi-modal feature coding, emotional dynamic time series modeling and personalized adaptation mechanism to jointly depict the user’s emotional state and its change trajectory. In the experimental part, MELD and CMU-MOSEI are selected as the main data sources, and the performance of the model is evaluated by Accuracy, Recall, F1-score and MSE. The results show that the Accuracy of the proposed model reaches 88.64%, the Recall is 87.91%, the F1-score is 88.23%, and the MSE is reduced to 0.098. Compared with Multimodal MLP, the Accuracy is increased by 10.23%, and the F1-score is increased by 10.41%, indicating that the method has good effect and application potential in emotion recognition, dynamic characterization and personalized adaptation.
Povzetek: Aiming at the needs of human-centered emotional interaction, this paper constructs a personalized emotion dynamic deep learning model, which integrates multi-modal information such as text, speech, expression and interactive behavior, and realizes emotion recognition and change trajectory characterization through time series modeling and personalized adaptation. Experimental results show that the proposed model is superior to the comparison models in accuracy, F1 value and mean square error.