Aiming at the problems of insufficient representation of single data source, fixed modal fusion weight and low prediction accuracy of continuous comfort score in clothing comfort prediction, a multimodal data fusion deep learning prediction model was constructed. Taking clothing images, human heat and humidity pressure sensing sequences, fabric structural parameters and environmental variables as input, the model uses convolution branch, bidirectional gated recurrent branch and structured coding branch to extract heterogeneous features, and completes dynamic weight allocation through cross-modal attention layer to output comfort level and continuous comfort score. The validation was carried out based on the GarmentComfort-MM dataset, and the total number of samples was 7200 groups. The experimental results show that the Accuracy, Precision, Recall and F1-score of the model on the test set reach 93.8%, 93.4%, 92.9% and 93.1%, respectively. The MAE and RMSE of the continuous comfort score prediction are 0.041 and 0.058, respectively. The Accuracy decreases by 4.6 percentage points after removing the sensing branch, which indicates that the dynamic information of heat and humidity pressure contributes significantly to the comfort prediction. The research can provide technical support for the intelligent evaluation of clothing comfort.