Aiming at the problem that ceramic process evolution recognition relies on empirical interpretation and lacks multi-source collaboration in assisted analysis of cultural relic restoration, a multi-modal data system that fuses visible light images, microscopic images, spectral detection, three-dimensional morphology and text description is constructed, and an integrated model consisting of feature coding, cross-modal fusion, process evolution analysis and assisted restoration recommendation is proposed. In this study, convolutional neural network, Transformer, spectral line coding network and pre-trained language model are used to realize heterogeneous information unified representation, and multi-task learning is used to complete generation recognition, kiln mouth classification, decoration style discrimination and disease analysis. The experiment was divided into training set, validation set and test set according to 70%, 15% and 15%. The complete model achieved 93.8%, 90.6% and 89.8% in process evolution recognition, disease detection and repair assistant recommendation tasks, respectively, which were significantly better than 78.6%, 71.4% and 69.8% of traditional feature methods. The results show that multimodal learning can effectively improve the ability of process knowledge extraction, damage diagnosis and repair assistant decision-making of ceramic cultural relics.