Rural revitalization and county industrial upgrading put forward higher requirements for the chain sinking path identification and synergy effect prediction of specialized and special new enterprises. This paper constructs a multi-modal data-driven collaborative evolution model of specialized and special new enterprise chain sinking and rural revitalization, which integrates enterprise structured data, policy text, spatial location, supply chain collaboration, rural development indicators and remote sensing auxiliary information. Transformer semantic encoding, spatial embedding, heterogeneous graph relational learning, multi-modal attention fusion and GRUN-Temporal Transformer time series prediction methods are used to realize enterprise sinking path identification, synergistic effect measure and evolution trend prediction. The experimental results show that the Accuracy of the proposed model in the chain sinking path recognition task reaches 93.8%, the F1-score reaches 92.3%, and the AUC reaches 95.1%. After full modal fusion, RMSE decreased from 0.186 to 0.121, R² increased to 0.914, and the proportion of true value falling into 95% confidence interval reached 93.7%. In the path optimization experiment, the comprehensive synergy effect score of the model reaches 0.89, the prediction error is reduced to 0.072, and the stability index is 0.93. The research shows that this method can improve the identification accuracy of enterprises ‘sinking path and the prediction ability of rural revitalization co-evolution, and provide technical support for the digital decision-making of county industrial layout and rural revitalization.