Aiming at the problems of insufficient perception of students ‘ability evolution, lagging response of enterprise job demand and dependence on experience for training path adjustment in traditional mode, this paper constructs a dynamic decision-making model of school-enterprise collaborative education driven by reinforcement learning. The model encodes students ‘learning behavior, course status, practical tasks, enterprise job requirements and collaborative feedback into 112-dimensional state features, and combines GRU time series modeling, job demand graph, cross-domain attention mechanism and action mask strategy to form an executable action space for course recommendation, practical task allocation, tutor matching and path adjustment. In the strategy optimization stage, the PPO policy network and multi-objective reward function were introduced to complete the continuous decision-making by comprehensive ability improvement, job adaptation, learning participation, enterprise satisfaction and resource consumption. The experiment was carried out based on 1260 students, 86 courses, 42 types of posts and 26780 effective interaction samples. The results show that the matching degree of the training program of the model in this paper reaches 91.7%, the job suitability increases to 89.1%, the average reward is stable at 0.86, and the stability score is 0.88. The research results provide technical support for the intelligent decision-making of school-enterprise collaborative education.