Aiming at the problem that static route recommendation in smart tourism scenarios is difficult to adapt to changes in traffic, weather, passenger flow and tourist preferences, this paper constructs a personalized route dynamic programming model based on reinforcement learning. The model takes tourists’ historical behavior, real-time context, spatial location and itinerary constraints as state input, and combines dynamic action space screening, tourist experience-oriented reward function and asynchronous advantage actor-critic training mechanism to realize the continuous optimization of route recommendation. Experimental results show that the HR@5 and HR@10 of A3C-RL model reach 56.8% and 82.5%, the comprehensive score of route rationality is 87.6, the success rate of dynamic event transfer is 88.0%, and the average replanning delay is 1.12 s. The results show that the proposed method can improve the accuracy of route recommendation, the personalized matching degree and the real-time response ability in unexpected situations, and provide a feasible calculation method for intelligent tourism service optimization.