With the development of wearable devices and smart fitness platforms, exercise training is moving from empirical prescription to data-driven personalized decision making. Aiming at the problem that the traditional training scheme is difficult to adjust in real time with the change of individual state, fatigue and action quality, this paper constructs a personalized exercise training scheme optimization and real-time feedback system driven by reinforcement learning. The system fuses multi-source data such as heart rate, IMU signal, action trajectory, training log and subjective feedback, generates training state vector through time alignment, normalization processing, feature fusion and exercise ability portrait, and completes the dynamic decision of training item, load intensity, interval time and feedback type based on PPO policy network. Based on the 8-week training data of 60 subjects, the results show that the state recognition Accuracy of PPO model reaches 94.3%, the F1-score reaches 93.5%, the RMSE and MAE of training load prediction are reduced to 4.2 and 3.2 respectively, and the delay of system complete feedback link is 94 ms. The research provides technical support for real-time optimization and safety control of personalized sports training.