In this paper, we propose a state space dimension reduction control policy optimization method based on policy gradient learning for continuous control modeling tasks in robot autonomous navigation in complex environments. In this method, the position, speed, course Angle, obstacle distance and target association information in the original navigation state are compressed and encoded to construct a low-dimensional state representation, which is jointly trained with the policy network to weaken the interference of redundant features on action search and enhance the stability and convergence efficiency of continuous control output. The experimental results on the simulation environment and the real robot platform show that compared with PPO, DDPG and the method without dimension reduction module, the navigation success rate of the proposed method reaches 96.7%, the average path length is reduced to 18.6 m, the decision delay is controlled at 0.041 s, and the training reward tends to be stable after 420 rounds. The real platform successfully reached the task target point in 19 out of 20 rounds of testing. Ablation experiments further show that the state space dimension reduction module has a significant support effect on the control smoothness, the control smoothness, the performance of complex scenes and the stability of dynamic obstacle avoidance, which can provide more stable strategy search boundaries and more efficient online deployment capabilities for robot navigation tasks.
Povzetek: Ta članek predlaga metodo optimizacije strategije nadzora z redukcijo dimenzionalnosti prostora stanj za robotsko navigacijo, ki temelji na učenju z gradientom politike. Eksperimenti kažejo, da stopnja uspešnosti navigacije te metode doseže 96,7 %, povprečna dolžina poti se zmanjša na 18,6 m, zakasnitev odločanja je omejena na 0,041 s, stabilna konvergenca pa je dosežena po 420 iteracijah.