Outline

Ingegneria Sismica

Ingegneria Sismica

Optimization method of robot navigation state space dimension reduction control policy based on policy gradient learning

Author(s): Yunfeng Gao1, Jianan Li1, Bingbing Pan1
1Instrument and Electronics School of North University of China 030051, Taiyuan, China
Gao, Yunfeng., Li, Jianan., and Pan, Bingbing. “Optimization method of robot navigation state space dimension reduction control policy based on policy gradient learning.” Ingegneria Sismica Volume 43 Issue 1: 1-23, doi:10.65102/is2026095.

Abstract

In this paper, we propose a state space dimension reduction control policy optimization method based on policy gradient learning for continuous control modeling tasks in robot autonomous navigation in complex environments. In this method, the position, speed, course Angle, obstacle distance and target association information in the original navigation state are compressed and encoded to construct a low-dimensional state representation, which is jointly trained with the policy network to weaken the interference of redundant features on action search and enhance the stability and convergence efficiency of continuous control output. The experimental results on the simulation environment and the real robot platform show that compared with PPO, DDPG and the method without dimension reduction module, the navigation success rate of the proposed method reaches 96.7%, the average path length is reduced to 18.6 m, the decision delay is controlled at 0.041 s, and the training reward tends to be stable after 420 rounds. The real platform successfully reached the task target point in 19 out of 20 rounds of testing. Ablation experiments further show that the state space dimension reduction module has a significant support effect on the control smoothness, the control smoothness, the performance of complex scenes and the stability of dynamic obstacle avoidance, which can provide more stable strategy search boundaries and more efficient online deployment capabilities for robot navigation tasks.

Povzetek: Ta članek predlaga metodo optimizacije strategije nadzora z redukcijo dimenzionalnosti prostora stanj za robotsko navigacijo, ki temelji na učenju z gradientom politike. Eksperimenti kažejo, da stopnja uspešnosti navigacije te metode doseže 96,7 %, povprečna dolžina poti se zmanjša na 18,6 m, zakasnitev odločanja je omejena na 0,041 s, stabilna konvergenca pa je dosežena po 420 iteracijah.

 

Keywords
Policy gradient learning; Robot navigation; State space dimension reduction; Control strategy optimization

Related Articles

Huiqiao Liu1
1Yinchuan University of Energy, Ningxia, 750000, China
Xin Zhao1, Yan Li1, Xiangyang Cao1, Qiushuang Li1, Jianing Zhang1
1State Grid Shandong Electric Power Company Economic and Technological Research Institute ShanDong JiNan 250001, China
Dan Yang1
1School of Marxism, Suzhou Polytechnic University, Suzhou, 215104, China
Liuhang Shen1, Xiangwen Sun1
1Ulster college at Shaanxi University of Science &Technology, Xi’an,710021, Shaanxi, China