Outline

Ingegneria Sismica

Ingegneria Sismica

Task-Consistent Bayesian Domain Inference via Performance Distributions for Deep Reinforcement Learning Policy Deployment

Author(s): Xiang Fu1, Kewei Chen1
1Faculty of Mechanical Engineering & Mechanics, Ningbo University, Ningbo, 315211, China
Fu, Xiang. and Chen, Kewei. “Task-Consistent Bayesian Domain Inference via Performance Distributions for Deep Reinforcement Learning Policy Deployment.” Ingegneria Sismica Volume 43 Issue 3: 1-23, doi:10.65102/is20261261.

Abstract

Although Deep reinforcement learning (DRL) has achieved remarkable success in robotics, the policies learnt in simulation often experience severe performance drops in the real world because of the reality gap. To reduce this performance gap, we propose a Task-Consistent Bayesian Inference (TCBI) framework for sim-to-real transfer. Rather than relying on intractable dynamics likelihoods or matching on high-dimensional trajectories, TCBI builds the task-level pseudo-likelihood based on the divergence of simulated and real performance distribution. In our formulation, the reward statistics, the body posture distributions and the contact time ratios are all compact task-oriented performance statistics of the distribution that characterizes task-specific domain discrepancy. Our design thus supports likelihood-free Bayesian inference effectively and robustly and also improves computational efficiency. We demonstrate the proposed framework on a six-legged robot in both balance task and forward locomotion task. Our experimental results show that TCBI always lowers the reward distribution disparity and also achieves better real-world performance than domain randomization, ABC, and simulation optimization (SimOpt) do. Ablation studies further show that incorporating reward, posture, and contact statistics can further improve the posterior identifiability and policy stability compared with using the reward distributions. Moreover, posterior variance analysis tells us that parameter concentrations in the inference process are progressive, and wall-clock time comparison also demonstrates that the computational cost of the method is much lower than that of trajectory-based methods. Robustness experiments under sensor noises further verify the stability and generalization capability of the method that we presented. All of these experimental results clearly point out that task-level probabilistic inference gives us an efficient, robust and scalable solution for sim-to-real deployment of reinforcement learning methods.

Keywords
Deep Reinforcement Learning; Sim-to-Real Transfer; Domain Adaptation; Bayesian Inference; Simulation Optimization

Related Articles

Liqin Zheng1, Dongrui Qing2, Yan Zhang1
1School of Mathematics and Statistics, Shaan Xi Xue Qian Normal University Xi’an 710100, P.R.China
2School of Marxism, Xi’an University of Finance and Economics Xi’an 710100, P.R.China
Yanan Gao1, Aiqun Peng2, Nina Ma2
1Management School of Anhui Business and Technology College Hefei 230000, Anhui, China
2Economics and Trade School of Anhui Business and Technology College Hefei 230000, Anhui, China
Ya’ning Liu1, Ping Ma1
1School of Teacher Education, Shihezi University, Shihezi, Xinjiang, 832000, China
Yuhui Li1, Zhongliang Gong1
1College of Mechanical and Intelligent Manufacturing, Central South University of Forestry and Technology, Changsha, Hunan, 410004, China
Hanqing Hu1, Chengjin Liu1, Tianmu Tian1
1School of Management Science and Engineering, Beijing Information Science & Technology University, Beijing 100192