Task-Consistent Bayesian Domain Inference via Performance Distributions for Deep Reinforcement Learning Policy Deployment

Fu, Xiang; Chen, Kewei

doi:10.65102/is20261261

Research article

Ingegneria Sismica

Volume 43 Issue 3
Pages: 1
-23

Task-Consistent Bayesian Domain Inference via Performance Distributions for Deep Reinforcement Learning Policy Deployment

Author(s): ^¹, ^¹

¹Faculty of Mechanical Engineering & Mechanics, Ningbo University, Ningbo, 315211, China

Published: 10/06/2026

Cite

Fu, Xiang. and Chen, Kewei. “Task-Consistent Bayesian Domain Inference via Performance Distributions for Deep Reinforcement Learning Policy Deployment.” Ingegneria Sismica Volume 43 Issue 3: 1-23, doi:10.65102/is20261261.

https://doi.org/10.65102/is20261261

Abstract

Although Deep reinforcement learning (DRL) has achieved remarkable success in robotics, the policies learnt in simulation often experience severe performance drops in the real world because of the reality gap. To reduce this performance gap, we propose a Task-Consistent Bayesian Inference (TCBI) framework for sim-to-real transfer. Rather than relying on intractable dynamics likelihoods or matching on high-dimensional trajectories, TCBI builds the task-level pseudo-likelihood based on the divergence of simulated and real performance distribution. In our formulation, the reward statistics, the body posture distributions and the contact time ratios are all compact task-oriented performance statistics of the distribution that characterizes task-specific domain discrepancy. Our design thus supports likelihood-free Bayesian inference effectively and robustly and also improves computational efficiency. We demonstrate the proposed framework on a six-legged robot in both balance task and forward locomotion task. Our experimental results show that TCBI always lowers the reward distribution disparity and also achieves better real-world performance than domain randomization, ABC, and simulation optimization (SimOpt) do. Ablation studies further show that incorporating reward, posture, and contact statistics can further improve the posterior identifiability and policy stability compared with using the reward distributions. Moreover, posterior variance analysis tells us that parameter concentrations in the inference process are progressive, and wall-clock time comparison also demonstrates that the computational cost of the method is much lower than that of trajectory-based methods. Robustness experiments under sensor noises further verify the stability and generalization capability of the method that we presented. All of these experimental results clearly point out that task-level probabilistic inference gives us an efficient, robust and scalable solution for sim-to-real deployment of reinforcement learning methods.

Keywords
Deep Reinforcement Learning; Sim-to-Real Transfer; Domain Adaptation; Bayesian Inference; Simulation Optimization

Research article
https://doi.org/10.65102/is20261302

Visual analysis of related hotspots affecting Diab...

Volume 43 Issue 3
Pages: 1
-18
08/07/2026

^¹, ^¹

¹Guangzhou University of Chinese Medicine, School of Pharmaceutical Medicine, Guangzhou,Guangdong,China,510006

Research article
https://doi.org/10.65102/is20261300

Research on high-quality image super-resolution re...

Volume 43 Issue 3
Pages: 1
-21
08/07/2026

^¹,², ^¹,², ^¹

¹Hainan Vocational University of Science and Technology, Haikou 571126, China

²Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Malaysia

Research article
https://doi.org/10.65102/is20261301

Multi-scale Dual Transformer based Multi long-term...

Volume 43 Issue 3
Pages: 1
-18
08/07/2026

^¹,², ^¹,², ^¹

¹Hainan Vocational University of Science and Technology, Haikou 571126, China

²Institute for Mathematical Research, Universiti Putra Malaysia, Serdang 43400, Malaysia

Research article
https://doi.org/10.65102/is20261299

Ultra-Short-Term Wind Power Forecasting Based on V...

Volume 43 Issue 3
Pages: 1
-15
08/07/2026

^¹, ^², ^¹, ^¹, ^¹

¹Electric Power Research Institute, State Grid Shanxi Electric Power Co., Ltd., Taiyuan, 030001, Shanxi, China

²Jincheng Power Supply Branch, State Grid Shanxi Electric Power Co., Ltd., Jincheng, 048000, Shanxi, China

Research article
https://doi.org/10.65102/is20261298

Integration of Traditional Culture Elements and Co...

Volume 43 Issue 3
Pages: 1
-12
01/07/2026

^¹,²

¹China Academy of Cultural Heritage, Chaoyang District, 100029, Beijing, China

²Beijing University of Civil Engineering and Architecture, Xicheng District, 100044, Beijing, China

Outline

Ingegneria Sismica

Task-Consistent Bayesian Domain Inference via Performance Distributions for Deep Reinforcement Learning Policy Deployment

Abstract

Related Articles

Visual analysis of related hotspots affecting Diab...

Research on high-quality image super-resolution re...

Multi-scale Dual Transformer based Multi long-term...

Ultra-Short-Term Wind Power Forecasting Based on V...

Integration of Traditional Culture Elements and Co...

Open Access