Deep Reinforcement Learning for Uncertainty-Aware Dispatch Optimization in Power Systems

Yu, Shuo; Wang, Jingbo; Li,  Qiang; Yang, Rui; Tang, Hongyu

doi:10.65102/is2026804

Research article

Ingegneria Sismica

Volume 43 Issue 2
Pages: 1
-22

Deep Reinforcement Learning for Uncertainty-Aware Dispatch Optimization in Power Systems

Author(s): ^¹, ^¹, ^¹, ^², ^²

¹Inner Mongolia Power (Group) Co., Ltd., Saihan District, Hohhot 010020, Inner Mongolia, China

²Beijing Tsintergy Technology Co., Ltd., Haidian District, Beijing 100084, China

Published: 30/04/2026

Cite

Yu, Shuo . et al “Deep Reinforcement Learning for Uncertainty-Aware Dispatch Optimization in Power Systems.” Ingegneria Sismica Volume 43 Issue 2: 1-22, doi:10.65102/is2026804.

https://doi.org/10.65102/is2026804

Abstract

This paper proposes a deep reinforcement learning method for uncertainty aware scheduling to address the problem of power system scheduling decisions being susceptible to prediction bias, related disturbances, and extreme scenarios under high proportion wind and photovoltaic power integration conditions. Firstly, construct a scheduling environment that includes joint error characterization of wind power, photovoltaic power, and load, and explicitly embed multi-source related deviations into the state space. Secondly, design a Soft Actor Critic (SAC) scheduler that integrates risk sensitive rewards and safety action mapping layers to achieve coordinated optimization between operating costs, wind and solar power curtailment, carbon emissions, insufficient backup, and constraint violations. Based on publicly available time series data and combined with typical days, extreme disturbances, and sample scenarios outside the training set for validation. The results showed that the total operating cost of the proposed method was 52.47 × 10⁴ CNY/day, a decrease of 3.39% compared to the original SAC method and a decrease of 7.75% compared to Model Predictive Control (MPC). And the wind and solar abandonment rate is 3.79%, the constraint violation rate is only 0.21%, and the average single step solving time is 0.045 seconds. At the same time, this method shows better stability and generalization ability under high uncertainty, cross month testing, and extreme weather conditions. Research has shown that this method can provide intelligent decision support with engineering feasibility for online scheduling of new power systems.

Keywords
deep reinforcement learning; uncertainty-aware dispatch; risk-sensitive optimization; safety action mapping; power system operation optimization