Skip to main content
Power Systems Computation Conference 2024

Full Program »

Real-Time Power Scheduling Through Reinforcement Learning From Demonstrations

Real-time decision-making in power system scheduling is imperative in response to the increasing integration of renewable energy. This paper proposes a novel framework leveraging Reinforcement Learning from Demonstration (RLfD) to address complex unit commitment (UC) and optimal power flow (OPF) challenges, called GridZero-Imitation (GZ-I). Unlike traditional RL approaches that require complex reward function designs and have limited performance insurance, our method employs intuitive rewards and expert demonstrations to regularize the RL training. The demonstrations can be collected from asynchronous reanalysis of an expert solver, enabling RL to synergize with expert knowledge. Specifically, we conduct a decoupled training approach, employing two separate policy networks, RL and expert. During the Monte Carlo Tree Search (MCTS) process, action candidates from the expert policy foster a guided search mechanism, which is especially helpful in the early training stage. This framework alleviates the speed bottleneck typical of physics-based solvers in online decision-making, and also significantly enhances control performance and convergence speed of RL scheduling agents, as validated by substantial improvements in a 126-node real provincial test case.

Shaohuai Liu
Tsinghua University
China

Jinbo Liu
State Grid Corporation of China
China

Nan Yang
China Electric Power Research Institute
China

Yupeng Huang
China Electric Power Research Institute
China

Qirong Jiang
Tsinghua University
China

Yang Gao
Tsinghua University
China

 


Powered by OpenConf®
Copyright ©2002-2024 Zakon Group LLC