January 2024 | Gaochen Cui, Student Member, IEEE, Qing-Shan Jia, Senior Member, IEEE, and Xiaohong Guan, Life Fellow, IEEE
This paper addresses the challenge of coordinating multiple microgrids (MGs) in a distribution network using real-time pricing. The distribution system operator (DSO) sets reference price sequences to incentivize MGs to optimize their generation and charging plans. Due to privacy concerns, MGs may not provide detailed response behaviors, making conventional model-based methods challenging to implement. The paper proposes a bi-level framework where the DSO sets real-time reference prices, and the MGs make generation and charging plans based on these prices. A model-free reinforcement learning (RL) algorithm is applied to optimize the pricing policy when the MGs' response behaviors are unknown. To handle the large action space, a reference policy is incorporated into the RL algorithm to improve training efficiency. Numerical results show that the proposed model-free RL algorithm achieves costs close to those of conventional model-based methods while preserving MG privacy. The algorithm is also effective for MGs with quadratic cost functions and (dis)charging losses.This paper addresses the challenge of coordinating multiple microgrids (MGs) in a distribution network using real-time pricing. The distribution system operator (DSO) sets reference price sequences to incentivize MGs to optimize their generation and charging plans. Due to privacy concerns, MGs may not provide detailed response behaviors, making conventional model-based methods challenging to implement. The paper proposes a bi-level framework where the DSO sets real-time reference prices, and the MGs make generation and charging plans based on these prices. A model-free reinforcement learning (RL) algorithm is applied to optimize the pricing policy when the MGs' response behaviors are unknown. To handle the large action space, a reference policy is incorporated into the RL algorithm to improve training efficiency. Numerical results show that the proposed model-free RL algorithm achieves costs close to those of conventional model-based methods while preserving MG privacy. The algorithm is also effective for MGs with quadratic cost functions and (dis)charging losses.