14 Mar 2024 | Nicholas Zolman, Urban Fasel, J. Nathan Kutz, and Steven L. Brunton
SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning
This paper introduces SINDy-RL, a framework that combines sparse dictionary learning (SINDy) with deep reinforcement learning (DRL) to create efficient, interpretable, and trustworthy models for environment dynamics, reward functions, and control policies. SINDy-RL achieves comparable performance to state-of-the-art DRL algorithms using significantly fewer interactions in the environment and results in an interpretable control policy that is orders of magnitude smaller than a deep neural network policy.
SINDy is a sparse dictionary learning method that learns a representation of a function as a sparse linear combination of pre-chosen candidate dictionary functions. It has been widely used to discover dynamics in fluid mechanics, including reduced-order models and turbulence closures. SINDy has also been extended to systems with actuation and control and used for designing model predictive control laws.
SINDy-RL uses sparse dictionary learning to create surrogate environments for training DRL policies. It first collects offline data from the full-order environment, fits an ensemble of SINDy models to approximate the environment's dynamics, and then trains a policy in the surrogate environment using model-free DRL. In systems where the reward is difficult to measure directly, SINDy-RL learns an ensemble of sparse dictionary models to form a surrogate reward function. After training a DRL policy, SINDy-RL uses an ensemble of dictionary models to learn a lightweight, symbolic policy that can be readily transferred to an embedded system.
The paper evaluates SINDy-RL on benchmark environments for continuous control systems from mechanical systems using the dm_control and OpenAI-gymnasium suites as well as fluid systems from the HydroGym suite. The results show that SINDy-RL can improve sample efficiency by orders of magnitude for training a control policy by leveraging surrogate experience in an E-SINDy model environment. It can also learn a surrogate reward when the reward is not directly measurable from observations and reduce the complexity of a neural network policy by learning a sparse, symbolic surrogate policy with comparable performance, smoother control, and improved consistency. Additionally, SINDy-RL can quantify the uncertainty of models and provide insight into the quality of the learned models.
The paper demonstrates that SINDy-RL is highly sample efficient, achieving comparable performance to state-of-the-art DRL algorithms using significantly fewer interactions in the environment. It is particularly effective in environments where the dynamics are complex and the reward function is difficult to measure directly. SINDy-RL is also efficient in terms of computational resources, as it uses lightweight, sparse dictionary models that are fast to train on limited data and provide an interpretable symbolic representation by construction. The paper also shows that SINDy-RL can be used to accelerate DRL training by leveraging a single SINDy model to accelerate DRL, and has demonstrated its use on simple DRL benchmarks.SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning
This paper introduces SINDy-RL, a framework that combines sparse dictionary learning (SINDy) with deep reinforcement learning (DRL) to create efficient, interpretable, and trustworthy models for environment dynamics, reward functions, and control policies. SINDy-RL achieves comparable performance to state-of-the-art DRL algorithms using significantly fewer interactions in the environment and results in an interpretable control policy that is orders of magnitude smaller than a deep neural network policy.
SINDy is a sparse dictionary learning method that learns a representation of a function as a sparse linear combination of pre-chosen candidate dictionary functions. It has been widely used to discover dynamics in fluid mechanics, including reduced-order models and turbulence closures. SINDy has also been extended to systems with actuation and control and used for designing model predictive control laws.
SINDy-RL uses sparse dictionary learning to create surrogate environments for training DRL policies. It first collects offline data from the full-order environment, fits an ensemble of SINDy models to approximate the environment's dynamics, and then trains a policy in the surrogate environment using model-free DRL. In systems where the reward is difficult to measure directly, SINDy-RL learns an ensemble of sparse dictionary models to form a surrogate reward function. After training a DRL policy, SINDy-RL uses an ensemble of dictionary models to learn a lightweight, symbolic policy that can be readily transferred to an embedded system.
The paper evaluates SINDy-RL on benchmark environments for continuous control systems from mechanical systems using the dm_control and OpenAI-gymnasium suites as well as fluid systems from the HydroGym suite. The results show that SINDy-RL can improve sample efficiency by orders of magnitude for training a control policy by leveraging surrogate experience in an E-SINDy model environment. It can also learn a surrogate reward when the reward is not directly measurable from observations and reduce the complexity of a neural network policy by learning a sparse, symbolic surrogate policy with comparable performance, smoother control, and improved consistency. Additionally, SINDy-RL can quantify the uncertainty of models and provide insight into the quality of the learned models.
The paper demonstrates that SINDy-RL is highly sample efficient, achieving comparable performance to state-of-the-art DRL algorithms using significantly fewer interactions in the environment. It is particularly effective in environments where the dynamics are complex and the reward function is difficult to measure directly. SINDy-RL is also efficient in terms of computational resources, as it uses lightweight, sparse dictionary models that are fast to train on limited data and provide an interpretable symbolic representation by construction. The paper also shows that SINDy-RL can be used to accelerate DRL training by leveraging a single SINDy model to accelerate DRL, and has demonstrated its use on simple DRL benchmarks.