23 May 2024 | Hector Kohler, Quentin Delfosse, Riad Akrou, Kristian Kersting, Philippe Preux
This paper introduces INTERPRETER, a method for distilling deep reinforcement learning (RL) policies into interpretable and editable tree programs. The goal is to improve the alignment between RL agents and human goals, as well as to increase trust in automated decision-making systems. INTERPRETER produces tree programs that can be interpreted and edited by humans, enabling the correction of goal misalignments and the explanation of complex strategies.
The method uses imitation learning to extract tree policies from neural oracles, such as deep Q-networks (DQN) and Proximal Policy Optimization (PPO) agents. It then converts these policies into Python programs that can be easily edited and understood by humans. The tree programs are built using oblique decision trees, which allow for more accurate and interpretable decisions compared to axis-aligned trees.
The paper evaluates INTERPRETER on a variety of RL tasks, including classic control, Atari games, and MuJoCo robot simulations. The results show that INTERPRETER's tree programs can match or exceed the performance of neural oracles, while being more interpretable and easier to edit. The method is also shown to be effective in real-world applications, such as explaining human strategies for soil fertilization.
The paper also discusses the limitations of current interpretable RL methods and suggests future research directions, including the use of more expressive tree programs and the integration of symbolic states. Overall, INTERPRETER provides a promising approach to making RL policies more interpretable and trustworthy, with potential applications in a wide range of domains.This paper introduces INTERPRETER, a method for distilling deep reinforcement learning (RL) policies into interpretable and editable tree programs. The goal is to improve the alignment between RL agents and human goals, as well as to increase trust in automated decision-making systems. INTERPRETER produces tree programs that can be interpreted and edited by humans, enabling the correction of goal misalignments and the explanation of complex strategies.
The method uses imitation learning to extract tree policies from neural oracles, such as deep Q-networks (DQN) and Proximal Policy Optimization (PPO) agents. It then converts these policies into Python programs that can be easily edited and understood by humans. The tree programs are built using oblique decision trees, which allow for more accurate and interpretable decisions compared to axis-aligned trees.
The paper evaluates INTERPRETER on a variety of RL tasks, including classic control, Atari games, and MuJoCo robot simulations. The results show that INTERPRETER's tree programs can match or exceed the performance of neural oracles, while being more interpretable and easier to edit. The method is also shown to be effective in real-world applications, such as explaining human strategies for soil fertilization.
The paper also discusses the limitations of current interpretable RL methods and suggests future research directions, including the use of more expressive tree programs and the integration of symbolic states. Overall, INTERPRETER provides a promising approach to making RL policies more interpretable and trustworthy, with potential applications in a wide range of domains.