[slides and audio] Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

The paper "Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning" introduces INTERPRETER, a method that distills deep reinforcement learning (RL) policies into interpretable and editable tree programs. The authors address the issue of goal misalignment in deep RL agents, which can lead to poor performance and lack of trust in real-world applications. INTERPRETER uses a fast distillation approach to extract compact tree programs that match oracles (i.e., optimal policies) for various RL tasks. The key contributions of the paper include: 1. **Introduction of INTERPRETER**: A method that extracts interpretable and editable tree programs from neural oracles in a few minutes. 2. **Ablation Studies**: Identifies the importance of different components of INTERPRETER on policy performance. 3. **Interpretability and Modifiability**: Demonstrates that INTERPRETER's tree programs can be interpreted and edited by human experts. The paper evaluates INTERPRETER on a variety of benchmarks, including classic control tasks, Atari games, and MuJoCo robot simulations. Key findings include: - INTERPRETER's tree programs can match oracles in terms of performance, even on complex tasks like Atari games. - The use of oblique decision trees (trees that test linear combinations of features) is crucial for matching oracle performances. - Removing idle state features significantly reduces the number of features and improves performance. - INTERPRETER's tree programs are more interpretable and modifiable compared to other methods, as shown in a user study. - INTERPRETER's tree programs can be easily edited to correct misalignments, as demonstrated on Atari games and real-world applications like soil fertilization. The paper also discusses limitations and future work, including the need for more expressive tree programs, handling complex state spaces, and further evaluation of interpretability. Overall, INTERPRETER provides a promising approach to making deep RL policies more interpretable and editable, enhancing trust and alignment in automated decision-making tasks.The paper "Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning" introduces INTERPRETER, a method that distills deep reinforcement learning (RL) policies into interpretable and editable tree programs. The authors address the issue of goal misalignment in deep RL agents, which can lead to poor performance and lack of trust in real-world applications. INTERPRETER uses a fast distillation approach to extract compact tree programs that match oracles (i.e., optimal policies) for various RL tasks. The key contributions of the paper include: 1. **Introduction of INTERPRETER**: A method that extracts interpretable and editable tree programs from neural oracles in a few minutes. 2. **Ablation Studies**: Identifies the importance of different components of INTERPRETER on policy performance. 3. **Interpretability and Modifiability**: Demonstrates that INTERPRETER's tree programs can be interpreted and edited by human experts. The paper evaluates INTERPRETER on a variety of benchmarks, including classic control tasks, Atari games, and MuJoCo robot simulations. Key findings include: - INTERPRETER's tree programs can match oracles in terms of performance, even on complex tasks like Atari games. - The use of oblique decision trees (trees that test linear combinations of features) is crucial for matching oracle performances. - Removing idle state features significantly reduces the number of features and improves performance. - INTERPRETER's tree programs are more interpretable and modifiable compared to other methods, as shown in a user study. - INTERPRETER's tree programs can be easily edited to correct misalignments, as demonstrated on Atari games and real-world applications like soil fertilization. The paper also discusses limitations and future work, including the need for more expressive tree programs, handling complex state spaces, and further evaluation of interpretability. Overall, INTERPRETER provides a promising approach to making deep RL policies more interpretable and editable, enhancing trust and alignment in automated decision-making tasks.

Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

23 May 2024 | Hector Kohler, Quentin Delfosse, Riad Akrou, Kristian Kersting, Philippe Preux