This paper presents INSIGHT, a neuro-symbolic reinforcement learning framework that jointly learns structured states and symbolic policies, and generates textual explanations for policies and decisions. The framework addresses the limitations of previous methods by distilling vision foundation models into an efficient perception module and refining it during policy learning. It also designs a pipeline to prompt GPT-4 to generate natural language explanations for policies and decisions, reducing the cognitive load on users to understand symbolic policies. The key idea is to use object coordinates as structured state representations and learn symbolic policies from visual input. The perception module is pre-trained on frame-symbol datasets to extract object coordinates and refine them with reward signals. The symbolic policies are learned using an EQL network with sparsity regularization, and a neural guidance scheme is introduced to improve policy learning. The framework is evaluated on nine Atari tasks and a MetaDrive task, where it outperforms existing NS-RL approaches. The results show that the improvement can be attributed to the refined coordinate prediction for policy-relevant objects. Additionally, the framework generates textual explanations for policies and decisions, which are friendly to non-expert users and reveal some patterns in the agent's decision-making process. The paper also discusses the limitations of the framework, including the inability to express certain logical operations and the need for further quantitative evaluation of explanations. Overall, the framework demonstrates the effectiveness of neuro-symbolic reinforcement learning in achieving explainable decision-making.This paper presents INSIGHT, a neuro-symbolic reinforcement learning framework that jointly learns structured states and symbolic policies, and generates textual explanations for policies and decisions. The framework addresses the limitations of previous methods by distilling vision foundation models into an efficient perception module and refining it during policy learning. It also designs a pipeline to prompt GPT-4 to generate natural language explanations for policies and decisions, reducing the cognitive load on users to understand symbolic policies. The key idea is to use object coordinates as structured state representations and learn symbolic policies from visual input. The perception module is pre-trained on frame-symbol datasets to extract object coordinates and refine them with reward signals. The symbolic policies are learned using an EQL network with sparsity regularization, and a neural guidance scheme is introduced to improve policy learning. The framework is evaluated on nine Atari tasks and a MetaDrive task, where it outperforms existing NS-RL approaches. The results show that the improvement can be attributed to the refined coordinate prediction for policy-relevant objects. Additionally, the framework generates textual explanations for policies and decisions, which are friendly to non-expert users and reveal some patterns in the agent's decision-making process. The paper also discusses the limitations of the framework, including the inability to express certain logical operations and the need for further quantitative evaluation of explanations. Overall, the framework demonstrates the effectiveness of neuro-symbolic reinforcement learning in achieving explainable decision-making.