The Option-Critic Architecture introduces a new method for learning options in reinforcement learning, enabling the simultaneous learning of internal policies, termination conditions, and the policy over options without requiring additional rewards or subgoals. This approach is based on policy gradient theorems and allows for efficient learning in both discrete and continuous environments. The framework is flexible and efficient, capable of learning temporally extended behaviors directly from data. It does not require prior specification of subgoals or pseudo-rewards, making it suitable for a wide range of tasks. The method is demonstrated through experiments in various domains, including navigation, pinball, and the Arcade Learning Environment, where it outperforms traditional methods. The architecture is designed to work with both linear and non-linear function approximators and is capable of learning options end-to-end, achieving good performance in complex tasks. The paper also discusses related work and provides theoretical foundations for the approach, including proofs of the intra-option policy gradient theorem and the termination gradient theorem. The results show that the option-critic architecture is effective in learning options that can be used for planning and decision-making in complex environments.The Option-Critic Architecture introduces a new method for learning options in reinforcement learning, enabling the simultaneous learning of internal policies, termination conditions, and the policy over options without requiring additional rewards or subgoals. This approach is based on policy gradient theorems and allows for efficient learning in both discrete and continuous environments. The framework is flexible and efficient, capable of learning temporally extended behaviors directly from data. It does not require prior specification of subgoals or pseudo-rewards, making it suitable for a wide range of tasks. The method is demonstrated through experiments in various domains, including navigation, pinball, and the Arcade Learning Environment, where it outperforms traditional methods. The architecture is designed to work with both linear and non-linear function approximators and is capable of learning options end-to-end, achieving good performance in complex tasks. The paper also discusses related work and provides theoretical foundations for the approach, including proofs of the intra-option policy gradient theorem and the termination gradient theorem. The results show that the option-critic architecture is effective in learning options that can be used for planning and decision-making in complex environments.