The paper introduces the *option-critic* architecture, a novel approach to learning options in reinforcement learning. Options are a framework for representing and learning temporally extended actions, which are crucial for scaling up planning and learning. The authors derive policy gradient theorems for options and propose an architecture that can learn both the internal policies and termination conditions of options simultaneously, without requiring additional rewards or subgoals. This approach is flexible and efficient, as demonstrated through experiments in both discrete and continuous environments. The method learns meaningful temporally extended behaviors effectively, only needing to specify the number of desired options. The paper also discusses related work and limitations, highlighting the potential for further improvements in function approximation and option initiation sets.The paper introduces the *option-critic* architecture, a novel approach to learning options in reinforcement learning. Options are a framework for representing and learning temporally extended actions, which are crucial for scaling up planning and learning. The authors derive policy gradient theorems for options and propose an architecture that can learn both the internal policies and termination conditions of options simultaneously, without requiring additional rewards or subgoals. This approach is flexible and efficient, as demonstrated through experiments in both discrete and continuous environments. The method learns meaningful temporally extended behaviors effectively, only needing to specify the number of desired options. The paper also discusses related work and limitations, highlighting the potential for further improvements in function approximation and option initiation sets.