Reinforcement learning (RL) is a powerful mechanism for teaching artificial intelligence agents to perform complex tasks, including those in astronomy. This paper provides an overview of modern deep RL and its applications in astronomy. RL has made significant progress in various fields, including game playing, chess, Go, and robotics. In astronomy, RL is applied to telescope automation, observation scheduling, and radio astronomical data processing. The core concepts of RL include the agent, environment, state, action, and reward. RL is based on Markov decision processes, where the agent learns to maximize cumulative rewards over time. The Q-function, value function, and policy are key components of RL. Deep RL algorithms, such as Q-learning, actor-critic methods, and model-based RL, are used to handle high-dimensional and continuous problems. Model-based RL uses internal models to generate data without interacting with the environment, which is particularly useful in astronomy where data collection is expensive. The paper discusses various RL algorithms, including deep deterministic policy gradient (DDPG), twin delayed DDPG (TD3), and soft actor critic (SAC), and their applications in astronomy. Additionally, the paper introduces hint-assisted RL, which incorporates existing knowledge into the training of RL agents. The practical aspects of applying RL in astronomy include the use of deep learning frameworks and the challenges of data collection and model accuracy. Overall, RL offers a promising approach for automating and improving various aspects of astronomical research.Reinforcement learning (RL) is a powerful mechanism for teaching artificial intelligence agents to perform complex tasks, including those in astronomy. This paper provides an overview of modern deep RL and its applications in astronomy. RL has made significant progress in various fields, including game playing, chess, Go, and robotics. In astronomy, RL is applied to telescope automation, observation scheduling, and radio astronomical data processing. The core concepts of RL include the agent, environment, state, action, and reward. RL is based on Markov decision processes, where the agent learns to maximize cumulative rewards over time. The Q-function, value function, and policy are key components of RL. Deep RL algorithms, such as Q-learning, actor-critic methods, and model-based RL, are used to handle high-dimensional and continuous problems. Model-based RL uses internal models to generate data without interacting with the environment, which is particularly useful in astronomy where data collection is expensive. The paper discusses various RL algorithms, including deep deterministic policy gradient (DDPG), twin delayed DDPG (TD3), and soft actor critic (SAC), and their applications in astronomy. Additionally, the paper introduces hint-assisted RL, which incorporates existing knowledge into the training of RL agents. The practical aspects of applying RL in astronomy include the use of deep learning frameworks and the challenges of data collection and model accuracy. Overall, RL offers a promising approach for automating and improving various aspects of astronomical research.