This paper introduces a deep learning-based approach to solve high-dimensional parabolic partial differential equations (PDEs), which are known to be challenging due to the "curse of dimensionality." The method reformulates the PDEs as backward stochastic differential equations (BSDEs) and approximates the gradient of the solution using neural networks. This approach is similar to deep reinforcement learning, where the BSDE serves as the model and the gradient as the policy function. Numerical results on examples such as the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation demonstrate the effectiveness of the algorithm in terms of both accuracy and computational cost. The method opens up new possibilities in economics, finance, operational research, and physics, allowing for the simultaneous consideration of multiple interacting agents, assets, resources, or particles. The paper also discusses the implementation details and the impact of the number of hidden layers on the accuracy of the method.This paper introduces a deep learning-based approach to solve high-dimensional parabolic partial differential equations (PDEs), which are known to be challenging due to the "curse of dimensionality." The method reformulates the PDEs as backward stochastic differential equations (BSDEs) and approximates the gradient of the solution using neural networks. This approach is similar to deep reinforcement learning, where the BSDE serves as the model and the gradient as the policy function. Numerical results on examples such as the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation demonstrate the effectiveness of the algorithm in terms of both accuracy and computational cost. The method opens up new possibilities in economics, finance, operational research, and physics, allowing for the simultaneous consideration of multiple interacting agents, assets, resources, or particles. The paper also discusses the implementation details and the impact of the number of hidden layers on the accuracy of the method.