This paper introduces a parameter-sharing off-policy multi-agent path planning and following approach for unmanned aerial vehicles (UAVs). Unlike traditional grid-based maps, the proposed approach uses laser scan data as input, providing a more accurate simulation of real-world applications. The UAVs use the soft actor-critic (SAC) algorithm to train their policies, enabling end-to-end processing of laser scan data to avoid obstacles and reach the goal. The planner incorporates paths generated by a sampling-based method as following points, which are continuously updated as the UAVs progress. Multi-UAV path planning tasks are facilitated, and policy convergence is accelerated through shared experiences among agents. A reward function is designed to encourage UAV movement, addressing the issue of UAVs remaining stationary initially and being overly cautious near the goal. The effectiveness of the approach is validated through simulations, achieving an 80% success rate in ensuring that three UAVs reach their goal points. The main contributions include establishing a multi-UAV simulation environment, proposing a parameter-sharing off-policy multi-agent path planning and following approach, and designing a reward function to encourage UAV movement.This paper introduces a parameter-sharing off-policy multi-agent path planning and following approach for unmanned aerial vehicles (UAVs). Unlike traditional grid-based maps, the proposed approach uses laser scan data as input, providing a more accurate simulation of real-world applications. The UAVs use the soft actor-critic (SAC) algorithm to train their policies, enabling end-to-end processing of laser scan data to avoid obstacles and reach the goal. The planner incorporates paths generated by a sampling-based method as following points, which are continuously updated as the UAVs progress. Multi-UAV path planning tasks are facilitated, and policy convergence is accelerated through shared experiences among agents. A reward function is designed to encourage UAV movement, addressing the issue of UAVs remaining stationary initially and being overly cautious near the goal. The effectiveness of the approach is validated through simulations, achieving an 80% success rate in ensuring that three UAVs reach their goal points. The main contributions include establishing a multi-UAV simulation environment, proposing a parameter-sharing off-policy multi-agent path planning and following approach, and designing a reward function to encourage UAV movement.