This paper presents a multi-scale image estimation method based on wavelet transform to effectively remove motion features from multiple videos. The method uses an autoencoder with sparsity limit to adjust the input signal for compression, extracting effective features and learning optimal unique vectors. An improved convolutional neural network (CNN) is employed to recognize weak moving objects. Experiments demonstrate that the algorithm achieves high accuracy (up to 99.36%) without requiring large-scale learning samples, outperforming conventional algorithms. The system design includes the U-net network for efficient feature fusion and semantic segmentation, and a CNN for character recognition. The proposed algorithm is validated through simulation experiments, showing improved accuracy compared to existing methods. The study highlights the effectiveness of the proposed approach in recognizing weak moving objects in image sequences.This paper presents a multi-scale image estimation method based on wavelet transform to effectively remove motion features from multiple videos. The method uses an autoencoder with sparsity limit to adjust the input signal for compression, extracting effective features and learning optimal unique vectors. An improved convolutional neural network (CNN) is employed to recognize weak moving objects. Experiments demonstrate that the algorithm achieves high accuracy (up to 99.36%) without requiring large-scale learning samples, outperforming conventional algorithms. The system design includes the U-net network for efficient feature fusion and semantic segmentation, and a CNN for character recognition. The proposed algorithm is validated through simulation experiments, showing improved accuracy compared to existing methods. The study highlights the effectiveness of the proposed approach in recognizing weak moving objects in image sequences.