This paper proposes the Gathering Cascaded Dilated DETR (GCD-DETR) model for multi-scale fusion uncrewed aerial vehicle (UAV) detection. The main innovations include: (1) The Dilated Re-param Block is applied to the dilatation-wise Residual module, combining large kernel convolution with parallel small kernel convolution to fuse multi-scale feature maps, enhancing feature extraction. (2) The Gather-and-Distribute mechanism improves multi-scale feature fusion, enabling the model to effectively utilize backbone network features. (3) The Cascaded Group Attention mechanism reduces computational cost and improves attention diversity, enhancing complex scene processing. Experiments on two UAV datasets show that the improved RT-DETR model achieves 0.956 and 0.978 accuracy, 2% and 1.1% higher than the original model, with a 10 FPS improvement. The model balances accuracy and speed effectively. UAV detection is crucial for safety and efficiency, but traditional methods face challenges like limited detection range and environmental interference. Image-based detection, using advanced algorithms, offers adaptability and accuracy. Recent works like DETR, RT-DETR, and attention mechanisms have improved detection, but computational efficiency remains a challenge. The GCD-DETR model addresses these issues through multi-scale feature extraction, attention mechanisms, and efficient fusion. It outperforms existing models in accuracy and efficiency, demonstrating strong performance in UAV detection.This paper proposes the Gathering Cascaded Dilated DETR (GCD-DETR) model for multi-scale fusion uncrewed aerial vehicle (UAV) detection. The main innovations include: (1) The Dilated Re-param Block is applied to the dilatation-wise Residual module, combining large kernel convolution with parallel small kernel convolution to fuse multi-scale feature maps, enhancing feature extraction. (2) The Gather-and-Distribute mechanism improves multi-scale feature fusion, enabling the model to effectively utilize backbone network features. (3) The Cascaded Group Attention mechanism reduces computational cost and improves attention diversity, enhancing complex scene processing. Experiments on two UAV datasets show that the improved RT-DETR model achieves 0.956 and 0.978 accuracy, 2% and 1.1% higher than the original model, with a 10 FPS improvement. The model balances accuracy and speed effectively. UAV detection is crucial for safety and efficiency, but traditional methods face challenges like limited detection range and environmental interference. Image-based detection, using advanced algorithms, offers adaptability and accuracy. Recent works like DETR, RT-DETR, and attention mechanisms have improved detection, but computational efficiency remains a challenge. The GCD-DETR model addresses these issues through multi-scale feature extraction, attention mechanisms, and efficient fusion. It outperforms existing models in accuracy and efficiency, demonstrating strong performance in UAV detection.