This paper introduces deep networks with stochastic depth, a training method that enables the training of short networks while achieving the performance of deep networks during testing. The approach randomly drops layers during training, bypassing them with identity functions, which reduces training time and improves test error. The method is applied to residual networks, allowing for deeper architectures (up to 1200 layers) while maintaining effective performance. Stochastic depth complements existing techniques like dropout and batch normalization, acting as a regularizer and improving model performance. Experiments show that stochastic depth significantly reduces training time and test error on various datasets, including CIFAR-10, CIFAR-100, SVHN, and ImageNet. It also enables the use of extremely deep networks (e.g., 1202 layers) without overfitting, achieving a test error of 4.91% on CIFAR-10. The method is effective and stable, with results showing that stochastic depth can be used with different hyperparameters, including a linear decay rule for survival probabilities. The approach is simple, practical, and has the potential to become a valuable tool in deep learning for training very deep models.This paper introduces deep networks with stochastic depth, a training method that enables the training of short networks while achieving the performance of deep networks during testing. The approach randomly drops layers during training, bypassing them with identity functions, which reduces training time and improves test error. The method is applied to residual networks, allowing for deeper architectures (up to 1200 layers) while maintaining effective performance. Stochastic depth complements existing techniques like dropout and batch normalization, acting as a regularizer and improving model performance. Experiments show that stochastic depth significantly reduces training time and test error on various datasets, including CIFAR-10, CIFAR-100, SVHN, and ImageNet. It also enables the use of extremely deep networks (e.g., 1202 layers) without overfitting, achieving a test error of 4.91% on CIFAR-10. The method is effective and stable, with results showing that stochastic depth can be used with different hyperparameters, including a linear decay rule for survival probabilities. The approach is simple, practical, and has the potential to become a valuable tool in deep learning for training very deep models.