[slides and audio] The One Hundred Layers Tiramisu%3A Fully Convolutional DenseNets for Semantic Segmentation

The paper "The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation" by Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, and Yoshua Bengio introduces a novel architecture called Fully Convolutional DenseNet (FC-DenseNet) for semantic image segmentation. The authors extend the Densely Connected Convolutional Networks (DenseNets) to handle the task of semantic segmentation, achieving state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech without the need for additional post-processing or pretraining. The key contributions include: 1. **Architecture Extension**: FC-DenseNet is built from dense blocks, which are iterative concatenations of previous feature maps, and an upsampling path that avoids feature map explosion. 2. **Performance**: The proposed architecture outperforms existing methods on challenging datasets, showing significant improvements in Intersection over Union (IoU) and global accuracy. 3. **Parameter Efficiency**: FC-DenseNet has significantly fewer parameters compared to other state-of-the-art models, making it more efficient in terms of computational resources. 4. **Deep Supervision**: The architecture naturally induces deep supervision through short paths to all feature maps, enhancing model performance and optimization. The paper also discusses the related work in semantic segmentation, including improvements in upsampling paths, context understanding, and structured output generation. Experimental results on the CamVid and Gatech datasets demonstrate the effectiveness of FC-DenseNet, highlighting its ability to capture fine-grained details and handle temporal information in video data.The paper "The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation" by Simon Jégou, Michal Drozdzal, David Vazquez, Adriana Romero, and Yoshua Bengio introduces a novel architecture called Fully Convolutional DenseNet (FC-DenseNet) for semantic image segmentation. The authors extend the Densely Connected Convolutional Networks (DenseNets) to handle the task of semantic segmentation, achieving state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech without the need for additional post-processing or pretraining. The key contributions include: 1. **Architecture Extension**: FC-DenseNet is built from dense blocks, which are iterative concatenations of previous feature maps, and an upsampling path that avoids feature map explosion. 2. **Performance**: The proposed architecture outperforms existing methods on challenging datasets, showing significant improvements in Intersection over Union (IoU) and global accuracy. 3. **Parameter Efficiency**: FC-DenseNet has significantly fewer parameters compared to other state-of-the-art models, making it more efficient in terms of computational resources. 4. **Deep Supervision**: The architecture naturally induces deep supervision through short paths to all feature maps, enhancing model performance and optimization. The paper also discusses the related work in semantic segmentation, including improvements in upsampling paths, context understanding, and structured output generation. Experimental results on the CamVid and Gatech datasets demonstrate the effectiveness of FC-DenseNet, highlighting its ability to capture fine-grained details and handle temporal information in video data.

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

31 Oct 2017 | Simon Jégou1 Michal Drozdzal2,3 David Vazquez1,4 Adriana Romero1 Yoshua Bengio1