[slides and audio] Xception%3A Deep Learning with Depthwise Separable Convolutions

The paper "Xception: Deep Learning with Depthwise Separable Convolutions" by François Chollet introduces a novel deep convolutional neural network architecture called Xception. The authors propose that Inception modules in convolutional neural networks can be seen as an intermediate step between regular convolutions and depthwise separable convolutions. Depthwise separable convolutions, which consist of a depthwise convolution followed by a pointwise convolution, are argued to be a more efficient way to learn cross-channel and spatial correlations. Xception replaces Inception modules with depthwise separable convolutions, resulting in a simpler and potentially more efficient architecture. The paper demonstrates that Xception slightly outperforms Inception V3 on the ImageNet dataset and significantly outperforms it on a larger image classification dataset with 350 million images and 17,000 classes. The performance gains are attributed to a more efficient use of model parameters rather than increased capacity. Xception is also shown to be easier to define and modify compared to complex Inception architectures, making it a practical choice for large-scale image classification tasks. The authors also explore the impact of residual connections and the inclusion of non-linearities in depthwise separable convolutions, finding that residual connections are essential for convergence and that the absence of non-linearities can lead to faster convergence and better performance. The paper concludes by suggesting that intermediate points on the spectrum between regular convolutions and depthwise separable convolutions may hold further advantages, leaving this as a future direction for investigation.The paper "Xception: Deep Learning with Depthwise Separable Convolutions" by François Chollet introduces a novel deep convolutional neural network architecture called Xception. The authors propose that Inception modules in convolutional neural networks can be seen as an intermediate step between regular convolutions and depthwise separable convolutions. Depthwise separable convolutions, which consist of a depthwise convolution followed by a pointwise convolution, are argued to be a more efficient way to learn cross-channel and spatial correlations. Xception replaces Inception modules with depthwise separable convolutions, resulting in a simpler and potentially more efficient architecture. The paper demonstrates that Xception slightly outperforms Inception V3 on the ImageNet dataset and significantly outperforms it on a larger image classification dataset with 350 million images and 17,000 classes. The performance gains are attributed to a more efficient use of model parameters rather than increased capacity. Xception is also shown to be easier to define and modify compared to complex Inception architectures, making it a practical choice for large-scale image classification tasks. The authors also explore the impact of residual connections and the inclusion of non-linearities in depthwise separable convolutions, finding that residual connections are essential for convergence and that the absence of non-linearities can lead to faster convergence and better performance. The paper concludes by suggesting that intermediate points on the spectrum between regular convolutions and depthwise separable convolutions may hold further advantages, leaving this as a future direction for investigation.

Xception: Deep Learning with Depthwise Separable Convolutions

4 Apr 2017 | François Chollet