21 Jun 2016 | Özgün Çiçek1,2, Ahmed Abdulkadir1,4, Soeren S. Lienkamp2,3, Thomas Brox1,2, and Olaf Ronneberger1,2,5
This paper introduces a 3D U-Net network for volumetric segmentation that learns from sparse annotations. The network extends the 2D U-Net architecture by replacing 2D operations with their 3D counterparts. It is trained end-to-end from scratch without pre-trained networks. The network can be used in two modes: semi-automated, where a user annotates some slices and the network generates a dense 3D segmentation, and fully-automated, where the network is trained on a sparsely annotated dataset and can segment new volumetric images. The network uses batch normalization for faster convergence and applies on-the-fly elastic deformations for data augmentation. It is tested on a complex 3D structure, the Xenopus kidney, and achieves good results in both use cases.
The network architecture consists of an encoder and decoder with four resolution steps. The encoder uses 3D convolutions, max pooling, and ReLU activation, while the decoder uses up-convolution and two 3D convolutions. Shortcut connections help transfer high-resolution features from the encoder to the decoder. The final layer uses a 1×1×1 convolution to produce the number of output channels equal to the number of labels. The network has 19,069,955 parameters and is trained on down-sampled data.
The dataset consists of three Xenopus kidney embryos at Nieuwkoop-Faber stage 36-37. The data is recorded in four tiles with three channels. The network is trained on manually annotated slices and uses a weighted softmax loss function to focus on important regions. The network is tested on semi-automated and fully-automated setups, showing improved performance compared to 2D implementations. The results show that the network can generalize from few annotated slices and achieve high accuracy in 3D segmentation. The implementation is provided as open-source.This paper introduces a 3D U-Net network for volumetric segmentation that learns from sparse annotations. The network extends the 2D U-Net architecture by replacing 2D operations with their 3D counterparts. It is trained end-to-end from scratch without pre-trained networks. The network can be used in two modes: semi-automated, where a user annotates some slices and the network generates a dense 3D segmentation, and fully-automated, where the network is trained on a sparsely annotated dataset and can segment new volumetric images. The network uses batch normalization for faster convergence and applies on-the-fly elastic deformations for data augmentation. It is tested on a complex 3D structure, the Xenopus kidney, and achieves good results in both use cases.
The network architecture consists of an encoder and decoder with four resolution steps. The encoder uses 3D convolutions, max pooling, and ReLU activation, while the decoder uses up-convolution and two 3D convolutions. Shortcut connections help transfer high-resolution features from the encoder to the decoder. The final layer uses a 1×1×1 convolution to produce the number of output channels equal to the number of labels. The network has 19,069,955 parameters and is trained on down-sampled data.
The dataset consists of three Xenopus kidney embryos at Nieuwkoop-Faber stage 36-37. The data is recorded in four tiles with three channels. The network is trained on manually annotated slices and uses a weighted softmax loss function to focus on important regions. The network is tested on semi-automated and fully-automated setups, showing improved performance compared to 2D implementations. The results show that the network can generalize from few annotated slices and achieve high accuracy in 3D segmentation. The implementation is provided as open-source.