2021-10-09 | Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R. Roth, Daguang Xu
UNETR is a novel transformer-based architecture for 3D medical image segmentation. It reformulates the task as a sequence-to-sequence prediction problem, using a transformer encoder to learn global and local features from the input volume. The encoder is connected to a CNN-based decoder via skip connections at different resolutions to predict the final segmentation output. The model was validated on the BTCV and MSD datasets, achieving new state-of-the-art performance. UNETR outperforms existing methods in multi-organ segmentation and brain tumor and spleen segmentation tasks. The model uses a transformer encoder to capture long-range dependencies and global context, while a CNN-based decoder is used for fine-grained segmentation. The model was implemented in PyTorch and MONAI, and trained on a NVIDIA DGX-1 server. It achieved high accuracy in segmentation tasks, with Dice scores exceeding 85% for most organs. The model's performance was evaluated on multiple datasets, with results showing significant improvements over existing methods. UNETR's architecture allows for efficient learning of long-range dependencies and global context, making it effective for 3D medical image segmentation. The model's performance was validated on multiple datasets, demonstrating its effectiveness in medical image segmentation.UNETR is a novel transformer-based architecture for 3D medical image segmentation. It reformulates the task as a sequence-to-sequence prediction problem, using a transformer encoder to learn global and local features from the input volume. The encoder is connected to a CNN-based decoder via skip connections at different resolutions to predict the final segmentation output. The model was validated on the BTCV and MSD datasets, achieving new state-of-the-art performance. UNETR outperforms existing methods in multi-organ segmentation and brain tumor and spleen segmentation tasks. The model uses a transformer encoder to capture long-range dependencies and global context, while a CNN-based decoder is used for fine-grained segmentation. The model was implemented in PyTorch and MONAI, and trained on a NVIDIA DGX-1 server. It achieved high accuracy in segmentation tasks, with Dice scores exceeding 85% for most organs. The model's performance was evaluated on multiple datasets, with results showing significant improvements over existing methods. UNETR's architecture allows for efficient learning of long-range dependencies and global context, making it effective for 3D medical image segmentation. The model's performance was validated on multiple datasets, demonstrating its effectiveness in medical image segmentation.