6 Jul 2021 | Jeya Maria Jose Valanarasu, Poojan Oza, Ilker Hacihaliloglu, Vishal M. Patel
The paper introduces a medical image segmentation method called Medical Transformer (MedT), which uses a gated axial-attention mechanism to improve performance on medical images. Traditional convolutional neural networks (CNNs) are effective for medical image segmentation but struggle with long-range dependencies in images. To address this, the authors propose a gated axial-attention model that enhances the self-attention mechanism by introducing a control mechanism. Additionally, they propose a Local-Global (LoGo) training strategy that operates on both the entire image and patches to learn global and local features, respectively. The MedT model is evaluated on three medical image segmentation datasets and outperforms both CNNs and other transformer-based architectures. The model uses a gated axial-attention layer as its basic building block and applies the LoGo training strategy. The gated axial-attention mechanism allows the model to control the influence of positional bias in the encoding of non-local context, making it effective even on smaller datasets. The Local-Global training strategy improves segmentation performance by focusing on both global and local details. The proposed method achieves better performance than existing methods on three different medical image segmentation datasets.The paper introduces a medical image segmentation method called Medical Transformer (MedT), which uses a gated axial-attention mechanism to improve performance on medical images. Traditional convolutional neural networks (CNNs) are effective for medical image segmentation but struggle with long-range dependencies in images. To address this, the authors propose a gated axial-attention model that enhances the self-attention mechanism by introducing a control mechanism. Additionally, they propose a Local-Global (LoGo) training strategy that operates on both the entire image and patches to learn global and local features, respectively. The MedT model is evaluated on three medical image segmentation datasets and outperforms both CNNs and other transformer-based architectures. The model uses a gated axial-attention layer as its basic building block and applies the LoGo training strategy. The gated axial-attention mechanism allows the model to control the influence of positional bias in the encoding of non-local context, making it effective even on smaller datasets. The Local-Global training strategy improves segmentation performance by focusing on both global and local details. The proposed method achieves better performance than existing methods on three different medical image segmentation datasets.