6 February 2024 | Lahiru Gamage, Uditha Isuranga, Dulani Meedeniya, Senuri De Silva, Pratheepan Yogarajah
This paper presents a computational model for identifying melanoma skin cancer using deep learning techniques, specifically convolutional neural networks (CNNs) and vision transformers (ViT). The model utilizes the HAM10000 dataset, which contains dermoscopic and clinical images of melanoma and nevus lesions. Both CNN-based and ViT-based approaches are employed, with a focus on mask-guided techniques to enhance accuracy and explainability.
**Key Contributions:**
1. **Mask-Guided Techniques:** The model incorporates U2-Net for image segmentation to generate masks, which are used to focus the classification process on relevant regions.
2. ** Explainability:** Grad-CAM and Grad-CAM++ are applied to generate heatmaps, providing visual explanations of the classification decisions.
3. **Model Performance:** The CNN-based model achieved an accuracy of 98.37% with the Xception model, while the ViT-based model achieved 92.79% accuracy. Both models showed high sensitivity and specificity.
4. **Web Application:** The proposed model is developed as a web application, achieving a usability score of 86.87%, making it a practical tool for medical practitioners.
**Methodology:**
- **U2-Net for Segmentation:** A deep nested U-structure model for salient object detection (SOD) is used to generate segmentation masks.
- **CNN-Based Classification:** Pre-trained models (Xception, ResNet50, VGG16, InceptionV3, MobileNet) are used with transfer learning and modified architectures to improve performance.
- **ViT-Based Classification:** The SM-ViT model, which integrates saliency information into the self-attention mechanism, is proposed to enhance foreground object discrimination.
- **Explainability:** Grad-CAM and Grad-CAM++ are applied to generate heatmaps, providing insights into the model's decision-making process.
**Results:**
- **CNN-Based Models:** The modified Xception model achieved the highest accuracy (98.37%) and specificity (99.01%), while the modified ResNet50 model had the highest sensitivity (97.95%).
- **ViT-Based Models:** The SM-ViT model outperformed the baseline ViT model with an accuracy of 92.79%, sensitivity of 91.09%, and specificity of 93.54%.
**Conclusion:**
The proposed model, combining mask-guided techniques and explainable AI, demonstrates high accuracy and interpretability in melanoma skin cancer identification, making it a valuable tool for medical practitioners.This paper presents a computational model for identifying melanoma skin cancer using deep learning techniques, specifically convolutional neural networks (CNNs) and vision transformers (ViT). The model utilizes the HAM10000 dataset, which contains dermoscopic and clinical images of melanoma and nevus lesions. Both CNN-based and ViT-based approaches are employed, with a focus on mask-guided techniques to enhance accuracy and explainability.
**Key Contributions:**
1. **Mask-Guided Techniques:** The model incorporates U2-Net for image segmentation to generate masks, which are used to focus the classification process on relevant regions.
2. ** Explainability:** Grad-CAM and Grad-CAM++ are applied to generate heatmaps, providing visual explanations of the classification decisions.
3. **Model Performance:** The CNN-based model achieved an accuracy of 98.37% with the Xception model, while the ViT-based model achieved 92.79% accuracy. Both models showed high sensitivity and specificity.
4. **Web Application:** The proposed model is developed as a web application, achieving a usability score of 86.87%, making it a practical tool for medical practitioners.
**Methodology:**
- **U2-Net for Segmentation:** A deep nested U-structure model for salient object detection (SOD) is used to generate segmentation masks.
- **CNN-Based Classification:** Pre-trained models (Xception, ResNet50, VGG16, InceptionV3, MobileNet) are used with transfer learning and modified architectures to improve performance.
- **ViT-Based Classification:** The SM-ViT model, which integrates saliency information into the self-attention mechanism, is proposed to enhance foreground object discrimination.
- **Explainability:** Grad-CAM and Grad-CAM++ are applied to generate heatmaps, providing insights into the model's decision-making process.
**Results:**
- **CNN-Based Models:** The modified Xception model achieved the highest accuracy (98.37%) and specificity (99.01%), while the modified ResNet50 model had the highest sensitivity (97.95%).
- **ViT-Based Models:** The SM-ViT model outperformed the baseline ViT model with an accuracy of 92.79%, sensitivity of 91.09%, and specificity of 93.54%.
**Conclusion:**
The proposed model, combining mask-guided techniques and explainable AI, demonstrates high accuracy and interpretability in melanoma skin cancer identification, making it a valuable tool for medical practitioners.