Residual-based large language models (LLMs) have shown significant potential as efficient encoders for biomedical imaging tasks, where traditional language data is scarce. This study introduces a novel approach using a frozen transformer block from pre-trained LLMs as an encoder layer for visual data processing, enabling effective performance boosts across various biomedical imaging applications, including 2D and 3D classification tasks. The method involves integrating a frozen transformer block from an LLM into the visual encoder, followed by trainable linear layers for dimension alignment and residual connections for efficient feature transfer. This framework achieves state-of-the-art results on extensive datasets such as MedMNIST-2D and 3D, demonstrating its effectiveness in biomedical imaging.
The proposed method is distinct from traditional vision-language frameworks by operating independently of language components, offering flexibility without pre-training, and utilizing modular transformer blocks. It enhances performance across multiple biomedical imaging tasks, including 2D and 3D classification, with consistent improvements observed across various datasets and models. The approach is validated through extensive experiments on diverse datasets, showing significant performance gains in metrics such as accuracy (ACC) and area under the ROC curve (AUC). The method also demonstrates robustness in 3D classification tasks, outperforming existing state-of-the-art results in several cases.
The study highlights the versatility of LLMs in biomedical imaging, suggesting their potential beyond language processing. The framework's simplicity and effectiveness make it a promising solution for enhancing medical image analysis. The results underscore the value of LLMs in improving biomedical visual tasks and open new avenues for further exploration in this field. The method's ability to adapt to different imaging modalities and its efficiency in processing visual data make it a valuable tool for advancing biomedical imaging technologies.Residual-based large language models (LLMs) have shown significant potential as efficient encoders for biomedical imaging tasks, where traditional language data is scarce. This study introduces a novel approach using a frozen transformer block from pre-trained LLMs as an encoder layer for visual data processing, enabling effective performance boosts across various biomedical imaging applications, including 2D and 3D classification tasks. The method involves integrating a frozen transformer block from an LLM into the visual encoder, followed by trainable linear layers for dimension alignment and residual connections for efficient feature transfer. This framework achieves state-of-the-art results on extensive datasets such as MedMNIST-2D and 3D, demonstrating its effectiveness in biomedical imaging.
The proposed method is distinct from traditional vision-language frameworks by operating independently of language components, offering flexibility without pre-training, and utilizing modular transformer blocks. It enhances performance across multiple biomedical imaging tasks, including 2D and 3D classification, with consistent improvements observed across various datasets and models. The approach is validated through extensive experiments on diverse datasets, showing significant performance gains in metrics such as accuracy (ACC) and area under the ROC curve (AUC). The method also demonstrates robustness in 3D classification tasks, outperforming existing state-of-the-art results in several cases.
The study highlights the versatility of LLMs in biomedical imaging, suggesting their potential beyond language processing. The framework's simplicity and effectiveness make it a promising solution for enhancing medical image analysis. The results underscore the value of LLMs in improving biomedical visual tasks and open new avenues for further exploration in this field. The method's ability to adapt to different imaging modalities and its efficiency in processing visual data make it a valuable tool for advancing biomedical imaging technologies.