LYT-NET: LIGHTWEIGHT YUV TRANSFORMER-BASED NETWORK FOR LOW-LIGHT IMAGE ENHANCEMENT

LYT-NET: LIGHTWEIGHT YUV TRANSFORMER-BASED NETWORK FOR LOW-LIGHT IMAGE ENHANCEMENT

April 4, 2024 | Alexandru Brateanu, Raul Balmez, Adrian Avram, Ciprian Orhei
LYT-Net is a lightweight YUV transformer-based network designed for low-light image enhancement. This paper introduces a novel approach that leverages the YUV color space to separate luminance (Y) and chrominance (U and V), simplifying the task of disentangling light and color information. By utilizing the strengths of transformers, which are capable of capturing long-range dependencies, LYT-Net ensures a comprehensive contextual understanding of the image while maintaining reduced model complexity. The proposed method employs a novel hybrid loss function, achieving state-of-the-art results on low-light image enhancement datasets while being significantly more compact than its counterparts. Low-light image enhancement (LLIE) is a challenging task in computer vision, aiming to improve visibility and contrast while restoring distortions inherent in dark environments. LLIE plays a crucial role in various CV tasks, including feature extraction and content-based recognition. Existing solutions fall into two categories: those that directly map low-light images to their normal-light equivalents and those inspired by Retinex theory, which use multi-stage training pipelines. However, these methods often lack theoretical interpretability or are complex and require multiple training stages. LYT-Net proposes a novel transformer-based approach characterized by its low-weight design, achieving state-of-the-art results in LLIE while maintaining computational efficiency. The model utilizes the YUV color space, which is particularly advantageous for LLIE due to its distinct separation of luminance and chrominance. By focusing on the Y channel, which is more sensitive to changes in luminance, LYT-Net can enhance image visibility and detail without adversely affecting color information. The main contributions of this work include: (1) LYT-Net, a lightweight model that employs the YUV color space to target enhancements, using a multi-headed self-attention scheme on the denoised luminance and chrominance layers; (2) a hybrid loss function that plays a critical role in the efficient training of the model and significantly contributes to its enhancement capabilities; and (3) strong performance of LYT-Net compared to state-of-the-art methods on LOL datasets. The proposed method consists of several detachable blocks, including the Multi-headed Self-Attention (MHSA) Block, Multi-stage Squeeze & Excite Fusion (MSEF) Block, and Channel-wise Denoiser (CWD) Block. The model processes an input image in RGB format, converting it into YUV, and enhances each channel using a series of convolutional layers, pooling operations, and the MHSA mechanism. The luminance channel Y is enhanced by the MHSA block, while the chrominance channels U and V are processed through a CWD block to reduce noise while preserving details. Enhanced chrominance channels are then recombined and processed through the MSEF block, ultimately leading to the production of a high-quality enhanced image. The model's performance is evaluated on the LOL dataset, demonstrating strong results compared to state-of-the-art methods. The hybrid loss functionLYT-Net is a lightweight YUV transformer-based network designed for low-light image enhancement. This paper introduces a novel approach that leverages the YUV color space to separate luminance (Y) and chrominance (U and V), simplifying the task of disentangling light and color information. By utilizing the strengths of transformers, which are capable of capturing long-range dependencies, LYT-Net ensures a comprehensive contextual understanding of the image while maintaining reduced model complexity. The proposed method employs a novel hybrid loss function, achieving state-of-the-art results on low-light image enhancement datasets while being significantly more compact than its counterparts. Low-light image enhancement (LLIE) is a challenging task in computer vision, aiming to improve visibility and contrast while restoring distortions inherent in dark environments. LLIE plays a crucial role in various CV tasks, including feature extraction and content-based recognition. Existing solutions fall into two categories: those that directly map low-light images to their normal-light equivalents and those inspired by Retinex theory, which use multi-stage training pipelines. However, these methods often lack theoretical interpretability or are complex and require multiple training stages. LYT-Net proposes a novel transformer-based approach characterized by its low-weight design, achieving state-of-the-art results in LLIE while maintaining computational efficiency. The model utilizes the YUV color space, which is particularly advantageous for LLIE due to its distinct separation of luminance and chrominance. By focusing on the Y channel, which is more sensitive to changes in luminance, LYT-Net can enhance image visibility and detail without adversely affecting color information. The main contributions of this work include: (1) LYT-Net, a lightweight model that employs the YUV color space to target enhancements, using a multi-headed self-attention scheme on the denoised luminance and chrominance layers; (2) a hybrid loss function that plays a critical role in the efficient training of the model and significantly contributes to its enhancement capabilities; and (3) strong performance of LYT-Net compared to state-of-the-art methods on LOL datasets. The proposed method consists of several detachable blocks, including the Multi-headed Self-Attention (MHSA) Block, Multi-stage Squeeze & Excite Fusion (MSEF) Block, and Channel-wise Denoiser (CWD) Block. The model processes an input image in RGB format, converting it into YUV, and enhances each channel using a series of convolutional layers, pooling operations, and the MHSA mechanism. The luminance channel Y is enhanced by the MHSA block, while the chrominance channels U and V are processed through a CWD block to reduce noise while preserving details. Enhanced chrominance channels are then recombined and processed through the MSEF block, ultimately leading to the production of a high-quality enhanced image. The model's performance is evaluated on the LOL dataset, demonstrating strong results compared to state-of-the-art methods. The hybrid loss function
Reach us at info@study.space
[slides and audio] LYT-Net%3A Lightweight YUV Transformer-based Network for Low-Light Image Enhancement