LaSagnA: Language-based Segmentation Assistant for Complex Queries

LaSagnA: Language-based Segmentation Assistant for Complex Queries

12 Apr 2024 | Cong Wei1,2*, Haoxian Tan2*, Yujie Zhong2†, Yujiu Yang1† and Lin Ma2
The paper "LaSagnA: Language-based Segmentation Assistant for Complex Queries" addresses the limitations of Large Language Models for Vision (vLLMs) in handling complex queries, particularly in generating detailed perceptual outcomes like bounding boxes and masks. The main issues identified are the inability to handle multiple targets per query and the failure to identify the absence of query objects in the image. To address these challenges, the authors propose a general sequence format for complex queries and incorporate a semantic segmentation task into the training pipeline. They introduce three novel strategies—sequence augmentation, random classes list, and target order consistency—to effectively handle the integration of the proposed format. The effectiveness of the model is validated through experiments on both closed-set and open-set semantic segmentation datasets, demonstrating comparable performance to conventional methods and outperforming several vLLMs in reasoning and referring segmentation tasks. The code for LaSagnA is released at <https://github.com/congvec/LaSagnA>.The paper "LaSagnA: Language-based Segmentation Assistant for Complex Queries" addresses the limitations of Large Language Models for Vision (vLLMs) in handling complex queries, particularly in generating detailed perceptual outcomes like bounding boxes and masks. The main issues identified are the inability to handle multiple targets per query and the failure to identify the absence of query objects in the image. To address these challenges, the authors propose a general sequence format for complex queries and incorporate a semantic segmentation task into the training pipeline. They introduce three novel strategies—sequence augmentation, random classes list, and target order consistency—to effectively handle the integration of the proposed format. The effectiveness of the model is validated through experiments on both closed-set and open-set semantic segmentation datasets, demonstrating comparable performance to conventional methods and outperforming several vLLMs in reasoning and referring segmentation tasks. The code for LaSagnA is released at <https://github.com/congvec/LaSagnA>.
Reach us at info@study.space
Understanding LaSagnA%3A Language-based Segmentation Assistant for Complex Queries