1 Feb 2024 | Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, Yong Li
The paper "UV-SAM: Adapting Segment Anything Model for Urban Village Identification" by Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, and Yong Li introduces a novel framework called UV-SAM, which leverages the Segment Anything Model (SAM) to identify urban villages from satellite images. Urban villages, characterized by inadequate infrastructure and poor living conditions, are closely linked to Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditional methods for monitoring these areas are time-consuming and labor-intensive, leading to the development of computer vision techniques. However, existing studies often focus on simple image classification or fail to provide accurate boundary information.
UV-SAM addresses these limitations by adapting SAM to urban village segmentation. The framework first uses a small-sized semantic segmentation model, such as SegFormer, to generate mixed prompts (mask, bounding box, and image representations) for urban villages. These prompts are then fed into SAM for fine-grained boundary identification. Extensive experiments on datasets from Beijing and Xi'an, China, demonstrate that UV-SAM outperforms existing baselines and provides valuable insights into the evolving trends of urban villages, including their spatial distribution and area changes over time. The study highlights the potential of vision foundation models in sustainable urban planning and governance.The paper "UV-SAM: Adapting Segment Anything Model for Urban Village Identification" by Xin Zhang, Yu Liu, Yuming Lin, Qingmin Liao, and Yong Li introduces a novel framework called UV-SAM, which leverages the Segment Anything Model (SAM) to identify urban villages from satellite images. Urban villages, characterized by inadequate infrastructure and poor living conditions, are closely linked to Sustainable Development Goals (SDGs) on poverty, adequate housing, and sustainable cities. Traditional methods for monitoring these areas are time-consuming and labor-intensive, leading to the development of computer vision techniques. However, existing studies often focus on simple image classification or fail to provide accurate boundary information.
UV-SAM addresses these limitations by adapting SAM to urban village segmentation. The framework first uses a small-sized semantic segmentation model, such as SegFormer, to generate mixed prompts (mask, bounding box, and image representations) for urban villages. These prompts are then fed into SAM for fine-grained boundary identification. Extensive experiments on datasets from Beijing and Xi'an, China, demonstrate that UV-SAM outperforms existing baselines and provides valuable insights into the evolving trends of urban villages, including their spatial distribution and area changes over time. The study highlights the potential of vision foundation models in sustainable urban planning and governance.