Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

5 Mar 2024 | Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng
The paper introduces a novel semi-supervised learning framework for Audio-Visual Source Localization (AVSL), named Dual Mean-Teacher (DMT). DMT addresses the challenges of inaccurate localization, blurry boundaries, and false positives in existing AVSL methods by employing a dual teacher-student structure. The framework aims to maximize the utilization of both labeled and unlabeled data, improving the quality of pseudo-labels generated through noise filtering and intersection of pseudo-labels (IPL). DMT is evaluated on two large-scale datasets, Flickr-SoundNet and VGG-Sound Source, achieving significant improvements over state-of-the-art methods, with CLoU scores of 90.4% and 49.8%. The framework also demonstrates strong generalization capabilities in complex and open environments, outperforming other semi-supervised AVSL methods when integrated with them. The code for DMT is available at https://github.com/gyx-gloria/DMT.The paper introduces a novel semi-supervised learning framework for Audio-Visual Source Localization (AVSL), named Dual Mean-Teacher (DMT). DMT addresses the challenges of inaccurate localization, blurry boundaries, and false positives in existing AVSL methods by employing a dual teacher-student structure. The framework aims to maximize the utilization of both labeled and unlabeled data, improving the quality of pseudo-labels generated through noise filtering and intersection of pseudo-labels (IPL). DMT is evaluated on two large-scale datasets, Flickr-SoundNet and VGG-Sound Source, achieving significant improvements over state-of-the-art methods, with CLoU scores of 90.4% and 49.8%. The framework also demonstrates strong generalization capabilities in complex and open environments, outperforming other semi-supervised AVSL methods when integrated with them. The code for DMT is available at https://github.com/gyx-gloria/DMT.
Reach us at info@study.space