6 Nov 2018 | Jonathan Le Roux1, Scott Wisdom2, Hakan Erdogan3, John R. Hershey2
The paper "SDR – HALF-BAKED OR WELL DONE?" by Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, and John R. Hershey discusses the issues with the Signal-to-Distortion Ratio (SDR) metric used in the BSS_eval toolkit for evaluating speech enhancement and source separation algorithms. The authors argue that the original SDR implementation has been improperly used, leading to misleading results, especially in single-channel separation. They propose a modified version called Scale-Invariant SDR (SI-SDR), which is simpler and more robust. The paper highlights critical failures of the original SDR, such as allowing significant modifications to the reference signal and not accounting for scaling errors. The authors provide examples demonstrating how SI-SDR overcomes these issues, including cases where SDR yields high scores despite significant signal degradation. They also compare SI-SDR and SDR on a speech separation task, showing a difference of around 0.5 dB in performance. The paper concludes by advocating for the use of SI-SDR as a more reliable metric for evaluating single-channel separation algorithms.The paper "SDR – HALF-BAKED OR WELL DONE?" by Jonathan Le Roux, Scott Wisdom, Hakan Erdogan, and John R. Hershey discusses the issues with the Signal-to-Distortion Ratio (SDR) metric used in the BSS_eval toolkit for evaluating speech enhancement and source separation algorithms. The authors argue that the original SDR implementation has been improperly used, leading to misleading results, especially in single-channel separation. They propose a modified version called Scale-Invariant SDR (SI-SDR), which is simpler and more robust. The paper highlights critical failures of the original SDR, such as allowing significant modifications to the reference signal and not accounting for scaling errors. The authors provide examples demonstrating how SI-SDR overcomes these issues, including cases where SDR yields high scores despite significant signal degradation. They also compare SI-SDR and SDR on a speech separation task, showing a difference of around 0.5 dB in performance. The paper concludes by advocating for the use of SI-SDR as a more reliable metric for evaluating single-channel separation algorithms.