IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency
This paper proposes a simple yet effective input-level backdoor detection method, IBD-PSC, to identify and filter malicious testing images. The method is based on the phenomenon of parameter-oriented scaling consistency (PSC), where the prediction confidences of poisoned samples are more consistent than those of benign ones when the parameters of batch normalization (BN) layers are scaled. The method involves amplifying the parameters of different BN layers in the original model to obtain multiple parameter-amplified models. For each suspicious image, the PSC value is calculated based on the predictions of these models. A larger PSC value indicates a higher likelihood that the image is poisoned.
The IBD-PSC method is designed to detect and prevent malicious inputs and can serve as a firewall for deployed models. It is less resource-intensive compared to other backdoor defense strategies and is therefore our main focus. The method is effective against a wide range of backdoor attacks, including poison-only, training-controlled, and model-controlled attacks. It is also resistant to adaptive attacks, which are specifically designed to evade defenses.
The method is evaluated on benchmark datasets, including CIFAR-10, GTSRB, and a subset of ImageNet. The results show that IBD-PSC achieves high performance in terms of detection accuracy and efficiency. It outperforms existing backdoor defense methods, including STRIP, TeCo, and SCALE-UP, in terms of detection accuracy and efficiency. The method is also effective in detecting poisoned samples in a compromised training set.
The IBD-PSC method is based on the observation that the prediction confidences of poisoned samples are more consistent than those of benign ones when the parameters of selected BN layers are scaled. This phenomenon is supported by theoretical analysis and empirical studies. The method is designed to dynamically select the number of BN layers for amplification based on the performance of the model on benign samples. The method is effective in detecting poisoned samples regardless of whether the benign sample originates from the target class.
The IBD-PSC method is also effective in detecting poisoned samples in a compromised training set. The method is evaluated on the CIFAR-10 dataset against three representative attacks and shows high performance in terms of detection accuracy and efficiency. The results demonstrate that the method is effective in detecting poisoned samples and is resistant to adaptive attacks. The method is also effective in distinguishing between benign and poisoned samples, regardless of whether the benign sample originates from the target class.IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency
This paper proposes a simple yet effective input-level backdoor detection method, IBD-PSC, to identify and filter malicious testing images. The method is based on the phenomenon of parameter-oriented scaling consistency (PSC), where the prediction confidences of poisoned samples are more consistent than those of benign ones when the parameters of batch normalization (BN) layers are scaled. The method involves amplifying the parameters of different BN layers in the original model to obtain multiple parameter-amplified models. For each suspicious image, the PSC value is calculated based on the predictions of these models. A larger PSC value indicates a higher likelihood that the image is poisoned.
The IBD-PSC method is designed to detect and prevent malicious inputs and can serve as a firewall for deployed models. It is less resource-intensive compared to other backdoor defense strategies and is therefore our main focus. The method is effective against a wide range of backdoor attacks, including poison-only, training-controlled, and model-controlled attacks. It is also resistant to adaptive attacks, which are specifically designed to evade defenses.
The method is evaluated on benchmark datasets, including CIFAR-10, GTSRB, and a subset of ImageNet. The results show that IBD-PSC achieves high performance in terms of detection accuracy and efficiency. It outperforms existing backdoor defense methods, including STRIP, TeCo, and SCALE-UP, in terms of detection accuracy and efficiency. The method is also effective in detecting poisoned samples in a compromised training set.
The IBD-PSC method is based on the observation that the prediction confidences of poisoned samples are more consistent than those of benign ones when the parameters of selected BN layers are scaled. This phenomenon is supported by theoretical analysis and empirical studies. The method is designed to dynamically select the number of BN layers for amplification based on the performance of the model on benign samples. The method is effective in detecting poisoned samples regardless of whether the benign sample originates from the target class.
The IBD-PSC method is also effective in detecting poisoned samples in a compromised training set. The method is evaluated on the CIFAR-10 dataset against three representative attacks and shows high performance in terms of detection accuracy and efficiency. The results demonstrate that the method is effective in detecting poisoned samples and is resistant to adaptive attacks. The method is also effective in distinguishing between benign and poisoned samples, regardless of whether the benign sample originates from the target class.