Model Compression Techniques in Biometrics Applications: A Survey

Model Compression Techniques in Biometrics Applications: A Survey

2024-01-18 | Eduarda Caldeira, Pedro C. Neto, Marco Huber, Naser Damer, Ana F. Sequeira
This paper presents a comprehensive survey of model compression techniques in biometrics applications, focusing on quantization, knowledge distillation (KD), and pruning. The authors analyze the advantages and disadvantages of these techniques, highlighting their potential for improving performance in resource-constrained environments. They also emphasize the importance of addressing model bias and fairness in compression research, as compression can inadvertently introduce bias, particularly in biometrics applications that rely on human data. Quantization reduces the precision of model parameters, enabling faster inference and lower memory usage. It can be applied to weights, activations, or both, with different strategies such as weight-only quantization (WOQ), activation-only quantization (AOW), and weight-activation quantization (WAQ). The choice of quantization strategy and granularity (group-wise, layer-wise, channel-wise) affects performance and accuracy. Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, with QAT generally providing better performance but requiring retraining. Knowledge distillation transfers knowledge from a complex teacher model to a simpler student model, enabling efficient inference. Different KD strategies, such as response-based KD (RB-KD) and feature-based KD (FB-KD), are discussed, with FB-KD being more effective in preserving model performance. The use of temperature parameters in the softmax function can help mitigate issues related to probability collapse in KD. Pruning removes redundant connections in neural networks, reducing model size and computational cost. It can be applied at different granularities, with layer-wise, channel-wise, and group-wise pruning strategies. The choice of pruning criterion, such as L1-norm, and the sparsity level determines the trade-off between model size and performance. Pruning can be followed by retraining to improve performance, though it may introduce irregularities in the model's architecture. The paper also discusses the impact of compression on model fairness, noting that compression can exacerbate biases in biometrics applications. The authors suggest that future research should focus on developing compression techniques that maintain fairness and reduce bias, particularly in scenarios involving sensitive human data. Overall, the survey highlights the importance of model compression in biometrics applications, where resource constraints and fairness considerations are critical.This paper presents a comprehensive survey of model compression techniques in biometrics applications, focusing on quantization, knowledge distillation (KD), and pruning. The authors analyze the advantages and disadvantages of these techniques, highlighting their potential for improving performance in resource-constrained environments. They also emphasize the importance of addressing model bias and fairness in compression research, as compression can inadvertently introduce bias, particularly in biometrics applications that rely on human data. Quantization reduces the precision of model parameters, enabling faster inference and lower memory usage. It can be applied to weights, activations, or both, with different strategies such as weight-only quantization (WOQ), activation-only quantization (AOW), and weight-activation quantization (WAQ). The choice of quantization strategy and granularity (group-wise, layer-wise, channel-wise) affects performance and accuracy. Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches, with QAT generally providing better performance but requiring retraining. Knowledge distillation transfers knowledge from a complex teacher model to a simpler student model, enabling efficient inference. Different KD strategies, such as response-based KD (RB-KD) and feature-based KD (FB-KD), are discussed, with FB-KD being more effective in preserving model performance. The use of temperature parameters in the softmax function can help mitigate issues related to probability collapse in KD. Pruning removes redundant connections in neural networks, reducing model size and computational cost. It can be applied at different granularities, with layer-wise, channel-wise, and group-wise pruning strategies. The choice of pruning criterion, such as L1-norm, and the sparsity level determines the trade-off between model size and performance. Pruning can be followed by retraining to improve performance, though it may introduce irregularities in the model's architecture. The paper also discusses the impact of compression on model fairness, noting that compression can exacerbate biases in biometrics applications. The authors suggest that future research should focus on developing compression techniques that maintain fairness and reduce bias, particularly in scenarios involving sensitive human data. Overall, the survey highlights the importance of model compression in biometrics applications, where resource constraints and fairness considerations are critical.
Reach us at info@study.space
Understanding Model Compression Techniques in Biometrics Applications%3A A Survey