Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

2024 | Junnyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li
This paper investigates the trustworthiness of compressed Large Language Models (LLMs) under compression, focusing on how compression techniques affect various trust dimensions such as stereotype, toxicity, privacy, fairness, ethics, and robustness. The study evaluates three leading LLMs using five state-of-the-art compression techniques across eight trustworthiness dimensions. The results show that quantization is more effective than pruning in maintaining both efficiency and trustworthiness. For example, a 4-bit quantized model retains the trustworthiness of its original counterpart, while pruning significantly degrades trustworthiness, even at 50% sparsity. Quantization within a moderate bit range can improve certain trustworthiness dimensions like ethics and fairness, but extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. These findings highlight the need for comprehensive trustworthiness evaluation in practice. The study also reveals that 4-bit quantization provides a good balance between efficiency and trustworthiness, with minimal loss across all dimensions. However, 3-bit quantization leads to significant drops in trustworthiness in multiple dimensions. The paper provides practical recommendations for achieving high utility, efficiency, and trustworthiness in LLMs. It emphasizes the importance of evaluating trustworthiness in compressed models and highlights the potential risks of extreme compression. The study concludes that quantization is a more reliable method for achieving efficient and trustworthy LLMs compared to pruning. The findings suggest that while compression can enhance efficiency, it also introduces risks that must be carefully managed. The paper advocates for a comprehensive evaluation of trustworthiness in compressed models to ensure their safe and effective deployment.This paper investigates the trustworthiness of compressed Large Language Models (LLMs) under compression, focusing on how compression techniques affect various trust dimensions such as stereotype, toxicity, privacy, fairness, ethics, and robustness. The study evaluates three leading LLMs using five state-of-the-art compression techniques across eight trustworthiness dimensions. The results show that quantization is more effective than pruning in maintaining both efficiency and trustworthiness. For example, a 4-bit quantized model retains the trustworthiness of its original counterpart, while pruning significantly degrades trustworthiness, even at 50% sparsity. Quantization within a moderate bit range can improve certain trustworthiness dimensions like ethics and fairness, but extreme quantization to very low bit levels (3 bits) tends to reduce trustworthiness significantly. These findings highlight the need for comprehensive trustworthiness evaluation in practice. The study also reveals that 4-bit quantization provides a good balance between efficiency and trustworthiness, with minimal loss across all dimensions. However, 3-bit quantization leads to significant drops in trustworthiness in multiple dimensions. The paper provides practical recommendations for achieving high utility, efficiency, and trustworthiness in LLMs. It emphasizes the importance of evaluating trustworthiness in compressed models and highlights the potential risks of extreme compression. The study concludes that quantization is a more reliable method for achieving efficient and trustworthy LLMs compared to pruning. The findings suggest that while compression can enhance efficiency, it also introduces risks that must be carefully managed. The paper advocates for a comprehensive evaluation of trustworthiness in compressed models to ensure their safe and effective deployment.
Reach us at info@study.space
Understanding Decoding Compressed Trust%3A Scrutinizing the Trustworthiness of Efficient LLMs Under Compression