DuQuant is an innovative quantization strategy for large language models (LLMs) that effectively addresses the challenge of outlier activations. The method employs rotation and permutation transformations to more effectively eliminate both normal and massive outliers. DuQuant first constructs rotation matrices based on specific outlier dimensions, redistributing these outliers across adjacent channels within different rotation blocks. It then applies a zigzag permutation to ensure a balanced distribution of outliers among blocks, minimizing block-wise variance. An additional rotation further enhances the smoothness of the activation landscape, thereby improving model performance. DuQuant streamlines the quantization process and demonstrates superior outlier management, achieving top-tier results in multiple tasks with various LLM architectures even under 4-bit weight-activation quantization. The method is effective in reducing the impact of both normal and massive outliers, leading to significant improvements in model performance. DuQuant outperforms existing 4-bit weight-activation quantization baselines across various benchmarks, achieving a 5% improvement in Commonsense QA tasks across all LLaMA model sizes and a 10% increase in zero-shot MMLU benchmarks for the Vicuna-v1.5-13B. In practical applications with the LLaMA2-7B model, DuQuant not only accelerates prefilling phase by up to 2.08× but also reduces memory usage by 3.20×, with minimal impact on performance. These results highlight the effectiveness of DuQuant in enhancing both the efficiency and capacity of quantized LLMs. The method is also efficient in terms of runtime, surpassing other baselines. DuQuant's rotation and permutation transformations effectively mitigate the impact of outliers, leading to superior performance in quantization tasks.DuQuant is an innovative quantization strategy for large language models (LLMs) that effectively addresses the challenge of outlier activations. The method employs rotation and permutation transformations to more effectively eliminate both normal and massive outliers. DuQuant first constructs rotation matrices based on specific outlier dimensions, redistributing these outliers across adjacent channels within different rotation blocks. It then applies a zigzag permutation to ensure a balanced distribution of outliers among blocks, minimizing block-wise variance. An additional rotation further enhances the smoothness of the activation landscape, thereby improving model performance. DuQuant streamlines the quantization process and demonstrates superior outlier management, achieving top-tier results in multiple tasks with various LLM architectures even under 4-bit weight-activation quantization. The method is effective in reducing the impact of both normal and massive outliers, leading to significant improvements in model performance. DuQuant outperforms existing 4-bit weight-activation quantization baselines across various benchmarks, achieving a 5% improvement in Commonsense QA tasks across all LLaMA model sizes and a 10% increase in zero-shot MMLU benchmarks for the Vicuna-v1.5-13B. In practical applications with the LLaMA2-7B model, DuQuant not only accelerates prefilling phase by up to 2.08× but also reduces memory usage by 3.20×, with minimal impact on performance. These results highlight the effectiveness of DuQuant in enhancing both the efficiency and capacity of quantized LLMs. The method is also efficient in terms of runtime, surpassing other baselines. DuQuant's rotation and permutation transformations effectively mitigate the impact of outliers, leading to superior performance in quantization tasks.