AffineQuant is a post-training quantization (PTQ) method that uses equivalent affine transformations to minimize quantization errors in large language models (LLMs). Unlike existing PTQ methods that limit optimization to scaling transformations between pre- and post-quantization weights, AffineQuant directly optimizes equivalent affine transformations, expanding the optimization scope and significantly reducing quantization errors. This approach ensures equivalence between pre- and post-quantization outputs, maintaining efficiency and generalization. To ensure invertibility during optimization, a gradual mask method is introduced, aligning with the Levy-Desplanques theorem, which theoretically guarantees invertibility of the transformation. AffineQuant achieves significant performance improvements across various LLMs and datasets, particularly under low-bit quantization, enabling deployment on edge devices. For example, on the LLaMA2-7B model with W4A4 quantization, AffineQuant achieves a C4 perplexity of 15.76, outperforming OmniQuant by 2.26. On LLaMA-30B with 4/4-bit quantization, AffineQuant achieves 58.61% accuracy on six zero-shot tasks, surpassing OmniQuant by 1.98%. AffineQuant also demonstrates superior inference efficiency and performance in zero-shot tasks. The method is consistent with other methods after matrix merging and achieves state-of-the-art performance in LLM quantization, particularly for small-scale models or low-bit configurations. The contributions include proposing a novel affine transform in PTQ, a novel optimization algorithm ensuring invertibility, and achieving state-of-the-art performance in LLM quantization. The method is supported by extensive experiments and comparisons with existing PTQ methods, demonstrating its effectiveness in reducing quantization errors and improving model performance.AffineQuant is a post-training quantization (PTQ) method that uses equivalent affine transformations to minimize quantization errors in large language models (LLMs). Unlike existing PTQ methods that limit optimization to scaling transformations between pre- and post-quantization weights, AffineQuant directly optimizes equivalent affine transformations, expanding the optimization scope and significantly reducing quantization errors. This approach ensures equivalence between pre- and post-quantization outputs, maintaining efficiency and generalization. To ensure invertibility during optimization, a gradual mask method is introduced, aligning with the Levy-Desplanques theorem, which theoretically guarantees invertibility of the transformation. AffineQuant achieves significant performance improvements across various LLMs and datasets, particularly under low-bit quantization, enabling deployment on edge devices. For example, on the LLaMA2-7B model with W4A4 quantization, AffineQuant achieves a C4 perplexity of 15.76, outperforming OmniQuant by 2.26. On LLaMA-30B with 4/4-bit quantization, AffineQuant achieves 58.61% accuracy on six zero-shot tasks, surpassing OmniQuant by 1.98%. AffineQuant also demonstrates superior inference efficiency and performance in zero-shot tasks. The method is consistent with other methods after matrix merging and achieves state-of-the-art performance in LLM quantization, particularly for small-scale models or low-bit configurations. The contributions include proposing a novel affine transform in PTQ, a novel optimization algorithm ensuring invertibility, and achieving state-of-the-art performance in LLM quantization. The method is supported by extensive experiments and comparisons with existing PTQ methods, demonstrating its effectiveness in reducing quantization errors and improving model performance.