This paper presents SliM-LLM, a salience-driven mixed-precision quantization method for large language models (LLMs), designed to enhance performance with low-bit weights in a deployment-friendly manner. The method leverages the salience distribution of LLM weights to determine optimal bit-widths and quantizers for accurate quantization, while aligning bit-width partition to quantization groups for compact memory usage and fast integer computation on hardware. SliM-LLM introduces two novel techniques: (1) Salience-Determined Bit Allocation (SBA), which allocates bit-widths based on the salience distribution of weights to minimize weight output relative entropy and maintain inference efficiency; and (2) Salience-Weighted Quantizer Calibration (SQC), which optimizes quantizer parameters by considering element-wise salience within groups to balance the maintenance of salient information and minimization of errors. Comprehensive experiments show that SliM-LLM significantly improves the accuracy of various LLMs at ultra-low 2-3 bits, achieving substantial memory savings and reduced perplexity compared to existing methods. SliM-LLM $^{+}$, which integrates gradient-based quantizers, further reduces perplexity. The method is efficient, does not require fine-tuning, and achieves high performance on GPUs. The code is available at https://github.com/Aaronhuang-778/SliM-LLM.This paper presents SliM-LLM, a salience-driven mixed-precision quantization method for large language models (LLMs), designed to enhance performance with low-bit weights in a deployment-friendly manner. The method leverages the salience distribution of LLM weights to determine optimal bit-widths and quantizers for accurate quantization, while aligning bit-width partition to quantization groups for compact memory usage and fast integer computation on hardware. SliM-LLM introduces two novel techniques: (1) Salience-Determined Bit Allocation (SBA), which allocates bit-widths based on the salience distribution of weights to minimize weight output relative entropy and maintain inference efficiency; and (2) Salience-Weighted Quantizer Calibration (SQC), which optimizes quantizer parameters by considering element-wise salience within groups to balance the maintenance of salient information and minimization of errors. Comprehensive experiments show that SliM-LLM significantly improves the accuracy of various LLMs at ultra-low 2-3 bits, achieving substantial memory savings and reduced perplexity compared to existing methods. SliM-LLM $^{+}$, which integrates gradient-based quantizers, further reduces perplexity. The method is efficient, does not require fine-tuning, and achieves high performance on GPUs. The code is available at https://github.com/Aaronhuang-778/SliM-LLM.