The paper introduces Layer Collapse (LaCo), a novel layer-wise pruning method for large language models (LLMs) based on transformers. LaCo aims to reduce model size while preserving performance by collapsing rear model layers into a prior layer. The method involves merging the parameter differences between consecutive layers, which is shown to result in minimal performance loss. Experiments on various benchmarks demonstrate that LaCo maintains over 80% task performance at pruning ratios of 25-30%, outperforming existing structured pruning methods. Post-training experiments confirm that LaCo effectively inherits parameters from the original model, requiring minimal additional training to restore performance. The paper also discusses the motivation behind layer-wise similarity and evaluates the performance of pruned models across different pruning ratios. Overall, LaCo is a concise and efficient pruning method that preserves model structure and maintains excellent performance.The paper introduces Layer Collapse (LaCo), a novel layer-wise pruning method for large language models (LLMs) based on transformers. LaCo aims to reduce model size while preserving performance by collapsing rear model layers into a prior layer. The method involves merging the parameter differences between consecutive layers, which is shown to result in minimal performance loss. Experiments on various benchmarks demonstrate that LaCo maintains over 80% task performance at pruning ratios of 25-30%, outperforming existing structured pruning methods. Post-training experiments confirm that LaCo effectively inherits parameters from the original model, requiring minimal additional training to restore performance. The paper also discusses the motivation behind layer-wise similarity and evaluates the performance of pruned models across different pruning ratios. Overall, LaCo is a concise and efficient pruning method that preserves model structure and maintains excellent performance.