LEMUR: LOG PARSING WITH ENTROPY SAMPLING AND CHAIN-OF-THOUGHT MERGING

LEMUR: LOG PARSING WITH ENTROPY SAMPLING AND CHAIN-OF-THOUGHT MERGING

2025 | Wei Zhang, Xiangyuan Guan, Lu Yunhong, Jie Zhang, Shuangyong Song, Xianfu Cheng, Zhenhe Wu, Zhoujun Li
LEMUR is a novel log parsing framework that combines entropy sampling and chain-of-thought merging to improve the efficiency and accuracy of log analysis. Traditional log parsers rely on manually defined rules and often fail to capture the semantic meaning of logs, leading to suboptimal performance. LEMUR addresses these issues by using information entropy to cluster logs and large language models (LLMs) to merge log templates based on semantic understanding rather than syntax. The framework consists of three main components: Information Entropy Clustering, Template Generation, and Chain-of-Thought Merging. Information Entropy Clustering uses entropy-based sampling to efficiently group logs by their informational content. Template Generation identifies fixed and variable parts of logs by analyzing the information entropy of tokens. Chain-of-Thought Merging leverages LLMs to merge log templates by analyzing their structure, semantics, and solutions. LEMUR was evaluated on large-scale public datasets, including LogHub, and demonstrated superior performance compared to existing log parsing methods. It achieved high accuracy in grouping and template accuracy, with significant improvements in efficiency. The framework is suitable for both unsupervised and supervised learning scenarios, and it requires fewer computational resources compared to other LLM-based methods. LEMUR's approach is particularly effective in handling the variability and complexity of log messages, making it a robust solution for log parsing in complex software systems. The framework's use of entropy sampling and chain-of-thought merging ensures that it can adapt to different types of logs and maintain high performance even with large volumes of data. Overall, LEMUR represents a significant advancement in log parsing technology, offering a more efficient and accurate alternative to traditional methods.LEMUR is a novel log parsing framework that combines entropy sampling and chain-of-thought merging to improve the efficiency and accuracy of log analysis. Traditional log parsers rely on manually defined rules and often fail to capture the semantic meaning of logs, leading to suboptimal performance. LEMUR addresses these issues by using information entropy to cluster logs and large language models (LLMs) to merge log templates based on semantic understanding rather than syntax. The framework consists of three main components: Information Entropy Clustering, Template Generation, and Chain-of-Thought Merging. Information Entropy Clustering uses entropy-based sampling to efficiently group logs by their informational content. Template Generation identifies fixed and variable parts of logs by analyzing the information entropy of tokens. Chain-of-Thought Merging leverages LLMs to merge log templates by analyzing their structure, semantics, and solutions. LEMUR was evaluated on large-scale public datasets, including LogHub, and demonstrated superior performance compared to existing log parsing methods. It achieved high accuracy in grouping and template accuracy, with significant improvements in efficiency. The framework is suitable for both unsupervised and supervised learning scenarios, and it requires fewer computational resources compared to other LLM-based methods. LEMUR's approach is particularly effective in handling the variability and complexity of log messages, making it a robust solution for log parsing in complex software systems. The framework's use of entropy sampling and chain-of-thought merging ensures that it can adapt to different types of logs and maintain high performance even with large volumes of data. Overall, LEMUR represents a significant advancement in log parsing technology, offering a more efficient and accurate alternative to traditional methods.
Reach us at info@study.space
[slides and audio] Lemur%3A Log Parsing with Entropy Sampling and Chain-of-Thought Merging