ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization

1 Jul 2024 | Chunrong Fang, Weisong Sun*, Yuchen Chen, Xiao Chen, Zhao Wei, Quanjun Zhang, Yudu You, Bin Luo, Yang Liu, Zhenyu Chen
ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization This paper proposes ESALE, a novel approach to improve code summarization by enhancing the encoder's ability to learn code-summary alignment through three summary-focused pre-training tasks: unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike existing pre-trained models that predict masked code tokens, ESALE predicts masked words in summaries based on code snippets, enabling the encoder to learn the alignment between code snippets and summaries. The approach also introduces AWP to enhance the encoder's ability to learn the alignment between action words and code snippets. ESALE is evaluated on four datasets, including JCSD, PCSD, CPJD, and CodeSearchNet, and significantly outperforms state-of-the-art baselines in terms of BLEU, METEOR, and ROUGE-L. Human evaluation shows that ESALE's summaries are more informative and closer to ground-truth summaries. The method is based on a multi-task learning paradigm, where the shared encoder is trained on three summary-focused tasks to improve code-summary alignment. The encoder is initialized with a pre-trained model (e.g., UniXcoder) and fine-tuned on code summarization tasks. The decoder is trained simultaneously to generate natural language summaries. ESALE's encoder is shown to capture code patterns that are essential for generating accurate summaries, as demonstrated by experiments where removing code patterns affected the summaries generated by ESALE but not by UniXcoder. The results indicate that the three summary-focused tasks effectively enhance the encoder's ability to learn code-summary alignment, leading to improved code summarization performance.ESALE: Enhancing Code-Summary Alignment Learning for Source Code Summarization This paper proposes ESALE, a novel approach to improve code summarization by enhancing the encoder's ability to learn code-summary alignment through three summary-focused pre-training tasks: unidirectional language modeling (ULM), masked language modeling (MLM), and action word prediction (AWP). Unlike existing pre-trained models that predict masked code tokens, ESALE predicts masked words in summaries based on code snippets, enabling the encoder to learn the alignment between code snippets and summaries. The approach also introduces AWP to enhance the encoder's ability to learn the alignment between action words and code snippets. ESALE is evaluated on four datasets, including JCSD, PCSD, CPJD, and CodeSearchNet, and significantly outperforms state-of-the-art baselines in terms of BLEU, METEOR, and ROUGE-L. Human evaluation shows that ESALE's summaries are more informative and closer to ground-truth summaries. The method is based on a multi-task learning paradigm, where the shared encoder is trained on three summary-focused tasks to improve code-summary alignment. The encoder is initialized with a pre-trained model (e.g., UniXcoder) and fine-tuned on code summarization tasks. The decoder is trained simultaneously to generate natural language summaries. ESALE's encoder is shown to capture code patterns that are essential for generating accurate summaries, as demonstrated by experiments where removing code patterns affected the summaries generated by ESALE but not by UniXcoder. The results indicate that the three summary-focused tasks effectively enhance the encoder's ability to learn code-summary alignment, leading to improved code summarization performance.
Reach us at info@study.space
[slides] Esale%3A Enhancing Code-Summary Alignment Learning for Source Code Summarization | StudySpace