26 Jun 2024 | Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, Jacob Andreas
Deductive Closure Training (DCT) is a method that uses language models (LMs) to improve their factuality, accuracy, and coherence by leveraging their own reasoning capabilities during training. The approach involves generating additional text implied by or contradicting seed documents, reasoning about the correctness of this generated text, and then fine-tuning the LM on the inferred correct text. DCT can be applied in both supervised and unsupervised settings. In supervised settings, seed documents are provided by a trusted source, while in unsupervised settings, seed documents are generated by the LM itself. Across three datasets—CREAK, MQUAKE, and the "Reversal Curse"—DCT improves fact verification and text generation accuracy by 3–26%. On CREAK, fully unsupervised DCT improves verification accuracy by 12%. These results show that LMs' reasoning capabilities during inference can be leveraged during training to improve their reliability. DCT builds on recent work on inference-time procedures for improving models' factual correctness and shows that these techniques can be used at training time as well. The method involves generating related documents, evaluating their consistency, and fine-tuning the LM on the most likely correct documents. DCT has been shown to improve model performance in fact verification, question answering, and model editing tasks. The approach is effective in both supervised and unsupervised settings and can be applied to various domains. The results demonstrate that DCT significantly improves the accuracy and reliability of language models.Deductive Closure Training (DCT) is a method that uses language models (LMs) to improve their factuality, accuracy, and coherence by leveraging their own reasoning capabilities during training. The approach involves generating additional text implied by or contradicting seed documents, reasoning about the correctness of this generated text, and then fine-tuning the LM on the inferred correct text. DCT can be applied in both supervised and unsupervised settings. In supervised settings, seed documents are provided by a trusted source, while in unsupervised settings, seed documents are generated by the LM itself. Across three datasets—CREAK, MQUAKE, and the "Reversal Curse"—DCT improves fact verification and text generation accuracy by 3–26%. On CREAK, fully unsupervised DCT improves verification accuracy by 12%. These results show that LMs' reasoning capabilities during inference can be leveraged during training to improve their reliability. DCT builds on recent work on inference-time procedures for improving models' factual correctness and shows that these techniques can be used at training time as well. The method involves generating related documents, evaluating their consistency, and fine-tuning the LM on the most likely correct documents. DCT has been shown to improve model performance in fact verification, question answering, and model editing tasks. The approach is effective in both supervised and unsupervised settings and can be applied to various domains. The results demonstrate that DCT significantly improves the accuracy and reliability of language models.