May 31, 2024 | Najoung Kim*, Sebastian Schuster*, Shubham Toshniwal*
This paper investigates whether pretraining language models on code improves their ability to track changes in discourse entities. The authors compare base models with models trained on additional code data, as well as models trained on math data and alignment-tuned models. They find that models trained on code data outperform base models in entity tracking tasks. However, additional math training and alignment tuning do not consistently improve performance across different model families.
The study uses a task called "boxes" where the model must track the contents of seven boxes after a series of operations. The results show that models trained on code data perform better in tracking entity states, especially when the number of operations affecting the target box is greater than zero. However, the performance gains from additional math training are marginal, and alignment tuning has mixed effects depending on whether it is applied to base models or code models.
The authors conclude that code pretraining significantly improves entity tracking abilities in language models, while math training and alignment tuning have limited effects. They also note that the effectiveness of code training varies with model size, with larger models showing more significant improvements. The study highlights the importance of structured data in enhancing the reasoning capabilities of language models.This paper investigates whether pretraining language models on code improves their ability to track changes in discourse entities. The authors compare base models with models trained on additional code data, as well as models trained on math data and alignment-tuned models. They find that models trained on code data outperform base models in entity tracking tasks. However, additional math training and alignment tuning do not consistently improve performance across different model families.
The study uses a task called "boxes" where the model must track the contents of seven boxes after a series of operations. The results show that models trained on code data perform better in tracking entity states, especially when the number of operations affecting the target box is greater than zero. However, the performance gains from additional math training are marginal, and alignment tuning has mixed effects depending on whether it is applied to base models or code models.
The authors conclude that code pretraining significantly improves entity tracking abilities in language models, while math training and alignment tuning have limited effects. They also note that the effectiveness of code training varies with model size, with larger models showing more significant improvements. The study highlights the importance of structured data in enhancing the reasoning capabilities of language models.