[slides] Transformers Can Do Arithmetic with the Right Embeddings

The paper addresses the poor performance of transformers on arithmetic tasks, particularly their inability to accurately represent the position of each digit within a large sequence. To improve this, the authors introduce *Abacus Embeddings*, which are learned positional embeddings that encode the position of each digit relative to the start of the number. These embeddings significantly enhance the models' ability to solve arithmetic problems, achieving state-of-the-art performance on 100-digit addition problems with up to 99% accuracy. The authors also explore architectural modifications such as input injection and recurrent layers, which further improve performance. Input injection reduces generalization errors by 50%, while recurrent layers, particularly looped transformers, achieve near-perfect accuracy on addition problems. The combination of Abacus Embeddings and recurrent layers results in an 87% reduction in error compared to standard architectures alone. Additionally, the paper demonstrates that the gains in numeracy unlock improvements on other multi-step reasoning tasks, including multiplication and sorting. The authors train models on addition problems and evaluate their performance on multiplication and sorting tasks, showing that their models can solve problems with six times as many digits as the largest samples in the training set. The paper concludes by discussing the limitations and future directions, emphasizing the need for further advancements in algorithmic reasoning capabilities of large language models.The paper addresses the poor performance of transformers on arithmetic tasks, particularly their inability to accurately represent the position of each digit within a large sequence. To improve this, the authors introduce *Abacus Embeddings*, which are learned positional embeddings that encode the position of each digit relative to the start of the number. These embeddings significantly enhance the models' ability to solve arithmetic problems, achieving state-of-the-art performance on 100-digit addition problems with up to 99% accuracy. The authors also explore architectural modifications such as input injection and recurrent layers, which further improve performance. Input injection reduces generalization errors by 50%, while recurrent layers, particularly looped transformers, achieve near-perfect accuracy on addition problems. The combination of Abacus Embeddings and recurrent layers results in an 87% reduction in error compared to standard architectures alone. Additionally, the paper demonstrates that the gains in numeracy unlock improvements on other multi-step reasoning tasks, including multiplication and sorting. The authors train models on addition problems and evaluate their performance on multiplication and sorting tasks, showing that their models can solve problems with six times as many digits as the largest samples in the training set. The paper concludes by discussing the limitations and future directions, emphasizing the need for further advancements in algorithmic reasoning capabilities of large language models.

Transformers Can Do Arithmetic with the Right Embeddings

23 Dec 2024 | Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein