Transformers Can Do Arithmetic with the Right Embeddings

Transformers Can Do Arithmetic with the Right Embeddings

23 Dec 2024 | Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kaikhlura, Abhinav Bhatle, Jonas Geiping, Avi Schwarzschild, Tom Goldstein
Transformers can perform arithmetic tasks effectively when using appropriate embeddings. The study shows that adding positional embeddings to digits enables transformers to track the exact position of each digit within a number, improving their ability to solve arithmetic problems. This approach, called Abacus Embeddings, allows models to generalize to larger numbers, achieving up to 99% accuracy on 100-digit addition problems with minimal training. The study also demonstrates that these embeddings improve performance on other multi-step reasoning tasks, including sorting and multiplication. The research highlights the importance of positional information in arithmetic tasks and shows that combining Abacus Embeddings with recurrent layers and input injection significantly enhances model performance. The findings suggest that transformers can be trained to handle complex arithmetic and algorithmic reasoning tasks without relying on external tools. The study contributes new insights into the capabilities of transformers for arithmetic and algorithmic reasoning, showing that with the right embeddings and architectural modifications, transformers can achieve near-perfect performance on a range of tasks.Transformers can perform arithmetic tasks effectively when using appropriate embeddings. The study shows that adding positional embeddings to digits enables transformers to track the exact position of each digit within a number, improving their ability to solve arithmetic problems. This approach, called Abacus Embeddings, allows models to generalize to larger numbers, achieving up to 99% accuracy on 100-digit addition problems with minimal training. The study also demonstrates that these embeddings improve performance on other multi-step reasoning tasks, including sorting and multiplication. The research highlights the importance of positional information in arithmetic tasks and shows that combining Abacus Embeddings with recurrent layers and input injection significantly enhances model performance. The findings suggest that transformers can be trained to handle complex arithmetic and algorithmic reasoning tasks without relying on external tools. The study contributes new insights into the capabilities of transformers for arithmetic and algorithmic reasoning, showing that with the right embeddings and architectural modifications, transformers can achieve near-perfect performance on a range of tasks.
Reach us at info@study.space
Understanding Transformers Can Do Arithmetic with the Right Embeddings