Understanding Transformers for Low-Resource Languages%3A Is F%C3%A9idir Linn!

The paper evaluates the performance of Transformer models in translating low-resource language pairs, specifically English to Irish. The study focuses on hyperparameter optimization and the choice of subword models to improve translation accuracy. Key findings include: 1. **Hyperparameter Optimization**: Random search was used to optimize hyperparameters such as the number of attention heads, layers, and regularization techniques. The optimal parameters significantly improved model performance. 2. **Subword Models**: The choice of subword model type, particularly the vocabulary size, was crucial. BPE models with a vocabulary size of 16k words outperformed other models, achieving a BLEU score improvement of 7.8 points compared to a baseline RNN model. 3. **Performance Improvements**: Transformer models with optimized parameters demonstrated substantial improvements across various metrics, including BLEU, TER, and ChrF. The best-performing Transformer model achieved a BLEU score of 60.5 on a 55k DGT corpus and 0.33 TER on a 16k BPE submodel. 4. **Benchmarking**: The optimized Transformer model was benchmarked against Google Translate, showing significant improvements in translation quality. 5. **Environmental Impact**: The study also tracked the environmental impact of model development, noting that the process generated just under 10 kgCO₂. 6. **Conclusion**: The research demonstrates that Transformer models, with appropriate hyperparameter optimization and subword model choices, can effectively handle low-resource language translation tasks, achieving high-quality translations for English to Irish.The paper evaluates the performance of Transformer models in translating low-resource language pairs, specifically English to Irish. The study focuses on hyperparameter optimization and the choice of subword models to improve translation accuracy. Key findings include: 1. **Hyperparameter Optimization**: Random search was used to optimize hyperparameters such as the number of attention heads, layers, and regularization techniques. The optimal parameters significantly improved model performance. 2. **Subword Models**: The choice of subword model type, particularly the vocabulary size, was crucial. BPE models with a vocabulary size of 16k words outperformed other models, achieving a BLEU score improvement of 7.8 points compared to a baseline RNN model. 3. **Performance Improvements**: Transformer models with optimized parameters demonstrated substantial improvements across various metrics, including BLEU, TER, and ChrF. The best-performing Transformer model achieved a BLEU score of 60.5 on a 55k DGT corpus and 0.33 TER on a 16k BPE submodel. 4. **Benchmarking**: The optimized Transformer model was benchmarked against Google Translate, showing significant improvements in translation quality. 5. **Environmental Impact**: The study also tracked the environmental impact of model development, noting that the process generated just under 10 kgCO₂. 6. **Conclusion**: The research demonstrates that Transformer models, with appropriate hyperparameter optimization and subword model choices, can effectively handle low-resource language translation tasks, achieving high-quality translations for English to Irish.

Transformers for Low-Resource Languages: Is Féidir Linn!

4 Mar 2024 | Séamus Lankford, Haithem Afli, Andy Way