FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling

FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling

1 Apr 2019 | Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli
FAIRSEQ is an open-source sequence modeling toolkit developed by Facebook AI Research, built on PyTorch, designed for training custom models in tasks like translation, summarization, and language modeling. It supports distributed training across multiple GPUs and machines, and offers fast mixed-precision training and inference on modern GPUs. The toolkit is extensible, allowing users to add custom components through five types of plug-ins, enabling experimentation with new ideas while reusing existing components. FAIRSEQ features a common interface across models and tasks, efficient distributed and mixed precision training, state-of-the-art implementations for machine translation, summarization, and language modeling, and optimized inference with multiple search algorithms. It is distributed under a BSD license and available on GitHub. The design of FAIRSEQ includes extensibility through user-supplied plug-ins, which allow for the extension of models, criterions, tasks, optimizers, and learning rate schedulers. The toolkit ensures reproducibility and forward compatibility by saving the full state of the model, optimizer, and dataloader in checkpoints, allowing training to be resumed after interruption. FAIRSEQ is implemented in PyTorch and provides efficient batching, mixed precision training, multi-GPU, and multi-machine training. It minimizes padding within mini-batches by grouping sequences of similar lengths, and uses NCCL2 and torch.distributed for inter-GPU communication. It mitigates the effects of stragglers by overlapping gradient synchronization with backward passes and accumulating gradients over multiple mini-batches. FAIRSEQ supports inference in FP16, which increases decoding speed by 54% compared to FP32 without loss in accuracy. It has been applied in various tasks, including machine translation, language modeling, abstractive document summarization, story generation, error correction, multilingual sentence embeddings, and dialogue. In machine translation, FAIRSEQ provides reference implementations of popular sequence-to-sequence models, including LSTM, convolutional models, and Transformer. It has been evaluated on the WMT'14 and WMT'16 datasets, achieving improved BLEU scores compared to previous models. In language modeling, FAIRSEQ supports models like gated convolutional networks and Transformers, with various input and output representations. It has been used to evaluate models on the WikiText-103 dataset and the One Billion Word benchmark. In abstractive document summarization, FAIRSEQ has been used to generate summaries from the CNN-Dailymail dataset, achieving competitive ROUGE scores. It also supports using pre-trained language model representations for improved performance.FAIRSEQ is an open-source sequence modeling toolkit developed by Facebook AI Research, built on PyTorch, designed for training custom models in tasks like translation, summarization, and language modeling. It supports distributed training across multiple GPUs and machines, and offers fast mixed-precision training and inference on modern GPUs. The toolkit is extensible, allowing users to add custom components through five types of plug-ins, enabling experimentation with new ideas while reusing existing components. FAIRSEQ features a common interface across models and tasks, efficient distributed and mixed precision training, state-of-the-art implementations for machine translation, summarization, and language modeling, and optimized inference with multiple search algorithms. It is distributed under a BSD license and available on GitHub. The design of FAIRSEQ includes extensibility through user-supplied plug-ins, which allow for the extension of models, criterions, tasks, optimizers, and learning rate schedulers. The toolkit ensures reproducibility and forward compatibility by saving the full state of the model, optimizer, and dataloader in checkpoints, allowing training to be resumed after interruption. FAIRSEQ is implemented in PyTorch and provides efficient batching, mixed precision training, multi-GPU, and multi-machine training. It minimizes padding within mini-batches by grouping sequences of similar lengths, and uses NCCL2 and torch.distributed for inter-GPU communication. It mitigates the effects of stragglers by overlapping gradient synchronization with backward passes and accumulating gradients over multiple mini-batches. FAIRSEQ supports inference in FP16, which increases decoding speed by 54% compared to FP32 without loss in accuracy. It has been applied in various tasks, including machine translation, language modeling, abstractive document summarization, story generation, error correction, multilingual sentence embeddings, and dialogue. In machine translation, FAIRSEQ provides reference implementations of popular sequence-to-sequence models, including LSTM, convolutional models, and Transformer. It has been evaluated on the WMT'14 and WMT'16 datasets, achieving improved BLEU scores compared to previous models. In language modeling, FAIRSEQ supports models like gated convolutional networks and Transformers, with various input and output representations. It has been used to evaluate models on the WikiText-103 dataset and the One Billion Word benchmark. In abstractive document summarization, FAIRSEQ has been used to generate summaries from the CNN-Dailymail dataset, achieving competitive ROUGE scores. It also supports using pre-trained language model representations for improved performance.
Reach us at info@study.space