OPT: Open Pre-trained Transformer Language Models

OPT: Open Pre-trained Transformer Language Models

21 Jun 2022 | Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer
Meta AI has introduced Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformer models ranging from 125M to 175B parameters. The models aim to be fully and responsibly shared with researchers. OPT-175B is comparable to GPT-3 in performance but requires only 1/7th the carbon footprint to develop. The team has released their logbook detailing the infrastructure challenges and code for experimenting with all models. The models were trained on a diverse corpus including RoBERTa, the Pile, and PushShift.io Reddit. The training used 992 80GB A100 GPUs and achieved 147 TFLOP/s utilization per GPU. The training process involved several adjustments to handle hardware failures and loss divergences. The models were evaluated on 16 standard NLP tasks and showed performance comparable to GPT-3 in most cases. However, they underperformed in some tasks like ARC Challenge and MultiRC. The models also performed well in dialogue tasks, outperforming other models in some cases. The models were also evaluated for bias and toxicity. OPT-175B showed more stereotypical biases than GPT-3 in some categories. It also had a higher toxicity rate than other models. The models were evaluated for dialogue safety and showed similar performance to other models. The models have limitations, including issues with instruction learning, repetition, and factual accuracy. The team believes that future work should focus on improving these aspects. The models are released with the aim of enabling responsible research and addressing ethical considerations. The release of OPT-175B is part of a broader effort to promote responsible AI development. The team has provided detailed information on the training process, including the infrastructure challenges and code for experimentation. The models are available for research, with access limited to academic researchers and those affiliated with organizations in government, civil society, and academia. The team hopes that the release will enable more researchers to study the ethical implications of large language models.Meta AI has introduced Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformer models ranging from 125M to 175B parameters. The models aim to be fully and responsibly shared with researchers. OPT-175B is comparable to GPT-3 in performance but requires only 1/7th the carbon footprint to develop. The team has released their logbook detailing the infrastructure challenges and code for experimenting with all models. The models were trained on a diverse corpus including RoBERTa, the Pile, and PushShift.io Reddit. The training used 992 80GB A100 GPUs and achieved 147 TFLOP/s utilization per GPU. The training process involved several adjustments to handle hardware failures and loss divergences. The models were evaluated on 16 standard NLP tasks and showed performance comparable to GPT-3 in most cases. However, they underperformed in some tasks like ARC Challenge and MultiRC. The models also performed well in dialogue tasks, outperforming other models in some cases. The models were also evaluated for bias and toxicity. OPT-175B showed more stereotypical biases than GPT-3 in some categories. It also had a higher toxicity rate than other models. The models were evaluated for dialogue safety and showed similar performance to other models. The models have limitations, including issues with instruction learning, repetition, and factual accuracy. The team believes that future work should focus on improving these aspects. The models are released with the aim of enabling responsible research and addressing ethical considerations. The release of OPT-175B is part of a broader effort to promote responsible AI development. The team has provided detailed information on the training process, including the infrastructure challenges and code for experimentation. The models are available for research, with access limited to academic researchers and those affiliated with organizations in government, civil society, and academia. The team hopes that the release will enable more researchers to study the ethical implications of large language models.
Reach us at info@study.space