10 Feb 2022 | Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, Quoc Le
LaMDA is a family of Transformer-based neural language models designed for dialog applications, with parameters ranging from 2B to 137B. These models are pre-trained on a large dataset of 1.56T words from public dialog data and web text. The paper demonstrates that while model scaling alone improves quality, it shows less improvement on safety and factual grounding. Fine-tuning with annotated data and enabling the model to consult external knowledge sources significantly enhances these two key challenges. The first challenge, safety, involves ensuring that the model's responses align with human values, such as preventing harmful suggestions and unfair bias. The second challenge, factual grounding, involves enabling the model to use external knowledge sources to generate responses grounded in known sources. The paper also explores the use of LaMDA in education and content recommendations, analyzing its helpfulness and role consistency. The results show that fine-tuned LaMDA models perform better than pre-trained models in these domains.LaMDA is a family of Transformer-based neural language models designed for dialog applications, with parameters ranging from 2B to 137B. These models are pre-trained on a large dataset of 1.56T words from public dialog data and web text. The paper demonstrates that while model scaling alone improves quality, it shows less improvement on safety and factual grounding. Fine-tuning with annotated data and enabling the model to consult external knowledge sources significantly enhances these two key challenges. The first challenge, safety, involves ensuring that the model's responses align with human values, such as preventing harmful suggestions and unfair bias. The second challenge, factual grounding, involves enabling the model to use external knowledge sources to generate responses grounded in known sources. The paper also explores the use of LaMDA in education and content recommendations, analyzing its helpfulness and role consistency. The results show that fine-tuned LaMDA models perform better than pre-trained models in these domains.