12 Apr 2018 | Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Céspedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil
The paper introduces the Universal Sentence Encoder (USE), a model designed to encode sentences into embedding vectors for transfer learning in various natural language processing (NLP) tasks. The encoder is efficient and performs well on diverse tasks, with two variants allowing trade-offs between accuracy and computational resources. The relationship between model complexity, resource consumption, and task performance is investigated, showing that sentence-level transfer learning often outperforms word-level transfer. The models achieve good performance with minimal supervised training data and exhibit promising results in detecting model bias using Word Embedding Association Tests (WEAT). The pre-trained sentence encoding models are made available on TF Hub for public use. The paper also discusses the engineering trade-offs between memory and compute requirements for different sentence lengths and compares the performance of the transformer-based and deep averaging network (DAN) encoders.The paper introduces the Universal Sentence Encoder (USE), a model designed to encode sentences into embedding vectors for transfer learning in various natural language processing (NLP) tasks. The encoder is efficient and performs well on diverse tasks, with two variants allowing trade-offs between accuracy and computational resources. The relationship between model complexity, resource consumption, and task performance is investigated, showing that sentence-level transfer learning often outperforms word-level transfer. The models achieve good performance with minimal supervised training data and exhibit promising results in detecting model bias using Word Embedding Association Tests (WEAT). The pre-trained sentence encoding models are made available on TF Hub for public use. The paper also discusses the engineering trade-offs between memory and compute requirements for different sentence lengths and compares the performance of the transformer-based and deep averaging network (DAN) encoders.