LLMs are Good Sign Language Translators

LLMs are Good Sign Language Translators

1 Apr 2024 | Jia Gong1† Lin Geng Foo1† Yixuan He1† Hossein Rahmani2 Jun Liu1‡
The paper "LLMs are Good Sign Language Translators" by Jia Gong, Lin Geng Foo, Yixuan He, Hossein Rahmani, and Jun Liu introduces a novel framework called SignLLM to leverage large language models (LLMs) for Sign Language Translation (SLT). SLT aims to translate sign videos into spoken language, a challenging task due to the limited availability of paired sign-text data. The authors propose two key modules: the Vector-Quantized Visual Sign (VQ-Sign) module, which converts sign videos into a sequence of discrete character-level sign tokens, and the Codebook Reconstruction and Alignment (CRA) module, which converts these tokens into word-level sign representations using an optimal transport formulation. A sign-text alignment loss further enhances semantic compatibility between sign and text tokens. The framework is trained on two popular SLT datasets, achieving state-of-the-art gloss-free results. The main contributions include the development of SignLLM, the first use of off-the-shelf and frozen LLMs for SLT, and the introduction of VQ-Sign and CRA modules to make sign videos more compatible with LLMs. The paper also includes a comprehensive ablation study and qualitative analysis to validate the effectiveness of the proposed methods.The paper "LLMs are Good Sign Language Translators" by Jia Gong, Lin Geng Foo, Yixuan He, Hossein Rahmani, and Jun Liu introduces a novel framework called SignLLM to leverage large language models (LLMs) for Sign Language Translation (SLT). SLT aims to translate sign videos into spoken language, a challenging task due to the limited availability of paired sign-text data. The authors propose two key modules: the Vector-Quantized Visual Sign (VQ-Sign) module, which converts sign videos into a sequence of discrete character-level sign tokens, and the Codebook Reconstruction and Alignment (CRA) module, which converts these tokens into word-level sign representations using an optimal transport formulation. A sign-text alignment loss further enhances semantic compatibility between sign and text tokens. The framework is trained on two popular SLT datasets, achieving state-of-the-art gloss-free results. The main contributions include the development of SignLLM, the first use of off-the-shelf and frozen LLMs for SLT, and the introduction of VQ-Sign and CRA modules to make sign videos more compatible with LLMs. The paper also includes a comprehensive ablation study and qualitative analysis to validate the effectiveness of the proposed methods.
Reach us at info@study.space