LLMs are Good Sign Language Translators

LLMs are Good Sign Language Translators

1 Apr 2024 | Jia Gong, Lin Geng Foo, Yixuan He, Hossein Rahmani, Jun Liu
This paper presents SignLLM, a novel framework that leverages off-the-shelf and frozen large language models (LLMs) for Sign Language Translation (SLT). The main idea is to transform sign videos into a language-like representation that is compatible with LLMs. The framework consists of two key modules: the Vector-Quantized Visual Sign (VQ-Sign) module, which converts sign videos into a sequence of discrete character-level sign tokens, and the Codebook Reconstruction and Alignment (CRA) module, which transforms these character-level tokens into word-level sign tokens. The CRA module uses an optimal transport formulation to align the sign tokens with text tokens, enhancing semantic compatibility. The framework also includes a sign-text alignment loss to further bridge the gap between sign and text tokens. The proposed SignLLM achieves state-of-the-art gloss-free results on two widely-used SLT benchmarks. The paper also includes an extensive ablation study that demonstrates the effectiveness of the proposed framework. The results show that SignLLM outperforms previous methods in terms of translation accuracy and fluency. The framework is evaluated on two datasets: Phoenix-2014T and CSL-Daily. The results show that SignLLM achieves high accuracy in translating sign videos into spoken language. The paper concludes that SignLLM is a promising first step towards effectively harnessing LLMs for SLT.This paper presents SignLLM, a novel framework that leverages off-the-shelf and frozen large language models (LLMs) for Sign Language Translation (SLT). The main idea is to transform sign videos into a language-like representation that is compatible with LLMs. The framework consists of two key modules: the Vector-Quantized Visual Sign (VQ-Sign) module, which converts sign videos into a sequence of discrete character-level sign tokens, and the Codebook Reconstruction and Alignment (CRA) module, which transforms these character-level tokens into word-level sign tokens. The CRA module uses an optimal transport formulation to align the sign tokens with text tokens, enhancing semantic compatibility. The framework also includes a sign-text alignment loss to further bridge the gap between sign and text tokens. The proposed SignLLM achieves state-of-the-art gloss-free results on two widely-used SLT benchmarks. The paper also includes an extensive ablation study that demonstrates the effectiveness of the proposed framework. The results show that SignLLM outperforms previous methods in terms of translation accuracy and fluency. The framework is evaluated on two datasets: Phoenix-2014T and CSL-Daily. The results show that SignLLM achieves high accuracy in translating sign videos into spoken language. The paper concludes that SignLLM is a promising first step towards effectively harnessing LLMs for SLT.
Reach us at info@study.space
[slides and audio] LLMs are Good Sign Language Translators