20 Feb 2024 | Haisong Gong, Qiang Liu, Shu Wu, Liang Wang
The paper introduces TGM-DLM (Text-Guided Molecule Generation with Diffusion Language Model), a novel approach to text-guided molecule generation that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string iteratively and collectively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. The authors demonstrate that TGM-DLM outperforms MoiT5-Base, an autoregressive model, without the need for additional data resources. The findings highlight the effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. The code for TGM-DLM is available at: https://github.com/Deno-V/tgm-dlm.The paper introduces TGM-DLM (Text-Guided Molecule Generation with Diffusion Language Model), a novel approach to text-guided molecule generation that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string iteratively and collectively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. The authors demonstrate that TGM-DLM outperforms MoiT5-Base, an autoregressive model, without the need for additional data resources. The findings highlight the effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. The code for TGM-DLM is available at: https://github.com/Deno-V/tgm-dlm.