VOL. 14, NO. 8, MARCH 2024 | Qizhi Pei, Lijun Wu*, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, and Rui Yan*
The paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey" by Qizhi Pei et al. explores the integration of biomolecular modeling with natural language processing (BL) as a promising interdisciplinary field. This approach leverages textual data to enhance understanding and enable computational tasks such as biomolecule property prediction. The paper outlines the technical representations of biomolecules, including sequences, 2D graphs, and 3D structures, and examines the rationale and objectives behind effective multi-modal integration. It discusses machine learning frameworks like GPT-based pre-training and multi-stream neural networks, as well as representation learning methods. The paper also surveys practical applications, such as property prediction, molecular description generation, and biomolecular data retrieval from text. Additionally, it compiles available resources and datasets to facilitate future research. Finally, the paper identifies promising research directions and aims to provide a comprehensive resource for interdisciplinary researchers in biology, chemistry, and AI.The paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey" by Qizhi Pei et al. explores the integration of biomolecular modeling with natural language processing (BL) as a promising interdisciplinary field. This approach leverages textual data to enhance understanding and enable computational tasks such as biomolecule property prediction. The paper outlines the technical representations of biomolecules, including sequences, 2D graphs, and 3D structures, and examines the rationale and objectives behind effective multi-modal integration. It discusses machine learning frameworks like GPT-based pre-training and multi-stream neural networks, as well as representation learning methods. The paper also surveys practical applications, such as property prediction, molecular description generation, and biomolecular data retrieval from text. Additionally, it compiles available resources and datasets to facilitate future research. Finally, the paper identifies promising research directions and aims to provide a comprehensive resource for interdisciplinary researchers in biology, chemistry, and AI.