VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

6 Jun 2024 | Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling This paper presents VidMuse, a simple and effective framework for generating music aligned with video inputs. The authors first construct a large-scale dataset, V2M, containing 190K video-music pairs across various genres. This dataset is used to evaluate state-of-the-art methods and to train VidMuse. VidMuse uses a Long-Short-Term Visual Module and a Music Token Decoder to generate high-fidelity music that is both acoustically and semantically aligned with the video. The Long-Short-Term Visual Module captures both local and global visual cues, while the Music Token Decoder converts video embeddings into music tokens. The proposed method outperforms existing models in terms of audio quality, diversity, and audio-visual alignment. The code and datasets are available at https://github.com/ZeyueT/VidMuse/.VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling This paper presents VidMuse, a simple and effective framework for generating music aligned with video inputs. The authors first construct a large-scale dataset, V2M, containing 190K video-music pairs across various genres. This dataset is used to evaluate state-of-the-art methods and to train VidMuse. VidMuse uses a Long-Short-Term Visual Module and a Music Token Decoder to generate high-fidelity music that is both acoustically and semantically aligned with the video. The Long-Short-Term Visual Module captures both local and global visual cues, while the Music Token Decoder converts video embeddings into music tokens. The proposed method outperforms existing models in terms of audio quality, diversity, and audio-visual alignment. The code and datasets are available at https://github.com/ZeyueT/VidMuse/.
Reach us at info@study.space