March 18-21, 2024 | Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, Raj Sodhi
LAVE is a video editing tool that integrates large language models (LLMs) to provide agent-assisted editing and language-augmented features. The system automatically generates language descriptions for user footage, enabling LLMs to process videos and assist in editing tasks. LAVE offers two interaction modes: agent assistance and direct UI manipulation, providing flexibility and allowing users to refine agent actions. The system includes a language-augmented video gallery, an editing timeline, and a video editing agent that can interpret user commands, plan, and execute relevant actions to achieve editing objectives. A user study with eight participants demonstrated LAVE's effectiveness in aiding video editing, highlighting user perceptions of the proposed LLM-assisted editing paradigm and its impact on creativity and co-creation. The study revealed that users found LAVE's functionalities easy to use and useful for creative video production. LAVE's design aims to lower editing barriers through natural language and preserve user agency by offering both AI-assisted and manual editing options. The system supports various editing functions, including video retrieval, footage overview, idea brainstorming, storyboarding, and clip trimming, leveraging LLMs' linguistic capabilities. The backend system includes a plan-and-execute agent that translates user commands into action plans and executes them, with results displayed in the frontend UI. The user study results indicate that LAVE effectively supports video editing, reducing barriers for beginners and enhancing user agency. The findings suggest that LAVE's integration of LLMs and agents can inform future developments in agent-assisted content editing.LAVE is a video editing tool that integrates large language models (LLMs) to provide agent-assisted editing and language-augmented features. The system automatically generates language descriptions for user footage, enabling LLMs to process videos and assist in editing tasks. LAVE offers two interaction modes: agent assistance and direct UI manipulation, providing flexibility and allowing users to refine agent actions. The system includes a language-augmented video gallery, an editing timeline, and a video editing agent that can interpret user commands, plan, and execute relevant actions to achieve editing objectives. A user study with eight participants demonstrated LAVE's effectiveness in aiding video editing, highlighting user perceptions of the proposed LLM-assisted editing paradigm and its impact on creativity and co-creation. The study revealed that users found LAVE's functionalities easy to use and useful for creative video production. LAVE's design aims to lower editing barriers through natural language and preserve user agency by offering both AI-assisted and manual editing options. The system supports various editing functions, including video retrieval, footage overview, idea brainstorming, storyboarding, and clip trimming, leveraging LLMs' linguistic capabilities. The backend system includes a plan-and-execute agent that translates user commands into action plans and executes them, with results displayed in the frontend UI. The user study results indicate that LAVE effectively supports video editing, reducing barriers for beginners and enhancing user agency. The findings suggest that LAVE's integration of LLMs and agents can inform future developments in agent-assisted content editing.