March 18–21, 2024 | Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, Raj Sodhi
**Abstract:**
Video creation has become increasingly popular, but the expertise and effort required for editing often pose barriers to beginners. This paper explores the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. The LAVE system is designed to provide LLM-powered agent assistance and language-augmented features, enhancing the video editing experience. LAVE automatically generates language descriptions for videos, enabling LLMs to process and assist in various editing tasks. Users can interact with an agent through natural language commands, which the agent interprets and plans to execute relevant actions. LAVE offers flexibility by allowing users to edit videos either with agent assistance or manual manipulation. A user study involving eight participants, ranging from novices to proficient editors, demonstrated the effectiveness of LAVE in aiding video editing. The results also highlighted user perceptions of the proposed LLM-assisted editing paradigm and its impact on creativity and co-creation with AI.
**Keywords:**
Video Editing, LLMs, Agents, Human-AI Co-Creation
**Design Goals:**
1. **Harnessing Natural Language to Lower Editing Barriers:** Enhance manual video editing paradigms with natural language and LLMs.
2. **Preserving User Agency in the Editing Process:** Offer both AI-assisted and manual editing options to ensure users maintain control over their creative vision.
**System Components:**
- **Language-Augmented Video Gallery:** Displays videos with auto-generated titles and summaries.
- **Video Editing Timeline:** Allows users to sequence and trim clips.
- **Video Editing Agent:** Facilitates interactions with a conversational agent, providing assistance throughout the editing process.
**LLM-Powered Editing Functions:**
- **Footage Overviewing:** Generates an overview of video content.
- **Idea Brainstorming:** Assists in generating video editing ideas.
- **Video Retrieval:** Searches for relevant videos based on language queries.
- **Storyboarding:** Sequences clips based on provided narratives.
- **Clip Trimming:** Adjusts video segments according to user commands.
**User Study:**
- **Participants:** Eight participants with varying video editing experiences.
- **Protocol:** Engaged with LAVE to produce videos, provided feedback on features, and completed questionnaires.
- **Results:** Participants found LAVE easy to use and useful, particularly for beginners. They appreciated the efficiency of video retrieval and the novelty of the editing paradigm. However, some users preferred maintaining autonomy and experienced variability in the usefulness of certain functions due to the stochastic nature of LLMs.
**Conclusion:**
LAVE effectively reduces editing barriers and enhances the video editing experience through LLM-powered agent assistance and language-augmented features. The study's findings provide insights into user perceptions and suggest future design implications for multimedia content editing tools integrating LLMs and agents.**Abstract:**
Video creation has become increasingly popular, but the expertise and effort required for editing often pose barriers to beginners. This paper explores the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. The LAVE system is designed to provide LLM-powered agent assistance and language-augmented features, enhancing the video editing experience. LAVE automatically generates language descriptions for videos, enabling LLMs to process and assist in various editing tasks. Users can interact with an agent through natural language commands, which the agent interprets and plans to execute relevant actions. LAVE offers flexibility by allowing users to edit videos either with agent assistance or manual manipulation. A user study involving eight participants, ranging from novices to proficient editors, demonstrated the effectiveness of LAVE in aiding video editing. The results also highlighted user perceptions of the proposed LLM-assisted editing paradigm and its impact on creativity and co-creation with AI.
**Keywords:**
Video Editing, LLMs, Agents, Human-AI Co-Creation
**Design Goals:**
1. **Harnessing Natural Language to Lower Editing Barriers:** Enhance manual video editing paradigms with natural language and LLMs.
2. **Preserving User Agency in the Editing Process:** Offer both AI-assisted and manual editing options to ensure users maintain control over their creative vision.
**System Components:**
- **Language-Augmented Video Gallery:** Displays videos with auto-generated titles and summaries.
- **Video Editing Timeline:** Allows users to sequence and trim clips.
- **Video Editing Agent:** Facilitates interactions with a conversational agent, providing assistance throughout the editing process.
**LLM-Powered Editing Functions:**
- **Footage Overviewing:** Generates an overview of video content.
- **Idea Brainstorming:** Assists in generating video editing ideas.
- **Video Retrieval:** Searches for relevant videos based on language queries.
- **Storyboarding:** Sequences clips based on provided narratives.
- **Clip Trimming:** Adjusts video segments according to user commands.
**User Study:**
- **Participants:** Eight participants with varying video editing experiences.
- **Protocol:** Engaged with LAVE to produce videos, provided feedback on features, and completed questionnaires.
- **Results:** Participants found LAVE easy to use and useful, particularly for beginners. They appreciated the efficiency of video retrieval and the novelty of the editing paradigm. However, some users preferred maintaining autonomy and experienced variability in the usefulness of certain functions due to the stochastic nature of LLMs.
**Conclusion:**
LAVE effectively reduces editing barriers and enhances the video editing experience through LLM-powered agent assistance and language-augmented features. The study's findings provide insights into user perceptions and suggest future design implications for multimedia content editing tools integrating LLMs and agents.