A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming

30 Jan 2024 | Pengyuan Zhou, Lin Wang, Zhi Liu, Yanbin Hao, Pan Hui, Sasu Tarkoma, Jussi Kangasharju
This paper explores the integration of Generative AI and Large Language Models (LLMs) in video generation, understanding, and streaming. It highlights the transformative potential of these technologies in creating realistic videos, enhancing video understanding, and improving streaming experiences. The study reviews current achievements, ongoing challenges, and future possibilities of applying Generative AI and LLMs to video-related tasks, emphasizing their potential to advance video technology in multimedia, networking, and AI communities. Generative AI and LLMs are reshaping video technology by enabling the creation of lifelike videos, extracting meaningful information from visual content, and adapting content delivery to individual viewer preferences. The paper discusses the use of GANs, VAEs, autoregressive models, and diffusion models for video generation, while LLMs are employed for video understanding tasks such as captioning, action recognition, and retrieval. Additionally, LLMs are used to optimize video streaming by predicting bandwidth, viewport, and resource allocation. The paper also addresses the challenges associated with these technologies, including poor temporal reasoning, high computational costs, and multimodal understanding. It highlights the need for further research to improve the performance of LLMs for video scene understanding and to overcome these limitations. The study concludes that Generative AI and LLMs have significant potential to advance video technology, but their application requires careful consideration of technical and ethical issues.This paper explores the integration of Generative AI and Large Language Models (LLMs) in video generation, understanding, and streaming. It highlights the transformative potential of these technologies in creating realistic videos, enhancing video understanding, and improving streaming experiences. The study reviews current achievements, ongoing challenges, and future possibilities of applying Generative AI and LLMs to video-related tasks, emphasizing their potential to advance video technology in multimedia, networking, and AI communities. Generative AI and LLMs are reshaping video technology by enabling the creation of lifelike videos, extracting meaningful information from visual content, and adapting content delivery to individual viewer preferences. The paper discusses the use of GANs, VAEs, autoregressive models, and diffusion models for video generation, while LLMs are employed for video understanding tasks such as captioning, action recognition, and retrieval. Additionally, LLMs are used to optimize video streaming by predicting bandwidth, viewport, and resource allocation. The paper also addresses the challenges associated with these technologies, including poor temporal reasoning, high computational costs, and multimodal understanding. It highlights the need for further research to improve the performance of LLMs for video scene understanding and to overcome these limitations. The study concludes that Generative AI and LLMs have significant potential to advance video technology, but their application requires careful consideration of technical and ethical issues.
Reach us at info@study.space
[slides] A Survey on Generative AI and LLM for Video Generation%2C Understanding%2C and Streaming | StudySpace