15 Mar 2024 | Xinrun Xu, Yuxin Wang, Chaoyi Xu, Ziluo Ding, Jiechuan Jiang, Zhiming Ding, Börje F. Karlsson
This paper provides a comprehensive survey of the current state of Large Models (LMs) in game-playing scenarios, focusing on their capabilities, challenges, and future research directions. It systematically reviews the existing architectures of LM-based Agents (LMAs) for games, highlighting their commonalities, challenges, and insights. The paper also presents perspectives on promising future research avenues for advancing LMs in games, aiming to help researchers gain a clear understanding of the field and generate more interest in this impactful research direction.
The paper discusses the development of large-scale models, including language and multi-modal models, which have made significant advancements in natural language processing and computer vision. Recent progress has led to notable achievements in various applications, including text generation, image understanding, and robotics. This progress has led researchers to explore the application of LMs as agents to perform complex tasks, where LMAs have shown interesting generalization capabilities compared to traditionally trained counterparts. The demonstrated capabilities of LMs have led to considerable interest in their application to game playing, particularly in popular games like Minecraft.
In the context of pursuing Artificial General Intelligence (AGI) research, digital games are recognized as significant for their provision of complex challenges that necessitate advanced reasoning and cognitive abilities, serving as an ideal benchmark for assessing agent and system capabilities. The process of data acquisition in gaming contexts offers advantages in terms of cost-effectiveness, controllability, safety, and diversity in comparison to real-world experiments, while preserving significant challenges. Although attempting to analyze or formalize game AI agents and their components is hardly a recent phenomenon even outside academia, investigating the performance of LMAs within complex gaming environments is crucial for delineating their current limitations and assessing the progress towards autonomy, generalizability, informing the design of new architectures, and moving closer to a potential AGI.
The paper outlines the core survey structure, covering the essence of how sensory information is converted into actions and how LMs can play a role in each step. It discusses the challenges faced by LMAs in various stages, including hallucinations, error correction, generalization to unseen tasks, and enhancing interpretability. The paper also highlights the importance of multi-modal perception, authenticity in gaming experiences, the use of external tools, and real-time gaming in the development of LMAs for game-playing scenarios. The paper concludes by identifying future research directions for LMAs and digital games, emphasizing the need for improvements in visual perception, narrative generation, the use of external tools, and real-time gaming capabilities.This paper provides a comprehensive survey of the current state of Large Models (LMs) in game-playing scenarios, focusing on their capabilities, challenges, and future research directions. It systematically reviews the existing architectures of LM-based Agents (LMAs) for games, highlighting their commonalities, challenges, and insights. The paper also presents perspectives on promising future research avenues for advancing LMs in games, aiming to help researchers gain a clear understanding of the field and generate more interest in this impactful research direction.
The paper discusses the development of large-scale models, including language and multi-modal models, which have made significant advancements in natural language processing and computer vision. Recent progress has led to notable achievements in various applications, including text generation, image understanding, and robotics. This progress has led researchers to explore the application of LMs as agents to perform complex tasks, where LMAs have shown interesting generalization capabilities compared to traditionally trained counterparts. The demonstrated capabilities of LMs have led to considerable interest in their application to game playing, particularly in popular games like Minecraft.
In the context of pursuing Artificial General Intelligence (AGI) research, digital games are recognized as significant for their provision of complex challenges that necessitate advanced reasoning and cognitive abilities, serving as an ideal benchmark for assessing agent and system capabilities. The process of data acquisition in gaming contexts offers advantages in terms of cost-effectiveness, controllability, safety, and diversity in comparison to real-world experiments, while preserving significant challenges. Although attempting to analyze or formalize game AI agents and their components is hardly a recent phenomenon even outside academia, investigating the performance of LMAs within complex gaming environments is crucial for delineating their current limitations and assessing the progress towards autonomy, generalizability, informing the design of new architectures, and moving closer to a potential AGI.
The paper outlines the core survey structure, covering the essence of how sensory information is converted into actions and how LMs can play a role in each step. It discusses the challenges faced by LMAs in various stages, including hallucinations, error correction, generalization to unseen tasks, and enhancing interpretability. The paper also highlights the importance of multi-modal perception, authenticity in gaming experiences, the use of external tools, and real-time gaming in the development of LMAs for game-playing scenarios. The paper concludes by identifying future research directions for LMAs and digital games, emphasizing the need for improvements in visual perception, narrative generation, the use of external tools, and real-time gaming capabilities.