[slides and audio] Genie%3A Generative Interactive Environments

Genie is a novel generative interactive environment trained unsupervised from unlabelled Internet videos. It can generate a wide variety of interactive, playable environments based on text, synthetic images, photographs, and sketches. At 11 billion parameters, Genie serves as a foundation world model, capable of generating and controlling virtual worlds through latent actions. The model consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model. Despite training without ground-truth action labels, Genie enables users to act in generated environments on a frame-by-frame basis. The learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening new avenues for training generalist agents. Genie demonstrates high video fidelity and controllability, and can generate diverse trajectories in unseen reinforcement learning environments. The model's generality and controllability make it a promising tool for future research in interactive environments and agent training.Genie is a novel generative interactive environment trained unsupervised from unlabelled Internet videos. It can generate a wide variety of interactive, playable environments based on text, synthetic images, photographs, and sketches. At 11 billion parameters, Genie serves as a foundation world model, capable of generating and controlling virtual worlds through latent actions. The model consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model. Despite training without ground-truth action labels, Genie enables users to act in generated environments on a frame-by-frame basis. The learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening new avenues for training generalist agents. Genie demonstrates high video fidelity and controllability, and can generate diverse trajectories in unseen reinforcement learning environments. The model's generality and controllability make it a promising tool for future research in interactive environments and agent training.

Genie: Generative Interactive Environments