作者：Willi Menapace Aliaksandr Siarohin Stéphane Lathuilière Panos Achlioptas Vladislav Golyanik Elisa Ricci Sergey Tulyakov
Game engines are powerful tools in computer graphics. Their power comes atthe immense cost of their development. In this work, we present a framework totrain game-engine-like neural models, solely from monocular annotated videos.The result-a Learnable Game Engine (LGE)-maintains states of the scene, objectsand agents in it, and enables rendering the environment from a controllableviewpoint. Similarly to a game engine, it models the logic of the game and theunderlying rules of physics, to make it possible for a user to play the game byspecifying both high- and low-level action sequences. Most captivatingly, ourLGE unlocks the director’s mode, where the game is played by plotting behindthe scenes, specifying high-level actions and goals for the agents in the formof language and desired states. This requires learning “game AI”, encapsulatedby our animation model, to navigate the scene using high-level constraints,play against an adversary, devise the strategy to win a point. The key tolearning such game AI is the exploitation of a large and diverse text corpus,collected in this work, describing detailed actions in a game and used to trainour animation model. To render the resulting state of the environment and itsagents, we use a compositional NeRF representation used in our synthesis model.To foster future research, we present newly collected, annotated and calibratedlarge-scale Tennis and Minecraft datasets. Our method significantly outperformsexisting neural video game simulators in terms of rendering quality. Besides,our LGEs unlock applications beyond capabilities of the current state of theart. Our framework, data, and models are available athttps://learnable-game-engines.github.io/lge-website.