GENIE – SCI & TECH

News: Explained: Google DeepMind’s Genie, an AI model that creates virtual worlds from image prompts

 

What's in the news?

       The biggest draw of video games is the escapism or the fantasy of a world far removed from our immediate reality.

       Google DeepMind has just introduced Genie, a new model that can generate interactive video games from just a text or image prompt.

 

Genie AI Model:

       It is a foundation world model that is trained on videos sourced from the Internet.

       The model can “generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.”

       It is the first generative interactive environment that has been trained in an unsupervised manner from unlabelled internet videos.

 

Specifications:

       When it comes to size, Genie stands at 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model and a simple and scalable latent action model.

       These technical specifications let Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.

 

Characteristics:

       Genie can be prompted to generate a diverse set of interactive and controllable environments although it is trained on video-only data.

       It makes playable environments from a single image prompt.

       It can be prompted with images it has never seen. This includes real world photographs, sketches, allowing people to interact with their imagined virtual worlds.

       It is trained more on videos of 2D platformer games and robotics.

       Genie is trained on a general method, allowing it to function on any type of domain, and it is scalable to even larger Internet datasets.

       The standout aspect of Genie is its ability to learn and reproduce controls for in-game characters exclusively from internet videos.

       This is noteworthy because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.

       It allows you to create an entire new interactive environment from a single image.