P L RAJ IAS & IPS ACADEMY - AN INSTITUTION FOR IAS, IPS AND TNPSC EXAMINATION

GENIE – SCI & TECH

News: Explained: Google DeepMind’s Genie, an AI model that creates virtual worlds from image prompts

What's in the news?

● The biggest draw of video games is the escapism or the fantasy of a world far removed from our immediate reality.

● Google DeepMind has just introduced Genie, a new model that can generate interactive video games from just a text or image prompt.

Genie AI Model:

● It is a foundation world model that is trained on videos sourced from the Internet.

● The model can “generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.”

● It is the first generative interactive environment that has been trained in an unsupervised manner from unlabelled internet videos.

Specifications:

● When it comes to size, Genie stands at 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model and a simple and scalable latent action model.

● These technical specifications let Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.

Characteristics:

● Genie can be prompted to generate a diverse set of interactive and controllable environments although it is trained on video-only data.

● It makes playable environments from a single image prompt.

● It can be prompted with images it has never seen. This includes real world photographs, sketches, allowing people to interact with their imagined virtual worlds.

● It is trained more on videos of 2D platformer games and robotics.

● Genie is trained on a general method, allowing it to function on any type of domain, and it is scalable to even larger Internet datasets.

● The standout aspect of Genie is its ability to learn and reproduce controls for in-game characters exclusively from internet videos.

● This is noteworthy because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.

● It allows you to create an entire new interactive environment from a single image.