This is Image, Google’s AI project to compete with Dalle-2


In recent years, the digital ecosystem has been in constant evolution, and one of the fields that has been most involved in this progress has been that of AI or artificial intelligence. This disruptive technology has found a place in many spaces of the digital ecosystem, optimizing automation, interpretation or data recovery functions, among many others.

Recently, the development of this technology has taken a more visual approach with the presentation of Dalle-2, a project we talked about recently. This is a system designed to generate images from descriptions written by users, with highly realistic results. The release of this AI has caused quite a stir in the digital realm, prompting a response from Google, which has launched its own AI project around this line of development.

Image, Google’s new AI project that creates images from text

The tech giant has presented its AI project as a text-to-image broadcast model, which poses a alternative to OpenAI’s AI, Dalle-2. Imagewhich is the name of this project, offers the opportunity to create photorealistic images from descriptive texts.

“A wall in a royal castle. There are two pictures on the wall. The one on the left is a detailed oil painting of the royal raccoon king. The one on the right is a detailed oil painting of the raccoon queen.” / Source: Image

Image is built from large linguistic models to achieve the deep understanding of language that its technology achieves. Likewise, it works in conjunction with diffusion models to create images with a higher level of fidelity. That is, images that adapt more efficiently to the descriptions made in the text.

“A Pomeranian is sitting on the king’s throne wearing a crown. Two tiger soldiers are standing by the throne.” / Source: Image

According to Google statements, one of the main discoveries they have made in the development of Image is that integration of extensive language models increases fidelity and image-text alignment, much more than increasing the size of the image diffusion model. Thus, large generic language models, which are pre-trained in the interpretation of corpus of texts (series of structured language fragments), are ideal for the functions of this AI project. In addition, they allow to obtain high quality images and relationship with the text like the ones we have seen.

“A cute corgi lives in a house made of sushi.” / Source: Image

On the other hand, along with the presentation of these research results that we have mentioned, Google also published a DrawBench. Its purpose is to serve as a guide and comparison, being a benchmark for Image acceptance, compared to other AI engines for creating images from text on the market, such as Dalle-2 and VQ-GAN+CLIP. . According to Google, the results of this comparison reveal that human testers prefer Image over these other modelsboth in terms of image quality and text-image alignment.

Source: Google

A closed project, for the moment

Both Google and OpenAI are aware of the wide possibilities of use, and misuse, that these new diffusion models can offer. For this reason, they have dedicated themselves to working in a closed manner on their respective technologies. In the case of Dalle-2 it is possible to request to test the AI ​​from a form, while with Image have decided to keep it completely isolated. Therefore, it has been decided not to publish any kind of code or public demonstration about the generation of the images.

That said, the only direct interaction with Image generation is through a short demo present on your presentation page. In addition to the posts shared by Jeff Dean, Vice President of Google, and other members of the research team.

Image: Image (“The Toronto skyline with Google brain logo written in fireworks”)

Stay informed of the most relevant news on our Telegram channel