DALL · E 2 may have to cede its throne as the most impressive image-generating AI to Google, which has unveiled its own text-to-image model called Imagen.
Like OpenAI DALL · E 2, Google’s system generates images of things based on written instructions from users. Ask him out well if he is no longer absorbed in the connection.
A look at the Image website shows some of the images he has created (and Google has carefully cured), such as a blue jealousy perched on a pile of macaroni, a pair of robots enjoying wine in front of the Eiffel Tower or Image’s own name. bud of a book. According to the team, “human evaluators greatly prefer Imagen over all other models in both image and text alignment and image fidelity,” but they would say, right?
Imagen comes from Google Research’s Brain Team, which claims that AI achieved an unprecedented level of photorealism through a combination of transformer models and image diffusion. When tested with similar models, such as DALL · E 2 and VQ-GAN + CLIP, the team said that Imagen came out of the water. DrawBench, a list of 200 indications used to compare models, was created internally.
Image work, with indications … Source: Google
Image designers say their key breakthrough was in the formative stage of their model. Their work, the team said, shows how effective pre-trained language models can be as frozen as text encoders. They found that scaling this language model had a much greater impact on performance than scaling the other components of Image.
“Our observation … encourages future research directions to explore even larger language models as text encoders,” the team wrote.
Unfortunately for those hoping to crack at Imagen, the team that created it said it was not releasing its code or a public demo, for a variety of reasons.
For example, Imagen is not good for generating human faces. In experiments with images that include human faces, Imagen received only a 39.2 percent preference for human values over reference images. When human faces were removed, that number rose to 43.9 percent.
Unfortunately, Google did not provide any image-generated human images, so it is impossible to know how they compare to those generated by platforms such as This Person Does Not Exist, which uses a general adverse network to generate faces.
Aside from technical concerns, and most importantly, the creators of Image found it a bit racist and sexist even though they tried to avoid these biases.
Imagen showed “a general bias towards the generation of images of people with lighter skin tones and … portraying different professions to align with Western gender stereotypes,” the team wrote. Eliminating humans didn’t help much either: “Imagen encodes a series of social and cultural biases when it comes to generating images of activities, events, and objects.”
Like similar AI, Imagen trained in image-text pairs extracted from the Internet in publicly available data sets such as COCO and LAION-400M. Image’s team said it leaked a subset of data to eliminate noise and offensive content, although an audit of the LAION dataset “found a wide range of inappropriate content, such as pornographic images, insults. racists and harmful social stereotypes “.
The bias in machine learning is a well-known problem: Twitter image cropping and Google computer vision are just a couple that have stood out for playing on stereotypes that are encoded in the data we produce.
“There are a lot of data challenges that need to be addressed before text-to-image models like Imagen can be securely integrated into user-oriented applications. “text-to-speech generation methods for any user. tools without much care and attention to the contents of the training dataset,” said the creators of Image. ®