Google has mastered caption generation in the new Imagen 4 model

Yevgeny Demkivskyi - 21 May, 12:59 PM

At Google I/O, the company announced Imagen 4, a new version of its image generation model. Google calls it "strikingly accurate" and "significantly better at reproducing text," opening up new possibilities for creating postcards, comics, posters, and other visual content.

As noted by Google DeepMind Vice President Eli Collins, Imagen 4 combines speed with accuracy and demonstrates "extraordinary clarity in fine details", including in fabric textures, water droplets or animal fur. At the same time, the model works not only in a photorealistic, but also in an abstract style. In Google's examples, it is noticeable that objects and text are clearly displayed even in small fonts.

Imagen 4 is already available in Gemini, Whisk, Vertex AI, and Workspace apps — including Google Slides, Docs, and the new Google Vids video editor. The company is also working on an accelerated version of the model that it says will run up to 10 times faster than Imagen 3.

Google positions Imagen 4 as one of the tools tightly integrated with the Gemini ecosystem — in particular, for creating illustrations in response to text queries, generating materials for presentations, or personalized content in Workspace.

By the way, this year at I/O Google also introduced a new AI Ultra plan for $250 per month, a real-time translation feature in Google Meet, and an AI tool for creating UI designs, Stitch. In addition, the company announced the integration of Gemini into the Chrome browser and talked about the development of Project Aura smart glasses based on Android XR.