OpenAI reveals an improved version of DALL-E program, which can create images based on text phrases. For example, visualize “a cat made of land” or “a fox sitting in a field in winter.” The DALL-E 2 has a higher resolution and lower latency than its predecessor.
The first version of DALL-E was presented in January 2021. Then the OpenAI consortium, founded by Elon Musk with the financial support of Microsoft, presented its most ambitious project – a machine learning system. This multimodal AI was able to create images (although somewhat cartoony) based on the user’s description.
DALL-E got its name from a combination of the words “Dali” (artist) and WALL-E (character of the eponymous cartoon from Pixar). The first version could generate images or combine several images into a collage. It also presented the image from different angles, taking into account the perspective, and even drew conclusions about individual elements (such as shadows) from the written description.
“Unlike 3D rendering, which requires unambiguous and detailed input, DALL-E can often fill in gaps when the caption suggests that the image may have details that are not specified,” said OpenAI team last year.
DALL-E 2, using OpenAI’s CLIP image recognition system based on its potential to generate images. Users can now select and edit specific areas of an existing image, add or remove elements along with their shadows, blend two images into one collage, and create variations of an existing image.
Furthermore, the created images have a resolution of 1024px, unlike the 256px avatars generated by the previous version. CLIP from OpenAI was designed to view images and summarize their content so that it is understandable. In working on a new system, the developers “reversed” this process, on the contrary, creating images from the description.
DALL-E was not planned as a commercial product, so its capabilities are somewhat limited. The OpenAI team works with it as a research tool. In addition, the system is deliberately limited so that it is not used for misinformation. DALL-E 2 was also protected by removing potentially unacceptable images from the training data. Products also generate a watermark on program-generated products, indicating that the image was created by AI.
The system has other protectors. It does not allow users to generate images if the description has titles – names, architectural monuments, etc. Also, everything related to naked bodies, obscenities, extremist ideologies, significant conspiracies and current events in geopolitics will not work.
Unlike the first version, which anyone could play with on the OpenAI site, the new one is currently only available for partner testing. In turn, they are limited in that they can be downloaded to or generated by DALL-E 2. Testers are also not allowed to share their work on other platforms. To try DALL-E 2 yourself, you can sign up for a queue on the developer’s website.