The artificial intelligence-based system ERNIE-ViLG, which converts text into images, was developed by the Chinese company Baidu, to more accurately generate images with Chinese cultural features than existing services. This includes objects or celebrities, reports MIT Technology Review.
However, artificial intelligence refuses to show many things such as Tiananmen Square, which is China’s second largest square and a symbolic political center. In late August, when a demo version of the software was released, users noted that the artificial intelligence flagged as “sensitive” mentions of the names of political leaders or words that were potentially controversial in a political context and blocked them from generating any results.
Often these AI-based systems restrict users from creating certain types of content. For example, DALL-E 2 prohibits content of a sexual nature, images of the faces of public figures, or images of medical treatment. However, the ERNIE-ViLG case highlights exactly where the line between moderation and political censorship lies.
The ERNIE-ViLG model is part of a large-scale natural language processing project from leading Chinese company Baidu called Wenxin. It was trained on a dataset of 145 million image-text pairs and contains 10 billion parameters—values that the neural network adjusts during training, which the AI uses to recognize differences between concepts and artistic styles. This means that ERNIE-ViLG has a smaller training data set than DALL-E 2 and Stable Diffusion, but more parameters than those.
Baidu released a demo version on its own platform in late August, and then on Hugging Face, a popular international AI community. The main difference between ERNIE-ViLG and Western models is that the model developed by Baidu understands clues written in Chinese and is less likely to make mistakes when dealing with culturally specific words.
A Chinese blogger compared the results of different models and found that ERNIE-ViLG produces more accurate images. ERNIE-ViLG has also been embraced by the Japanese anime community, who have noticed that this model can produce more satisfying anime art than other models. It’s probably because it has more anime in its training data.
However, unlike DALL-E 2 or Stable Diffusion, ERNIE-ViLG has not published explanation of its content moderation policy, and Baidu declined to comment on this story. While words like “democracy” and “government” are allowed on their own, sentences in which they are combined with other words like “Middle Eastern democracy” or “British government” are blocked. Beijing’s Tiananmen Square also fails to generate in ERNIE-ViLG, probably because of its association with protests that are censored in China.
In today’s China, even social media companies usually have their own lists of sensitive words. This means that any filter used by ERNIE-ViLG is likely to be different from those used by Tencent-owned WeChat or Sina Corporation-run Weibo.