Google trains its AI models on YouTube content without the authors' consent — CNBC

Мирослав Трінько - 20 June, 09:01 AM

Google is actively using content from YouTube's vast library to train its own artificial intelligence models, such as Gemini and the new video and audio generator Veo 3. This is reported by CNBC, citing its own sources.

One of the sources told the publication that a selection from a catalog of 20 billion videos is used for training. Google confirmed this information, but clarified that this is only a part of the content and within the framework of agreements with creators and media companies.

A YouTube representative explained that the company has always used its own content to improve its services — the emergence of generative AI has not changed that. "We understand the importance of safeguards, which is why we have developed robust protection mechanisms for creators," the company noted.

But experts fear copyright implications. They believe that using other people's videos to train AI without the creators' knowledge could cause an intellectual property crisis. Although YouTube says it has previously disclosed this, most creators were not even aware that their content was being used for training.

Google doesn't disclose how many videos it used to train its models, but even if it's 1% of its library, that's more than 2.3 billion minutes of content — 40 times more than its competitors.

By uploading videos, creators give YouTube broad permission to use the content. However, there is no option to opt out of training their videos for Google's own models.

Digital rights groups say that years of work by creators are being used to develop AI without compensation or even notice. For example, Vermillio has created a service called Trace ID that determines the similarity of AI-generated videos to the original content. In some cases, the match has reached over 90%.

Some creators are not opposed to training on their content, viewing the new tools as an opportunity for experimentation. But most believe the situation is opaque and needs clearer rules.

YouTube even struck a deal with Creative Artists Agency to develop a system to manage AI content that mimics famous people. However, the mechanisms for removing or tracking similar content are still imperfect.

Meanwhile, there are already calls in the US to provide authors with legal protections that would allow them to control the use of their work in the world of generative AI.

As a reminder, Google recently changed its internal content moderation rules on YouTube - now videos that partially violate the rules can remain online if they are considered socially important.