Some large tech companies used transcripts of YouTube videos to train their artificial intelligence models without permission. This is evidenced by an investigation by Proof News, Engadget reports.
This is a dataset that included transcripts of more than 173 thousand YouTube videos. It was created by the non-profit company EleutherAI. The set, which contained transcripts of videos from over 48 thousand video hosting channels, was used by Apple, NVIDIA, and Anthropic, among others.
The dataset does not include any videos or images from YouTube, but contains transcripts of videos from the platform's biggest creators, including Marques Brownlee and MrBeast, as well as major news publishers such as The New York Times, BBC, and ABC News.
Meanwhile, Google stated that YouTube CEO Neil Mohan's recent warnings that companies using YouTube data to train AI models violate the terms and conditions of the service remain in force.