OpenAI has created a tool to transcribe YouTube videos and collect data for GPT-4 training

Open AI artificial intelligence company glass building concept

Evgenia Gubina Редакторка розділу IT-бізнес на Mezha.Media. Загальний досвід роботи – понад 18 років. Захоплююся технологіями, космосом та обожнюю собак.

8 April 2024, 11:04 AM

To do this, OpenAI researchers created a speech recognition tool called Whisper. It was able to transcribe audio from YouTube videos, producing spoken text.

OpenAI took this action after the company faced a problem with the supply of training data at the end of 2021. It had exhausted the available materials, but still needed a large amount of data.

According to knowledgeable sources, some OpenAI employees discussed how transcribing videos and using the resulting texts could be against YouTube's rules.

But in the end, the OpenAI team decrypted more than 1 million hours of YouTube videos and uploaded the resulting texts to GPT-4. It is noteworthy that OpenAI president Greg Brockman personally helped collect the videos, according to informed sources.

Recently, YouTube CEO Neil Mohan said in an interview with Bloomberg that using videos from OpenAI's Sora AI training platform would be a violation of YouTube's terms of service.

As you know, creating innovative systems depends on having enough data to train the technology to instantly create text, images, sounds, and videos that resemble what humans create.

Advert:

OpenAI has created a tool to transcribe YouTube videos and collect data for GPT-4 training

Top Discussion

Latest News

Partner news