Українська правда

ElevenLabs Enters Speech Recognition Market with Scribe, a Voice-to-Text Model

ElevenLabs Enters Speech Recognition Market with Scribe, a Voice-to-Text Model
0

ElevenLabs, the AI startup that recently enabled podcaster Lex Friedman to translate an interview with Volodymyr Zelenskyy into multiple languages, has launched its first standalone voice-to-text model, called Scribe. It’s the company’s first foray beyond audio generation, and should allow it to compete with the likes of Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Whisper in the speech recognition space, TechCrunch reports.

The launch comes shortly after ElevenLabs raised $180 million in funding, raising its valuation to $3.3 billion. Previously, the company focused primarily on text-to-speech services using a large library of synthetic voices. It now aims to leverage its expertise to improve speech recognition and transcription accuracy.

The Scribe model supports over 99 languages, with 25 languages falling into the "excellent accuracy" category, as defined by a word error rate (WER) of less than 5%. These languages include English (with a claimed accuracy of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, Vietnamese, and Ukrainian. The remaining languages have high (5%-10% WER), good (10%-20% WER), or moderate (25%-50% WER) accuracy.

The company claims that Scribe outperforms Google Gemini 2.0 Flash and Whisper Large V3 in FLEURS and Common Voice-based tests, demonstrating its competitiveness.

Scribe includes several advanced features:

  • Speaker diarization to determine who is speaking;
  • Word-level timestamps for precise synchronized subtitling;
  • Automatic tagging of sound events, such as audience laughter;
  • Direct transcript of video content for adding subtitles and captions.

Currently, Scribe only works with pre-recorded audio formats, making it unsuitable for transcribing meetings or live conversations. However, ElevenLabs plans to release a low-latency version for real-time recording in the near future.

Scribe’s development reflects ElevenLabs’ broader ambitions in voice AI. While the company initially built speech recognition components as part of its AI agent platform, this is its first standalone transcription model.

In an interview with TechCrunch, ElevenLabs CEO Mati Staniszewski emphasized the need to improve speech recognition models:

"We want to understand what’s being said by you in a conversation better. We are working on ways to move away from only generating content and understanding and transcribing speech. Many people say that speech-to-text is a solved problem. But for many languages, it is pretty bad. We think we can build better speech detection models because we have in-house teams to annotate data and give us quick feedback."

Scribe is priced at $0.40 per hour of transcribed audio. While this is a competitive rate, some competitors now offer lower prices and other functional differences. However, ElevenLabs’ strong position in audio AI and growing capabilities in speech recognition could make the company a serious player in this market.

If you're not interested in the capabilities of Scribe from ElevenLabs, we recently wrote about transcription services that work well with the Ukrainian language.

Share:
Посилання скопійовано
Advert:
Advert: