Meta has created an artificial intelligence language model that can recognize more than 4,000 spoken languages and reproduce speech in more than 1,100 languages. This is reported by Engadget.
This is a Massively Multilingual Speech (MMS) project and is not a clone of ChatGPT. Meta provides open access to it.
“Today, we are publicly sharing our models and code so that others in the research community can build upon our work,” the company wrote. “Through this work, we hope to make a small contribution to preserve the incredible language diversity of the world.”
Meta noted that speech recognition and text-to-speech models typically require training on thousands of hours of audio recordings with accompanying transcription labels. But for languages that are not widely spoken in industrialized countries, this data simply does not exist.
With this in mind, the company resorted to an unconventional approach to data collection – listening to audio recordings of translated religious texts. This made it possible to significantly increase the number of languages available for the model.
“We turned to religious texts, such as the Bible, that have been translated in many different languages and whose translations have been widely studied for text-based language translation research,” the company said. “These translations have publicly available audio recordings of people reading these texts in different languages.”
To make the data more usable, Meta used the wav2vec 2.0 self-supervised language representation learning model, which can train on unlabeled data. The combination of non-traditional data sources and a self-directed speech model led to high results.
“Our results show that the Massively Multilingual Speech models perform well compared with existing models and cover 10 times as many languages.” Specifically, Meta compared MMS to OpenAI’s Whisper, and it exceeded expectations. “We found that models trained on the Massively Multilingual Speech data achieve half the word error rate, but Massively Multilingual Speech covers 11 times more languages.”
At the same time, the company warns that its new models are not perfect. For example, there is a risk that the speech-to-text model may incorrectly translate individual words or phrases.
Earlier it was reported that EU regulatory bodies imposed a record $1.3 billion (€1.2 billion) fine on Meta and ordered it to stop transferring EU citizens’ data from Facebook to the US. According to the EU courts, such data transfer exposes EU citizens to the danger of violating their private life.