Just as a language model predicts the next symbols in a sentence, MusicGen predicts the next part of a piece of music. 20,000 hours of licensed music were used for its training.
The model is unique as it can handle both text and musical cues. The text sets the basic style, which then matches the melody in the audio file.
Compared to other music models such as Riffusion, Mousai, MusicLM, and Noise2Music, MusicGen performs better on both objective and subjective metrics that test how well the music matches the text and how believable the composition is. Tests show that MusicGen’s performance is comparable to Google MusicLM.
Meta has released the AI model open source on Github, allowing commercial use. A demo version is available on Huggingface.
It was previously reported that Meta created a language AI model that can recognize more than 4,000 spoken languages and reproduce speech in more than 1,100 languages. This is a Massively Multilingual Speech (MMS) project and is not a clone of ChatGPT. Meta provides open access to it.