Last week, Meta announced its own EnCodec audio compression method. It contains the work of artificial intelligence, in which it is possible to achieve up to ten times more compression than a 64 Kbit/s mp3 without losing quality. Meta indicates that such technology will significantly improve the quality of voice transmission in poor communication (for example, phone calls with a weak signal). But it can also work with music.
The technology was presented in the paper called High Fidelity Neural Audio Compression by Meta workers: Alexandre Défossez, Jade Copet, Gabriel Synnaeve and Yossi Adi. It was also reported on corporate blog.
The company describes a three-part system. First, the data is compressed to a latent space with a lower frequency. After that, the “quantizer” compresses the file to the specified size, while keeping the most important data, which will be used later by the third part. The received file is then transmitted over the Network or stored in the drive. And after that, it is decoded in real time, where the neural network plays the audio file using only the CPU.
The key to lossy compression is to identify changes that will not be perceivable by humans, as perfect reconstruction is impossible at low bit rates. To do so, we use discriminators to improve the perceptual quality of the generated samples. This creates a cat-and-mouse game where the discriminator’s job is to differentiate between real samples and reconstructed samples. The compression model attempts to generate samples to fool the discriminators by pushing the reconstructed samples to be more perceptually similar to the original samples.
It is worth noting here that the use of neural networks for sound processing is not something new, especially for voice recordings. But the Meta team claims that they have used the technology for the first time for the 48kHz sampling rate, which is most often used for music files.
In terms of usage, Meta says this technology can significantly improve call quality in weak connection conditions. And, of course, EnCodec can be used in metaverses, providing a “rich metaverse experiences without requiring major bandwidth improvements.” Over time, it may also reduce the size of music files.
Currently, the EnCodec technology remains at the research stage.