The quality of AI-generated voices has improved rapidly in recent years, but there are still aspects of human speech that cannot be synthetically imitated. Of course, AI actors can voice presentations and commercials, but more complex productions — such as a convincing performance of Hamlet — remain out of reach.
Voice AI startup Sonantic says it has created a synthetic voice that can express subtleties like teasing and flirting. The company says the key to its success is incorporating non-speech sounds into the audio, teaching its AI models to reproduce the small pauses in breathing (tiny scoffs and half-hidden chuckles) that give real speech a stamp of biological authenticity.
“We chose love as a general theme,” Sonantic co-founder and CTO John Flynn tells The Verge. “But our research goal was to see if we could model subtle emotions. Bigger emotions are a little easier to capture.”
Sonantic CEO Zeena Qureshi describes the company’s software as “Photoshop for voice.” Its interface lets users type out the speech they want to synthesize, specify the mood of the delivery, and then select from a cast of AI voices, most of which are copied from real human actors. This is by no means a unique offering (rivals like Descript sell similar packages) but Sonantic says its level of customization is more in-depth than that of rivals’.
Emotional choices for delivery include anger, fear, sadness, happiness, and joy, and, with this week’s update, flirtatious, coy, teasing, and boasting. A “director mode” allows for even more tweaking: the pitch of a voice can be adjusted, the intensity of delivery dialed up or down, and those little non-speech vocalizations like laughs and breaths inserted.
Sonantic’s choice of a female voice was inspired by Spike Jonze’s 2013 film Her, where the main character falls in love with an AI assistant, Samantha. The company said it recognizes the ethical issues that accompany the development of new technologies and that it is careful about how and where it uses its AI voices.