Alibaba has unveiled a new AI model that can generate videos of people using “movie-level” audio. Wan2.2-S2V has 14 billion parameters and is available as open source on GitHub and other platforms.
The new model is capable of generating high-quality video from a single image or audio clip. The Wan2.2-S2V has versatile character animation capabilities that allow you to create videos with various framing options, including portrait, bust, and full perspective.
Alibaba says the model can dynamically generate character actions and environmental factors based on quick instructions. The finished videos can be in 480 or 720p resolution.
Wan2.2-S2V combines global text-driven motion control with small, local motions driven by sound, allowing for more natural-looking characters even in challenging situations.
The Chinese company notes that another key breakthrough of the model is its innovative frame processing technique. The model compresses frames of arbitrary length into a single compact presentation, which significantly reduces the computing power requirement. At the same time, the company does not specify how long the videos can be generated.