VASA: Microsoft AI generates "talking face" from photo and voice recording

Microsoft has developed a video generator that creates a video from a photo and an audio recording of speech, but it's not planned to be released to the public.

Save to Pocket listen Print view
Mehrere fotorealistisch erscheinende Bilder von Gesichtern mit starker Mimik

(Bild: Microsoft Asia)

2 min. read
This article was originally published in German and has been automatically translated.

A research team at Microsoft has developed an AI tool that can generate amazingly realistic video clips from a photo and a voice recording, in which the photo appears to be speaking. They call the framework VASA, with the first version dubbed VASA-1, which refers to the "visual affective skills" of the generated avatars. The tool is not only able to create valuable synchronization between lips and sound, but can also simulate a wide range of expressive facial expressions and natural head movements. VASA can already handle audio files of any length and seamlessly generate talking videos of faces on a PC with an Nvidia RTX 4090.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externes Video (Kaltura Inc.) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Kaltura Inc.) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

On a project page, Microsoft Asia employees have compiled a whole series of examples to demonstrate the tool's capabilities. Numerous square videos of different faces expressively reciting different texts can be seen. The team assures that all portraits are virtual, non-existent AI-generated representations - the only exception is an animation of Leonardo da Vinci's Mona Lisa. For some videos, there are juxtapositions of how the generated face recites the text with different emotions. Another example shows three female faces speaking a text in complete synchronization.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externes Video (Kaltura Inc.) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Kaltura Inc.) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

The aim of the research project is to develop a technique for animating photorealistic avatars in real time, the team writes. However, they admit that the technology can be misused to impersonate real people. Despite this, the group is convinced that the potential benefits justify the research. For these reasons, there are currently no plans to publish an online demo, provide development access or even release a product based on it. This will only be tackled once they can be sure that the technology will only be used responsibly. For now, they just wanted to present the research.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmmung wird hier ein externes Video (Kaltura Inc.) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (Kaltura Inc.) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(mho)