ADVERTISEMENT
  • Campus
  • Happening
  • Opinion
  • People
  • News
  • #BeInspired
  • Careers
  • 40 under 40
  • Exams
  • What The FAQ
  • Videos
    • Straight Up
    • Odisha Literary Festival 2020
    • Campus Convo
    • Careers After Corona
    • Express Expressions
    • Q&A With Prabhu Chawla
    • ThinkEdu Awards 2020
ADVERTISEMENT
University of Washington

Published: 12th July 2017     

In a first, scientists develop new tool that turns audio clips into realistic videos of people speaking

The system converts audio files of an individual's speech into realistic mouth shapes

Edex Live
Edex Live

Share Via Email

Obama-photos-750x229

The researchers generated a video of Barack Obama talking about terrorism, fatherhood and job creation

Scientists have developed new computer algorithms that can turn audio clips into a realistic, lip-synced video of the person speaking those words. The researchers successfully generated a highly-realistic video of former US President Barack Obama talking about terrorism, fatherhood, job creation and other topics, using audio clips of those speeches and existing weekly video addresses that were originally on a different topic.
    

"These type of results have never been shown before," said Ira Kemelmacher-Shlizerman, an assistant professor at the University of Washington (UW) in the US. "Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio," said Kemelmacher- Shlizerman.


Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings, as well as futuristic ones such as being able to hold a conversation with a historical figure in virtual reality by creating visuals just from audio

Kemelmacher- Shlizerman, Professor, University of Washington



In a visual form of lip-syncing, the system converts audio files of an individual's speech into realistic mouth shapes, which are then grafted onto and blended with the head of that person from another existing video. The team chose Obama because the machine learning technique needs an available video of the person to learn from, and there were hours of presidential videos in the public domain.
    

"In the future video, chat tools like Skype or Messenger will enable anyone to collect videos that could be used to train computer models," Kemelmacher-Shlizerman said. Because streaming audio over the internet takes up far less bandwidth than video, the new system has the potential to end video chats that are constantly timing out from poor connections.
    

"When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good," said Steve Seitz, a professor at UW. "So if you could use the audio to produce much higher- quality video, that would be terrific," he said. By reversing the process - feeding video into the network instead of just audio - the team could also potentially develop algorithms that could detect whether a video is real or manufactured, researchers said.


When you watch Skype or Google Hangouts, often the connection is stuttery and low-resolution and really unpleasant, but often the audio is pretty good. So if you could use the audio to produce much higher- quality video, that would be terrific.

Steve Seitz, Professor, University of Washington


The new machine learning tool makes significant progress in overcoming what is known as the "uncanny valley" problem, which has dogged efforts to create realistic video from audio. When synthesised human likenesses appear to be almost real - but still manage to somehow miss the mark - people find them creepy or off-putting. "People are particularly sensitive to any areas of your mouth that don't look realistic," said Supasorn Suwajanakorn, a doctoral graduate of UW's Allen School of Computer Science
& Engineering.
    

"If you do not render teeth right or the chin moves at the wrong time, people can spot it right away and it is going to look fake. So you have to render the mouth region perfectly to get beyond the uncanny valley," Suwajanakorn said.

telegram
TAGS
University of Washington lip-synching computer algorithmsaudio-to-video conversion

O
P
E
N

ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT
telegram
ADVERTISEMENT
Write to us!

If you have campus news, views, works of art, photos or just want to reach out to us, just drop us a line.

Mailbox
edexlive@gmail.com
Facebook
Twitter
Instagram
ADVERTISEMENT
Facebook
ADVERTISEMENT
Tweets by Xpress_edex
ADVERTISEMENT
ADVERTISEMENT

FOLLOW US

Copyright - edexlive.com 2021

The New Indian Express | Dinamani | Kannada Prabha | Samakalika Malayalam | Indulgexpress | Cinema Express | Event Xpress

Contact Us | About Us | Privacy Policy | Terms of Use | Advertise With Us

Home | Live Now | Live Story | Campus Trip | Coach Calling | Live Take