In the video below, the team has grafted George Bush’s speech and mannerisms onto several other notable persons’ faces, with an astounding, if somewhat creepy, result:
It’s one step toward a grand goal: create fully interactive, 3D digital personas from family photo albums and videos, historic collections, or other existing visuals.
LEARNING ‘IN THE WILD’
As virtual and augmented reality technologies develop, researchers envision using family photographs and videos to create an interactive model of a relative living overseas or a far-away grandparent, rather than simply Skyping in two dimensions.
“You might one day be able to put on a pair of augmented reality glasses and there is a 3D model of your mother on the couch,” says senior author Kemelmacher-Shlizerman. “Such technology doesn’t exist yet—the display technology is moving forward really fast—but how do you actually re-create your mother in three dimensions?”
One day the reconstruction technology could even be taken a step further.
“Imagine being able to have a conversation with anyone you can’t actually get to meet in person—LeBron James, Barack Obama, Charlie Chaplin—and interact with them,” says coauthor Steve Seitz, professor of computer science and engineering. “We’re trying to get there through a series of research steps. One of the true tests is can you have them say things that they didn’t say but it still feels like them? This paper is demonstrating that ability.”
Existing technologies to create detailed 3D holograms or digital movie characters like Benjamin Button often rely on bringing a person into an elaborate studio. They painstakingly capture every angle of the person and the way they move—something that can’t be done in a living room.
Other approaches still require a person to be scanned by a camera to create basic avatars for video games or other virtual environments. But computer vision experts wanted to digitally reconstruct a person based solely on a random collection of existing images.
To reconstruct celebrities like Tom Hanks, Barack Obama, and Daniel Craig, the machine learning algorithms mined a minimum of 200 internet images taken over time in various scenarios and poses—a process known as learning “in the wild.”
“We asked, ‘Can you take internet photos or your personal photo collection and animate a model without having that person interact with a camera?’” says Kemelmacher-Shlizerman. “Over the years we created algorithms that work with this kind of unconstrained data, which is a big deal.”
Suwajanakorn more recently developed techniques to capture expression-dependent textures—small differences that occur when a person smiles or looks puzzled or moves his or her mouth, for example.
By manipulating the lighting conditions across different photographs, he developed a new approach to densely map the differences from one person’s features and expressions onto another person’s face. That breakthrough enables the team to “control” the digital model with a video of another person, and could potentially enable a host of new animation and virtual reality applications.
“How do you map one person’s performance onto someone else’s face without losing their identity?” asks Seitz. “That’s one of the more interesting aspects of this work. We’ve shown you can have George Bush’s expressions and mouth and movements, but it still looks like George Clooney.”
The research, presented this week at the International Conference on Computer Vision in Chile, was funded by Samsung, Google, Intel, and the University of Washington.