29 Jul 2020 | Justus Thies1 Michael Zollhöfer2 Marc Stamminger1 Christian Theobalt2 Matthias Nießner3
The paper presents a novel approach for real-time facial reenactment of monocular target video sequences, such as those from YouTube. The method aims to animate the facial expressions of a source actor, captured live with a commodity webcam, onto a target video in a photo-realistic manner. Key contributions include:
1. **Non-rigid Model-Based Bundling**: A global non-rigid model-based bundling approach to recover the shape identity of the target actor using a prerecorded training sequence.
2. **Accurate Tracking and Estimation**: Dense photometric consistency tracking for both source and target videos, ensuring accurate appearance and lighting estimation.
3. **Expression Transfer**: Efficient deformation transfer between the source and target using subspace deformations, preserving person-specific expressions.
4. **Mouth Synthesis**: A novel image-based mouth synthesis approach that generates realistic mouth interiors by retrieving and warping best-matching mouth shapes from the target sequence.
The method is demonstrated in a live setup, where a source video stream is used to manipulate a target YouTube video in real-time. The paper compares the proposed method to state-of-the-art reenactment techniques, showing superior results in terms of video quality and runtime. The system is designed to facilitate applications in VR/AR, teleconferencing, and on-the-fly dubbing of videos with translated audio.The paper presents a novel approach for real-time facial reenactment of monocular target video sequences, such as those from YouTube. The method aims to animate the facial expressions of a source actor, captured live with a commodity webcam, onto a target video in a photo-realistic manner. Key contributions include:
1. **Non-rigid Model-Based Bundling**: A global non-rigid model-based bundling approach to recover the shape identity of the target actor using a prerecorded training sequence.
2. **Accurate Tracking and Estimation**: Dense photometric consistency tracking for both source and target videos, ensuring accurate appearance and lighting estimation.
3. **Expression Transfer**: Efficient deformation transfer between the source and target using subspace deformations, preserving person-specific expressions.
4. **Mouth Synthesis**: A novel image-based mouth synthesis approach that generates realistic mouth interiors by retrieving and warping best-matching mouth shapes from the target sequence.
The method is demonstrated in a live setup, where a source video stream is used to manipulate a target YouTube video in real-time. The paper compares the proposed method to state-of-the-art reenactment techniques, showing superior results in terms of video quality and runtime. The system is designed to facilitate applications in VR/AR, teleconferencing, and on-the-fly dubbing of videos with translated audio.