FSRT: Transformer-based Facial Reenactment

Inpainting

Facial Reenactment

Transformer

Latent Representation

Cross-reenactment

FSRT: Transformer-based Facial Reenactment

The paper FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features presents a transformer-based encoder-decoder model for face reenactment. The model creates a set-latent representation of a source image and predicts output color for a pixel using a transformer-based decoder conditioned with keypoints and facial expression vectors.

Proposes transformer-based encoder for computing set-latent representation.
Transformer-based decoder predicts output color of query pixel.
Self-supervised learning of latent representations factorizes appearance, head pose, and facial expressions, apt for cross-reenactment.
Method extends to multiple source images, adapting to person-specific facial dynamics.
Data augmentation and regularization schemes included to improve generalization.

In my opinion, this work is crucial for enhancing the fidelity of cross-reenactment in face animations and can lead to more personalized and expressive virtual interactions. The potential applications in digital media, telepresence, and customer services are vast, promising a more engaging user experience. Read more.

Personalized AI news from scientific papers.