Skeleton
Subscribe
Deepfake Detection
Audio-Visual Fusion
Dynamic Weight Fusion
Transformers
AVT2-DWF: Improving Deepfake Detection Through Audio-Visual Fusion

The paper introduces AVT2-DWF, a framework utilizing Audio-Visual dual Transformers with Dynamic Weight Fusion, aimed at enhancing detection capabilities against deepfake methods that adapt from single to multimodal fusions. Summary points:

  • Dual transformers capture spatial and temporal dynamics.
  • Uses a face transformer with an n-frame-wise tokenization strategy.
  • Incorporates audio transformer encoders.
  • Employs dynamic weight fusion for information fusion between audio and visual.
Personalized AI news from scientific papers.