
This paper introduces OT-Attack, an Optimal Transport-based Adversarial Attack, to enhance the transferability of adversarial examples in vision-language models. By optimally mapping features of image and text sets, OT-Attack improves the generation of adversarial examples, addressing issues of overfitting to source models. Highlights of this method include:
The introduction of OT-Attack marks a significant advancement in the field of adversarial machine learning by effectively tackling the overfitting problem and opening new possibilities for robust multimodal model evaluations.