Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning

Abstract: This study presents a novel strategy leveraging Vision Transformers (ViT) equipped with intra-task mutual attention to significantly improve few-shot learning. This approach involves exchanging the class (CLS) token and patch tokens between support and query sets, enhancing focus on useful information for both sides, promoting stronger intra-class similarity, and minimizing variation due to external factors like background changes.
Key Findings:
- By adopting a pre-trained ViT architecture and applying a meta-learning approach to fine-tune critical parameters minimally, the method proves both effective and efficient.
- Experiments indicate superior performance over traditional baselines on multiple few-shot classification scenarios, under both 5-shot and 1-shot conditions.
Relevance and Future Directions:
- This method’s ability to reduce required tunable parameters and utilize pre-trained models effectively makes it a promising option for applications requiring few-shot learning capabilities.
- Further exploration could extend this approach to other complex tasks requiring high generalization from limited examples.
By effectively harnessing the capabilities of ViTs, this research offers significant advancements in few-shot learning arenas, posing robust solutions to traditional challenges faced in training with limited data.
Personalized AI news from scientific papers.