In the paper FasterViT: Fast Vision Transformers with Hierarchical Attention, a new hybrid CNN-ViT architecture named FasterViT is presented, aiming to maximize image throughput for CV applications. It leverages Hierarchical Attention (HAT) to lower computational demands while ensuring efficient local and global representation learning.
Highlights of FasterViT:
Opinion: FasterViT is a game-changer for resource-sensitive CV tasks by achieving a stellar balance between accuracy and throughput. Future work could focus on refining the HAT approach to enhance image processing speeds without compromising accuracy.
A deeper dive into FasterViT can be found here.