Yuchen Duan and collaborators present the Vision-RWKV (VRWKV) model in their paper Efficient and Scalable Visual Perception with RWKV-Like Architectures. By adopting the RWKV design from NLP, they address issues with transformers in computer vision such as computational complexity for high-resolution images.
This research stands out as it offers a solution for the computational challenges faced in applying transformers to computer vision tasks, potentially opening up usage in more resource-constrained environments. Find the code on GitHub.