A novel approach, Visual AutoRegressive (VAR) modeling, has been introduced to the realm of autoregressive learning on images, shifting the method from the traditional raster-scan next-token prediction to inherently scalable image generation via next-scale prediction. VAR’s application marks the first instance where AR models excel over diffusion transformers in this domain. Read more
Key results of VAR include demonstrable improvements such as:
Given these impactful results, VAR provides a solid foundation for future AR model exploration, particularly for visual generation and unified learning.