Daily updates vision
Subscribe
Image Generation
Autoregressive Models
Visual Transformers
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

  • LlamaGen introduces a new family of image generation models using the next-token prediction paradigm.
  • Image tokenizer with downsample ratio of 16 and reconstruction quality of 0.94 rFID on ImageNet benchmark.
  • Class-conditional image generation models ranging from 111M to 3.1B parameters achieving 2.18 FID on ImageNet 256x256 benchmarks.
  • Release of text-conditional image generation model with 775M parameters and optimization for inference speed.

This paper is important as it showcases the capabilities of autoregressive models in visual generation, paving the way for further research in multimodal models and large-scale image generation. Further research can explore the application of autoregressive models in other visual tasks and improve training efficiency.

Personalized AI news from scientific papers.