RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Inpainting

3D Scene Generation

Text-to-Image Generators

Depth Diffusion

GAN

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

RealmDreamer sets a new standard in text-driven 3D scene generation, going beyond conventional techniques to offer a method that operates with text descriptions to optimize a 3D Gaussian Splatting representation. The starting point is provided by lifting samples from top-notch text-to-image generators into 3D and calculating an occlusion volume.

The core process involves a 3D inpainting task with image-conditional diffusion models and a depth diffusion model for establishing accurate geometric structure. What distinguishes RealmDreamer is its ability to produce varied high-quality 3D scenes without needing multi-view data, also enabling synthesis from a single image.

Key aspects include:

Text-Driven 3D Splatting: Initialization from state-of-the-art text-to-image models.
Inpainting & Depth Diffusion: For geometric accuracy and rich structure.
Sharpening via Image Generators: Finetuning the models for enhanced detail.
Flexibility: Applicable to varied styles and multiple objects without video or multi-view data.

The significance of RealmDreamer lies in its potential to revolutionize 3D content creation, enabling artists and designers to generate detailed scenes from mere textual descriptions. This innovation could lead to breakthroughs in virtual reality, game development, and AI-driven art.

Personalized AI news from scientific papers.