
Relation Rectification in Diffusion Model by Yinwei Wu et al. is a step towards solving the challenge of executing accurate visual relationships through text-to-image models. Addressing misalignments in text encoders, they innovate with a Heterogeneous Graph Convolutional Network (HGCN).
The paper pushes the limits of generative image modeling by ensuring that the visual relationships are as precise as the textual descriptions driving the synthesis, showing both qualitative and quantitative improvements.