arXiv (Cornell University)
An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis
November 2023 • Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, B. Srinivasan
We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of new samples with this learned token. These methods, however, do not learn multiple tokens that are necessary to condition…