An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis

Exploring foci of: arXiv (Cornell University) An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis November 2023 • Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, B. Srinivasan We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of new samples with this learned token. These methods, however, do not learn multiple tokens that are necessary to condition… Open Article Page

Computer Science Artificial Intelligence Process (Computing) Algorithm Mathematics Programming Language Paleontology Biology Open Article