arXiv (Cornell University)
Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks
August 2022 • Qingrong Cheng, Keyu Wen, Xiaodong Gu
Text-to-image synthesis aims to generate a photo-realistic and semantic consistent image from a specific text description. The images synthesized by off-the-shelf models usually contain limited components compared with the corresponding image and text description, which decreases the image quality and the textual-visual consistency. To address this issue, we propose a novel Vision-Language Matching strategy for text-to-image synthesis, named VLMGAN*, which introduces a dual vision-language matching mechanism to st…