Explanipedia

Vision-Language Matching for Text-to-Image Synthesis via Generative Adversarial Networks Open

Qingrong Cheng, Keyu Wen, Xiaodong Gu · 2022

Text-to-image synthesis aims to generate a photo-realistic and semantic consistent image from a specific text description. The images synthesized by off-the-shelf models usually contain limited components compared with the corresponding im…

A Unified Two-Stage Group Semantics Propagation and Contrastive Learning Network for Co-Saliency Detection Open

Zhenshan Tan, Cheng Chen, Keyu Wen, Yuzhuo Qin, Xiaodong Gu · 2022

Co-saliency detection (CoSOD) aims at discovering the repetitive salient objects from multiple images. Two primary challenges are group semantics extraction and noise object suppression. In this paper, we present a unified Two-stage grOup …

Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval Open

Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong Gu · 2022

Recently, the cross-modal pre-training task has been a hotspot because of its wide application in various down-streaming researches including retrieval, captioning, question answering and so on. However, exiting methods adopt a one-stream …

Learning Dual Semantic Relations With Graph Attention for Image-Text Matching Open

Keyu Wen, Xiaodong Gu, Qingrong Cheng · 2020

Image-Text Matching is one major task in cross-modal information processing.\nThe main challenge is to learn the unified visual and textual representations.\nPrevious methods that perform well on this task primarily focus on not only the\n…

Keyu Wen YOU? Author Swipe