Journal of King Saud University - Computer and Information Sciences • Vol 37 • No 9
HiCoS-Net: hierarchical cross-modal graph learning with dynamic attention for hard negative-aware image-text matching
October 2025 • Dingcheng Feng, Ning Luo, Shudong Zhang, Lijuan Zhou, Bing Wei
Abstract Fine-grained image-text matching, which is pivotal to multimodal intelligence, has advanced semantic correspondence inference through inter-modal region-word aggregation. Despite the efficacy of this approach, it remains limited by its inability to accommodate the semantic associations of hard negative samples. To illustrate this point, consider the failure to leverage shared knowledge across multiple samples on analogous topics. This failure results in an inadequate capacity to differentiate hard negativ…