Exploring foci of:
arXiv (Cornell University)
Enhanced Multimodal Video Retrieval System: Integrating Query Expansion and Cross-modal Temporal Event Retrieval
December 2025 • Vo Van Thinh, Nguyen Minh Khoi, Tran Minh Huy, Nguyen-Tran, Anh-Quan, Nguyen Duy Tan, Nguyen Khanh Loi, Phan, Anh-Minh
Multimedia information retrieval from videos remains a challenging problem. While recent systems have advanced multimodal search through semantic, object, and OCR queries - and can retrieve temporally consecutive scenes - they often rely on a single query modality for an entire sequence, limiting robustness in complex temporal contexts. To overcome this, we propose a cross-modal temporal event retrieval framework that enables different query modalities to describe distinct scenes within a sequence. To determine de…
Computer Science
Artificial Intelligence
Thresholding (Image Processing)
Visualization (Graphics)
Computer Vision
Machine Learning