Explanipedia

AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions Open

Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru , et al. · 2018

International audience

TVQA: Localized, Compositional Video Question Answering Open

Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg · 2018

Computer science Economics

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA data…

[Paper] Visual Instance Retrieval with Deep Convolutional Networks Open

Ali Sharif Razavian, Josephine Sullivan, Stefan Carlsson, Atsuto Maki · 2016

Computer science Economics

This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient p…

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval Open

Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy–Chowdhury · 2018

Computer science Sociology Economics

Constructing a joint representation invariant across different modalities (e.g., video, language) is of significant importance in many multimedia applications. While there are a number of recent successes in developing effective image-text…

Use What You Have: Video Retrieval Using Representations From Collaborative Experts Open

Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman · 2019

Computer science

The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild' vary a lot in terms of degree of specificity, w…

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research Open

Atousa Torabi, Christopher Pal, Hugo Larochelle, Aaron Courville · 2015

Computer science

In this work, we introduce a dataset of video annotated with high quality natural language phrases describing the visual content in a given segment of time. Our dataset is based on the Descriptive Video Service (DVS) that is now encoded on…

Interactive Video Retrieval in the Age of Deep Learning – Detailed Evaluation of VBS 2019 Open

Luca Rossetto, Ralph Gasser, Jakub Lokoč, Werner Bailer, Klaus Schoeffmann , et al. · 2020

Computer science Political science Biology

Despite the fact that automatic content analysis has made remarkable progress over the last decade - mainly due to significant advances in machine learning - interactive video retrieval is still a very challenging problem, with an increasi…

Methods and Challenges in Shot Boundary Detection: A Review Open

Sadiq H. Abdulhussain, Abd Rahman Ramli, M. Iqbal Saripan, Basheera M. Mahmmod, S. A. R. Al-Haddad , et al. · 2018

Computer science Chemistry Mathematics

The recent increase in the number of videos available in cyberspace is due to the availability of multimedia devices, highly developed communication technologies, and low-cost storage devices. These videos are simply stored in databases th…

V3C1 Dataset Open

Fabian Berns, Luca Rossetto, Klaus Schoeffmann, Christian Beecks, George Awad · 2019

Computer science Chemistry Philosophy

In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality an…

TVQA: Localized, Compositional Video Question Answering Open

Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg · 2018

Computer science Economics

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a large-scale video QA data…

Combination of Multiple Global Descriptors for Image Retrieval Open

HeeJae Jun, Byungsoo Ko, Youngjoon Kim, Insik Kim, Jongtack Kim · 2019

Computer science Geography Economics

Recent studies in image retrieval task have shown that ensembling different models and combining multiple global descriptors lead to performance improvement. However, training different models for the ensemble is not only difficult but als…

Temporal Matching Kernel with Explicit Feature Maps Open

Sébastien Poullot, Tsukatani Shunsuke, Phuong Anh Nguyen, Hervé Jeǵou, Shin’ichi Satoh · 2015

Computer science Philosophy Mathematics

International audience

Use What You Have: Video retrieval using representations from collaborative experts. Open

Yang Liu, Samuel Albanie, Arsha Nagrani, Andrew Zisserman · 2019

Computer science Political science Materials science

The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild' vary a lot in terms of degree of specificity, w…

Latent topics-based relevance feedback for video retrieval Open

Rubén Fernández-Beltrán, Filiberto Pla · 2015

Computer science Mathematics Biology

This paper presents a novel Content-Based Video Retrieval approach in order to cope with the semantic gap challenge by means of latent topics. Firstly, a supervised topic model is proposed to transform the classical retrieval approach into…

DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval Open

Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, Ioannis Kompatsiaris, Ioannis Patras · 2022

Computer science

In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal …

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Open

Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu , et al. · 2023

Computer science Physics Economics

This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 m…

Exploiting Visual Semantic Reasoning for Video-Text Retrieval Open

Zerun Feng, Zhimin Zeng, Caili Guo, Zheng Li · 2020

Computer science

Video retrieval is a challenging research topic bridging the vision and language areas and has attracted broad attention in recent years. Previous works have been devoted to representing videos by directly encoding from frame-level feature…

Evaluating Performance and Trends in Interactive Video Retrieval: Insights From the 12th VBS Competition Open

Lucia Vadicamo, Rahel Arnold, Werner Bailer, Fabio Carrara, Cathal Gurrin , et al. · 2024

Computer science Biology Business

<p>This paper conducts a thorough examination of the 12th Video Browser Showdown (VBS) competition, a well-established international benchmarking campaign for interactive video search systems. <br>The annual VBS competitio…

The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval Open

Giuseppe Amato, Paolo Bolettieri, Fabio Carrara, Franca Debole, Fabrizio Falchi , et al. · 2021

Computer science Mathematics

This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationship…

Activity Image-to-Video Retrieval by Disentangling Appearance and Motion Open

Liu Liu, Jiangtong Li, Li Niu, Ruicong Xu, Liqing Zhang · 2021

Computer science Geography Economics

With the rapid emergence of video data, image-to-video retrieval has attracted much attention. There are two types of image-to-video retrieval: instance-based and activity-based. The former task aims to retrieve videos containing the same …

TRECVID 2019: An evaluation campaign to benchmark video activity detection, video captioning and matching, and video search & retrieval: Open

George Awad, Asad A. Butt, K.M. Curtis, Y. Lee, J. Fiscus , et al. · 2020

Computer science Geography Mathematics

The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of informati…

A distributed Content-Based Video Retrieval system for large datasets Open

El Mehdi Saoudi, Said Jai Andaloussi · 2021

Computer science Geography

With the rapid growth in the amount of video data, efficient video indexing and retrieval methods have become one of the most critical challenges in multimedia management. For this purpose, Content-Based Video Retrieval (CBVR) is nowadays …

Multi-Stage Queries and Temporal Scoring in Vitrivr Open

Silvan Heller, Loris Sauter, Heiko Schuldt, Luca Rossetto · 2020

Computer science Philosophy Mathematics

The increase in multimedia data brings many challenges for retrieval systems, not only in terms of storage and processing requirements but also with respect to query formulation and retrieval models. Querying approaches which work well up …

Semantic Reasoning in Zero Example Video Event Retrieval Open

Maaike de Boer, Yijie Lu, Hao Zhang, Klamer Schutte, Chong‐Wah Ngo , et al. · 2017

Computer science Philosophy Physics

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial fo…

Content-based Video Indexing and Retrieval Using Corr-LDA Open

Rahul Radhakrishnan Iyer, Sanjeel Parekh, Vikas Mohandoss, Anush Ramsurat, Bhiksha Raj , et al. · 2016

Computer science Mathematics

Existing video indexing and retrieval methods on popular web-based multimedia sharing websites are based on user-provided sparse tagging. This paper proposes a very specific way of searching for video clips, based on the content of the vid…

ECO: Efficient Convolutional Network for Online Video Understanding Open

Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox · 2018

Computer science Economics

The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While ther…

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions Open

Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru , et al. · 2017

Computer science Geography Physics

This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resul…

Visual Consensus Modeling for Video-Text Retrieval Open

Shuqiang Cao, Bairui Wang, Wei Zhang, Lin Ma · 2022

Computer science Economics Geography

In this paper, we propose a novel method to mine the commonsense knowledge shared between the video and text modalities for video-text retrieval, namely visual consensus modeling. Different from the existing works, which learn the video an…

A Proposal-Based Approach for Activity Image-to-Video Retrieval Open

Ruicong Xu, Li Niu, Jianfu Zhang, Liqing Zhang · 2020

Computer science Economics

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper…

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains Open

George Awad, Asad A. Butt, K.M. Curtis, Jonathan G. Fiscus, Afzal Godil , et al. · 2021

Computer science

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital v…