Sofian Chaybouti
YOU?
Author Swipe
View article: REVEAL: Relation-based Video Representation Learning for Video-Question-Answering
REVEAL: Relation-based Video Representation Learning for Video-Question-Answering Open
Video-Question-Answering (VideoQA) comprises the capturing of complex visual relation changes over time, remaining a challenge even for advanced Video Language Models (VLM), i.a., because of the need to represent the visual content to a re…
View article: MaskInversion: Localized Embeddings via Optimization of Explainability Maps
MaskInversion: Localized Embeddings via Optimization of Explainability Maps Open
Vision-language foundation models such as CLIP have achieved tremendous results in global vision-language alignment, but still show some limitations in creating representations for specific image regions. % To address this problem, we prop…
View article: LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity Open
Vision Transformers (ViTs), with their ability to model long-range dependencies through self-attention mechanisms, have become a standard architecture in computer vision. However, the interpretability of these models remains a challenge. T…
View article: EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System
EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System Open
State-of-the-art extractive question-answering models achieve superhuman performances on the SQuAD benchmark. Yet, they are unreasonably heavy and need expensive GPU computing to answer questions in a reasonable time. Thus, they cannot be …
View article: MIX : a Multi-task Learning Approach to Solve Open-Domain Question Answering
MIX : a Multi-task Learning Approach to Solve Open-Domain Question Answering Open
This paper introduces MIX, a multi-task deep learning approach to solve open-ended question-answering. First, we design our system as a multi-stage pipeline of 3 building blocks: a BM25-based Retriever to reduce the search space, a RoBERTa…