Ismail Ben Ayed
YOU?
Author Swipe
View article: TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses Open
State Space Models (SSMs) have emerged as efficient alternatives to Vision Transformers (ViTs), with VMamba standing out as a pioneering architecture designed for vision tasks. However, their generalization performance degrades significant…
View article: Purge-Gate: Backpropagation-Free Test-Time Adaptation for Point Clouds Classification via Token Purging
Purge-Gate: Backpropagation-Free Test-Time Adaptation for Point Clouds Classification via Token Purging Open
Test-time adaptation (TTA) is crucial for mitigating performance degradation caused by distribution shifts in 3D point cloud classification. In this work, we introduce Token Purging (PG), a novel backpropagation-free approach that removes …
View article: Language-Aware Information Maximization for Transductive Few-Shot CLIP
Language-Aware Information Maximization for Transductive Few-Shot CLIP Open
Transductive few-shot learning has triggered an abundant literature focusing on vision-only models, but is still at a nascent stage within the recent context of foundational vision-language models (VLMs). Only a few recent methods addresse…
View article: Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation
Regularized Low-Rank Adaptation for Few-Shot Organ Segmentation Open
Parameter-efficient fine-tuning (PEFT) of pre-trained foundation models is increasingly attracting interest in medical imaging due to its effectiveness and computational efficiency. Among these methods, Low-Rank Adaptation (LoRA) is a nota…
View article: ViLU: Learning Vision-Language Uncertainties for Failure Prediction
ViLU: Learning Vision-Language Uncertainties for Failure Prediction Open
Reliable Uncertainty Quantification (UQ) and failure prediction remain open challenges for Vision-Language Models (VLMs). We introduce ViLU, a new Vision-Language Uncertainty quantification framework that contextualizes uncertainty estimat…
View article: Full Conformal Adaptation of Medical Vision-Language Models
Full Conformal Adaptation of Medical Vision-Language Models Open
Vision-language models (VLMs) pre-trained at large scale have shown unprecedented transferability capabilities and are being progressively integrated into medical image analysis. Although its discriminative potential has been widely explor…
View article: Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation
Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation Open
Recently, test-time adaptation has attracted wide interest in the context of vision-language models for image classification. However, to the best of our knowledge, the problem is completely overlooked in dense prediction tasks such as Ope…
View article: SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds
SMART-PC: Skeletal Model Adaptation for Robust Test-Time Training in Point Clouds Open
Test-Time Training (TTT) has emerged as a promising solution to address distribution shifts in 3D point cloud classification. However, existing methods often rely on computationally expensive backpropagation during adaptation, limiting the…
View article: Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation Open
Adapting Vision-Language Models (VLMs) to new domains with few labeled samples remains a significant challenge due to severe overfitting and computational constraints. State-of-the-art solutions, such as low-rank reparameterization, mitiga…
View article: AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples
AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples Open
While novel gradient-based attacks are continuously proposed to improve the optimization of adversarial examples, each is shown to outperform its predecessors using different experimental setups, implementations, and computational budgets,…
View article: Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images
Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images Open
This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a pro…
View article: Realistic Test-Time Adaptation of Vision-Language Models
Realistic Test-Time Adaptation of Vision-Language Models Open
The zero-shot capabilities of Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance. However, previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data di…
View article: UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning Open
Transductive few-shot learning has recently triggered wide attention in computer vision. Yet, current methods introduce key hyper-parameters, which control the prediction statistics of the test batches, such as the level of class balance, …
View article: Are foundation models for computer vision good conformal predictors?
Are foundation models for computer vision good conformal predictors? Open
Recent advances in self-supervision and contrastive learning have brought the performance of foundation models to unprecedented levels in a variety of tasks. Fueled by this progress, these models are becoming the prevailing approach for a …
View article: Semantic Anchor Transport: Robust Test-Time Adaptation for Vision-Language Models
Semantic Anchor Transport: Robust Test-Time Adaptation for Vision-Language Models Open
Large pre-trained vision-language models (VLMs), such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is s…
View article: Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging
Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging Open
Test-Time Adaptation (TTA) addresses distribution shifts during testing by adapting a pretrained model without access to source data. In this work, we propose a novel TTA approach for 3D point cloud classification, combining sampling varia…
View article: Few-shot Adaptation of Medical Vision-Language Models
Few-shot Adaptation of Medical Vision-Language Models Open
Integrating image and text data through multi-modal learning has emerged as a new approach in medical imaging research, following its successful deployment in computer vision. While considerable efforts have been dedicated to establishing …
View article: Boosting Vision-Language Models for Histopathology Classification: Predict all at once
Boosting Vision-Language Models for Histopathology Classification: Predict all at once Open
The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive class…
View article: Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification
Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification Open
Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and ma…
View article: Robust Calibration of Large Vision-Language Adapters
Robust Calibration of Large Vision-Language Adapters Open
This paper addresses the critical issue of miscalibration in CLIP-based model adaptation, particularly in the challenging scenario of out-of-distribution (OOD) samples, which has been overlooked in the existing literature on CLIP adaptatio…
View article: FDS: Feedback-guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization
FDS: Feedback-guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization Open
Domain Generalization techniques aim to enhance model robustness by simulating novel data distributions during training, typically through various augmentation or stylization strategies. However, these methods frequently suffer from limite…
View article: WATT: Weight Average Test-Time Adaptation of CLIP
WATT: Weight Average Test-Time Adaptation of CLIP Open
Vision-Language Models (VLMs) such as CLIP have yielded unprecedented performance for zero-shot image classification, yet their generalization capability may still be seriously challenged when confronted to domain shifts. In response, we p…
View article: Transductive Zero-Shot and Few-Shot CLIP
Transductive Zero-Shot and Few-Shot CLIP Open
Transductive inference has been widely investigated in few-shot image classification, but completely overlooked in the recent, fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-…
View article: When is an Embedding Model More Promising than Another?
When is an Embedding Model More Promising than Another? Open
Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-spe…
View article: Boosting Vision-Language Models with Transduction
Boosting Vision-Language Models with Transduction Open
Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). …
View article: Low-Rank Few-Shot Adaptation of Vision-Language Models
Low-Rank Few-Shot Adaptation of Vision-Language Models Open
Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, alre…
View article: A Transductive Few-Shot Learning Approach for Classification of Digital Histopathological Slides from Liver Cancer
A Transductive Few-Shot Learning Approach for Classification of Digital Histopathological Slides from Liver Cancer Open
International audience
View article: GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D
GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D Open
We introduce a pioneering approach to self-supervised learning for point clouds, employing a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Auto Encoders (MAE). Unlike the conventio…