Maxime Zanella
YOU?
Author Swipe
View article: Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language Models
Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language Models Open
Remote Sensing Vision-Language Models (RSVLMs) have shown remarkable potential thanks to large-scale pretraining, achieving strong zero-shot performance on various tasks. However, their ability to generalize in low-data regimes, such as fe…
View article: Language-Aware Information Maximization for Transductive Few-Shot CLIP
Language-Aware Information Maximization for Transductive Few-Shot CLIP Open
Transductive few-shot learning has triggered an abundant literature focusing on vision-only models, but is still at a nascent stage within the recent context of foundational vision-language models (VLMs). Only a few recent methods addresse…
View article: Vocabulary-free few-shot learning for Vision-Language Models
Vocabulary-free few-shot learning for Vision-Language Models Open
Recent advances in few-shot adaptation for Vision-Language Models (VLMs) have greatly expanded their ability to generalize across tasks using only a few labeled examples. However, existing approaches primarily build upon the strong zero-sh…
View article: Online Gaussian Test-Time Adaptation of Vision-Language Models
Online Gaussian Test-Time Adaptation of Vision-Language Models Open
Online test-time adaptation (OTTA) of vision-language models (VLMs) has recently garnered increased attention to take advantage of data observed along a stream to improve future predictions. Unfortunately, existing methods rely on dataset-…
View article: Realistic Test-Time Adaptation of Vision-Language Models
Realistic Test-Time Adaptation of Vision-Language Models Open
The zero-shot capabilities of Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance. However, previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data di…
View article: Exploring Foundation Models Fine-Tuning for Cytology Classification
Exploring Foundation Models Fine-Tuning for Cytology Classification Open
Cytology slides are essential tools in diagnosing and staging cancer, but their analysis is time-consuming and costly. Foundation models have shown great potential to assist in these tasks. In this paper, we explore how existing foundation…
View article: Physically Interpretable Probabilistic Domain Characterization
Physically Interpretable Probabilistic Domain Characterization Open
Characterizing domains is essential for models analyzing dynamic environments, as it allows them to adapt to evolving conditions or to hand the task over to backup systems when facing conditions outside their operational domain. Existing s…
View article: Boosting Vision-Language Models for Histopathology Classification: Predict all at once
Boosting Vision-Language Models for Histopathology Classification: Predict all at once Open
The development of vision-language models (VLMs) for histo-pathology has shown promising new usages and zero-shot performances. However, current approaches, which decompose large slides into smaller patches, focus solely on inductive class…
View article: Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification
Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification Open
Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and ma…
View article: Exploring viability of Test-Time Training: Application to 3D segmentation in Multiple Sclerosis
Exploring viability of Test-Time Training: Application to 3D segmentation in Multiple Sclerosis Open
Test-Time Training (TTT) is an unsupervised domain adaptation technique employing a self-supervised task performed by an attached branch model. While justification of key design choices are often neglected in the literature, we explore the…
View article: Boosting Vision-Language Models with Transduction
Boosting Vision-Language Models with Transduction Open
Transduction is a powerful paradigm that leverages the structure of unlabeled data to boost predictive accuracy. We present TransCLIP, a novel and computationally efficient transductive approach designed for Vision-Language Models (VLMs). …
View article: Low-Rank Few-Shot Adaptation of Vision-Language Models
Low-Rank Few-Shot Adaptation of Vision-Language Models Open
Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further pushed their generalization capabilities, at the expense of just a few labeled samples within the target downstream task. However, this promising, alre…
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? Open
The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmente…
Mixture Domain Adaptation to Improve Semantic Segmentation in Real-World Surveillance Open
peer reviewed
Mixture Domain Adaptation to Improve Semantic Segmentation in Real-World Surveillance Open
Various tasks encountered in real-world surveillance can be addressed by determining posteriors (e.g. by Bayesian inference or machine learning), based on which critical decisions must be taken. However, the surveillance domain (acquisitio…