Éric Granger
YOU?
Author Swipe
View article: High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization
High-Rate Mixout: Revisiting Mixout for Robust Domain Generalization Open
Ensembling fine-tuned models initialized from powerful pre-trained weights is a common strategy to improve robustness under distribution shifts, but it comes with substantial computational costs due to the need to train and store multiple …
View article: Revisiting Mixout: An Overlooked Path to Robust Finetuning
Revisiting Mixout: An Overlooked Path to Robust Finetuning Open
Finetuning vision foundation models often improves in-domain accuracy but comes at the cost of robustness under distribution shift. We revisit Mixout, a stochastic regularizer that intermittently replaces finetuned weights with their pretr…
View article: VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors
VLOD-TTA: Test-Time Adaptation of Vision-Language Object Detectors Open
Vision-language object detectors (VLODs) such as YOLO-World and Grounding DINO achieve impressive zero-shot recognition by aligning region proposals with text representations. However, their performance often degrades under domain shift. W…
View article: MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training
MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training Open
Personalized expression recognition (ER) involves adapting a machine learning model to subject-specific data for improved recognition of expressions with considerable interpersonal variability. Subject-specific ER can benefit significantly…
View article: Low-Rank Expert Merging for Multi-Source Domain Adaptation in Person Re-Identification
Low-Rank Expert Merging for Multi-Source Domain Adaptation in Person Re-Identification Open
Adapting person re-identification (reID) models to new target environments remains a challenging problem that is typically addressed using unsupervised domain adaptation (UDA) methods. Recent works show that when labeled data originates fr…
View article: WiSE-OD: Benchmarking Robustness in Infrared Object Detection
WiSE-OD: Benchmarking Robustness in Infrared Object Detection Open
Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR impro…
View article: Sleep Brain and Cardiac Activity Predict Cognitive Flexibility and Conceptual Reasoning Using Deep Learning
Sleep Brain and Cardiac Activity Predict Cognitive Flexibility and Conceptual Reasoning Using Deep Learning Open
Despite extensive research on the relationship between sleep and cognition, the connection between sleep microstructure and human performance across specific cognitive domains remains underexplored. This study investigates whether deep lea…
View article: BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change
BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change Open
Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of digital behaviour change interventions. These subtle and conflicting emotions are manifested by a d…
View article: DART$^3$: Leveraging Distance for Test Time Adaptation in Person Re-Identification
DART$^3$: Leveraging Distance for Test Time Adaptation in Person Re-Identification Open
Person re-identification (ReID) models are known to suffer from camera bias, where learned representations cluster according to camera viewpoints rather than identity, leading to significant performance degradation under (inter-camera) dom…
View article: CLIP-IT: CLIP-based Pairing for Histology Images Classification
CLIP-IT: CLIP-based Pairing for Histology Images Classification Open
Multimodal learning has shown promise in medical imaging, combining complementary modalities like images and text. Vision-language models (VLMs) capture rich diagnostic cues but often require large paired datasets and prompt- or text-based…
View article: Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation
Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation Open
Adapting Vision-Language Models (VLMs) to new domains with few labeled samples remains a significant challenge due to severe overfitting and computational constraints. State-of-the-art solutions, such as low-rank reparameterization, mitiga…
View article: Beyond Patches: Mining Interpretable Part-Prototypes for Explainable AI
Beyond Patches: Mining Interpretable Part-Prototypes for Explainable AI Open
As AI systems grow more capable, it becomes increasingly important that their decisions remain understandable and aligned with human expectations. A key challenge is the limited interpretability of deep models. Post-hoc methods like GradCA…
View article: The need for an ecological dataset for perturbed gait analysis - Supplementary materials
The need for an ecological dataset for perturbed gait analysis - Supplementary materials Open
International audience
View article: PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization
PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization Open
Weakly supervised object localization (WSOL) methods allow training models to classify images and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier. Standard WSOL methods re…
View article: Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data
Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data Open
Facial Expression Recognition (FER) from videos is a crucial task in various application areas, such as human-computer interaction and health diagnosis and monitoring (e.g., assessing pain and depression). Beyond the challenges of recogniz…
View article: MTLoc: A Confidence-Based Source-Free Domain Adaptation Approach For Indoor Localization
MTLoc: A Confidence-Based Source-Free Domain Adaptation Approach For Indoor Localization Open
Various deep learning models have been developed for indoor localization based on radio-frequency identification (RFID) tags. However, they often require adaptation to ensure accurate tracking in new target operational domains. To address …
View article: From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification
From Cross-Modal to Mixed-Modal Visible-Infrared Re-Identification Open
Visible-infrared person re-identification (VI-ReID) aims to match individuals across different camera modalities, a critical task in modern surveillance systems. While current VI-ReID methods focus on cross-modality matching, real-world ap…
View article: TeD-Loc: Text Distillation for Weakly Supervised Object Localization
TeD-Loc: Text Distillation for Weakly Supervised Object Localization Open
Weakly supervised object localization (WSOL) using classification models trained with only image-class labels remains an important challenge in computer vision. Given their reliance on classification objectives, traditional WSOL methods li…
View article: Image Retrieval Methods in the Dissimilarity Space
Image Retrieval Methods in the Dissimilarity Space Open
Image retrieval methods rely on metric learning to train backbone feature extraction models that can extract discriminant queries and reference (gallery) feature representations for similarity matching. Although state-of-the-art accuracy h…
View article: Visual Modality Prompt for Adapting Vision-Language Object Detectors
Visual Modality Prompt for Adapting Vision-Language Object Detectors Open
The zero-shot performance of object detectors degrades when tested on different modalities, such as infrared and depth. While recent work has explored image translation techniques to adapt detectors to new modalities, these methods are lim…
View article: Weakly Supervised Learning for Facial Behavior Analysis: A Review
Weakly Supervised Learning for Facial Behavior Analysis: A Review Open
In the recent years, there has been a shift in facial behavior analysis from the laboratory-controlled conditions to the challenging in-the-wild conditions due to the superior performance of deep learning based approaches for many real wor…
View article: TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning
TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning Open
Diffusion models have emerged as highly effective techniques for inpainting, however, they remain constrained by slow sampling rates. While recent advances have enhanced generation quality, they have also increased sampling time, thereby l…
View article: Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition
Spatial Action Unit Cues for Interpretable Deep Facial Expression Recognition Open
Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (AUs) from…
View article: Source-Free Domain Adaptation for YOLO Object Detection
Source-Free Domain Adaptation for YOLO Object Detection Open
Source-free domain adaptation (SFDA) is a challenging problem in object detection, where a pre-trained source model is adapted to a new target domain without using any source domain data for privacy and efficiency reasons. Most state-of-th…
View article: Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition
Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition Open
Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals. Multimodal emotion recognition systems can perform well because they can learn complementary an…
View article: Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild
Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild Open
Systems for multimodal emotion recognition (ER) are commonly trained to extract features from different modalities (e.g., visual, audio, and textual) that are combined to predict individual basic emotions. However, compound emotions often …
View article: Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos
Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos Open
Weakly-Supervised Video Object Localization (WSVOL) involves localizing an object in videos using only video-level labels, also referred to as tags. State-of-the-art WSVOL methods like Temporal CAM (TCAM) rely on class activation mapping (…